85316 - Statistics and Data Analysis

Academic Year 2024/2025

  • Docente: Marco Novelli
  • Credits: 6
  • SSD: SECS-S/01
  • Language: Italian
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Law and Economics (cod. 5913)

Learning outcomes

The aim of the course is to deliver skills related to usage of data analysis tools and tecniques both in descriptive and inferential statistics. At the end of the course the student will be able to use for basic tasks one of the most common data analysis softwares. Moreover, the student will know and will be able to critically apply the main tools for descriptive and inferential statistics for both the univariate and the two or more populations case. The lab activity is aimed at improving autonomy of the students about data management.

Course contents

This is not an introductory course on Statistics, students without the required background knowledge in statistics and probability are supposed to fill their gap before preparing themselves for the examination.

Required background knowledge in statistics

  • Empirical frequency distributions.
  • Measures of location (mode, median, arithmetic mean).
  • Measures of dispersion, linear correleation and simple linear regression.
  • Meaure of association, meand dependence, chi-squared, Cramer's V.
  • Fundamentals of parametric estimation and hypothesis testing.
  • Statistical tables for the standard normal and Student's t distributions.

Required background knowledge in probability

  • Random experiments and their sample spaces. Simple, compound and disjoint events. Impossible and certain events. Events obtained by intersection, union and negation.
  • Definitions and axioms of probability. Conditional probability. Independent events. The law of total probability. Bayes' theorem.
  • Random variables. Rules for computing probabilities for any random variable. The distribution function of a random variable. Probability mass function. Probability density function.
  • Limit theorems and convergence.

 

 

Introduction to R and RStudio.

Arithmetics, mathematics and logic in R. Data structures in R.

Creation and management of variables and dataframes. Data importing.

Iterative Control Structures/Loops.

Conditional Control Structures/Conditional Statements.

Descriptive analysis of data and graphical representations.

User-defined functions.

Statistical inference for the mean of a gaussian population and for a proportion.

Comparison of means of two population.

Linear regression.

Readings/Bibliography

The following books are freely available on the internet.

Wickham, Hadley, and Grolemund, Garrett. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Stati Uniti, O'Reilly Media, 2016. https://r4ds.had.co.nz [https://r4ds.had.co.nz/]

Måns Thulin, Modern statistics with R, 2021. http://modernstatisticswithr.com/

The following book is available in bookshops:

Alan Agresti, Maria Kateri, Foundations of Statistics for Data Scientists with R and Python, Taylor & Francis, 2021

https://www.taylorfrancis.com/books/mono/10.1201/9781003159834/foundations-statistics-data-scientists-alan-agresti-maria-kateri

Teaching methods

Class lectures.

Each student will need to bring his/her own laptop after installing R and RStudio in this order:

install R from https://www.cran.r-project.org/

install RStudio from https://www.rstudio.org/download/desktop

In view of the type of activities and teaching methods adopted, the attendance of this training activity requires the prior participation of all students in Modules 1 and 2 of safety training in the workplace Module I and II, in e-learning mode.

Assessment methods

The exam will consist of a practical data analysis test in the computer lab structured into 2/3 exercises. The duration of the test is 60 minutes.

Grading policy

insufficient <18; sufficient 18-23; good 24-27; very good 28-30; excellent 30 cum laude.

Teaching tools

  • material provided by the lecturer on virtuale.unibo.it [https://virtuale.unibo.it/]
  • statistical software R www.r-project.org [https://www.r-project.org/]
  • integrated development environment RStudio www.rstudio.com [https://www.rstudio.com/]

Students with disability or specific learning disabilities (DSA) are required to make their condition known to find the best possibile accomodation to their needs.

Office hours

See the website of Marco Novelli