40720 - Data Mining

Academic Year 2024/2025

Learning outcomes

This course will present statistical methods that have proven to be of value in the field of knowledge discovery in business databases, with special attention to techniques that help managers to make intelligent use of data repositories by recognizing patterns and making predictions. In particular, this course enables the student: - to correctly plan a data mining process - to choose the best suited methodology for the problem at hand - to critically interpret the results

Course contents

1. Review of baseline concepts of descriptive statistics and hypothesis testing.

2. Simple linear regression and multiple linear regression: OLS estimation, inference, model comparison, analysis of the residuals, inclusion of categorical variables in the model.

3. Cluster analysis: distance measures, hierarchical and partitioning clustering algorithms, linkage methods.

 

 

 

Readings/Bibliography

Slides and lab materials (datasets, Stata scripts, etc.) available on Virtuale.

 

Recommended books:

A. Agresti. (2017). Statistical Methods for the Social Sciences. Global Edition.

James G., Witten D., Hastie T., and Tibshirani R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.

Teaching methods

Theoretical lectures, practical labs using Stata and group assignments.

Assessment methods

Attending Students:
-Written Exam (max 22 points): exercises, open questions, interpretation of software results.
-Group Assignments (max 2x5=10 points): two lessons will be dedicated to group assignments, where you will replicate in Stata the problems discussed in class using new data and discuss the results in a report. Each of the two reports will be evaluated on a scale from 0 to 5 points.
-Max final mark 32 (31/32=30L)


Not Attending Students:
-Written Exam (max 22 points): exercises, open questions, interpretation of software results.
-Oral Exam (max 10 points).
-Max final mark 32 (31/32=30L)

Teaching tools

Theoretical lectures are accompanied by practical lab sessions in Stata. During the labs you will see empirical examples of the methods studied using real data. At the end of the course, you will be able to perform your own statistical analysis using descriptive statistics techniques as well as regression and clustering methods.

 

You can download Stata using the UNIBO licence from this link:

https://scienzeaziendali.unibo.it/en/department/technical-and-administrative-services/software-with-campus-licenses


Office hours

See the website of Federica Galli