Tu sei qui:

95662 - INTRODUCTION TO MACHINE LEARNING

Anno Accademico 2023/2024

                
                        Docente:
                        Matteo Amabili
                    
                        Crediti formativi:
                        3
                    
                        SSD:
                        SECS-S/06
                    
                        Lingua di insegnamento:
                        Inglese
                    
                        Modalità didattica:
                        Convenzionale - Lezioni in presenza
                        
                            Campus:
                            Bologna
                        
                            Corso:
                            Laurea Magistrale in
                            Quantitative Finance (cod. 8854)

                                Valido anche per
                                
                                    Laurea Magistrale in
                                    
                                        Greening Energy Market and Finance (cod. 5885)
                                    
                            Risorse didattiche su Virtuale
                        
                                    Orario delle lezioni
                                
dal 15/02/2024 al 29/02/2024

Conoscenze e abilità da conseguire

The main goal of the course is to present the first elements of Machine Learning accompanied by a brief reference to the most important elements of numerical analysis used in this field. We present also the Python ecosystem for machine learning and the functionality it provides with NumPy, Matplotlib and Pandas, scikit-learn. A general discussion of Supervised and Unsupervised is introduced. After discussing the idea of clustering, the student should learn that this class of algorithms explore input data without being given an explicit output variable. Students should also clearly understand when to use it. After that, we continue with the definition of Supervised Learning describing from a very general point of view how this class of algorithms works and when we should use it. Students should learn how to implement some of the most simple and standard methods for modelling relationship between independent input variables and dependent output variables. As regards decision trees and how they can be used for prediction, the student should learn what are potential advantages of this technique over linear or logistic regression and how to use it in classification problems. A simple introduction to bayesan learning is presented. Finally we explain how different machine learning algorithms can be combined to produce composite predictions. An important example of this is a random forest which is a procedure for generating many different decision trees and combining the results. Students should be familiar with the following concepts: Vector Spaces, Eigenfunctions and Eigenvectors, Operator and Matrix Calculus, Calculus of Extrema, the concept of Gradient, Condition for local and global minima, Conditional Probability, Bayes Rule. A basic experience with Python programming is required.

Contenuti

The first part of the course is dedicated to an introduction of the main concepts of machine learning. Here the contents:

Intoduction to ML
- 1.1.What is ML: a shift from knowledge to data
- 1.2.Kind of problems
  - 1.2.1.supervised versus unsupervised
  - 1.2.2.regression vs classification
- 1.3.Data pipeline
- 1.4.Python Basics
Data Preprocessing
- 2.1.Data Normalizzation
- 2.3.Categorical variables: ordinal and non-ordinal
- 2.4.Outliers
- 2.5.Feature Engineering
- 2.6.Dimensionality reduction:PCA
- 2.7.Examples in python: sklearn
Linear Regression
- 3.1.Estimating the coefficients: Least Square Method & maximum likehood
- 3.2.Performance metrics
- 3.3.Interpreting the coefficients
- 3.4.The problem of Collinearity
- 3.5.Selecting the relevant variables: Lasso/Ridge regression
- 3.6.Kernel Regression
- 3.7.Pyhton Hand-on
Logistic Regression
- 4.1.Problem Definition
- 4.2.Estimating the coefficients: gradient descent
- 4.3.Classification Metrics:
  - 4.3.1.Precision
  - 4.3.2.Recall
  - 4.3.3.F-beta score
  - 4.3.4.Area Under tre ROC curve
- 4.4.Interpreting the coefficients
- 4.5.Generalized linear model: Poisson regression
- 4.6.Multilabel case
- 4.7.Python hands-on
Evaluate a Model
- 5.1.Cross-validation & hyper parameter tuning
- 5.2.Bias Variance trade-off
- 5.3.Simple cross-validation
- 5.4.N-fold cross-validation
- 5.5.Python hands-on
Tree Based Method
- 6.1.Simple Cart for regression and classification
- 6.2.Ensample methods: Random Forest
- 6.3.Boosting methods
- 6.4.Python hands-on
Unsupervised learning
- 7.1.Problems
- 7.2.K-means
- 7.3.Density-Based Model: DBSCAN
- 7.5.Remove outliers using unsupervised methods
- 7.6.Python Hands-on

The second parte of the course (held by Prof. Petruccelli) is dedicated to the machine learning application to natural risk. Here is the contents:

Introduction to Natural Risks
Geospatial Data Analysis
Machine Learning in Weather and Climate
Extreme Events Analysis
AI and Disaster Management

Testi/Bibliografia

James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: springer, 2009.
Rogers, Simon, and Mark Girolami. A first course in machine learning. Chapman and Hall/CRC, 2016.

Metodi didattici

Frontal lessons

Modalità di verifica e valutazione dell'apprendimento

The final exam will consists of a Machine learning project. During the exam, the student will have to present the developed project and discuss its main aspects as well as the underlying theory.

Strumenti a supporto della didattica

Slides (power point/pdf)
Selected literature
Jupyter Notebook and Python Code

Orario di ricevimento

Consulta il sito web di Matteo Amabili