- Docente: Laura Anderlucci
- Crediti formativi: 6
- SSD: SECS-S/01
- Lingua di insegnamento: Inglese
- Modalità didattica: Convenzionale - Lezioni in presenza
- Campus: Bologna
- Corso: Laurea in Genomics (cod. 9211)
-
dal 16/09/2024 al 05/12/2024
Conoscenze e abilità da conseguire
Al termine del corso, lo studente conosce i metodi correnti delle tecniche applicati ai data-science usando metodi e software computazionali moderni con una particolare enfasi sul ragionamento rigoroso in statistica. Lo studente è capace di rappresentare e organizzare le conoscenze riguardo a collezioni di dati su larga scala, trasformare i dati in informazioni pratiche usando concetti di "statistical learning" e " data mining" combinati con le tecniche di visualizzazione dei dati e di riproducibilità delle analisi dei dati.
Contenuti
Part I: Introduction to Statistical Learning
Part II: Data Visualization and Reporting
Part III: Supervised Learning
- Cross-Validation
- Naïve Bayes
- Logistic Regression;
- k-Nearest Neighbors;
- Nearest Shrunken Centroid;
- Regression and classification trees;
- The Bootstrap;
- Bagging; Random Forests; Boosting.
Part IV: Unsupervised Learning
- k-means
- Hierarchical clustering
Part V: Overview of the main machine learning methods
- Support Vector Machines
Testi/Bibliografia
The primary text for the course:
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to Statistical Learning. Second Edition. New York: Springer. ISBN: 978-1-0716-1417-4. E-book ISBN 978-1-0716-1418-1
The book is freely available here:
https://www.statlearning.com/
In addition, we will use:
- T. Hastie, R. Tibshirani, and J. Friedman (2001) The Elements of Statistical Learning: data mining, inference and prediction. Springer Verlag.
Freely available at: https://hastie.su.domains/Papers/ESLII.pdf - J. Han and M. Kamber (2000) Data mining: concepts and techniques. Morgan Kaufman.
Freely available at: http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf
Metodi didattici
Lectures and practical sessions.
Lectures complemented with practical sessions. As concerns the teaching methods of this course unit, all students must attend Module 1, 2 [http://www.unibo.it/en/services-and-opportunities/health-and-assistance/health-and-safety/online-course-on-health-and-safety-in-study-and-internship-areas] on Health and Safety online.
Modalità di verifica e valutazione dell'apprendimento
The learning assessment is composed by a written test lasting 100 minutes. The written test is aimed at assessing the student's ability to use the learned definitions, concepts and properties and in solving exercises. During the written exam, students can only use the cheat sheet that is provided on virtuale.unibo.it, containing references to R packages and functions. Students cannot make use of the textbook, personal notes and mobile phones (smart watch or similar electronic data storage or communication device are not allowed either).
The written test consists of 7-10 questions, both multiple choice and open, some of which to be solved in R. The final grade is out of thirty.
Students that, despite having passed the exam, do not feel represented by the obtained result can ask to have an additional (optional) oral exam that can change the grade by +/-3 points.
Strumenti a supporto della didattica
The following material will be provided: slides of the lectures, exercises with solutions, mock exam.
Orario di ricevimento
Consulta il sito web di Laura Anderlucci