- Docente: Piotr Cwiakowski
- Credits: 3
- Language: English
- Teaching Mode: Blended Learning
- Campus: Bologna
- Corso: First cycle degree programme (L) in Statistical Sciences (cod. 8873)
-
from Dec 04, 2024 to Dec 12, 2024
Learning outcomes
By the end of the course the student will develop advanced expertise in analyzing real-world phenomena by using statistical methods. By the end of this course students will be able to: - implement appropriate advanced statistical analysis using a statistical software (SAS or R or SPSS); - interpret the output of the procedures; - critically collate results and conclusions; - present the main results and conclusions in the form of concise summaries; - work independently on practical data analysis problems.
Course contents
- Text cleaning and text standardization (i. a. stemming, lemmatization, stopwords)
- Creating Document Term Matrix with different weights
- Data wrangling in text mining.
- Searching for relationships and patterns between words.
- Visualization techniques for text mining analysis.
- Unsupervised machine learning methods for text analysis (clustering, sentiment analysis, dimensional reduction)
- Supervised machine learning methods and simple feature engineering of text data (Naive Bayes, KNN, Decision Trees, SVM, Random forest).
- R software and R infrastructure for the text mining analysis and machine learning (packages: tm, tidytext, quanteda, caret, mlr).
Readings/Bibliography
- Ashish Kumar, Avinash Paul, Mastering Text Mining with R.„Packt Publishing", 2016.
- Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.", 2009.
- Feldman, Ronen, and James Sanger. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge university press, 2007.
- Friedl, Jeffrey EF. Mastering regular expressions. " O'Reilly Media, Inc.", 2006.
- Kumar, Ashish, and Avinash Paul. Mastering Text Mining with R. Packt Publishing Ltd, 2016.
- Kwartler, Ted. Text mining in practice with R. John Wiley & Sons, 1991.
- Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. Vol. 999. Cambridge: MIT press, 1999.
- Meyer, David, Kurt Hornik, and Ingo Feinerer. "Text mining infrastructure in R." Journal of statistical software 25.5 (2008): 1-54.
- Silge, Julia, and David Robinson. Text mining with R: A tidy approach. " O'Reilly Media, Inc.", 2017.
- Weiss, Sholom M., et al. Text mining: predictive methods for analyzing unstructured information. Springer Science & Business Media, 2010.
Teaching methods
Lectures and lab tutorials
Assessment methods
Attendance, take-home project.
Teaching tools
Lab tutorials & teaching notes
Office hours
See the website of Piotr Cwiakowski