B5557 - DATA SCIENCE FOR SOCIAL SCIENCES AND POPULATION ANALYTICS

Academic Year 2024/2025

  • Teaching Mode: Traditional lectures
  • Campus: Rimini
  • Corso: Second cycle degree programme (LM) in Statistical, Financial and Actuarial Sciences (cod. 8877)

Learning outcomes

Students will acquire specialised skills to apply data science in fields such as social sciences and demographic analysis. Educational objectives include understanding the fundamentals of data science, using a programming language to analyse demographic and social data, and knowledge of open science principles. Students will be able to implement research projects and communicate results effectively.

Course contents

Course Description:
This course focuses on the application of data science techniques to analyze social science and population data. Using real census data and programming in R, with an introduction to Python, students will learn to manage, analyze, and interpret large datasets to derive meaningful insights and knowledge in the field of social sciences.

Learning Objectives:
By the end of the course, students will be able to:

  • Understand and apply fundamental data science techniques in the context of social sciences.
  • Process and analyze real census data using R, with an introduction to Python.
  • Effectively visualize data to communicate results.
  • Develop critical thinking skills to evaluate and interpret data analysis outcomes.
  • Apply knowledge to real-world problems in social sciences and population analysis.
  • Utilize generative artificial intelligence as a support tool for data science programming.

Course Content:

Introduction to Data Science:

  • Overview of data science in social sciences
  • Types of data: structured and unstructured
  • Data collection methods and sources
  • Open data in demographic and social sciences

Data Visualization:

  • Principles of effective data visualization
  • Creating visualizations with R, with an introduction to Python
  • Interpreting and presenting data visually

Exploratory Data Analysis (EDA):

  • Techniques for EDA
  • Descriptive statistics and data distributions
  • Identifying patterns and anomalies

Statistical Analysis:

  • Basic statistical concepts
  • Hypothesis testing and inferential statistics
  • Regression analysis and correlation

Working with Real Census Data:

  • Understanding the structure of census data
  • Importing and managing large datasets
  • Case studies and practical exercises

Applications in Social Sciences:

  • Demographic analysis
  • Socio-economic indicators
  • Public policies and population studies

Introduction to Generative Artificial Intelligence for Data Science:

  • Key generative AI models
  • Examples of prompt engineering

Prerequisites:

  • Basic knowledge of statistics
  • Familiarity with programming concepts (experience with R and Python is helpful but not mandatory)

Readings/Bibliography

Hadley Wickham, Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, O'Reilly Media, 2017.

Norman Matloff, The Art of R Programming: A Tour of Statistical Software Design, No Starch Press, 2011.

Teaching methods

  • Lectures and interactive sessions
  • Hands-on programming exercises
  • Group projects and collaborative learning
  • Real-world case studies and applications
  • Assessment methods

    • Discussion of a project work (individually or in groups of up to three people) using real demographic data. For the Project Work, the data to be analyzed will be provided by the instructor and sourced from IPUMS census databases. The analyses will be based on R scripts demonstrated during the lectures. Presentations should be prepared in PowerPoint, and the content will be agreed upon with the instructor.
    • Students will receive a pass/fail grade based on their performance and engagement during the preparation and presentation phases of the project work.

    Teaching tools

  • Course slides and lecture notes (provided on the course platform Virtuale)
  • Recommended textbooks and reading materials
  • Access to census datasets available at https://www.ipums.org/ and relevant software tools (R/Python)
  • Office hours

    See the website of Francesco Scalone