- Docente: Moreno Marzolla
- Credits: 6
- SSD: INF/01
- Language: English
- Moduli: Moreno Marzolla (Modulo 1) Davide Rossi (Modulo 2)
- Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Artificial Intelligence (cod. 9063)
-
from Sep 16, 2024 to Nov 11, 2024
-
from Sep 17, 2024 to Oct 29, 2024
Learning outcomes
At the end of the course, the student has a deep understanding of the requirements of machine-learning workloads for computing systems, has an understanding of the main architectures for accelerating machine learning workloads and heterogeneous architectures for embedded machine learning, and of the most popular platforms made available by cloud providers to specifically support machine/deep learning applications.
Course contents
Module 1:
- Introduction to parallel programming.
- Parallel programming patterns: embarassingly parallel, decomposition, master/worker, scan, reduce, ...
- Shared-Memory programming with OpenMP.
- OpenMP programming model: the “omp parallel” construct, scoping constructs, other work-sharing constructs.
- GPU programming with CUDA.
Module 2:
- From ML to DNNs - a computational perspective
- Introduction to key computational kernels (dot-product, matrix multiply...)
- Inference vs training - workload analysis characterization
- The NN computational zoo: DNNs, CNNs, RNNs, GNNs, Attention-based Networks
- Running ML workloads on programmable processors
- recap of processor instruction set architecture (ISA) with focus on data processing
- improving processor ISAs for ML: RISC-V and ARM use cases
- fundamentals of parallel processor architecture and parallelization of ML workloads
- Algorithmic optimizations for ML
- Key bottlenecks taxonomy of optimization techniques
- Algorithmic techniques: Strassen, Winograd, FFT
- Topology optimization: efficient NN models - depthwise convolutions, inverse bottleneck, introduction to Neural Architectural Search
Readings/Bibliography
Suggested readings for Module 1: selected parts from the following books
An Introduction to Parallel Programming
Peter Pacheco,
Morgan Kaufmann, 2011, ISBN 9780123742605
https://shop.elsevier.com/books/an-introduction-to-parallel-programming/pacheco/978-0-12-374260-5
Note: there is an updated version of this book, published in 2021 and coauthored with Matthew Malensek.
CUDA C programming guide
NVidia Corporation
http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Teaching methods
Traditional lectures for theory.
In addition, both Module 1 and Module 2 include hands-on sessions requiring a student laptop.
Assessment methods
Module 1: Project work
Module 2: Written exam with oral discussionTeaching tools
Lectures using projector for slides provided by the instructors.
Hands-on sessions.
Office hours
See the website of Moreno Marzolla
See the website of Davide Rossi