- Docente: Samuele Salti
- Credits: 6
- SSD: ING-INF/05
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Artificial Intelligence (cod. 9063)
-
from Sep 19, 2024 to Dec 19, 2024
Learning outcomes
At the end of the course, the student masters the most popular modern machine-learning approaches to computer-vision tasks, with particular reference to specialized deep-learning architectures. The student has both a theoretical understanding and the necessary practical skills required to develop state-of-the-art image and video analysis systems for real-world applications.
Course contents
The course is held in the first semester, from September to December.
Topics:
- Advanced CNNs: : ResNeXt and grouped convolutions, MobileNets, EfficientNet and RegNet.
- Attention and vision Transformers. Transformers and atttention. Image classification architectures based on Transformers.
- Object detection. Introduction to ensemble learning via boosting. The Viola-Jones detector and its applications. Specialized NN architectures for object detection. Two-stages, one-stage, and anchor-free detectors. Feature Pyramid Networks. Imbalanced learning and the focal loss. DEtection TRansformer (DETR). Hands-on session on object detection.
- Semantic/instance/panoptic segmentation. Ensemble learning via bagging and random forests. The algorithm behind the Kinect body part segmentation. Transposed and dilated convolutions. Fully Convolutional Networks, U-net, DeepLab. Instance segmentation and Mask R-CNN.
- Depth estimation from monocular images: photometric loss and Monodepth.
- Metric and representation learning. Deep metric learning and its applications to face recognition/identification and beyond. Contrastive and triplet loss, ArcFace, NT-Xent loss, CLIP. Unsupervised representation learning. Hands-on session on metric learning.
- Image generation with GANs and diffusion models: metrics for generative tasks. Generative Adversarial Networks and Denoising diffusion probabilistic models. Stable diffusion and text-guided image generation. Hands-on session on textual inversion.
Prerequisites:
- Basic knowledge of computer vision and image processing: image formation, image digitization, camera modelling and calibration, basic image manipulation and processing, local image features.
- Basic knowledge of PyTorch: a good intro is available at https://pytorch.org/tutorials/beginner/basics/intro.html
- Basic knowledge of machine learning: supervised versus unsupervised learning, classification versus regression, underfitting and overfitting, regularization; data split in training, validation and test sets; hyper-parameters and cross-validation.
If you attended Computer Vision and Image Processing (either thought by Prof. Lisanti and me or by Prof. Di Stefano), you already fulfill these prerequisites. If you didn't, you can find on Virtuale slides and lab sessions.
Readings/Bibliography
The main reference material will be the slides and notes provided on Virtuale by the instructor. A set of pointers to scientific papers and technical reports for each topic will also be provided during lectures.
Several freely-available on-line resources can be useful to complement the material provided by the instructor on Virtuale.
- https://udlbook.github.io/udlbook/ Simon J.D. Prince, "Understanding Deep Learning", MIT Press, 2023.
- http://d2l.ai/ - Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola "Dive into Deep Learning", 2020
- http://www.deeplearningbook.org/ - Ian Goodfellow and Yoshua Bengio and Aaron Courville, "Deep Learning", 2016.
- https://github.com/fastai/fastbook - Jeremy Howard and Sylvain Gugger, "Deep Learning for Coders with fastai and PyTorch", 2020
- https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf - Eli Stevens, Luca Antiga, and Thomas Viehmann, "Deep learning with PyTorch", July 2020.
Teaching methods
Taught lessons.
Theory lessons are complemented by in-class hands-on sessions, where selected topics will be studied from a practical point of view by using the Python language and the PyTorch library.
Assessment methods
The assessment methods comprise of a theoretical part and a practical part.
The theoretical part is an oral exam. Students will present a recent scientific paper related to the course topics, previously agreed with the instructor. Then, students will answer questions on the paper and on the theory discussed in the course.
The practical part is an assignment on topics covered in the hands-on sessions and the theoretical lessons. The assignment must be submitted before sitting for the theoretical part.
Teaching tools
Powerpoint slides (whose PDF printouts are available from the course's web site before lectures) are projected and discussed during class hours.
Jupyter notebooks of all the hands-on sessions will be available on the course website.
Office hours
See the website of Samuele Salti
SDGs


This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.