91266 - Machine Learning for Computer Vision

Academic Year 2024/2025

  • Docente: Samuele Salti
  • Credits: 6
  • SSD: ING-INF/05
  • Language: English
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Artificial Intelligence (cod. 9063)

Learning outcomes

At the end of the course, the student masters the most popular modern machine-learning approaches to computer-vision tasks, with particular reference to specialized deep-learning architectures. The student has both a theoretical understanding and the necessary practical skills required to develop state-of-the-art image and video analysis systems for real-world applications.

Course contents

The course is held in the first semester, from September to December.

Topics:

  1. Advanced CNNs: : ResNeXt and grouped convolutions, MobileNets, EfficientNet and RegNet.
  2. Attention and vision Transformers. Transformers and atttention. Image classification architectures based on Transformers.
  3. Object detection. Introduction to ensemble learning via boosting. The Viola-Jones detector and its applications. Specialized NN architectures for object detection. Two-stages, one-stage, and anchor-free detectors. Feature Pyramid Networks. Imbalanced learning and the focal loss. DEtection TRansformer (DETR). Hands-on session on object detection.
  4. Semantic/instance/panoptic segmentation. Ensemble learning via bagging and random forests. The algorithm behind the Kinect body part segmentation. Transposed and dilated convolutions. Fully Convolutional Networks, U-net, DeepLab.  Instance segmentation and Mask R-CNN. 
  5. Depth estimation from monocular images: photometric loss and Monodepth.
  6. Metric and representation learning. Deep metric learning and its applications to face recognition/identification and beyond. Contrastive and triplet loss, ArcFace, NT-Xent loss, CLIP. Unsupervised representation learning. Hands-on session on metric learning.
  7. Image generation with GANs and diffusion models: metrics for generative tasks. Generative Adversarial Networks and Denoising diffusion probabilistic models. Stable diffusion and text-guided image generation. Hands-on session on textual inversion.

Prerequisites:

  1. Basic knowledge of computer vision and image processing: image formation, image digitization, camera modelling and calibration, basic image manipulation and processing, local image features.
  2. Basic knowledge of PyTorch: a good intro is available at https://pytorch.org/tutorials/beginner/basics/intro.html
  3. Basic knowledge of machine learning: supervised versus unsupervised learning, classification versus regression, underfitting and overfitting, regularization; data split in training, validation and test sets; hyper-parameters and cross-validation.

If you attended Computer Vision and Image Processing (either thought by Prof. Lisanti and me or by Prof. Di Stefano), you already fulfill these prerequisites. If you didn't, you can find on Virtuale slides and lab sessions. 

Readings/Bibliography

The main reference material will be the slides and notes provided on Virtuale by the instructor. A set of pointers to scientific papers and technical reports for each topic will also be provided during lectures.

Several freely-available on-line resources can be useful to complement the material provided by the instructor on Virtuale.

Teaching methods

Taught lessons.

Theory lessons are complemented by in-class hands-on sessions, where selected topics will be studied from a practical point of view by using the Python language and the PyTorch library.

Assessment methods

The assessment methods comprise of a theoretical part and a practical part.

The theoretical part is an oral exam. Students will present a recent scientific paper related to the course topics, previously agreed with the instructor. Then, students will answer questions on the paper and on the theory discussed in the course.

The practical part is an assignment on topics covered in the hands-on sessions and the theoretical lessons. The assignment must be submitted before sitting for the theoretical part.

Teaching tools

Powerpoint slides (whose PDF printouts are available from the course's web site before lectures) are projected and discussed during class hours.

Jupyter notebooks of all the hands-on sessions will be available on the course website. 

Office hours

See the website of Samuele Salti

SDGs

Quality education Industry, innovation and infrastructure

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.