Dissertation topics suggested by the teacher.
Possible Topics
- Integration of Big Data standards in bioinformatics-related processes to address data quality and data lineage
With the rapid generation of large and complex biomedical datasets, ensuring data quality and traceability is fundamental to guaranteeing the reliability and reproducibility of machine learning (ML) and artificial intelligence (AI) applications, as well as effective data governance. Although the use of containers (e.g., Docker, Apptainer) and workflow managers (e.g., Nextflow, Snakemake) is essential for managing and executing data analysis pipelines, complete information about data quality and data lineage is equally important. However, consistent and standardized treatment of these aspects is often missing. This thesis will focus on integrating state-of-the-art Big Data management practices to improve biomedical data governance. In this way, we aim to enhance the reliability and impact of biomedical research, fostering better collaboration and innovation within the scientific community. The thesis work will be validated on both synthetic and real-world data.
Keywords: bioinformatics pipelines, data governance, data quality, data lineage, big data.
Recent dissertations supervised by the teacher.
Second cycle degree programmes dissertations
- Deploying a custom Blockchain-as-a-Service environment to develop a decentralized biomedical data sharing App
- Facilitating the distribution of software or reference data using CernVM-FS over Cloud Infrastructure
- Validating ESMFold and AlphaFold2 modelling procedures for protein structure prediction
- Workflow Distribution and Parallelization Using Dask