Project in Prairie
We will combine algorithms, machine learning and statistical techniques to mine through large amounts of DNA sequencing data. The plan is to develop new computational methods to perform an initial analysis of raw sequencing data, and then apply supervised machine learning methods to detect clinically relevant variants.
This project fosters connections with three disciplines: sequence bioinformatics, AI, and a high-profile clinical application. It is thus part of a biological and interdisciplinary side of PRAIRIE. We will also tackle analysis of ‘very big data’, as each human genome yields around 100 gigabases of raw data, and studied cohorts typically gather thousands of samples or more.

