A team of researchers from Rutgers University have used artificial intelligence and machine learning (AI/ML) to identify risk genes for cardiovascular diseases. The work, published this week in the journal Genomics, combined traditional bioinformatics approaches with the more novel AI/ML techniques in a bid to improve diagnostics and disease outcomes for those at risk.
A global problem
Cardiovascular diseases, including heart failure and atrial fibrillation, are the most common cause of death and disability worldwide. Despite the prevalence of these illnesses, the World Health Organisation predicts that around three quarters of associated deaths would be preventable if relevant risk factors could be identified. A significant amount of research has been undertaken in this field, and a number of disease-associated genetic loci have been identified through genome-wide association studies and whole genome sequencing. However, the clinical application of these results has been hindered due to the sheer amount of data – a problem that can be solved through the use of AI/ML techniques.
In this study, pioneered by lead author Zeeshan Ahmed, a Findable, Accessible, Intelligent and Reproducible (FAIR) method was developed using the Random Forest machine learning algorithm to identify genetic and demographic risk factors for cardiovascular diseases. The researchers aimed to identify traits that could be used to improve diagnostics, prevent serious illness and potentially contribute to personalised therapies.
The random forest
Prior to the application of the FAIR AI/ML approach, the team used more traditional RNA-Seq and bioinformatics methods to identify genes that were differentially expressed in individuals with cardiovascular diseases – specifically heart failure and atrial fibrillation. This transcriptomic analysis revealed a number of genes that were up- or down-regulated in patients suffering from various cardiovascular conditions compared to unaffected controls.
To validate the above results, the researchers carried out analyses using the “Random Forest” approach with genomic, clinical and demographic data. This method is popular in the analysis of relatively small datasets. At its most basic level, the Random Forest consists of numerous “decision trees” – where each tree attempts to classify data into subsets through a series of questions, represented by the nodes of the tree.
The AI/ML approach revealed that age and gender were strongly associated with the risk of heart failure and other cardiovascular diseases, whereas age and race were more closely linked to atrial fibrillation. As for genomics, seven genes were identified that were closely linked to heart failure, six of which were previously implicated in the gene expression analysis and other literature, validating the results of the AI/ML method. For atrial fibrillation, another six genes were identified. One of the genes in question has been previously associated with sensitivity to Warfarin – a result that highlights the potential for this data to be used in the development of personalised treatment plans.
Prevention is key
With many patients dying within five years of their diagnosis, an understanding of the intrinsic risk factors for cardiovascular diseases is the key to reducing the global health burden associated with these illnesses. It is hoped that the identification of relevant risk factors could not only provide a new avenue for diagnostics, potentially allowing people access to treatments earlier, but also inform personalised therapeutic plans based on genetics, age, race and gender. Ahmed stated: “Timely understanding and precise treatment of cardiovascular disease will ultimately benefit millions of individuals by reducing the high risk for mortality and improving the quality of life.” Despite these crucial results, the team know that the work does not end here. They intend to test their AI/ML method with a larger dataset, and look at a wider range of genes to uncover more biomarkers associated with the cardiovascular diseases.