Identifying whether specific mutations in cancer genes have an impact on tumour formation is a largely unsolved problem in the world of cancer research. However, a new study by scientists based in Barcelona has showcased new machine learning models that are able to pinpoint which mutations are driving tumorigenesis. This has important implications for the way we treat cancer in the future.
Drivers and passengers
Approximately 90% of mutations in cancer genes within tumours are of unknown significance. Identifying those that accelerate disease progression is vital for both understanding the mechanisms of selection and identifying potential treatments. Mutations that confer a selective growth advantage to the tumour cell and drive tumorigenesis are aptly named ‘driver mutations’. However, uncovering them is no easy task!
The problem stems from the fact that many mutations are so called ‘passenger mutations’. They do not alter fitness, but occur in a cell that subsequently acquires a driver mutation. Therefore, in every cell with that driver mutation, you will also find the passenger.
Machine learning-based methodology
Currently, high-throughput experimental mutagenesis or bioinformatics methods are used to identify driver mutations. The former is extremely arduous, whilst the latter often uses a less precise one-size-fits-all approach. One-size-fits-all methods are not practical, as each cancer gene and tissue possesses different molecular mechanisms.
To overcome this issue, the researchers developed a machine learning-based methodology tool known as boostDM. BoostDM works with genomes from 28,000 tumours across 66 different types of cancer to assess cancer gene mutations in human tissues.
Mutations observed in 282 gene-tissue combinations as well as simulated neutral mutations in the same genes were used to train boostDM. Next, specific models for each of the 282 gene-tissue combinations were built. Collectively, these combined models comprise boostDM.
Interestingly, evolutionary biology inspired the construction of models of the mechanisms of tumorigenesis. Positive selection results in driver mutations occurring more often in cancer genes across tumours than would occur randomly. The models therefore learn from the observed mutations in human tumours to identify driver mutations in cancer genes.
“We started from the premise that we only get to observe some mutations because the tumour cells with this mutation guide the development of the tumour, and we questioned what distinguishes these mutations from other possible mutations, ” co-first author Dr. Ferran Muiños said. “Doing this analysis manually would be excessively laborious, but there are computational strategies that allow it to be organised systematically and efficiently.”
In practice, boostDM has performed remarkably well. All boostDM models tested outperformed experimental mutagenesis assays, as well as seven computation methods designed to identify driver mutations.
185 gene-tissue specific models generated by boostDM were able to identify specific gene mutations in a given type of cancer. This included different mutations in the same gene in two types of tumour. For example, the tool was able to identify that mutation clusters in the EGFR gene would drive either lung adenocarcinoma or glioblastoma, depending on the location.
Application to precision cancer medicine
The boostDM models built and validated in this study have the potential to classify mutations in the tumours of patients into drivers and passengers. This information could then influence the treatment an individual receives.
Additionally, boostDM can incorporate new genomes and mutation features as time goes on. This will allow the tool to identify more driver mutations in more cancer types. BoostDM is therefore at the forefront of technology, carrying out processes that haven’t been seen before.
“BoostDM goes further: it simulates each possible mutation within each gene for a specific type of cancer and indicates which ones are key in the cancer process,” said Dr. López-Bigas, head of the Biomedical Genomics lab. “This information helps us to understand how a tumour is caused at the molecular level and it can facilitate medical decisions regarding the most appropriate therapy for a patient.”