Mobile Menu

Computational program deciphers any genetic code

A recent paper has described a new computer program that can read the genome sequence of any organism and then decipher its genetic code.  

The genetic code

In simplistic terms, the genetic code is a set of instructions used by living cells to translate information within our genetic material into proteins. The genetic code was once proposed to be a “frozen accident” as almost every organism uses the same genetic code. However, since this concept was first proposed researchers have discovered alternative genetic codes in over 30 different lineages of bacteria, eukaryotes and mitochondria. These discoveries have indicated that the genetic code is in fact capable of evolving to some degree.

Most of these cases were found anecdotally. In addition, current methods for surveying the genetic code are either phylogenetically restricted or lack sufficiently robust and objective statistical footing to allow for accurate large-scale screening. With an incomplete set of alternative genetic codes, our ability to understand the evolutionary processes behind codon reassignment remains limited.


For the past five years, Yekaterina Shulgina (Harvard University) has been working on a project to decipher the genetic code and understand how it could evolve and change. To do this, Shulgina developed the statistical theory behind a new program – Codetta. Codetta is a computational method for predicting the genetic code which can scale to analyse thousands of genomes. The model, reported in the journal eLife, works by reading through the genome of an organism and then harnessing a database of known proteins to produce a likely genetic code. Unlike other similar methods, Codetta is able to analyse genomes at a much larger scale.

The researchers first performed a survey of genetic code usage in over 250,000 bacterial and archaeal genomes from the GenBank database. Here, they reidentified all known codes in the dataset. They also discovered the first examples of sense codon changes in bacteria. All five reassignments affected arginine codons (AGG, CGA and CGG). Overall, these findings provide insights into the evolutionary forces that could be driving alternative genetic codes to evolve.

Harvard biology and co-author, Sean Eddy, explained:

“Many protein sequences in the databases these days are only conceptual translations of genomic DNA sequences.

People mine these protein sequences for all sorts of useful stuff, like new enzymes or new gene editing tools and whatnot. You’d like for those protein sequences to be accurate, but if the organism is using a nonstandard code, they’ll be erroneously translated.”

The next step for these researchers is to use Codetta to search for alternative codes in viruses, eukaryotes and organellar genomes i.e., mitochondria and chloroplasts.

Image credit: canva

More on these topics

Evolution / Genetic Code / Transcriptome