A recent study has revealed several microprotein-encoding genes that have arisen de novo from non-coding sequences within the human genome. The work, published in Cell Reports, describes how these microproteins came to be, and discusses their recent emergence in the context of primate evolution.
Not a small task
Small open reading frames (sORFs) are noncanonical reading frames that are often less than 300 nucleotides in length. Despite their modest size, these sORFs can contain what are known as microproteins – molecules that perform critical biological functions via, for example, interaction with other proteins. These molecules and their associated reading frames are poorly understood; their size makes them difficult to study and their roles are often overlooked.
De novo gene “birth” is one potential origin for these sORFs. This type of gene inception results in an entirely novel (typically short) sequence, independent of any duplication events. A lack of conservation between species means that these sequences are often mistakenly labelled as non-coding and subsequently excluded from relevant studies.
In this study, the team of scientists from Ireland and Greece chose to study de novo gene birth and the origins of microproteins to further our understanding of human evolution.
Figure 1: Graphical abstract describing the study. The figure describes the comparisons made between humans and other primates in order to understand evolution. Taken from Vakirlis et al, 2022.
The team analysed data from a large, recently published dataset of human sORFs to determine the evolutionary timing and mode of their origins. This was achieved by analysing the genomes of humans and 99 other vertebrate species, including other primates, to create a phylogenetic tree for each sORF. Following this, an ancestral reconstruction was performed to determine the earliest common ancestor in which each sORF was found. If an ancestor lacking the sORF was immediately followed by descendants carrying an “intact” version, then the sORF was deemed to have arisen via de novo gene birth.
155 sORFs were identified as having originated in this manner. Of these, 12 biologically significant sORFs were found to have developed after the evolution of primates. Two of these evolutionary events occurred only after the split of humans and chimpanzees from their common ancestor 7 million years ago. Many of the microproteins encoded by these sORFs were deemed to have significant functional effects, including in cell growth and disease. Whilst these functions have developed over millions of years, this is still a relatively short time frame in evolutionary terms – proving that biological function can arise quickly.
An evolving field
The continuing genetic changes demonstrated in this study prove that humans, and indeed other species, continue to evolve at a significant rate – even if we don’t notice this in our day-to-day lives. Although the work provides great insight into the topic, many questions remain. Namely, why do some microproteins develop functional significance so quickly compared to others, and what exactly are the mechanisms underlying these functions. These questions may be difficult to answer, as the lack of conservation between other species makes it harder to study these genes. The team are keen to stress that there are many more secrets to reveal; as author Aoife McLysaght stated, there is “a lot more functionally relevant stuff hidden in the human genome.”