Research: Functions of Gene Duplication

The MOtivation

Basically it all started with this article that stated elephants contained 20 copies of the TP53 tumor suppressor gene in order to give them an edge at fighting cancer. It’s one of the common genes found mutated in human cancer cells. This gene duplication plays some key roles, first it tips the scales more in favor of the cell killing itself when something goes wrong in its DNA, and second it gives it back up systems in case one of the TP53 genes are knocked out. And then I asked, whether it was an evolutionarily converging feature, to have back up copies of tumor suppressor genes. Can we find super important genes to fighting cancer based on gene duplication? So I began researching to find out.

The process

Basically what I wanted to do was take the human genome as a base compare it to a low cancer and high cancer groups of animals, and see if there are genes that are consistently duplicated more in the low cancer group than the high cancer group.

So most of the struggle here is figuring out where and what data to download, and then how to process it.

So more specifically I took the CDS (coding domain sequence) representing individual genes from humans and compared it with the genomes of the low cancer group (elephant/blue whale/naked mole rat) and the high cancer group (mouse/human).

This starts with going to https://www.ncbi.nlm.nih.gov/labs/gquery/ , searching your desired creature, and then clicking on the scientific name in the Taxonomy box at the top. Then on that page you can hit download, which will open a popup, allowing you to select the genomic sequence or CDS sequence.

Then you download the NCBI blast software, so you can run blast searches locally (blast searches basically search for similar sequences to your target sequence in the target genome). You’ll also need to download the genomes you want to study and run the terminal commands to install them into the NCBI blast software. And since this is meant to be run from the terminal, the easiest way to run it in the code is to run it as a subprocess. This will find the number of hits (gene duplicates) of a particular gene in the respective genome.

Then you run this code across all genes in the human genome for days.

Finally you have the gene duplicates for each human gene in each organism. Ultimately my preferred way of comparing the low and high cancer groups was by taking the min of gene duplicates for the low cancer group and the max of the gene duplicates of the high cancer group, and then taking the difference between the min and the max. (Remember the low cancer group should have a higher number of gene duplicates and the high cancer group should have a lower number of gene duplicates). So I’m looking for genes that have the greatest separation between groups.

End result

Of those results there were 128 genes (including their isoforms) that had a difference greater than or equal to 2. Since some results had the form where human had 18 copies, mouse 3 copies, elephant 49 copies, blue whale 22 copies, and naked mole rat 24 copies, I narrowed down the results further to only those results where the human gene had one copy.

Now there’s only a handful of genes left with the list being:

  • TAF13 - transcription initiation

  • ASNSD1 - asparagine synthase activity

  • NDUFAB1 - mitochondrial acyl activity

  • ZNF560 - DNA binding transcription repression

That last one was the interesting hit, which had:

  • 1 copy mouse

  • 1 copy human

  • 12 copies elephant

  • 5 copies blue whale

  • 9 copies naked mole rat

Predicted to enable DNA-binding transcription repressor activity, RNA polymerase II-specific and RNA polymerase II transcription regulatory region sequence-specific DNA binding activity. Predicted to be involved in negative regulation of transcription by RNA polymerase II. Predicted to be located in nucleus. [provided by Alliance of Genome Resources, Apr 2022]

GeneCards

And there are a couple of sources that suggest ZNF560 plays a role in cancer:

One interesting study would be to take naked mole rat cells (since they’re the easiest to get a hold of), knock out the extra copies, and see how much more susceptible to cancer they become. This may be a powerful gene in fighting cancer since all three of these highly cancer resistant animals independently evolved duplicates of it.

Another further study to see if there’s evolutionary convergence of gene duplication of tumor suppressor genes could be to take a gene like TP53 or ZNF560 and compare it across all the animals, and see if it shows duplication in other animals. The paper already covered relatives of the elephant which does show numerous duplications of TP53, but for it to be an evolutionarily converging feature it’d need to be from a separate ancestor.

There is a weakness to the max separation analysis though. The more animals you add, the less likely you’ll find a candidate gene, since even one outlier will throw off the analysis. I experimented with using the average gene duplications of each group as well, which leads to results like TP53. It showed one copy mouse, one copy humans, and one copy blue whales, one copy naked mole rat, and 18 copies elephant. It’s not a gene that is consistently is duplicated across the low cancer group. But to run larger studies and cast a wider net, this maybe the way to go.

Overall, it seems like converging evolution of gene duplication of specific cancer fighting genes to form genetic backups can happen but it does not appear to be common. Nature seems to tend to work through multiple independent systems and each animal appears to have evolved different systems to tackle it.