Research: Aging Gene Communities
The experiment
I wanted to try seeing if there was a central mechanism for aging where all the aging genes are clustered together and talk with each other due to the closely connected network discovered in the previous aging research.
The data
The network is downloaded from https://downloads.thebiogrid.org/File/BioGRID/Release-Archive/BIOGRID-5.0.255/BIOGRID-ORGANISM-5.0.255.tab3.zip it uses the mus musculus file for lab mice. Note there are genes in biogrid that aren’t found in gene_info like 6622 is only found for humans.
The way I converted between Entrez ids and actual gene names I used https://www.ncbi.nlm.nih.gov/books/NBK3840/?utm_source=chatgpt.com#:~:text=You%20can%20therefore%20convert%20any%20GeneID%20into%20its%20current%20names%20by%20using%20the%20definitions%20provided%20in%20the%20file%20available%20as%20https%3A//ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz Note there are symbols collisions, the synonyms for a gene can be the same for multiple genes.
The code
The code can be found here: https://github.com/luojxxx/Aging-community
Basically I took the gene regulatory network and ran the Louvain algorithm on it at increasing resolution to separate it into communities, so that I had a everything from all genes are in one community to each gene is in it’s own community. Then to decide which resolution (and splitting up of the genes into communities) is the optimal one: Each gene community is scored based on it’s average score formed from the aging gene relatedness metric from the previous research divided by the number of genes in that community. Then I take the average scores from each community and determine the standard deviation for that particular split. The rationale being a high standard deviation would have some communities with very high scores and some with very low ones, thus separating the aging genes from the rest. Furthermore the standard deviation is divided by the number of communities to penalize the score for having too many communities.
The results
The scoring of the different ways to partition the genes is below:
As you can see the highest scoring partition is the 2nd one with 125 communities. This is a very coarse grained resolution, but it could still be useful because if all the aging related genes are in one cluster it still tells us something. Also if there aren’t that many genes in the top community, it still works. So the next step is to see the average scores of each community, which is below (the number on the left is the community ID number and the right is the score):
At first glance, there are communities that stand out amongst the pack, which suggests this top community may have a stronger connection to aging genes and is a central mechanism to aging. I’ve posted the entrez ids of the genes within community 32 below.
Now here’s the score data for the genes in community 32 that have score data. We can see it’s pretty sparse in comparison to the 229 total genes in the community. However, this is only 1758 genes spread out over 19299 total genes in biogrid, it’s likely to be sparse. Given it’s the top result, the other genes may deserve some examination, but it is pretty weak overall. However, it is also possible the Louvain split method isn’t splitting the network optimally, since in the previous research showed a much more closely connected network for the aging genes than the randomized controls. But at this point, this is the best that we can do. The fact may still be that aging is a diffuse mechanism.