Research: Aging Gene Communities

The experiment

I wanted to try seeing if there was a central mechanism for aging where all the aging genes are clustered together and talk with each other.

The data

The regulatory gene paths are downloaded from https://grand.networkmedicine.org/tissues/ and I used the skeletal muscle dataset.

The way I converted between ENSG ids and actual gene names I used https://ftp.ensembl.org/pub/release-74/gtf/homo_sapiens/.

The code

The code can be found here: https://github.com/luojxxx/Aging-community

Basically I took the gene regulatory network and ran the Louvain algorithm on it at increasing resolution to separate it into communities, so that I had a everything from all genes are in one community to each gene is in it’s own community. Then to decide which resolution (and splitting up of the genes into communities) is the optimal one: Each gene community is scored based on it’s average score formed from the aging gene relatedness metric from the previous research divided by the number of genes in that community. Then I take the average scores from each community and determine the standard deviation for that particular split. The rationale being a high standard deviation would have some communities with very high scores and some with very low ones, thus separating the aging genes from the rest. Furthermore the standard deviation is divided by the number of communities to penalize the score for having too many communities.

The results

The scoring of the different ways to partition the genes is below:

0 numCommunities: 1 maxScore: 0.0 networkAverageScore: 0.0 1 numCommunities: 22 maxScore: 0.0011187988958297572 networkAverageScore: 5.085449526498897 2 numCommunities: 644 maxScore: 0.006588115803071665 networkAverageScore: 1.0229993483030537 3 numCommunities: 644 maxScore: 0.006545291349804581 networkAverageScore: 1.0163495884789722 4 numCommunities: 644 maxScore: 0.0064786919069263265 networkAverageScore: 1.0060080600817276 5 numCommunities: 644 maxScore: 0.006541465126955066 networkAverageScore: 1.0157554544961283 6 numCommunities: 644 maxScore: 0.006558828080003319 networkAverageScore: 1.0184515652179067 7 numCommunities: 692 maxScore: 0.00665511105772943 networkAverageScore: 0.9617212511169696 8 numCommunities: 1647 maxScore: 0.005918484858611513 networkAverageScore: 0.3593494146090779 9 numCommunities: 4140 maxScore: 0.004497036515497951 networkAverageScore: 0.10862407042265582 10 numCommunities: 7148 maxScore: 0.012468224833691467 networkAverageScore: 0.17442955838964 11 numCommunities: 10181 maxScore: 0.01084028088605524 networkAverageScore: 0.10647560049165349 12 numCommunities: 13074 maxScore: 0.013171569656533133 networkAverageScore: 0.10074628772015552 13 numCommunities: 15690 maxScore: 0.014691636879535761 networkAverageScore: 0.09363694633228656 14 numCommunities: 18204 maxScore: 0.016100917148011363 networkAverageScore: 0.08844713880472073 15 numCommunities: 20225 maxScore: 0.016947380580533085 networkAverageScore: 0.08379421795071983 16 numCommunities: 22073 maxScore: 0.019505791115076946 networkAverageScore: 0.08836946094811284 17 numCommunities: 23604 maxScore: 0.021115376500348195 networkAverageScore: 0.08945677215873662 18 numCommunities: 24911 maxScore: 0.023134633859424533 networkAverageScore: 0.09286914961031084 19 numCommunities: 25992 maxScore: 0.027793282289265125 networkAverageScore: 0.10693014115599078

As you can see the highest scoring partition is the 2nd one with 22 communities. This is a very coarse grained resolution, but it could still be useful because if all the aging related genes are in one cluster it still tells us something. So the next step is to see the average scores of each community, which is below (the number on the left is the community ID number and the right is the score):

{5: 0.009691511175230554, 9: 0.009669555635085031, 17: 0.009565149568688562, 4: 0.009467785733168166, 10: 0.008763163566313481, 19: 0.00867775620697367, 18: 0.008377641938011443, 7: 0.0083583581322157, 3: 0.008242568753833383, 21: 0.008120827767055094, 2: 0.008060493754596169, 13: 0.007706890789515751, 16: 0.007514900807324891, 11: 0.007288466378207555, 6: 0.0072868074533334005, 15: 0.007141613886303938, 0: 0.006954825398436275, 14: 0.006885445663241918, 20: 0.006622003561744187, 1: 0.006314587889851097, 8: 0.006180129640669689, 12: 0.006033783772057245}

As we can see there is some concentration of aging genes in particular communities compared to others but it isn’t a wild multiple times concentration. And at this point, I put away this research considering it a null result. However, later on I realized that we still learned something from this. That within this GRN and metric, aging-relatedness is consistent with a diffuse, multi-module architecture.

Either way I decided to still share this research as an interesting attempt while still learning something.