Research: Aging Gene Communities

The experiment

I wanted to try seeing if there was a central mechanism for aging where all the aging genes are clustered together and talk with each other.

The data

The regulatory gene paths are downloaded from https://grand.networkmedicine.org/tissues/ and I used the skeletal muscle dataset.

The way I converted between ENSG ids and actual gene names I used https://ftp.ensembl.org/pub/release-74/gtf/homo_sapiens/.

The code

The code can be found here: https://github.com/luojxxx/Aging-community

Basically I took the gene regulatory network and ran the Louvain algorithm on it at increasing resolution to separate it into communities, so that I had a everything from all genes are in one community to each gene is in it’s own community. Then to decide which resolution (and splitting up of the genes into communities) is the optimal one: Each gene community is scored based on it’s average score formed from the aging gene relatedness metric from the previous research divided by the number of genes in that community. Then I take the average scores from each community and determine the standard deviation for that particular split. The rationale being a high standard deviation would have some communities with very high scores and some with very low ones, thus separating the aging genes from the rest. Furthermore the standard deviation is divided by the number of communities to penalize the score for having too many communities.

The results

The scoring of the different ways to partition the genes is below:

0
numCommunities: 1
maxScore: 0.0
networkAverageScore: 0.0
1
numCommunities: 22
maxScore: 0.0011187988958297572
networkAverageScore: 5.085449526498897
2
numCommunities: 644
maxScore: 0.006588115803071665
networkAverageScore: 1.0229993483030537
3
numCommunities: 644
maxScore: 0.006545291349804581
networkAverageScore: 1.0163495884789722
4
numCommunities: 644
maxScore: 0.0064786919069263265
networkAverageScore: 1.0060080600817276
5
numCommunities: 644
maxScore: 0.006541465126955066
networkAverageScore: 1.0157554544961283
6
numCommunities: 644
maxScore: 0.006558828080003319
networkAverageScore: 1.0184515652179067
7
numCommunities: 692
maxScore: 0.00665511105772943
networkAverageScore: 0.9617212511169696
8
numCommunities: 1647
maxScore: 0.005918484858611513
networkAverageScore: 0.3593494146090779
9
numCommunities: 4140
maxScore: 0.004497036515497951
networkAverageScore: 0.10862407042265582
10
numCommunities: 7148
maxScore: 0.012468224833691467
networkAverageScore: 0.17442955838964
11
numCommunities: 10181
maxScore: 0.01084028088605524
networkAverageScore: 0.10647560049165349
12
numCommunities: 13074
maxScore: 0.013171569656533133
networkAverageScore: 0.10074628772015552
13
numCommunities: 15690
maxScore: 0.014691636879535761
networkAverageScore: 0.09363694633228656
14
numCommunities: 18204
maxScore: 0.016100917148011363
networkAverageScore: 0.08844713880472073
15
numCommunities: 20225
maxScore: 0.016947380580533085
networkAverageScore: 0.08379421795071983
16
numCommunities: 22073
maxScore: 0.019505791115076946
networkAverageScore: 0.08836946094811284
17
numCommunities: 23604
maxScore: 0.021115376500348195
networkAverageScore: 0.08945677215873662
18
numCommunities: 24911
maxScore: 0.023134633859424533
networkAverageScore: 0.09286914961031084
19
numCommunities: 25992
maxScore: 0.027793282289265125
networkAverageScore: 0.10693014115599078

As you can see the highest scoring partition is the 2nd one with 22 communities. This is a very coarse grained resolution, but it could still be useful because if all the aging related genes are in one cluster it still tells us something. So the next step is to see the average scores of each community, which is below (the number on the left is the community ID number and the right is the score):

{5: 0.009691511175230554,
0.009669555635085031,
0.009565149568688562,
0.009467785733168166,
0.008763163566313481,
0.00867775620697367,
0.008377641938011443,
0.0083583581322157,
0.008242568753833383,
0.008120827767055094,
0.008060493754596169,
0.007706890789515751,
0.007514900807324891,
0.007288466378207555,
0.0072868074533334005,
0.007141613886303938,
0.006954825398436275,
0.006885445663241918,
0.006622003561744187,
0.006314587889851097,
0.006180129640669689,
0.006033783772057245}

As we can see there is some concentration of aging genes in particular communities compared to others but it isn’t a wild multiple times concentration. And at this point, I put away this research considering it a null result. However, later on I realized that we still learned something from this. That within this GRN and metric, aging-relatedness is consistent with a diffuse, multi-module architecture.

Either way I decided to still share this research as an interesting attempt while still learning something.