resources icon indicating copy to clipboard operation
resources copied to clipboard

Machine learning run to determine epigenetic marks driving small gene sets

Open sr320 opened this issue 10 months ago • 10 comments

Using those gene sets identified in our last chat for time series (Apul)

https://sr320.github.io/tumbling-oysters/posts/41-Apul-GO/

sr320 avatar Mar 31 '25 20:03 sr320

Will do - will aim to complete this week.

AHuffmyer avatar Mar 31 '25 20:03 AHuffmyer

I can also get started with this!

shedurkin avatar Mar 31 '25 20:03 shedurkin

It's not a competition... however we do have a new sticker board! 💯 So it is kind of a competition.

sr320 avatar Mar 31 '25 20:03 sr320

@shedurkin if you could start with making a gene count matrix for the genes that have been selected that would be great.

AHuffmyer avatar Apr 01 '25 19:04 AHuffmyer

Will do! In the mean time I've already done a trial run of your ML pipeline using miRNA as predictors and all genes as the response -- the results are pretty interesting! Model performance is very high for many of the gene PCs (essentially a group of coregulated genes), with some R^2 values close to 1.

Image

Looking at the miRNA that most contribute to predicting some of these PCs, the results differ. Sometimes several miRNA have high importance (e.g. PC11), suggesting a more complicated interplay is influencing gene expresion, while in other cases only one or two miRNA stand out (e.g., PC10, PC7).

Image

Image

Image

shedurkin avatar Apr 01 '25 19:04 shedurkin

Ok, I've isolated a bunch of gene sets that may be of interest. For each physiological/seasonal trait (e.g. host biomass, respiration, temperature, timepoint), I took all of the modules that are significantly assoiated with a that trait and

a) saved the functional annotations for all genes contained within those modules, and b) saved a raw counts matrix for only the genes contained within those modules

I also did the same for all genes that were annotated with at least one of the GO terms Steven provided above.

Code (the code for saving gene sets is at the very bottom) Output folder

shedurkin avatar Apr 02 '25 00:04 shedurkin

Kathleen and I met last week and here are the next steps that Kathleen is working on:

  • Prediction of expression in genes of interest (those that correlate with biomass) using miRNAs
  • Prediction of expression of genes that have GO terms of interest (Steven's GO searches) using miRNAs

These will allow you to test for potential regulation of gene expression of genes that relate to physiological outcomes using miRNAs.

AHuffmyer avatar Apr 07 '25 20:04 AHuffmyer

Finished for the following gene sets:

  • host biomass ("Host_AFDW")
  • symbiont photosynthesis ("Am")
  • List of GO terms provided above by @sr320 ("ATP_production_GO")

Notebook post summarrizing results Rendered code

shedurkin avatar Apr 08 '25 01:04 shedurkin

@shedurkin can you now add lncRNA and DNA methylation data to miRNA to predict expression of those gene sets?

sr320 avatar Apr 11 '25 15:04 sr320

@shedurkin I think we can mark this as closed now, yes?

AHuffmyer avatar May 21 '25 17:05 AHuffmyer