tcga-embedding
tcga-embedding copied to clipboard
using shallow neural network layer (embedding) to infer gene-gene/sample relationship from gene expression data
Embedding (TCGA RNASeq)
Source code of applying embedding on TCGA RNASeqV2 RSEM normalized data.
Link
Web Interactive Embedding Projector (powered by TensorFlow)
Gene Embedding Matrix from:
Source Code
Handy python scripts to load data (load_data.py) and functions for handling embeddings (util.py) are included.
Dependencies
- numpy
- pandas
- matplotlib
- seaborn
- networkx
- scipy
- sklearn
- fastai
Usage
- Clone the repo locally.
- Change directory to the local directory.
- Run
python train.py --data $YOUR_INPUT_DATA --out-prefix $OUT --out-dir $OUTPUT_PATH.
Note.train.py can only be run on CUDA enabled machine.
Input data must be .csv with oberservation per row and must have an ID column.
Folder Structure
tcga-embedding
| LICENSE
| README.rst
| load_data.py
| train.py
| util.py
└───emb
| gemb_bias_CN.csv
| gemb_bias_normal.csv
| gemb_CN.csv
| gemb_normal.csv
| semb_bias_CN.csv
| semb_bias_normal.csv
| semb_CN.csv
| semb_normal.csv
└───geneSCF
| gemb_d17_top_GO_BP.tsv
| gemb_d22_top_GO_BP.tsv
| gemb_d25_top_GO_BP.tsv
| gemb_d35_bottom_GO_BP.tsv
| gemb_d43_bottom_GO_BP.tsv
| gemb_d46_bottom_GO_BP.tsv
└───ipynb
| tcga_emb_dist.ipynb
| tcga_emb_pca.ipynb
| tcga_emb_subtyping.ipynb
| tcga_ioresponse.ipynb
| tcga_plot_emb_som_pca_heatmap.ipynb
| tcga_plot_gsea_compare.ipynb
| tcga_som.ipynb
| tcga_training_CN.ipynb
| tcga_training_normal.ipynb
└───ref
| genes_gids.tsv
| sid_ca.csv