How to best visualize multi-level transcriptomics experiment data?
2
0
Entering edit mode
@mohammedtoufiq91-17679
Last seen 18 days ago
United States

Hi,

I am working with the transcriptomics expression data (a 96 targeted gene panel) and many samples (consisting of 4 different healthy/diseases conditions including Healthy, Disease A, Disease B, and Disease C. These different disease conditions are again sub-categorized into different stages (Moderate, Mild, Severe, Remission). Further, these were stimulated with 10 different stimulations/treatments (Stim 1, Stim 2, Stim 3, Stim 4, ..............., Stim 10). Basically, this is a multi-level experiment with multiple samples from each subject, and analysis was perfomed using multi-level experiments in the limma R package User's Guide, I performed statistical analysis identified differentially expressed genes between groups/comparisons as stated below.

My question is, what is the best way to represent this data in the form of visualization as I have many comparisons? One way is to extract the differentially expressed genes with p.val = 0.05 and logFC = +/-1.0 per each comparison and plot the heatmap, etc., But this will be tedious and time consuming task. The other way, I was thinking it would be nice if there are common or unique genes/biomarker across all these conditions and then plot heatmap. With these, certain biological contextualization can be performed like pathways/ontologies

Groups compared in limma:

Cont.matrix <- makeContrasts(
DaRS2.1 = Disease_A_Remission_Stim_2 - Disease_A_Remission_Stim_1
DaRS3.1 = Disease_A_Remission_Stim_3 - Disease_A_Remission_Stim_1
DaRS4.1 = Disease_A_Remission_Stim_4 - Disease_A_Remission_Stim_1
etc.,

DbRS2.1 = Disease_B_Remission_Stim_2 - Disease_B_Remission_Stim_1
DbRS3.1 = Disease_B_Remission_Stim_3 - Disease_B_Remission_Stim_1
DbRS4.1 = Disease_B_Remission_Stim_4 - Disease_B_Remission_Stim_1
etc., 

DbMS2.1 = Disease_B_Mild_Stim_2 - Disease_B_Mild_Stim_1
DbMS3.1 = Disease_B_Mild_Stim_3 - Disease_B_Mild_Stim_1
DbMS4.1 = Disease_B_Mild_Stim_4 - Disease_B_Mild_Stim_1
etc., 

DbaMS2.1 = Disease_B_Mild_Stim_2 - Disease_A_Mild_Stim_1
DbaMS3.1 = Disease_B_Mild_Stim_3 - Disease_A_Mild_Stim_1
DbaMS4.1 = Disease_B_Mild_Stim_4 - Disease_A_Mild_Stim_1
etc., 

levels=design)

Thank you, Toufiq

ggplot2 Transcriptomics Visualization R limma • 1.8k views
ADD COMMENT
2
Entering edit mode
Ali Barry ▴ 40
@2f691b31
Last seen 3 days ago
United Kingdom

Hi Toufiq, a big part of this will be determining which factors actually contribute variance and are needed in your design. How does your data cluster? Can you group any of the treatments together, or pull out a signature from the principal components? Have you already plotted heatmaps across all genes, or those with the highest variance? You can also look at plotting summarized expression data by level: facet_wrap or facet_grid from ggplot2 may help with visualizations if you find you have too many factors in a single plot.

From here you can visualize contrasts in multiple ways. Plotting heatmaps of differentially regulated genes would be an appropriate option, as are volcano plots and venn diagrams for overlapping genes. Pathways analyses may provide biological context depending on your dataset size, but it's worth thinking about your design before running these analyses.

ADD COMMENT
0
Entering edit mode

Ali Barry Thank you for the information and inputs. I will work on the design further. Yes, facet_wrap is a good suggestion. I will try this. One question I have at the moment is how do I calculate the variance or highest variance. I have a data.frame with fold changes and I am interested in extracting highly variance genes and plotting the data. Is there any specific package to perform this or base function in R.

ADD REPLY
1
Entering edit mode

Would rowVars work for your data structure?

ADD REPLY
0
Entering edit mode

Ali Barry

Sure, let me try rowVars.

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 14 hours ago
WEHI, Melbourne, Australia

One way is to extract the differentially expressed genes with p.val = 0.05 and logFC = +/-1.0

I strongly recommend using FDR < 0.05 and not using a fold-change cutoff.

But this will be tedious and time consuming task

You can easily choose genes that are significant for any of your comparisons using an F-test:

fit2 <- contrasts.fit(fit, Cont.matrix)
fit2 <- eBayes(fit2)
tab <- topTable(fit2, n=Inf, sort="none")
isDE <- (tab$adj.P.Val < 0.05)
coolmap(y[isDE,])

If there are too many DE genes to plot, just decrease the 0.05 FDR cutoff.

Alternatively you might want to choose all the genes that have significant t-tests for any comparison:

fit2.treat <- treat(fit2, fc=1.2)
DE <- decideTests(fit2.treat)
anyDE <- (rowSums(DE) > 0)
coolmap(y[anyDE,])

It's all quite quick. Again, if you have too many genes to plot, then increase fc.

Here, y is the matrix of log-expression values.

ADD COMMENT
0
Entering edit mode

Gordon Smyth, thank you very much. I will try your suggestions.

ADD REPLY

Login before adding your answer.

Traffic: 360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6