Differential analysis based on selection of random samples
1
0
Entering edit mode
Beginner ▴ 60
@beginner-15939
Last seen 20 months ago
Switzerland

I have 35 tumor and 4 normal samples. I'm using DESeq2 for differential analysis. Differential analysis between tumor and normal gave only two upregulated genes which could be due to statistical power. So, I'm interested in selection of random samples from tumor condition and do differential analysis with that and repeat the process `n` times. 

I have a matrix with genes as rows and samples as columns. Columns 1-35 are tumor and 36-39 are normal samples.

    nb.replicates <- 10
    samples.Normal <- sample(36:39, replace=FALSE)
    set.seed(123)
    
    ## Random sampling of the Tumor
    samples.Tumor <- sample(c(1:35), size=nb.replicates, replace=FALSE)
    samples.Tumor
    
    selected.samples <- c(samples.Normal, samples.Tumor)

So, with the above code I repeated differential analysis `n` number of times. I have different number of differential expressed genes with each analysis. 

Now, from all the analysis should I merge and consider only common genes as differentially expressed genes for the whole cohort?

In each analysis it is 4 normal vs and 10 random tumor samples.

                baseMean        log2FoldChange    lfcSE       stat    pvalue    padj
    AL357060.1    8.50582           6.1871       1.67335    3.54023    0.0003    0.03245

In another analysis it is 4 normal vs another 10 random tumor I see the same gene differentially expressed but with different values as results

                  baseMean     log2FoldChange       lfcSE        stat    pvalue    padj
    AL357060.1    10.58937424    6.552371044      1.6296950    3.85921    0.00011    0.02642

There are many genes with different results in different analysis so, which one should I consider? 

r differential gene expression deseq2 bioconductor rnaseq • 1.3k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

You will not increase power by reducing the sample size in one group.

You may get different sets of genes, and sometimes more than you get in the full analysis, but this is induced by your random process of sub-sampling. It's not an increase in power per se but just random fluctuations. So to answer "Now, from all the analysis should I merge and consider only common genes as differentially expressed genes for the whole cohort?": no, I would not recommend to do this random subsampling procedure.

 

I would recommend to instead increase the FDR cutoff and only look at the full dataset.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. Basically with full analysis (35 tumor vs 4 normal samples) I got only 4 Upregulated genes using results function [results(dds, lfcThreshold = log2(1.2), alpha = 0.05)]. I felt random-subsampling can be applied to get more Upregulated genes from different analysis and then merge them. You said this random subsampling procedure is not a good idea. One more reason to apply subsampling is because of tumor samples grouped into different clusters. MDS plot https://imgur.com/a/YbB3wPV

Questions:

1) May I know when this subsampling can be applied?

2) You said me to increase FDR cutoff with full dataset analysis. So, what should be the FDR cutoff now? 0.01 0r 0.5 or 0.1?

ADD REPLY
1
Entering edit mode

I do not recommend subsampling

You should pick an FDR cutoff that makes sense. That is up to you as the analyst.

ADD REPLY
0
Entering edit mode

Thanks. And could you please tell when subsampling can be applied for differential analysis?

ADD REPLY
0
Entering edit mode

I have limited time to reply to users’ questions on the support site and I have to divide it among all the threads that are active. I believe I’ve already answered your question so I won’t be replying further.

ADD REPLY

Login before adding your answer.

Traffic: 464 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6