Question

Finding genes that are expressed only in one condition within a contrast

0

Entering edit mode

charles.foster ▴ 180

@charlesfoster-17652

Last seen 4 months ago

Australia

Hi,

I have carried out differential expression analyses comparing conditions using DESeq2. Intuitively, I have considered genes to be expressed if they have a count of at least 10 in at least some libraries (sensu Chen et al: https://f1000research.com/articles/5-1438). Hence, I carried out a filtering step before DE analysis using the filterByExpr function of edgeR. In my results, in addition to the pvalues and LFC etc. I have columns with baseMeans for conditions:

Gene    sampleA sampleB baseMeanA_cond1_vs_cond2    baseMeanB_cond1_vs_cond2

Gene1   cond1   cond2   0   70.0618858219621

Gene2   cond1   cond2   0   13.8155035471724

(apologies if the tab-delimited table shows up poorly)

To get these, I did (e.g.):

baseMeanA_cond1_vs_cond2 <- rowMeans(counts(dds, normalized=TRUE)[,colData(dds)$Tissue == "cond1"]) baseMeanB_cond1_vs_cond2 <- rowMeans(counts(dds, normalized=TRUE)[,colData(dds)$Tissue == "cond2"])

Now, I am looking to further refine my results to find any genes that are expressed in one condition, and not expressed at all in another. In this case, I do not want to know that Gene1 is upregulated in Condition2 relative to Condition2, but is still expressed in Condition1. I would just like to know that Gene1 is expressed in Condition2, and is not expressed in Condition1.

What would be the best way to do this?

From reading this site and the DESeq2 vignette, I know that the baseMean is "the mean of normalized counts of all samples, normalizing for sequencing depth." However, I'm a bit confused about 1) how my criterion on counts having to be >=10 to be expressed has been factored into the final baseMean results, and 2) how to subset my DE results to get expressed vs not expressed.

Is it as simple as getting all genes where the baseMean for condition1 = 0, and the baseMean for condition2 > 0? Or would it be genes where the baseMean for condition1 < 10, and the baseMean for condition2 >= 10?

Also, if it's easier to do this separately to the DESeq2 results, I'm happy to do so, e.g. by subsetting a matrix of count values or TPM values or TMM values.

Thanks!

Charles

filter deseq2 • 1.1k views

ADD COMMENT • link updated 6.1 years ago by Michael Love 43k • written 6.1 years ago by charles.foster ▴ 180

score 1 · Accepted Answer · 2019-03-14

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 hours ago

United States

We don’t have a way in DESeq2 to determine what counts or TPM correspond to “expressed” and what to “not expressed”. When students or collaborators want to do this I typically recommend looking at histsograms of abundance (TPM) over all genes.

ADD COMMENT • link 6.1 years ago Michael Love 43k