Question

contamination and DESeq2 performance

0

Entering edit mode

pablo_garcia05 ▴ 20

@pablo_garcia05-12443

Last seen 6.5 years ago

Hello community and in special to M Love,

thank you for the support what you give trough this platform, I'm sure you have improved a lot the performance of RNAseq and corrected several fails.

I have read about the "problematic" around low count genes and DESeq2, I found several post here in the support forum and I'm quite sure about have understood you. If I'm right you mentioned several times the very good performance of independent filtering step and the basis of "genefilter" behind it.

However, just to clarify one specific problem, if I have some contamination in my samples coming for example from the digestive content (diets are different between studied groups) some people warm me about they shouldn't appear (even when they are actually here) because the proportion of reads coming from my sample and from the contaminants make them to have a low coverage so I should filter out these genes with low counts assigned (sorry that's a awful sentence construction).

In contrast, I found you highlighting the risk to take out low read count genes and how good is the pipeline dealing with it.

Of course I'm quite confident about follow your advices but just to avoid argue with my supervisor, the argument which support that is the independent filtering or there are something more that I'm missing?

Thank you for your time and sorry about this (maybe) awkward consult

Pablo

deseq2 low count genes independent filtering • 1.6k views

ADD COMMENT • link updated 7.0 years ago by hs.lansdell ▴ 20 • written 7.0 years ago by pablo_garcia05 ▴ 20

0

Entering edit mode

Thanks for that! The plot looks very strange..https://ibb.co/gARrh6

ADD REPLY • link 7.0 years ago hs.lansdell ▴ 20

0

Entering edit mode

Well you can see why it picks a high filter. You can disable this if it’s not desired.

ADD REPLY • link 7.0 years ago Michael Love 43k

score 0 · Answer 1 · 2017-12-12

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

Let me rephrase and get more information. You are concerned, because there are differences in diet across groups, and that this may affect gene expression levels, and there is an assumption that the affected genes will have low counts. Can you show me the experiment design? Which groups are you interested in comparing, and which groups had different diet?

ADD COMMENT • link 7.0 years ago Michael Love 43k

0

Entering edit mode

I was terrible with my explanation.

Yes, part of the design is focused around compare diets. However my problem are those genes assembled from mRNA sequences which come from the digestive tract of my samples. As consequence of the proportion of these material over the RNA extracted from my sample, I expect a lower number of counts mapping to them.

ADD REPLY • link 7.0 years ago pablo_garcia05 ▴ 20

0

Entering edit mode

It sounds like you don't want to do independent filtering then, as you are specifically interested in preserving low counts genes, even if this means less advantage to high count genes. I'd recommend using a simple filter:

dds <- estimateSizeFactors(dds)
keep <- rowSums(counts(dds, normalized=TRUE) >= 5) >= x
dds <- dds[keep,]

Where you need to fill in a reasonable value for x (a common suggestion is the smallest sample size per group in your dataset).

Then later, use:

res <- results(dds, independentFiltering=FALSE)

ADD REPLY • link 7.0 years ago Michael Love 43k

0

Entering edit mode

Thank you MLove for spend your time here. But, in fact what I want to corroborate is the effect of that independent filtering taking our low count genes. Because if that is true, I have a contamination problem in my samples where I found as differentially expressed (few) genes which comes from the gut content and not from the organisms.

ADD REPLY • link 7.0 years ago pablo_garcia05 ▴ 20