Entering edit mode
Wolfgang Huber
★
13k
@wolfgang-huber-3550
Last seen 12 weeks ago
EMBL European Molecular Biology Laborat…
This thread has accumulated a good number of opinions and speculations
what the best filter criterion and cutoff value is.
The ?genefilter? vignette (I mentioned it previously) "Diagnostics for
independent filtering? [1] provides rational criteria for deciding in
a data-dependent manner.
Kind regards
Wolfgang
[1] http://bioconductor.org/packages/release/bioc/html/genefilter.html
On 30 Apr 2014, at 23:25, Steve Lianoglou <lianoglou.steve at="" gene.com=""> wrote:
> Hi,
>
> On Wed, Apr 30, 2014 at 1:11 PM, Ryan C. Thompson <rct at="" thompsonclan.org=""> wrote:
>> Filtering on raw counts has a statistical motivation, i.e.
something like
>> "we can't do statistics with less than X reads". Filtering on CPM
is
>> sometimes just used as a proxy for count-based filtering, but
sometimes it
>> also has a biological motivation, i.e. "we believe that CPM < X
represents
>> biological noise transcription rather than genuine regulated
transcription
>> relevant to the biological system in question". So you have to
consider what
>> your goals are for filtering and choose an appropriate method.
>
> Even still, in the "biological motivation" case: if you want to use
> CPM, shouldn't you really prefer {R|F}PKM so you don't "enrich" for
> removal of lowly expressed short transcripts while letting lowly
> expressed long transcripts slip through?
>
> -steve
>
> --
> Steve Lianoglou
> Computational Biologist
> Genentech
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor