Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
I am writing to inquire about independent filtering for my large RNA-
seq dataset. I have around 55,000 genes (raw gene counts) RNA
sequencing data from 91 libraries/samples, consisting of 3 biological
replicates for 4 different genotypes on germinating seeds. I am
currently working on differential expression and subsequently
transcriptomic network analysis for these samples. Before performing
any of these analyses, I'd like to perform an independent filtering
for my data to increase detection power for differentially expressed
genes. I will be using your DESeq2 package (version 1.2.5) for my
filtering and differential expression analysis.
Based on recommendation by a statistician, I have decided to perform
the following steps:
1) Fit a negative binomial GLM with genotype & time effects across all
samples for all genes that have nonzero counts in at least one sample
2) Filter weakly expressed genes (for example using a filter like the
one implemented in HTSFilter)
3) Adjust p-values for genes passing the filter to correct for
multiple testing
While the DESeq2 package was nicely written, since I am not a
statistician, I am still a little bit unclear on a few things. Hence,
I would like to clarify a few things with you, mainly the workflow for
my analysis. Based on my understanding from what's written in DESeq2
package, I should be doing the following (in chronological order):
1. First, perform a differential expression (dds function) on my raw
gene counts for library size normalization. This step will fit my data
to a negative binomial generalized linear model with genotypes & time
effects across all samples for all genes that have nonzero counts in
at least one sample.
2. Second, use the result I obtain from step 1 to go through
independent filtering step using filter_p function from genefilter
package.
3. Third, use the result from step 2 to filter weakly expressed genes
further more using HTSFilter package.
4. Finally, adjust p-values for genes passing the filter to correct
for multiple testing. I am not entirely sure how to do this. Can I
perform this step using DESeq2 package?
Furthermore, does DESeq2 take care of PCR duplicate artifacts?
-- output of sessionInfo():
none
--
Sent via the guest posting facility at bioconductor.org.