Question

Filtering and Reference Level for DESeq2

0

Entering edit mode

kcarey • 0

@b626d890

Last seen 9 months ago

United States

Hello All,

Grad student here-I am new to DESeq2 and I have two fundamental questions. I am working with a large dataset, of 392 patients. I started with 35,458 genes, which is pretty high. I wanted to do my own filtering. I was given some code from a colleague that did RowMeans filtering and the Vignette says, RowSums based on smallestGroupSize. Now intuitively, RowMeans filtering doesn't make sense to me if the data is raw counts (unnormalized)....so going off the RowSums code, with my smallest group size being (mRNA subtype, n=82) . I used this:

Filtering:


smallestGroupSize<- 82
# Filter genes with at least 10 total read counts
keep<- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds<- dds[keep,]
dds
#now 26,392 genes

Reference Level:

I was told 10-15k genes for RNA sequencing data is best...so I still have a lot of genes....DESeq2 will also perform internal filtering, correct? So this may not be an issue....I don't want to take away from biologically meaningful results. Is this the best practice for this package?

My next question is, for the reference level. I don't have a normal vs diseased, or control vs treatment. I want to see differential expression of mRNA subtypes amongst one another...is there a way to make it no reference level? I saw default would be alphabetical order...or is It possible to make my reference level the mRNA subtype, that has best clinical outcome (ie. based on survival)?

Any thoughtful advice would be helpful! Thanks

DESeq2 GeneFiltering RNAseq • 1.3k views

ADD COMMENT • link 11 months ago kcarey • 0

score 0 · Answer 1 · 2024-03-20

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

Yes, there is also filtering in results by default.

I would maybe lower the smallestGroupSize number to like 40 or 50 even. You just want to avoid unnecessary computation.

My next question is, for the reference level. I don't have a normal vs diseased, or control vs treatment.

Reference level doesn't really matter if you supply the contrast argument to results. But yes you can set the reference level, code is in the vignette.