Hello All,
Grad student here-I am new to DESeq2 and I have two fundamental questions. I am working with a large dataset, of 392 patients. I started with 35,458 genes, which is pretty high. I wanted to do my own filtering. I was given some code from a colleague that did RowMeans filtering and the Vignette says, RowSums based on smallestGroupSize. Now intuitively, RowMeans filtering doesn't make sense to me if the data is raw counts (unnormalized)....so going off the RowSums code, with my smallest group size being (mRNA subtype, n=82) . I used this:
Filtering:
smallestGroupSize<- 82
# Filter genes with at least 10 total read counts
keep<- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds<- dds[keep,]
dds
#now 26,392 genes
Reference Level:
I was told 10-15k genes for RNA sequencing data is best...so I still have a lot of genes....DESeq2 will also perform internal filtering, correct? So this may not be an issue....I don't want to take away from biologically meaningful results. Is this the best practice for this package?
My next question is, for the reference level. I don't have a normal vs diseased, or control vs treatment. I want to see differential expression of mRNA subtypes amongst one another...is there a way to make it no reference level? I saw default would be alphabetical order...or is It possible to make my reference level the mRNA subtype, that has best clinical outcome (ie. based on survival)?
Any thoughtful advice would be helpful! Thanks
Perfect, thanks Michael!