Hi,
I am trying to improve my RNA-Seq DGE analysis practice so would like to see if I have been doing it correctly.
First part is related to tximport. For DESeq2, I use the suggested way:
txi <- tximport(files, type = "salmon", tx2gene = tx2gene)
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)
This uses the offset matrix to correct for the length if I am not mistaken. For limma:
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, countsFromAbundance = "lengthScaledTPM")
y <- DGEList(txi$counts)
So my first question is, if I want to eliminate some of the samples from the DESeq or DGEList object, should I just select them from the object like:
dds <- dds[,sample_i_want_to_keep]
# or
y <- y[,sample_i_want_to_keep]
Or should I redo the tximport()
process? Would it affect the adjustment? My understanding is that TPM adjusts to gene length and library size so it shouldn't but I am not sure.
The second part of the question is about DGE design matrix. Let's say I have something like this, where donor is the origin of the cells, and I want to compare conditions NTC vs others and between A, B and C.
donor Condition
1_NTC 1 NTC
1_A 1 A
1_B 1 B
1_C 1 C
2_NTC 2 NTC
2_A 2 A
2_B 2 B
2_C 2 C
3_A 3 A
Due to technical issue 3_NTC, 3_B and 3_C could not be sequenced. I am just wondering when creating the design matrix, should I include donor or not. How will it affect the result when I compare NTC vs non-NTC and between A, B and C, given that for donor 3 there is no NTC, B or C? I assume in both DESeq2
and limma
, the answer for this question would be the same since both are GLM-based?
Thanks a lot! Hope the questions make sense.
Thanks Michael Love ! I just reread the tutorial for tximport again and had a relevant question regarding to the scaling. So We don't have to scale to transcript length, but library size is always a must to be normalised to. My question is for
limma-voom
in the tutorial, you recommended to use"scaledTPM'
to normalise the counts to library size, but then a few lines later you redo the normalisation again:Is the
calNormFactors()
step necessary given that the counts are already normalised? Will it change the value of the original count?scaledTPM
doesn't correct for library size. For example, on thefiles
in the man page: