Dear Communities,
Here are my questions:
(1) Which kind of data is the best type for correlation or survival analysis, e.g., DESeq2 normalised count, TPM or FPKM?
(2) Which kind of normalised count data could be used for my desired analysis?
i. RSEM expected_count (DESeq2 standardized)
This kind of data could be fetched from UCSC XENA (https://xenabrowser.net/datapages/?dataset=TCGA-GTEx-TARGET-gene-exp-counts.deseq2-normalized.log2&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443)
ii. Transformation by the following DESeq2 code
dds <- DESeqDataSetFromMatrix(countData = exprSet, colData = metadata, design = ~ group)
dds <- dds[rowSums(counts(dds))>1,]
vsd <- vst(dds, blind = FALSE)
expr.normalised <- as.data.frame(assay(vsd)) #used as for correlation or survival analysis.
Hi ATpoint, thanks for your reply and show me the thread. But there are still some questions:
(1) dds <- estimateSizeFactors(dds); ntd <- normTransform(dds) would be suggested instead of vst transformation because of elapsed time. May I ask whether the two method could be substituted with each other for correlation or survival analysis?
(2) The vst transformation code shown above is correct?
(3) z-score tranformation should be done after vst transformation in case of correlation and survival analysis?
(4) Which is the best type of data for correlation and survival analysis?
Thanks again! And looking forward to your reply!
I don't know what is optimal for survival analysis. We basically have our transformations and the motivation for them (see the workflow for detailed discussion). But it's up to you which you use for what application.
I do _not_ recommend z-score after VST. The whole point of the VST is to stabilize the features so that there are all on a comparable scale and you haven't inflated the noise in the data. Dividing by SD undoes that.
Thanks for your reply and your excellent package! Could you please tell me the difference bwtween vst transformed data and the "vtd" based on the following code? Sorry I'm the naive in Bioinformatics analysis.
(1) dds <- estimateSizeFactors(dds); ntd <- normTransform(dds); expr.normalised <- as.data.frame(assay(ntd))
(2) vsd <- vst(dds, blind = FALSE); expr.normalised <- as.data.frame(assay(vsd))
Furthermore, can I make differentially expressed genes analysis based on these two kinds of data using wilcox.test? If not, is there anyother methods could be utilized? Or this is make no sense do DEGs based on these two kinds of data.
Besides, are these data need to be quantile normalization after that? Here are two boxplots made based on the two kinds od expr.normalised data.
Thanks so much! Looking forward to your reply.
VST is the variance stabilizing transformation, and in the code it produces
vsd
a variance stabilized dataset. VST data =vsd
.No see here:
https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#exploratory-analysis-and-visualization
We in general recommend using DESeq() for differential expression, not Wilcoxon on VST data.
No, that's not part of our default pipeline.