Which is the best type of data for correlation or survival analysis
1
0
Entering edit mode
Yang Shi ▴ 10
@ea61ff7a
Last seen 16 months ago
Zheng Zhou

Dear Communities,

Here are my questions:

(1) Which kind of data is the best type for correlation or survival analysis, e.g., DESeq2 normalised count, TPM or FPKM?

(2) Which kind of normalised count data could be used for my desired analysis?

i. RSEM expected_count (DESeq2 standardized)

This kind of data could be fetched from UCSC XENA (https://xenabrowser.net/datapages/?dataset=TCGA-GTEx-TARGET-gene-exp-counts.deseq2-normalized.log2&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443)

ii. Transformation by the following DESeq2 code

dds <- DESeqDataSetFromMatrix(countData = exprSet, colData = metadata, design = ~ group)
dds <- dds[rowSums(counts(dds))>1,]
vsd <- vst(dds, blind = FALSE)
expr.normalised <- as.data.frame(assay(vsd)) #used as for correlation or survival analysis.
TPM RNASeq RSEM DESeq2 • 2.8k views
ADD COMMENT
2
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 3 days ago
Germany

Asked before: DESeq2 for survival analysis

ADD COMMENT
0
Entering edit mode

Hi ATpoint, thanks for your reply and show me the thread. But there are still some questions:

(1) dds <- estimateSizeFactors(dds); ntd <- normTransform(dds) would be suggested instead of vst transformation because of elapsed time. May I ask whether the two method could be substituted with each other for correlation or survival analysis?

(2) The vst transformation code shown above is correct?

(3) z-score tranformation should be done after vst transformation in case of correlation and survival analysis?

(4) Which is the best type of data for correlation and survival analysis?

Thanks again! And looking forward to your reply!

ADD REPLY
2
Entering edit mode

I don't know what is optimal for survival analysis. We basically have our transformations and the motivation for them (see the workflow for detailed discussion). But it's up to you which you use for what application.

I do _not_ recommend z-score after VST. The whole point of the VST is to stabilize the features so that there are all on a comparable scale and you haven't inflated the noise in the data. Dividing by SD undoes that.

ADD REPLY
0
Entering edit mode

Thanks for your reply and your excellent package! Could you please tell me the difference bwtween vst transformed data and the "vtd" based on the following code? Sorry I'm the naive in Bioinformatics analysis.

(1) dds <- estimateSizeFactors(dds); ntd <- normTransform(dds); expr.normalised <- as.data.frame(assay(ntd))

(2) vsd <- vst(dds, blind = FALSE); expr.normalised <- as.data.frame(assay(vsd))

Furthermore, can I make differentially expressed genes analysis based on these two kinds of data using wilcox.test? If not, is there anyother methods could be utilized? Or this is make no sense do DEGs based on these two kinds of data.

Besides, are these data need to be quantile normalization after that? Here are two boxplots made based on the two kinds od expr.normalised data.

Thanks so much! Looking forward to your reply.

enter image description here enter image description here

ADD REPLY
0
Entering edit mode

VST is the variance stabilizing transformation, and in the code it produces vsd a variance stabilized dataset. VST data = vsd.

can I make differentially expressed genes analysis based on these two kinds of data using wilcox.test?

No see here:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#exploratory-analysis-and-visualization

We in general recommend using DESeq() for differential expression, not Wilcoxon on VST data.

Besides, are these data need to be quantile normalization after that?

No, that's not part of our default pipeline.

ADD REPLY

Login before adding your answer.

Traffic: 953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6