Question

How to find the differentially expressed genes for paired tumor normal samples without any biological replicates?

0

Entering edit mode

seeker • 0

@seeker-7773

Last seen 2.8 years ago

Netherlands

Dear all,

I would like to make an analysis with a multilevel design, with paired samples. Ie

My design looks like

> colData(dds) =

	sample	condition
	<factor>	<factor>
sample_1	sample_1	Control
sample_2	sample_1	Tumor
sample_3	sample_2	Control
sample_4	sample_2	Tumor
sample_5	sample_3	Control
sample_6	sample_3	Tumor
sample_7	sample_4	Control
sample_8	sample_4	Tumor

> dds=DESeqDataSetFromMatrix( countData = nlDe, colData = colData, design = ~ sample+ condition)

> mcols(res, use.name = T)

DataFrame with 6 rows and 2 columns

	type	description
baseMean	intermediate	mean of normalized counts for all samples
log2FoldChange	results	log2 fold change (MAP): condition Tumor vs Control
lfcSE	results	standard error: condition Tumor vs Control
stat	results	Wald statistic: condition Tumor vs Control
pvalue	results	Wald test p-value: condition Tumor vs Control
padj	results	BH adjusted p-values

> resultsNames(dds)
[1] "Intercept" "sample_1" "sample_2" "sample_3" "sample_4" "conditionControl"
[7] "conditionTumor"

I was wondering if this is the right way of doing the analysis?

rnaseq deseq2 • 2.3k views

ADD COMMENT • link updated 9.0 years ago by Michael Love 43k • written 9.0 years ago by seeker • 0

0

Entering edit mode

Only because this comes up so often: Of course you do have biological replicates: You have four patients, not just one. This counts as replications.

ADD REPLY • link 9.0 years ago Simon Anders ★ 3.8k

0

Entering edit mode

but these are samples are from four different patients.

ADD REPLY • link 9.0 years ago seeker • 0

0

Entering edit mode

Only because this comes up so often: Of course you do have biological replicates: You have four patients, not just one. This counts as replications.

ADD REPLY • link 9.0 years ago Simon Anders ★ 3.8k

score 1 · Answer 1 · 2015-11-26

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 8 hours ago

United States

Yes, the results table you want is just: results(dds)

ADD COMMENT • link 9.0 years ago Michael Love 43k

0

Entering edit mode

Michael,

But when i tried mcols(res, use.name = T), im getting

DataFrame with 6 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized counts for all samples
log2FoldChange results log2 fold change (MLE): sample_4.conditionTumor
lfcSE results standard error: sample_4.conditionTumor
stat results Wald statistic: sample_4.conditionTumor
pvalue results Wald test p-value: sample_4.conditionTumor
padj results BH adjusted p-values

ADD REPLY • link 9.0 years ago seeker • 0

0

Entering edit mode

You're skipping some lines of code which makes it difficult to help. Please post all your code and sessionInfo()

ADD REPLY • link 9.0 years ago Michael Love 43k

0

Entering edit mode

Please find the code below

sample2 <- read.table("read.count.txt", sep = "/t", stringsAsFactors= F, header=T)

sample2 <- sample2[!duplicated(sample2$Gene),]

comSamples <- sample2[,-1]
rownames(comSamples) <- sample2[,1]

sample_1 <- colnames(comSamples)
my_conditions <- factor(rep(c("Control","Tumor"),length(sample_1)/2))
dex <- factor(rep((1:4),each =2))
sample <- paste("sample",dex,sep ="_")

colData <- data.frame(sample = sample,condition=my_conditions,row.names= sample_1)

colData(dds) =

	sample	condition
	<factor>	<factor>
sample_1	sample_1	Control
sample_2	sample_1	Tumor
sample_3	sample_2	Control
sample_4	sample_2	Tumor
sample_5	sample_3	Control
sample_6	sample_3	Tumor
sample_7	sample_4	Control
sample_8	sample_4	Tumor

thres= 8
nzIndex= as.vector(which(apply(comSamples,1,function(x){sum(x>thres)/length(x)})>=0.5))
nlDe = comSamples[nzIndex,]

dds=DESeqDataSetFromMatrix( countData = nlDe, colData = colData, design = ~sample+condition)
dds$condition <- relevel(dds$condition, "Control")
dds <- DESeq(dds)
res <- results(dds)

mcols(res, use.names=T) =
DataFrame with 6 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized counts for all samples
log2FoldChange results log2 fold change (MAP): condition Tumor vs Control
lfcSE results standard error: condition Tumor vs Control
stat results Wald statistic: condition Tumor vs Control
pvalue results Wald test p-value: condition Tumor vs Control
padj results BH adjusted p-values

write.csv(as.data.frame(mcols(res, use.name = T)),file = "./output/DATE-DESeq2-test-conditions.csv")

plotMA(dds, ylim=c(-8,8),main = "RNAseq experiment")

result.table <- as.data.frame(res)
sig_gene <- row.names(result.table)[which(abs(result.table$log2FoldChange) >1)]
sig_gene_table <- result.table[which(abs(result.table$log2FoldChange) >1),]

sig_gene_table <- sig_gene_table[!is.na(sig_gene_table$pvalue),]
sig_gene_table <- sig_gene_table[sig_gene_table$padj < 0.05,]

resdata <- merge(sig_gene_table, as.data.frame(counts(dds,normalized=T)), by='row.names',sort=F)

sessionInfo()

R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (unknown)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] DESeq2_1.10.0 RcppArmadillo_0.6.200.2.0 Rcpp_0.12.2 SummarizedExperiment_1.0.1
[5] Biobase_2.30.0 GenomicRanges_1.22.1 GenomeInfoDb_1.6.1 IRanges_2.4.4
[9] S4Vectors_0.8.3 BiocGenerics_0.16.1 biomaRt_2.26.1

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3 XVector_0.10.0 bitops_1.0-6 futile.options_1.0.0
[7] tools_3.2.1 zlibbioc_1.16.0 rpart_4.1-10 digest_0.6.8 annotate_1.48.0 lattice_0.20-33
[13] RSQLite_1.0.0 gtable_0.1.2 DBI_0.3.1 proto_0.3-10 gridExtra_2.0.0 genefilter_1.52.0
[19] cluster_2.0.3 stringr_1.0.0 locfit_1.5-9.1 nnet_7.3-11 grid_3.2.1 AnnotationDbi_1.32.0
[25] survival_2.38-3 XML_3.98-1.3 BiocParallel_1.4.0 foreign_0.8-66 latticeExtra_0.6-26 Formula_1.2-1
[31] geneplotter_1.48.0 ggplot2_1.0.1 reshape2_1.4.1 lambda.r_1.1.7 magrittr_1.5 splines_3.2.1
[37] scales_0.3.0 Hmisc_3.17-0 MASS_7.3-45 xtable_1.8-0 colorspace_1.2-6 stringi_1.0-1
[43] acepack_1.3-3.3 RCurl_1.95-4.7 munsell_0.4.2

ADD REPLY • link 9.0 years ago seeker • 0

0

Entering edit mode

Yes, this is giving you the comparison you want. See: "log2 fold change (MAP): condition Tumor vs Control"

ADD REPLY • link 9.0 years ago Michael Love 43k

0

Entering edit mode

The reason why i was worried about my design is because, if i look at the normalized read count (rld) of tumor/control in Fgfr1 gene, there is average fold change on 1.5.

Sample_1_Cont Sample_1_Tum Sample_2_Cont Sample_2_Tumor Sample_3_Cont Sample_3_Tumor Sample_4_Cont Sample_4_Tumor
1.61928676 14.44964664 247.8401588 390.4711465 907.34300164 113.9341785 264.5305537 714.9689033

But looking at the result table log2FoldChange is 0.66.

Gene	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
Fgfr1	223.1446095	0.668350599	0.173459896	3.853055467	0.000116653	0.006547737

ADD REPLY • link 9.0 years ago seeker • 0

0

Entering edit mode

You should be averaging in the log2 scale. While the first sample has a large, positive log2 fold change, it is not so large for the other samples, and large in the negative direction for sample 3.

In addition, the log2 fold change provided by DESeq2 is the maximum posterior estimate, so it's not easily calculated from normalized counts. Take a look at the DESeq2 paper: http://www.genomebiology.com/2014/15/12/550

ADD REPLY • link 9.0 years ago Michael Love 43k

0

Entering edit mode

Michael,

But when i tried mcols(res, use.name = T), im getting

DataFrame with 6 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized counts for all samples
log2FoldChange results log2 fold change (MLE): sample_4.conditionTumor
lfcSE results standard error: sample_4.conditionTumor
stat results Wald statistic: sample_4.conditionTumor
pvalue results Wald test p-value: sample_4.conditionTumor
padj results BH adjusted p-values

ADD REPLY • link 9.0 years ago seeker • 0

0

Entering edit mode

Michael,

But when i tried mcols(res, use.name = T), im getting

DataFrame with 6 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized counts for all samples
log2FoldChange results log2 fold change (MLE): sample_4.conditionTumor
lfcSE results standard error: sample_4.conditionTumor
stat results Wald statistic: sample_4.conditionTumor
pvalue results Wald test p-value: sample_4.conditionTumor
padj results BH adjusted p-values

ADD REPLY • link 9.0 years ago seeker • 0