Question

Deseq2 Differential expression analysis design

0

Entering edit mode

alva.james • 0

@alvajames-6967

Last seen 6.6 years ago

Germany

Hello All,

I have a data set from 5 patient samples with three different time series (ID, CR, REl), in which CR is used as control and ID and Rel are treated samples.

currently the design looks like this

     Patients       type              condition
BM_Rel_pat_4 patient_4       Rel
BM_Rel_pat_5 patient_5       Rel
BM_Rel_pat_6 patient_6       Rel
BM_Rel_pat_7 patient_7       Rel
BM_Rel_pat_8 patient_8       Rel
BM_ID_Pat_4  patient_4        ID
BM_ID_Pat_5  patient_5        ID
BM_ID_Pat_6  patient_6        ID
BM_ID_Pat_7  patient_7        ID
BM_ID_Pat_8  patient_8        ID
BM_CR_pat4   patient_4        CR
BM_CR_pat5   patient_5        CR
BM_CR_pat6   patient_6        CR
BM_CR_pat7   patient_7        CR
BM_CR_pat8   patient_8        CR

The code for design with two condition (ID and REl as treated and CR as control)

single_pat = read.table( "/home/alva/AML_patients/BM_ID_Rel_CR/ID_REl_CR_BM_counts",header=TRUE, row.names=1 )
head(single_pat)
single_patDesign = data.frame(row.names = colnames( single_pat ),type=as.factor(c("patient_4","patient_5","patient_6","patient_7","patient_8")),condition =as.factor(c("Rel","Rel","Rel","Rel","Rel","ID","ID","ID","ID","ID","CR","CR","CR","CR","CR")))
condition = single_patDesign$condition
type=single_patDesign$type
colData <- pData(single_pat)[,c("condition","type")]
cds <-DESeqDataSetFromMatrix(countData=single_pat,colData=single_patDesign, design=~condition )

dds <- DESeq(cds)
res <- results(dds)

--- This is one method I already tried.

But, in addition to that I would like to get De genes by comparing ID vs CR and Rel Vs CR within the same analysis. This case how could I show the design in Deseq2.I wanted to do this Design for getting a cluster dendrogram , with DE gene set with three separated clusters.Please may I know this approach is correct..or whether I can approach it differently

sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] gplots_2.14.2             RColorBrewer_1.0-5       
 [3] genefilter_1.42.0         DESeq2_1.0.19            
 [5] RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3              
 [7] lattice_0.20-29           Biobase_2.20.1           
 [9] GenomicRanges_1.12.5      IRanges_1.18.4           
[11] BiocGenerics_0.6.0       

loaded via a namespace (and not attached):
 [1] annotate_1.38.0      AnnotationDbi_1.22.6 bitops_1.0-6        
 [4] caTools_1.17.1       DBI_0.3.1            gdata_2.13.3        
 [7] grid_3.0.2           gtools_3.4.1         KernSmooth_2.23-13  
[10] locfit_1.5-9.1       RSQLite_0.11.4       splines_3.0.2       
[13] stats4_3.0.2         survival_2.37-7      XML_3.98-1.1        
[16] xtable_1.7-4

Thank You for any assistance

deseq2 • 2.2k views

ADD COMMENT • link 10.3 years ago alva.james • 0

0

Entering edit mode

alva.james • 0

@alvajames-6967

Last seen 6.6 years ago

Germany

Hello Micheal ,

Thanks for the reply and solutions.

The clustering part,

Can you say more about what kind of clustering you want to do?

I am doing supervised clustering the piece of code I am using for obtaining it is,

y<-read.table( "Up_down_DE_ID_rel_genes", header=TRUE, row.names=1)
hr <- hclust(as.dist(1-cor(t(y), method="spearman")), method="complete")
## Column clustering (adjust here distance/linkage methods to what you need!)
hc <- hclust(as.dist(1-cor(y, method="spearman")), method="complete")
###saving image here
png(file="Up_down_DE_ID_rel.png", units="in", width=11, height=8.5, res=300)
heatmap.2(as.matrix(y), Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc), scale="row", col = rev(brewer.pal(11, "RdBu")),density.info="none", trace="none",margin=c(10,5))
dev.off()

Do you want to cluster genes, samples or patients?

Samples, based on differentially expressed genes (using their counts)

Thank you for help

ADD COMMENT • link 10.3 years ago alva.james • 0

0

Entering edit mode

alva.james • 0

@alvajames-6967

Last seen 6.6 years ago

Germany

Hello Micheal ,

Thanks for the reply and solutions.

The clustering part,

Can you say more about what kind of clustering you want to do?

I am doing supervised clustering the piece of code I am using for obtaining it is,

y<-read.table( "Up_down_DE_ID_rel_genes", header=TRUE, row.names=1)
hr <- hclust(as.dist(1-cor(t(y), method="spearman")), method="complete")
## Column clustering (adjust here distance/linkage methods to what you need!)
hc <- hclust(as.dist(1-cor(y, method="spearman")), method="complete")
###saving image here
png(file="Up_down_DE_ID_rel.png", units="in", width=11, height=8.5, res=300)
heatmap.2(as.matrix(y), Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc), scale="row", col = rev(brewer.pal(11, "RdBu")),density.info="none", trace="none",margin=c(10,5))
dev.off()

Do you want to cluster genes, samples or patients?

Samples, based on differentially expressed genes (using their counts), how does it different the cluster for genes samples and patients. How can it be different if the same dataset (DE genes) are used..? Could please brief a little bit.

Thank you for help

ADD COMMENT • link 10.3 years ago alva.james • 0

score 1 · Accepted Answer · 2014-11-03

hi,

A couple questions/comments: firstly, you are using the Bioconductor release from April 2013. You should update to the latest version of Bioconductor, which was just released a month ago. See instructions here: http://bioconductor.org/install/

Secondly, we recommend storing phenotypic data in a separate CSV or TSV file to prevent dangerous typos in your script.

"I would like to get De genes by comparing ID vs CR and Rel Vs CR within the same analysis"

If you want to make these comparisons while controlling for patient effects, use a design of ~ patient + condition (or type as you have above, note that these column names are not restricted to 'type' and 'condition'.), Then build the tables like so:

dds = DESeq(dds)
resID = results(dds, contrast=c("condition","ID","CR"))
resID = results(dds, contrast=c("condition","Rel","CR"))

"I wanted to do this Design for getting a cluster dendrogram , with DE gene set with three separated clusters."

Can you say more about what kind of clustering you want to do? Do you want to cluster genes, samples or patients? Typically clustering of samples is "unsupervised" which means you use an algorithm which does not "see" or have access to the phenotypic information. So clustering of samples is a separate task from the differential expression performed by the DESeq() function.