Sample label swap in HTML results
2
0
Entering edit mode
maximinio • 0
@maximinio-13364
Last seen 7.4 years ago
San Gerardo Hospital, Monza, Italy

This is my sample data frame:

sampleTable = data.frame(  condition = c("ctrl","ctrl","ctrl","resistant","resistant","resistant"),
                           replicate = c(1:3,1:3),
                           row.names = c("Ctrl1","Ctrl2","Ctrl4","AS4","BD1","BS1"),
                           stringsAsFactors = TRUE, check.names = FALSE,
                           libType = c("paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end"),
                           countName = c("Ctrl1_DEXSeq.tabular", "Ctrl2_DEXSeq.tabular", "Ctrl4_DEXSeq.tabular", "AS4_DEXSeq.tabular", "BD1_DEXSeq.tabular", "BS1_DEXSeq.tabular") )


I am sure the tabular files are correct because I checked them by comparing the RPMs with those obtained with htseq.
Nevertheless, when I test for DEU and estimate exon FC, the results seem to be fine but the 3 controls and the 3 experimental groups are evidently swapped.

In this picture for instance the RPMs belonging to the last exons are clearly higher in the control tab files as compared to the others:
dexseq

Could it be that the samples are erroneously processed alphabetically and so the AS4, BD1 and BS1 come first to the Ctrls?

Thank you in advance!
Best,
Luca

dexseq dexseqhtml • 2.8k views
ADD COMMENT
0
Entering edit mode

Please post the complete R code that you used to get from the sample and count tables to the plot you posted. Thanks.

ADD REPLY
0
Entering edit mode

Could you also the output of your sessionInfo() please?

ADD REPLY
0
Entering edit mode

It'll take some time...BRB!

Thank you all!!

ADD REPLY
0
Entering edit mode
> sessionInfo()
R version 3.4.0 Patched (2017-06-02 r72765)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Italian_Italy.1252 
[2] LC_CTYPE=Italian_Italy.1252   
[3] LC_MONETARY=Italian_Italy.1252
[4] LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.1252    

attached base packages:
[1] stats4    parallel  stats    
[4] graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.22.0             
 [2] RColorBrewer_1.1-2        
 [3] AnnotationDbi_1.38.1      
 [4] DESeq2_1.16.1             
 [5] SummarizedExperiment_1.6.3
 [6] DelayedArray_0.2.7        
 [7] matrixStats_0.52.2        
 [8] GenomicRanges_1.28.3      
 [9] GenomeInfoDb_1.12.2       
[10] IRanges_2.10.2            
[11] S4Vectors_0.14.3          
[12] Biobase_2.36.2            
[13] BiocGenerics_0.22.0       
[14] BiocParallel_1.10.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11           
 [2] locfit_1.5-9.1         
 [3] lattice_0.20-35        
 [4] Biostrings_2.44.1      
 [5] Rsamtools_1.28.0       
 [6] digest_0.6.12          
 [7] plyr_1.8.4             
 [8] backports_1.1.0        
 [9] acepack_1.4.1          
[10] RSQLite_2.0            
[11] ggplot2_2.2.1          
[12] zlibbioc_1.22.0        
[13] rlang_0.1.1            
[14] lazyeval_0.2.0         
[15] data.table_1.10.4      
[16] annotate_1.54.0        
[17] blob_1.1.0             
[18] rpart_4.1-11           
[19] Matrix_1.2-10          
[20] checkmate_1.8.2        
[21] splines_3.4.0          
[22] statmod_1.4.30         
[23] geneplotter_1.54.0     
[24] stringr_1.2.0          
[25] foreign_0.8-69         
[26] htmlwidgets_0.8        
[27] biomaRt_2.32.1         
[28] RCurl_1.95-4.8         
[29] bit_1.1-12             
[30] munsell_0.4.3          
[31] compiler_3.4.0         
[32] base64enc_0.1-3        
[33] htmltools_0.3.6        
[34] nnet_7.3-12            
[35] tibble_1.3.3           
[36] gridExtra_2.2.1        
[37] htmlTable_1.9          
[38] GenomeInfoDbData_0.99.0
[39] Hmisc_4.0-3            
[40] XML_3.98-1.9           
[41] bitops_1.0-6           
[42] grid_3.4.0             
[43] xtable_1.8-2           
[44] gtable_0.2.0           
[45] DBI_0.7                
[46] magrittr_1.5           
[47] scales_0.4.1           
[48] stringi_1.1.5          
[49] XVector_0.16.0         
[50] hwriter_1.3.2          
[51] genefilter_1.58.1      
[52] latticeExtra_0.6-28    
[53] Formula_1.2-1          
[54] tools_3.4.0            
[55] bit64_0.9-7            
[56] survival_2.41-3        
[57] colorspace_1.3-2       
[58] cluster_2.0.6          
[59] memoise_1.1.0          
[60] knitr_1.16
ADD REPLY
0
Entering edit mode

Thank you in advance for your help!

ADD REPLY
0
Entering edit mode

Please also post the relevant lines of your count tables that lead you to think that 'control' is expressed above 'resistant', and the output of sizeFactors(dxd) 

ADD REPLY
0
Entering edit mode
> sizeFactors(dxd)
 [1] 0.9904217 0.8435785 0.9805609
 [4] 1.2139886 1.1632014 0.9221617
 [7] 0.9904217 0.8435785 0.9805609
[10] 1.2139886 1.1632014 0.9221617
ADD REPLY
1
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 4 months ago
Novartis Institutes for BioMedical Rese…

Hi Maximino. Thanks for sending your files. I think the the sample swap is in your code. Specifically the line below is not returning the tabular files in the same order as you are specifying 

list.files(full.names=TRUE,pattern=".tabular")

[1] "./AS4_DEXSeq.tabular"   "./BD1_DEXSeq.tabular"   "./BS1_DEXSeq.tabular"

[4] "./Ctrl1_DEXSeq.tabular" "./Ctrl2_DEXSeq.tabular" "./Ctrl4_DEXSeq.tabular"

Is not returning the files in the same order as your annotation table:

> sampleTable

      condition replicate    libType            countName

Ctrl1      ctrl         1 paired-end Ctrl1_DEXSeq.tabular

Ctrl2      ctrl         2 paired-end Ctrl2_DEXSeq.tabular

Ctrl4      ctrl         3 paired-end Ctrl4_DEXSeq.tabular

AS4   resistant         1 paired-end   AS4_DEXSeq.tabular

BD1   resistant         2 paired-end   BD1_DEXSeq.tabular

BS1   resistant         3 paired-end   BS1_DEXSeq.tabular
ADD COMMENT
0
Entering edit mode

Hi Alejandro, thank you for getting back to the issue.

So in other words you are saying that since

countFiles <- list.files(full.names=TRUE,pattern=".tabular")

list the files in alphabetical order, the sampleTable should list the files in the same order...right?

ADD REPLY
0
Entering edit mode

Yes! So either change the order of the sampleTable rows or the order of the file names. The important thing is that these are matching. 

ADD REPLY
0
Entering edit mode

Thank you!!!!

Best,

Luca

ADD REPLY
0
Entering edit mode
maximinio • 0
@maximinio-13364
Last seen 7.4 years ago
San Gerardo Hospital, Monza, Italy
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_AS4_S59_RNASeq_hg38Aligned.sortedByCoord.out.bam AS4_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_BD1_S61_RNASeq_hg38Aligned.sortedByCoord.out.bam BD1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_BS1_S60_RNASeq_hg38Aligned.sortedByCoord.out.bam BS1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl1Aligned.sortedByCoord.out.bam Ctrl1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl2Aligned.sortedByCoord.out.bam Ctrl2_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl4Aligned.sortedByCoord.out.bam Ctrl4_DEXSeq.tabular -f bam

 


sampleTable = data.frame( condition = c("ctrl","ctrl","ctrl","resistant","resistant","resistant"),

replicate = c(1:3,1:3),

row.names = c("Ctrl1","Ctrl2","Ctrl4","AS4","BD1","BS1"),

stringsAsFactors = TRUE, check.names = FALSE,

libType = c("paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end"),

countName = c("Ctrl1_DEXSeq.tabular", "Ctrl2_DEXSeq.tabular", "Ctrl4_DEXSeq.tabular", "AS4_DEXSeq.tabular", "BD1_DEXSeq.tabular", "BS1_DEXSeq.tabular")

)


countFiles <- list.files(full.names=TRUE,pattern=".tabular")
countFiles


flattenedFile = file.path("gencode.v26.annotation_DEXSeq.gff")
flattenedFile


fullModel <- ~ sample + exon + condition:exon

dxd = DEXSeqDataSetFromHTSeq( countfiles=countFiles,

sampleData=sampleTable,

design=fullModel,

flattenedfile=flattenedFile)


dxd = estimateSizeFactors(dxd)


dxd = estimateDispersions(dxd)


dxd <- testForDEU(dxd)
dxd <- estimateExonFoldChanges(dxd, fitExpToVar="condition")


dxr1 <- DEXSeqResults(dxd)


DEXSeqHTML( dxr1, FDR=0.1, color=c("#FF000080", "#0000FF80") )
ADD COMMENT
0
Entering edit mode

Hi Maximino. I made a couple of tests and I could not reproduce the sample swap that you mentioned. Sorry for the multiple request, but could you include the normalized counts (norCounts) in the plotDEXSeq function?

ADD REPLY
0
Entering edit mode

Not really sure I understood the Q...

The code I used for the plot above is:

plotDEXSeq( dxr1, "ENSG00000006576.16", legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )

Ant these are the counts in the tabular files:

As you can see, the 38th and the 39th exons have higher expression in the ctrls according to the tabular files and viceversa in the graph above.

ADD REPLY
0
Entering edit mode

Sorry for not being clear. Could you post the plot resulting from the following line of code?

plotDEXSeq( dxr1, "ENSG00000006576.16", norCounts=TRUE, legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )
ADD REPLY
0
Entering edit mode

Here you go:

ADD REPLY
0
Entering edit mode

I believe the answer is there: the samples are not swapped. The count matrix that you are showing is not normalized for sequencing depth. If you do normalize for sequencing depth, you should get consistent values to the normalized counts plot.

counts(dxd, normalized=TRUE)

 

ADD REPLY
0
Entering edit mode

My bad, I thought it was done by default.

Nevertheless, even with the normalized counts exons 38 and 39 are still plotted higher than controls...maybe I am missing something here.

ADD REPLY
0
Entering edit mode

From the raw count matrix that you posted, you will see that the counts for the controls are generally higher than the resistant samples (not only for exons 38 and 39, but for all exons). Once you normalize for sequencing depth, the counts for the resistant samples are higher than in the resistant samples only for exons 38 and 39. This means that the higher raw counts in the control samples are due to sequencing depth.

ADD REPLY
0
Entering edit mode

Clear, although I am still missing something...

It appears to me that exons 38 and 39 (which I took as an example, many more exons are actually different) display higher values in resistants either without or with normalization.

ADD REPLY
0
Entering edit mode

I see... in that case my previous explanation is not valid. But still, I can't reproduce these results. Would you mind sending me your raw files so I can try to reproduce the swap?

Alejandro 

ADD REPLY
0
Entering edit mode

Thank you!

At which email address should I send them?

ADD REPLY
0
Entering edit mode

alejandro.reyes.ds at gmail please!

ADD REPLY

Login before adding your answer.

Traffic: 608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6