DESeq2: Analyzing just two normal+tumour pairs
1
0
Entering edit mode
mmokrejs • 0
@mmokrejs-16289
Last seen 6.0 years ago

Hi,

  I have RNAseq from two cancer+normal cases. I used salmon and proceeded with DESeq2. I do get the ./results/ directory out but the PDF figures are broken. The error message for example in boxplot.ENSG00000077454.15.pdf states: "Error using packet 1 arguments must have same length". The X-axis is labeled "factor", Y-axis is labeled "ENSG.... normalized counts".

$ cat samples.txt
pop    center    assay    sample    type    patient    experiment    topic    quant_file
1    lab    KAPA_stranded_RNAseq    PleioAdenVz1    cancer    patient1    PleiomorphicAdenoma1_S1    patient1_cancer    ./PleioAdenVz1/PleiomorphicAdenoma1_S1.gencode_v28.salmon_quant/quant.sf
2    lab    KAPA_stranded_RNAseq    PleioAdenVz2    normal    patient1    PleiomorphicAdenoma2_S2    patient1_normal    ./PleioAdenVz2/PleiomorphicAdenoma2_S2.gencode_v28.salmon_quant/quant.sf
3    lab    KAPA_stranded_RNAseq    PleioAdenVz9    cancer    patient2    PleiomorphicAdenoma9_S3    patient2_cancer    ./PleioAdenVz9/PleiomorphicAdenoma9_S3.gencode_v28.salmon_quant/quant.sf
4    lab    KAPA_stranded_RNAseq    PleioAdenVz10    normal    patient2    PleiomorphicAdenoma10_S4    patient2_normal    ./PleioAdenVz10/PleiomorphicAdenoma10_S4.gencode_v28.salmon_quant/quant.sf

$

 

I did something like the following:

 

$ for d in PleioAdenVz1 PleioAdenVz2 PleioAdenVz9 PleioAdenVz10; do

  pushd $d

   salmon quant --numThreads $threads --validateMappings --writeUnmappedNames -i ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all -l ISF -1 <(gunzip -c *.fwd_reads.gz) -2 <(gunzip -c *.rev_reads.gz) -o ./${d}.salmon_quant

  popd

done

$

I see I should have used --gencode argument because later tximport broke on the extra bars delimiting gene aliases:

 

[ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|, ...]
# if somebody forgot to run salmon with --gencode here is a workaround
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE)

 

I think I am still doing the right thing so far.

 

library(tximportData)
library(GenomicFeatures)

txdb <- makeTxDbFromGFF(file="ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz")
txdb
saveDb(x=txdb, file = "gencode.v28.annotation.TxDb")

k <- keys(txdb, keytype = "TXNAME")
tx2gene <- select(txdb, k, "GENEID", "TXNAME")
head(k)
head(tx2gene)
dim(tx2gene)
length(k)

dir <- '.'
samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
samples

files <- file.path(samples$quant_file)
names(files) <- samples$run
all(file.exists(files))
files

library(tximport)
# work around
# [ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|, ...]
# if somebody forgot to run salmon with --gencode
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE)
names(txi)

library(DESeq2)
dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient)

library(RColorBrewer)

dds <- DESeq(dds)
resultsNames(dds)

# now the shaky part

# dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient + type:patient)
#dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type)
dds$type <- relevel(dds$type, ref="normal")
dds$IndividualN <- factor(c(1,1,2,2))
dds <- DESeq(dds)
#res <- results(dds,contrast=list("type","cancer","normal"))
res <- results(dds,contrast=c("type","cancer","normal"))

library(ReportingTools)
desReport <- HTMLReport(shortName = "RNAseq_analysis_with_DESeq",
                        title = "Results of differential expression analysis using DESeq2 (cancer vs. normal)",
                        reportDirectory = "./reports")
publish(res, desReport, DataSet = dds, n = 50, pvalueCutoff = 0.01, lfc = 1,
        make.plots = TRUE, factor = dds$dex)
finish(desReport)

 

As you can see I tried several DESeqDataSetFromTximport alternatives but I always ended up with no result. Well, the page ./reports/RNAseq_analysis_with_DESeq.html does show p-values but can I trust that if elsewhere is silently broke? I haven't seen an error from DESeq2 in my R session.

 

ENSG00000077454.15 mini.ENSG00000077454.15.png -7.08 1.23e-38 4.09e-35
ENSG00000114098.17 mini.ENSG00000114098.17.png -3.58 8.59e-13 5.72e-10
ENSG00000119321.8 mini.ENSG00000119321.8.png -5.37 1.56e-47

1.30e-43

 

Thank you for your help.

deseq2 reporting tools • 1.1k views
ADD COMMENT
0
Entering edit mode
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /apps/gentoo/usr/lib64/librefblas.so.3.7.0
LAPACK: /apps/gentoo/usr/lib64/libreflapack.so.3.7.0

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2          pheatmap_1.0.10            
 [3] ReportingTools_2.22.0       knitr_1.20                 
 [5] DESeq2_1.22.1               SummarizedExperiment_1.12.0
 [7] DelayedArray_0.8.0          BiocParallel_1.16.1        
 [9] matrixStats_0.54.0          tximport_1.10.0            
[11] readr_1.2.1                 GenomicFeatures_1.34.1     
[13] AnnotationDbi_1.44.0        Biobase_2.42.0             
[15] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
[17] IRanges_2.16.0              S4Vectors_0.20.1           
[19] BiocGenerics_0.28.0         tximportData_1.10.0        

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2         hwriter_1.3.2            biovizBase_1.30.0       
  [4] htmlTable_1.12           XVector_0.22.0           base64enc_0.1-3         
  [7] dichromat_2.0-0          rstudioapi_0.8           bit64_0.9-7             
 [10] R.methodsS3_1.7.1        splines_3.5.1            ggbio_1.30.0            
 [13] geneplotter_1.60.0       Formula_1.2-3            Rsamtools_1.34.0        
 [16] annotate_1.60.0          cluster_2.0.7-1          GO.db_3.7.0             
 [19] R.oo_1.22.0              graph_1.60.0             BiocManager_1.30.4      
 [22] compiler_3.5.1           httr_1.3.1               GOstats_2.48.0          
 [25] backports_1.1.2          assertthat_0.2.0         Matrix_1.2-15           
 [28] lazyeval_0.2.1           limma_3.38.2             acepack_1.4.1           
 [31] htmltools_0.3.6          prettyunits_1.0.2        tools_3.5.1             
 [34] bindrcpp_0.2.2           gtable_0.2.0             glue_1.3.0              
 [37] GenomeInfoDbData_1.2.0   Category_2.48.0          reshape2_1.4.3          
 [40] dplyr_0.7.8              Rcpp_1.0.0               Biostrings_2.50.1       
 [43] rtracklayer_1.42.1       stringr_1.3.1            ensembldb_2.6.3         
 [46] XML_3.98-1.16            edgeR_3.24.0             zlibbioc_1.28.0         
 [49] scales_1.0.0             BSgenome_1.50.0          VariantAnnotation_1.28.2
 [52] hms_0.4.2                ProtGenerics_1.14.0      RBGL_1.58.1             
 [55] AnnotationFilter_1.6.0   curl_3.2                 memoise_1.1.0           
 [58] gridExtra_2.3            ggplot2_3.1.0            biomaRt_2.38.0          
 [61] rpart_4.1-13             reshape_0.8.8            latticeExtra_0.6-28     
 [64] stringi_1.2.4            RSQLite_2.1.1            genefilter_1.64.0       
 [67] checkmate_1.8.5          rlang_0.3.0.1            pkgconfig_2.0.2         
 [70] bitops_1.0-6             lattice_0.20-38          purrr_0.2.5             
 [73] bindr_0.1.1              GenomicAlignments_1.18.0 htmlwidgets_1.3         
 [76] bit_1.1-14               tidyselect_0.2.5         GSEABase_1.44.0         
 [79] AnnotationForge_1.24.0   GGally_1.4.0             plyr_1.8.4              
 [82] magrittr_1.5             R6_2.3.0                 Hmisc_4.1-1             
 [85] DBI_1.0.0                pillar_1.3.0             foreign_0.8-71          
 [88] survival_2.43-3          RCurl_1.95-4.11          nnet_7.3-12             
 [91] tibble_1.4.2             crayon_1.3.4             OrganismDbi_1.24.0      
 [94] PFAM.db_3.7.0            progress_1.2.0           locfit_1.5-9.1          
 [97] grid_3.5.1               data.table_1.11.8        blob_1.1.1              
[100] Rgraphviz_2.26.0         digest_0.6.18            xtable_1.8-3            
[103] R.utils_2.7.0            munsell_0.5.0           
> 
ADD REPLY
0
Entering edit mode
@mikelove
Last seen 3 days ago
United States

The error is coming from ReportingTools, not DESeq2, which has a different maintainer. I’d recommend to make a post that includes just the ReportingTools code and the error and session info. You should tag that package instead of DESeq2.

ADD COMMENT
0
Entering edit mode

Thank you, Michael, I will open a new thread then. I added the ReportingTools tag but probably too late to notify its author.

Would you please comment on these attempts, line by line? I am new to this and I do not really understand the underlying differences, hence the comments on lines commented out. And a third approach would be the defaults.

I added the samples.txt to give you the underlying info I have.

DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient)
# dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient + type:patient)
#dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type)
dds$type <- relevel(dds$type, ref="normal")
dds$IndividualN <- factor(c(1,1,2,2))
dds <- DESeq(dds)
#res <- results(dds,contrast=list("type","cancer","normal"))
res <- results(dds,contrast=c("type","cancer","normal"))
ADD REPLY
0
Entering edit mode

You have three commented out lines. The first two are entirely different designs. If you don’t understand the difference between these you should consult a statistician. This forum is for software guidance, when users know what model they want to use. 

The last commented out like won’t work. See the man page for the results function for description of how the contrast argument works.

ADD REPLY

Login before adding your answer.

Traffic: 532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6