Hello,
Im working with DRIMSeq and stageR to analyse DTU from RNASeq transcript counts estimated by salmon.
I have been following DRIMSeq and stagR's vignette. Importing my reads with tximport, using scaledTPM as scaling method. Filtering transcripts according to recommendations and calculating precision , fitting and testing with drimseq.
I created my design matrix like it was suggested in the DRIMSeq vignette in section 5.3, to include a batch effect variable.
I had no trouble going through those steps, and when looking at the results i can see various adjusted pvalues below a significance level of e.g. 0.05, on gene as well as transcript level.
When I moved on to the 2 stage testing using stageR as suggested, I end up with only NA' s in the gene and transcript column after calling the getAdjustedPValues() function. In the vignette is stated, that this means that no genes have passed the first stage screening and are thus turned to NA. Now I am wondering about three things:
- did I maybe use the wrong design matrix and end up with wrong DRIMSeq results. This thought came up because when I use the accessor function DRIMSeq::design() on my DRIMSeq object it returns another design matrix compared to what I put in. When running DRIMSeq it reported VERY high precision (~400-600), does that make sens?
- Even if none of the genes passed the first screening test in stageR, when I call the function getAdjustedPValues() with the option of only returning significant genes, I get an error message, while it should probably just return an empty list?!
- Concerning the fact that I get a long list of genes with adjusted pvalues below some alpha from the DRIMSeq results, and none after applying stageR, I wonder if I might have misunderstood the function's intention. From the stageR vignette section 3.2, I understand that if the stageRTx() is generated with the option of defining pScreen as already adjusted, the function call of stageWiseAdjustment() would just check which of those already adjusted pvalues (qvalues) in pScreen pass the threshold, and then move on to the transcript level test/adjustment (FWER) including only those genes that passed the screening.
Code:
txi <- tximport(files_perCohort,type=quant_tool,txOut=TRUE,countsFromAbundance=scale_method)
d <- dmDSdata(counts=countData[[i]],samples=info[[i]])
n.small<-min(table(DRIMSeq::samples(d)$condition))
d<-dmFilter(d,
min_samps_feature_expr=n.small,min_feature_expr=5,
min_samps_feature_prop=n.small,min_feature_prop=0.1,
min_samps_gene_expr=n*0.7,min_gene_expr=10)
design_full <- model.matrix(~condition+cov,data=DRIMSeq::samples(d))
head(design_full)
(Intercept) condition1 cov
Sid1 1 0 3.1
Sid2 1 1 5.6
Sid3 1 0 5.1
Sid4 1 1 5.9
Sid4 1 0 5.6
Sid5 1 1 6.0
head(DRIMSeq::design(d))
(Intercept) cov
Sid1 1 3.1
Sid2 1 5.6
Sid3 1 5.1
Sid4 1 5.9
Sid4 1 5.6
Sid5 1 6.0
res<-DRIMSeq::results(d)
pScreen <- **res$pvalue**
pConfirmation <- matrix(res$txp$pvalue,ncol=1)
rownames(pConfirmation) <- res$txp$feature_id
tx2gene <- res$txp[,c("feature_id","gene_id")]
stageRObj <- stageRTx(pScreen=pScreen,pConfirmation=pConfirmation,**pScreenAdjusted=F
ALSE**,tx2gene=tx2gene)
stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)
drim.padj <- getAdjustedPValues(stageRObj,order=TRUE,onlySignificantGenes=TRUE)
drim.padj
Error in alpha * 100 : non-numeric argument to binary operator
In addition: Warning message:
The returned adjusted p-values are based on a
stage-wise testing approach and are only valid for
the provided target OFDR level of 5%. If a different target OFDR level is of interest,
the entire adjustment should be re-run.
traceback()
5: paste0("No genes were found to be significant on a ", alpha *
100, "% OFDR level.")
4: message(paste0("No genes were found to be significant on a ",
alpha * 100, "% OFDR level."))
3: .getAdjustedPTx(object = object, onlySignificantGenes, order,
...)
2: getAdjustedPValues(Ss[[1]], order = TRUE, onlySignificantGenes = TRUE)
1: getAdjustedPValues(Ss[[1]], order = TRUE, onlySignificantGenes = TRUE)
drim.padj<-getAdjustedPValues(stageRObj,order=TRUE,onlySignificantGenes=FALSE)
nrow(drim.padj) == sum(is.na(drim.padj)[,"transcript"]))
TRUE
nrow(drim.padj)== sumis.na(drim.padj)[,"gene"]))
TRUE
#
Doing the same thing but passing the adjusted pvalues from DRIMSeq result to pScreen yields the same results and error message
pScreen <- **res$adj_pvalue**
stageRTx(pScreen=pScreen,pConfirmation=pConfirmation,**pScreenAdjusted=TRUE**,tx2gene=tx2gene)
stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)
drim.padj <- getAdjustedPValues(stageRObj,order=TRUE,onlySignificantGenes=TRUE)
#
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8 LC_PAPER=en_DK.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.32.0 AnnotationDbi_1.42.1 tximport_1.8.0
[4] stageR_1.2.22 SummarizedExperiment_1.10.1 DelayedArray_0.6.1
[7] BiocParallel_1.14.2 matrixStats_0.53.1 Biobase_2.40.0
[10] GenomicRanges_1.32.4 GenomeInfoDb_1.16.0 IRanges_2.14.10
[13] S4Vectors_0.18.3 BiocGenerics_0.26.0 DRIMSeq_1.8.0
[16] forcats_0.3.0 stringr_1.2.0 dplyr_0.7.6
[19] purrr_0.2.5 tidyr_0.8.1 tibble_1.4.2
[22] ggplot2_2.2.1 tidyverse_1.2.1 readr_1.1.1
[25] nvimcom_0.9-58
Thank you for reporting this.
There used to be a problem with getAdjustedPValues in the older versions of stageR. It would be great if you could update your version with
devtools::install_github("statOmics/stageR")
and check if the error still occurs.
Would you mind sharing (some of) your data that reproduces the error such that I can look into this in detail?
thanks for the quick reply. I will update the package and try with the new version. How could I share a subset of my data to make as much sense as possible? would you like to have the imported and scaled abundances of a subset of transcripts and a subset of samples?
I am not yet very familiar with DRIMSeq, but it would be great if you could share your `d` object after running DRIMSeq. From browsing through your code it seems like that should contain most of what I need to run stageR and check where things went wrong. You could for example send it to koen.vandenberge@ugent.be through WeTransfer or share a Dropbox link here on the forum.
I sent you the result object from the DRIMSeq analysis.
the data for stageR would then be:
pScreen <- res$res$adj_pvalue
pConfirmation <- res$txp$adj_pvalue
the error message persists, after updating as you recommended.
Hello Fiona, I am very sorry for my late reply. You have mentioned that the design matrix from DRIMSeq output is different than what you feed in. Would you like that I have a look on that? If yes, would share with me the d object after running the analysis? You could send it to gosia.nowicka@uzh.ch or via Dropbox.
Hi! sorry for the late answer I totally forgot about this thread. It was my own mistake because I called the design matrix of the reduced model...So all is good. Thanks for the help