Parallel computing for IsoformSwitchAnalyzeR
1
0
Entering edit mode
maya.kappil ▴ 30
@mayakappil-18569
Last seen 5.2 years ago

Hello! I am interested to run the IsoformSwitchAnalyzeR package to assess differential transcript usage and related downstream analysis. However, I have a large set of RNAseq samples (200) - for the native DRIMSeq and DEXSeq packages, it is possible to add a command to indicate the number of cores to utilize to reduce the running time. For example:

  d <- dmPrecision(d, design=design_full,BPPARAM = BiocParallel::SerialParam())
  d <- dmFit(d, design=design_full,BPPARAM = BiocParallel::SerialParam())
  d <- dmTest(d, coef="condition2",BPPARAM = BiocParallel::SerialParam())

Do you have suggestions for how to accomplish this when running the isoformSwitchTestDEXSeq or isoformSwitchTestDRIMSeq commands?

Thanks!

IsoformSwitchAnalyzeR • 1.4k views
ADD COMMENT
1
Entering edit mode
@kvittingseerup-7956
Last seen 15 months ago
European Union

Hi Maya

Thanks for reaching out. Unfortunately there is no way of doing that from within IsoformSwitchAnalyzeR at the moment. That said the runtime of DRIMSeq mostly scales with the number of transcripts analyzed (and a little bit with the number of samples) so if you just make sure to use preFilter() first isoformSwitchTestDRIMSeq() should run in 30-60 min for ~30k transcripts.

Cheers Kristoffer

ADD COMMENT
0
Entering edit mode

Hmm... I have somewhat more transcripts using the default preFilter setting (~80K). And, the calculated time to run DEXSeq seemed really large (17763754)... So far, I've tried running it for 15hrs (both correcting for a confounding variable and without correction) and they did not go to completion. I will try a more stringent filter to see if that helps. But just in case there is something obvious that I'm specifying incorrectly, below is the code I used:

Generate switchlist

base.dir<-getwd()

salmonQuant <- importIsoformExpression( + parentDir = paste0(base.dir,"/quants"),addIsofomIdAsColumn=TRUE)

coldata<-read.csv("Covariates.csv") summary(coldata$BW) AGA LGA SGA 112 55 33 coldata$batch<-factor(coldata$batch)

myDesign<-data.frame(sampleID=coldata$ID, + condition=coldata$BW, + batch=coldata$batch)

contrast<-data.frame(condition1="AGA",condition2="SGA")

aSwitchList_SGA <- importRdata( + isoformCountMatrix = salmonQuant$counts, + isoformRepExpression = salmonQuant$abundance, + designMatrix = myDesign, + comparisonsToMake=contrast, + isoformExonAnnoation = paste0(base.dir,"/gencode.v28.annotation.gtf.gz"), + showProgress = FALSE + )

aSwitchList_SGA This switchAnalyzeRlist list contains: 199338 isoforms from 57599 genes 1 comparison from 2 conditions Feature analyzed: [1] "ORFs"

Filter

aSwitchListSGA <- preFilter( + switchAnalyzeRlist = aSwitchListSGA, + geneExpressionCutoff = 1, + isoformExpressionCutoff = 0, + removeSingleIsoformGenes = TRUE + ) The fitering removed 121323 ( 60.86% of ) transcripts. There is now 78015 isoforms left

DEXSeq

aSwitchListSGADEXSeq <- isoformSwitchTestDEXSeq( + switchAnalyzeRlist = aSwitchList_SGA, + correctForConfoundingFactors=TRUE, + reduceToSwitchingGenes=TRUE + ) Step 1 of 4: Correcting for batch effect... Step 2 of 4: Calculating Isoform Fraction matrix... Step 3 of 4: Testing each pariwise comparisons with DEXSeq (this might be a bit slow)... Estimated time (for dataset with ~30.000 isoforms): 17763754 min

ADD REPLY
1
Entering edit mode

You are right that DEXSeq is infeasible (it scales exponentially with number of samples) so it would need to be DRIMSeq. Your code looks good. You could try filtering a bit more strict - I would suggest setting IFcutoff=0.1 - meaning you are only considering isoforms which on average contribute with at least 10% to the parent gene expression.

ADD REPLY

Login before adding your answer.

Traffic: 454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6