Hi all,
Was wondering if somebody could help me with an issue I am having with DEXSeq. The issue is that I am getting basically 100,s of thousands of exons being deferentially used in a time course experiment I am carrying out and was wondering if somebody that has experience with this could help me/look over my code. I have about 100 samples but have cut down to organs in which I have at least 2 replicates (so far) per time point. The time points are p3, p15, p30 and p60 with an additional 4 time points coming from sequencing results in the near future.
The aim of this question I have is as follows: Is there a difference in exon usage in each organ going from p3 up to p60. After this, can I identify exons that are deferentially used across all organs or a subset of organs?
I have no trouble doing this analysis for differential gene expresion across time but I think I have become a bit confused, especially with the sample data with DEXSeq: The code is as follows:
Prepare annotations
hse <- makeTxDbFromBiomart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
exonsByGene <- exonsBy(hse, by="gene")
exonicParts = disjointExons( hse, aggregateGenes=FALSE )
bamDir<- file.path(file= "Amir Raw RNA-sq data/")
fls <-list.files( bamDir, pattern="bam$", full=TRUE )
bamlst = BamFileList( fls, index=character(), yieldSize=100000,
obeyQname=TRUE )
SE = summarizeOverlaps( exonicParts, bamlst, mode="Union",
singleEnd=FALSE, ignore.strand=TRUE, inter.feature=FALSE,
fragments=TRUE )
I am then pulling the sample data from columns in my metadata sheet name sampledataalltimepoints:
colData(SE)$condition=Sampledataalltimepoints$Time_point
colData(SE)$Organ=Sampledataalltimepoints$Organ
colData(SE)
SE$condition<-as.factor(SE$condition)
SE$Organ<-as.factor(SE$Organ)
colData(SE)$replicate=Sampledataalltimepoints$Replicate
SE$replicate<-NULL
diffexon<-DEXSeqDataSetFromSE( SE ,design= ~condition+Organ+exon )
head( counts(diffexon), 5 )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
ENSMUSG00000000001:E001 459 1992 552 2052 1028 654 907 1576
ENSMUSG00000000001:E002 51 394 90 297 173 110 154 391
ENSMUSG00000000001:E003 126 393 99 333 203 118 144 374
ENSMUSG00000000001:E004 139 365 110 305 179 110 139 340
ENSMUSG00000000001:E005 212 439 121 403 206 152 199 418.... etc
diffexon = estimateSizeFactors( diffexon )
dxd<-estimateDispersions(diffexon ).... Very time consuming!
plotDispEsts(dxd
diffexonusage = testForDEU( dxd, fullModel = ~condition+Organ+exon, reducedModel = ~Organ, BPPARAM = BPPARAM)
DEXSeqResults(diffexonusage )
dxd = estimateExonFoldChanges(diffexonusage , fitExpToVar ="condition", BPPARAM = BPPARAM, independentFiltering = F)
dxr1 = DEXSeqResults( dxd )
DataFrame with 452908 rows and 17 columns
groupID featureID
<character> <character>
ENSMUSG00000000001:E001 ENSMUSG00000000001 E001
ENSMUSG00000000001:E002 ENSMUSG00000000001 E002
ENSMUSG00000000001:E003 ENSMUSG00000000001 E003
ENSMUSG00000000001:E004 ENSMUSG00000000001 E004
ENSMUSG00000000001:E005 ENSMUSG00000000001 E005
... ... ...
ENSMUSG00000116527:E011 ENSMUSG00000116527 E011
ENSMUSG00000116527:E012 ENSMUSG00000116527 E012
ENSMUSG00000116527:E013 ENSMUSG00000116527 E013
ENSMUSG00000116527:E014 ENSMUSG00000116527 E014
ENSMUSG00000116528:E001 ENSMUSG00000116528 E001
exonBaseMean dispersion stat
<numeric> <numeric> <numeric>
ENSMUSG00000000001:E001 657.8843 0.10376174 14.42773
ENSMUSG00000000001:E002 106.5895 0.13580335 972.78567
ENSMUSG00000000001:E003 112.4083 0.09943998 1267.37422
ENSMUSG00000000001:E004 105.6330 0.09874353 1331.44409
ENSMUSG00000000001:E005 134.5156 0.09485669 1153.30200... etc
By making the reduced model organ, am I not lookinf for the effects of time and exon usage?
This seems very vert wrong to me given the massive amount of differentially expressed exons.. and MA plot essentially is a splash of red for significant genes across the whole plot. Is the design I am using wrong, again, to test for differential exon usage across time in each organ? any help would be hugely helpful as this is an important analysis for me. Many thanks in advance!
Hi Alejandro,
Thank you so much for the quick reply! Really appreciate this. Will test these out. A few quick follow up questions:
When you write 'sample' in your design formulae, may I ask what you are referring to? The sampledata I have are age and organ (7 organs in total) 4 ages. I would like to test the effects of time on exon usage in the different organs...
Many thanks again!
With regards to subsetting p3 and p60 organs, I would also like to see if there is a progressive change in exon usage inlcuding at p15 and p30, co could I use the same design to see how exons change across all time points without subsetting?
No problem! The "sample" column is normally a column in colData that is added by default when creating a DEXSeqDataSet object, if you don't have it you should add it. This is a term that accounts for sample-specific contributions, it also absorbs other effects such as changes in gene expression. More details can be found in the paper and the software vignette.
Yes, if you would like to see progressive changes in through the time points, the same formulae should work.
Awesome! Running those 3 formulae right now and will report. Thank you so much once again! and yes, the title 'sample' was indeed included after te DEXSeq object had been created, I just wasn't sure that I had to include it!
Fingers crossed :) have a great day
Hi Alejandro,
Thank you so much for your help above! These models have helped tremendously! They all work perfectly. I was wondering if I could ask a few follow up questions:
Is there a way in estimateexonfoldchanges function to to specify both "condition" and "organ" in order to be able to plot differential exon usage for the individual organs over time. The problem at the moment is that with the first model, The plotDEXSeq function produced a plot of DE usage from p3 up to p60, however, does not specify which organ this belongs to. Are the values the average of all organs for a given exon?
It would be nice to see how they separate individually and then take all the expression data to form patterns of exons that change commonly across all organs as this isn't explicitly clear...
Many thanks!