Hi Alejandro and Michael,
Thanks a lot for helping me out with DEXSEQ package. I have some questions regarding it:
Ques1. I was reading DEXSEQ.pdf and the explanation of null model ( ∼ sample + exon ) vs alt model (∼ sample + exon + condition:exon)
Under null model, what is the hypothesis? Is it that exon/counting bin counts does not depend on condition? What does alternate hypothesis mean here? I do not understand this part and if possible can you explain me this:
The two models described by these formulae are fit for each counting bin, where the data supplied to the fit comprise two read count values for each sample, corresponding to the two levels of the exon factor: the number of reads mapping to the bin in question (level this), and the sum of the read counts from all other bins of the same gene (level others).
Also is it at this step testForDeu
where exonic counts are adjusted for changes in gene expression?
Ques2. I have generated from STAR aligner, a count matrix for splice junction reads for 2 treatment conditions: control and knockout. This is how it looks
gene c1 c2 kd1 kd2
g1_chrVIII_33663_33698_2 0 0 0 0
g2_chrVIII_326943_327029_2 0 0 0 0
g3_chrVIII_129529_129644_1 0 3 0 0
g3_chrVIII_129529_129647_1 123 139 148 217
g4_chrVIII_400482_400648_2 0 0 0 0
g4_chrVIII_400482_400850_2 0 0 0 0
g5_chrVIII_432447_432483_1 0 0 0 0
g6_chrVIII_428459_428647_2 0 0 0 0
g7_chrVIII_119009_119035_2 0 0 0 0
g8_chrVIII_185267_185575_2 0 0 0 0
g9_chrVIII_148317_148666_2 0 0 0 0
g10_chrVIII_251156_251270_1 0 0 0 0
g10_chrVIII_251156_251258_1 5 10 3 10
g10_chrVIII_251156_251458_1 0 1 2 1
g10_chrVIII_251156_251248_1 186 189 223 233
g10_chrVIII_251156_251224_1 4 0 2 0
I want to look for differential splice junction usage in these 2 conditions?
a. Can I use DESeq2 directly on this count matrix?? I think if I use this, I am not taking into consideration changes in the gene expression between 2 conditions.
b. I would want to use DEXSEQ on the splice junction counts matrix but then how can I make the DEXSeqDataSet object since now let's say if I have 2 conditions with 2 reps in the above example, then column 5,6,7,8 should be the total gene count. This is generally added when I use DEXSeqDataSet on my exon count matrix but this is just splice junction count matrix. In other words I want to create DEXSeqDataSet object where in addition to above 4 columns I have 4 columns more which are basically the gene counts for that junction in question for sample c1,c2,kd1,kd2. I have the genecounts for every sample in another file. How can I then make a DEXSeqDataSet object??
Hope to hear from you guys and thanks for all the help.
Thanks a lot Alejandro for making it clear. Regarding making DEXSeqDataSet, my alternativeCountData which is a count matrix of GENES in my case should have same number of rows as my initial count matrix(which is junction counts)? And the names of the rows should be the same in 2 count matrix?
You need two matrices, countData and alternativeCountData:
Let's say that for a gene you have 5 junctions for a gene and 4 samples,
countData
would look like thisalternativeCountData
should look like this:where the gene counts are repeated for each junction