Hi,
I want to use DESeq2 for assessing allele-specific expression. I will compare gene-level allelic counts of a yeast hybrid (basically comparing counts of orthologous genes of parentals of this hybrid).
I'm aware of this tutorial http://rstudio-pubs-static.s3.amazonaws.com/275642_e9d578fe1f7a404aad0553f52236c0a4.html.
My question is that parentals might have orthologous genes with different lengths, so for example ortholog in parent A is 1000bp and in parent B is 1020 bp. Is it possible at some step of DESeq2 to account for this length difference?
Thanks
Hi Michael,
Thanks for reply.
Just to make sure I do everything in order:
My `count_table` looks like
The code is
`length_data` looks like
Is everything correct?
Cheers and sorry if formatting is a bit messed
EDIT: When I omit sizeFactors(dds) <- c(as.numeric(rep("1",6))), I get the message `using 'avgTxLength' from assays(dds), correcting for library size`, but if I do both preset library size to 1 and feed the code with gene length, I do not receive the message above. Does it mean that with predefined size factors its not possible to account for gene length?
There's two issues here, first is that you should correct for library size unless you are comparing within individual (e.g. alt to total or alt to ref, etc.). Above you don't have the same kind of count table as in my example, where there were two columns for every sample.
The other issue is that you're correct, if you want to specify the size factors as 1, the avgTxLength method isn't the way to go. Instead you should use normalizationFactors instead:
Dear Michael,
Regarding normalization, so thanks a lot, I will try running the code like that.
Regarding the count table, so for example SC1 and SU1 are from the same library (as other samples with the same numbers), and I thought my design will compare alt to ref, like you have mentioned, won't it? Is there a conceptual difference for Deseq2 calculations if I also add "sample" column?
Once again thank you for your time!
I'd say take a look at how I went about it in the workflow you posted. Yes, there's a critical difference in adding the sample information and identifying which column is which allele or not.
Hi Michael,
I am sorry for ignorance, but still I don't understand why my example wouldn't work properly: I specify in design formula that there are two conditions (basically parents), and I control for library size and gene length. So basically it will compare condition 1 vs condition 2, and will show in which condition there is higher or lower expression (which in theory corresponds to ASE). What does this logic miss?
Thanks again for your time
One of the strengths of the ASE approach is that you compare counts within individual, controlling for many potential biases. In the workflow I created, this is the approach. Your approach doesn’t have individual in it.
Thanks for elaborating!Now it makes sense for me. And I guess for human data (i.e. patients or any "case-control" study) this "individualized" approach would matter compared to mine. In my case though I analyze yeasts (they are quite homogenous within the culture), so I can in principle assume that between-colony variations will not bias overall analysis.
Anyway, for my analysis I will do as you suggest with design like `~parent+sample`.
Also, just to clarify, with `normalizationFactors` I don't to set sizefactors to 1, correct?
That design is confounded. Again, I’d recommend to take a look at that workflow.
The only difference is that you will provide norm factors before DESeq()
Dear Michael,
I was wondering if in DESeq2 it is possible to simultaneously set size factors and control for gene length (like you have suggested in your post)?
Thanks in advance