Hello,
This is crossposted from biostars as I was unsure which platform is right for this question.
I am looking to do a differential expression analysis for RNAaseq data in a hybrid by comparing the parental alleles to the in silico midparent value (for example like this paper). I was hoping to use modern DE tools such as DESeq2, EdgeR etc. that assume a negative binomial model. What I am confused about is how to incorporate normalization into computing the midparent values - for example if two corresponding parental genes have different lengths, how would I accommodate for this?
Suppose I am working with DESeq2 and my parents are A, B and my child is C with parent alleles Ca, Cb. The flow I was thinking of is as follows:
- Input samples from A,B, Ca, Cb into a DESeq model with gene lengths and compute normalization factors
normalizationFactors(dds)
. - Take normalized sample columns from the normalized counts matrix, say
A_norm
,B_norm
andCa_norm
,Cb_norm
- Use
round((A_norm+B_norm)/2))
,Ca_norm
,Cb_norm
as inputs to a new DESeq model with constant size factors of 1 to account for the pre-normalized values.
The problem is that from what I understand DESeq does not directly use the normalized counts but rather incorporates the normalization factors into the computation of the means and dispersion values so I am not sure if this method makes sense. In particular if DESeq directly used the normalized counts K_ij/normalizationFactors(dds)
in its modelling, then I belieeve providing the average of the normalized counts across the parents and a constant normalization matrix would have made statistical sense.
The only other method I can think of is to use a t-test on the log(TPM+1) values but I have 15 tissues and 3 replicates per tissue so in case I have an interaction between the tissue and genotype, I am afraid the normal distribution assumption of the test would not hold.
Any suggestions or feedback would be much appreciated, thank you!