Hello!
I have seen some questions similar to mine, but there are some important differences so I want to see if anyone has any advice that is more specific to my situation. I am working on a paper where I found that when the same sample is sequenced with two different sequencing pipelines (i.e. different sequencing facility, different library prep), then VST-normalized expression between them is correlated only ~85%. I have done this for two different sample types (A & B) and three different sequencing pipelines and see the same pattern when I correlate normalized counts between each pair of pipelines for each sample. I want to see if the divergence in expression will affect differential expression between A and B (i.e. see if the different pipelines will identify the same differentially expressed genes). The idea I have is to run differential expression analyses between A and B for pipeline 1, pipeline 2, and pipeline 3, then calculate correlation coefficients for the LFC between each pair of pipelines. The problem is that I have uneven sample sizes between A and B, and between each pipeline:
Pipeline 1: A = 3, B = 10 Pipeline 2: A = 3, B = 11 Pipeline 3: A = 3, B = 3
From other questions that people have posted, It seems like the uneven sample sizes between A and B for each DE analysis is not a problem. But I am worried that the different sample sizes for B in the three different pipelines might be problematic for comparing between the different pipelines, since the pipelines where B has more replicates will identify more DE genes than pipeline 3 where B only has 3 replicates.
Does anyone have any advice? Some things I have thought about are - combining the datasets from each pipeline into one big DE analysis and using an interaction term to test for an interaction between pipeline and LFC, or running DE analyses separately for each pipeline but using different FDR thresholds for each one so that the number of DE genes is roughly equal. Someone suggested using shrinkage, but I don't understand how shrinking the LFC would help in this situation.
Many thanks!
Code should be placed in three backticks as shown below
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )