Question

Is comparison of data without pairwise comparison in batches possible? Or even reasonable?

1

Entering edit mode

Delta._.43 ▴ 20

@a6fd600c

Last seen 22 months ago

Poland

I'm trying to compare gene expression levels of a certain cell type generated in-vitro to expression of multiple cell types in-vivo using DESeq2. The data may be described as in the following experimental design table.

CT = c(sprintf("iCTa%d",1:4),sprintf("nCTa%d",1:6),sprintf("nCTb%d",1:2),sprintf("nCTc%d",1:2))
LB = c(rep(4,"LAB_A"),rep(10,"LAB_B"))
SMP = c(sprintf("A_SMP%d",1:4),sprintf("B_SMP%d",1:10))
data.frame(CT,LB,row.names = SMP)

Which looks like this -

           CT    LB
A_SMP1  iCTa1 LAB_A
A_SMP2  iCTa2 LAB_A
A_SMP3  iCTa3 LAB_A
A_SMP4  iCTa4 LAB_A
B_SMP1  nCTa1 LAB_B
B_SMP2  nCTa2 LAB_B
B_SMP3  nCTa3 LAB_B
B_SMP4  nCTa4 LAB_B
B_SMP5  nCTa5 LAB_B
B_SMP6  nCTa6 LAB_B
B_SMP7  nCTb1 LAB_B
B_SMP8  nCTb2 LAB_B
B_SMP9  nCTc1 LAB_B
B_SMP10 nCTc2 LAB_B

The problem here is that all the in-vitro cell types are from 1 Lab and all the normal in-vivo cell types are from a different Lab. So I tried using the Labs as a batch effect and comparing between cell types, but that fails because the design (~ 0 + CT + LAB) returns a non full rank matrix which DESeq doesn't accept, which if I understood right is happening because of no available pairs of data in both batches (or Labs).

I'm kind of new at this, so if there's any solution or suggestion, I would be grateful for it. Also, is it relevant to try something like this?

RNASeq DESeq2 • 819 views

ADD COMMENT • link updated 3.1 years ago by James W. MacDonald 67k • written 3.1 years ago by Delta._.43 ▴ 20

score 1 · Answer 1 · 2021-10-26

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 21 hours ago

United States

Your experiment is unfortunately uninterpretable. Any differences between the normal and in vitro samples are completely confounded with lab, so you cannot say if the difference is biological or technical. You could hypothetically use something like RUV to account for technical differences based on some known housekeeping genes, but that relies on you being able to accurately determine which genes are the housekeeping genes (e.g., they don't change expression between the two cell types, so any changes in expression are solely due to technical differences). But any results would be completely reliant on your ability to identify true housekeeping genes.

ADD COMMENT • link 3.1 years ago James W. MacDonald 67k

0

Entering edit mode

Thanks for the suggestion, I did have doubts regarding the effectiveness of such a comparison. But given the situation -

Is it plausible to normalise both the dataset separately Lab-wise using DESeq2 and perform a meta-analysis on top of the normalised counts?
And if so can DESeq2 be used to simply normalise the data without a model?

ADD REPLY • link 3.1 years ago Delta._.43 ▴ 20

1

Entering edit mode

It's not an issue of normalization, but instead an issue of confounding. Here's a simple example. Say you wanted to know if people in one state are heavier than those in another. So you get somebody in the first state to round up like 50 people and then weigh them, and another person in the second state to do the same. Unfortunately, the person in the first state only had this rusted old bathroom scale, whereas the person in the second state had a brand new electronic scale. If you do the comparison and the people in one state are heavier than the other, is the result accurate or not? It depends, right? I would tend to think the electronic scale is pretty accurate, but that busted old bathroom scale might be consistently biased up or down.

Now if you had a set of weights that you knew were exactly the same and you shipped one set to each state, they would be able to adjust their scales (and if you had a set of weights that ranged from maybe 50 - 300 lbs, more the better). But without some external reference you cannot say if the people are different or not because it's completely confounded with the choice of scale. And there is no magic that can unpick that.

If you say 'This set of 50 genes absolutely don't change between the two cell types, so any differences are due only to technical issues' then you have the equivalent of the weight set, and can adjust. But in the case of the weight set you know for sure that they are accurate, whereas with your housekeeping genes all you have is your own assertion that they don't change. What if you are wrong?

Also, what you describe isn't a meta-analysis. A meta-analysis is where you have two or more experiments that have the same groups that were run in different labs, or at different times. You can't just dump all the raw data into one analysis, so instead what you do is use one of the summary statistics (the t-statistics or the p-values) from each analysis to make the meta comparison. Since you can't do that with your in vitro samples, you can't do a meta-analysis. Normalizing separately isn't the same thing.

Also, DESeq doesn't 'normalize' the data. Instead it estimates offsets to use in the generalized linear model, and those offsets are pretty much restricted to account for differences in library size rather than other technical variables. You account for technical variables (if possible) as part of the model fit.

ADD REPLY • link 3.1 years ago James W. MacDonald 67k