Question

Batch effects between different sequencing runs

0

Entering edit mode

A ▴ 60

@a-14337

Last seen 16 months ago

United Kingdom

Hi all,

I would be really greatful if somebody could provide some insight about how to go about comparing two experiments that have both been sequenced at different times (and potentially in a different manner (for example single end vs paired end). I am trying to model the batch effect in DESeq2 in order to account for effects by batch but im running in to trouble and am unsure why:

I have two datasets with which i have combined the raw counts and the metadata.. all in order with regards to order of columns and rows of metadata etc... however, when i create the formula:

dds<-DESeqDataSetFromMatrix(countData = mergedCounts1, colData = metadataRobert, design = ~Batch+Group)

I get an error:

Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified.

As far as i am aware these are not the same as batch is simply batch 1 and batch 2, while the groups has multipls groups, control, trt 1, trt2, trt3 etc...

I guess the first question is, how to overcome this within the model. And the second question is, is a workaround to use the remove batch effect in limma on the VST counts and then re-plot the PCA and do downstream analysis?

This is particularly confusing for me as the variance is quite big between groups which also corresponds to batches.. however.. as the actual groups are completely different this may also be normal!!! so it is hard to tell if there is 90% variance or 70% variance because of a batch effect... I imagine if there were batch effects those samples that were truly similar between different sequencing types would be much closer to each other but not overlapping... not 90% different? I am unsure about how to tackle batch effects from this point of view...

thanks!

deseq2 batchcorrection • 2.7k views

ADD COMMENT • link updated 5.5 years ago by swbarnes2 ★ 1.4k • written 5.5 years ago by A ▴ 60

0

Entering edit mode

Please post your colData. You can do

as.data.frame(colData(dds))

ADD REPLY • link 5.5 years ago Michael Love 43k

0

Entering edit mode

thank you!

  SampleName
<fctr>
Organism
<fctr>
Replicate
<int>
Group
<fctr>
SequenceRun
<int>
Molecule
<fctr>
C11orf95_R1 EV_R1   Mus musculus    1   empty vector control    1   RNA
C11orf95_R2 EV_R2   Mus musculus    2   empty vector control    1   RNA
C11orf95_R3 EV_R3   Mus musculus    3   empty vector control    1   RNA
EV_R1   C11orf95_R1 Mus musculus    1   C11orf95    1   RNA
EV_R2   C11orf95_R2 Mus musculus    2   C11orf95    1   RNA
EV_R3   C11orf95_R3 Mus musculus    3   C11orf95    1   RNA
Fus1_R1 RELA_R1 Mus musculus    1   RELA WT 1   RNA
Fus1_R2 RELA_R2 Mus musculus    2   RELA WT 1   RNA
Fus1_R3 RELA_R3 Mus musculus    3   RELA WT 1   RNA
Fus2_R1 Fus1_R1 Mus musculus    1   RELA FUS1   1   RNA
Fus2_R2 Fus1_R2 Mus musculus    2   RELA FUS1   1   RNA
Fus2_R3 Fus1_R3 Mus musculus    3   RELA FUS1   1   RNA
RELA_R1 Fus2_R1 Mus musculus    1   RELA FUS2   1   RNA
RELA_R2 Fus2_R2 Mus musculus    2   RELA FUS2   1   RNA
RELA_R3 Fus2_R3 Mus musculus    3   RELA FUS2   1   RNA
M350.1  M350-1  Mus musculus    1   RHD 2   RNA
M350.2  M350-2  Mus musculus    2   RHD 2   RNA
M350.3  M350-3  Mus musculus    3   RHD 2   RNA
M350.4  M350-4  Mus musculus    4   RHD 2   RNA
M351.1  M351-1  Mus musculus    1   ZF  2   RNA
M351.2  M351-2  Mus musculus    2   ZF  2   RNA
M351.3  M351-3  Mus musculus    3   ZF  2

i hope this is clear enough? There is nothing really unusual (to me) that is immediately apparent...

if also add Batch1, Batch2 instead of 1 or 2 etc.. get the same error...

ADD REPLY • link 5.5 years ago A ▴ 60

score 0 · Answer 1 · 2019-10-04

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 3 days ago

San Diego

If "SequenceRun" is batch, this is hopelessly confounded. It doesn't matter that multiple groups are found in each batch, what matter is that all the members of each group are found in only one batch.

ADD COMMENT • link 5.5 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Yes, agree with this. You can't control for batch because it is confounded with the biological condition of interest. This is explicitly discussed in the vignette section that the "checkFullRank" error points you to.

ADD REPLY • link 5.5 years ago Michael Love 43k

0

Entering edit mode

I'll also note for the future that being sequenced at different times doesn't really matter. You can put things on the sequencing instrument whenever you want, that doesn't introduce technical artifacts. Prepping the RNA on different days, making the libraries on different days, that adds technical variation. You could probably dump read 2 to make the paired end samples look more like single end (I know my lab preps everything for paired end, even if they don't ever read the other end); it's not like having paired end data makes your counts twice as good as single end.

ADD REPLY • link 5.5 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Many thanks all!! appreciate the response and clarification... this indeed makes perfect sense.. a quick few follow up questions:

With this as a caveat: is it ok to simply proceed with ~Group in the model , to model differences between the difference conditions (groups)?

And secondly, in the case where I would be able to model a batch effect in a scenario like this? Taking out read 2 for PE seq makes sense..

What would i do where libraries have been prepped on different days? would it be the case that i have to representation of all groups in both batches?

so having said that, I should proceed with the caveat that neither sequencing contains the same group?? of course experimental validation would follow!

thanks in advance!!

ADD REPLY • link 5.5 years ago A ▴ 60