Question

Horizontal integration of two RNA-seq datasets derived from two different platforms

0

Entering edit mode

rjoana • 0

@ea48508a

Last seen 2.0 years ago

Portugal

Currently, I'm interested in horizontal integration of two RNA-seq datasets derived from different platforms (S5 and Illumina), one containing normal samples and the other containing abnormal samples. Since DESeq2 input is one raw count matrix, I'm trying to find a "conversion factor" based on raw count reads of the two datasets to address the issue of the normal and abnormal samples coming from different platforms. The stategy would be using the housekeeping genes shared by the datasets. After this step, I'd use estimateSizeFactors with controlGenes.

Do you think this would be a reliable strategy? Thanks in advance

DESeq2 dataIntegration • 1.0k views

ADD COMMENT • link updated 2.1 years ago by Michael Love 43k • written 2.1 years ago by rjoana • 0

0

Entering edit mode

Just to be more clear: having the 2 raw count matrices, applying a conversion factor based on shared house keeping genes, and lastely combine the 2 count matrices for DESeq2 input. In the literature there are some algorithms that claim their methods work but when reading further, the integrated databases derive from the same platform.

ADD REPLY • link 2.1 years ago rjoana • 0

1

Entering edit mode

Linear scaling is nothing different than the default normalization deseq2 does anyway. Different technologies measure different sets of genes, have different dynamic ranges and different ratios of genes compared to a set of housekeepers. A linear scaling will not do here. In any case, since here the treatment condition seems to be nested by the technology you anyway cannot do any integration. It's fully confounded, no stats magic will change that.

ADD REPLY • link 2.1 years ago ATpoint ★ 4.8k

score 1 · Answer 1 · 2023-03-13

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 11 hours ago

United States

I think you need more than size factor based scaling to deal with the different technology.

Do you have no control samples sequenced on the two platforms? It may be impossible to distinguish biological differences from technical ones.

ADD COMMENT • link 2.1 years ago Michael Love 43k

0

Entering edit mode

No, I don't. Control normal samples derived from the tissue I'm studying and sequenced at my lab are rarely included in RNA seq studies due to its limited availability. No compatible S5 data is available in public data bases. Thanks for all the comments.

ADD REPLY • link 2.1 years ago rjoana • 0

1

Entering edit mode

Without any samples across the technology, I can't think of any way to harmonize. If you had samples across you could use RUV-seq, which has methods that take advantage of these types of samples.

ADD REPLY • link 2.1 years ago Michael Love 43k