Question

Need advice on normalization of SMART-amplified RNA-Seq data

0

Entering edit mode

Jon Bråte ▴ 260

@jon-brate-6263

Last seen 4 months ago

Norway

I have Illumina sequenced mRNA-libraries amplified from single cells using the SMART protocol. Each sample required different numbers of PCR-cycles to get enough cDNA for library prep, and the quality of the cDNA was also variable (i.e. read length distribution and amount).

The goal of the experiment is to see whether the different cells are different in terms of gene expression, and whether cells with a similar morphology are more similar to each other genetically (e.g. cluster together in a PCA-analysis). Since we don't know these things yet, it is hard to say whether we have replicates or not, but we hope to also identify differetially expressed genes between the different cell types.

But I don’t know how to best normalize these data. I was thinking to normalize based on housekeeping genes. But maybe it is better to take into account the number of PCR-cycles (but there is also an amplification step in the library prep)? Or simply just normalize based on the number of mapped reads in total? And which criteria can I use to evaluate which procedure performs the best?

Thanks! Jon

normalization rna-seq deseq2 edger smart • 1.7k views

ADD COMMENT • link updated 9.7 years ago by Ryan C. Thompson ★ 7.9k • written 9.7 years ago by Jon Bråte ▴ 260

1

Entering edit mode

I think it at least partly depends on what you consider 'similar' gene expression in this context. Question: If you have two cells with precisely the same transcripts expressed in precisely the same relative amounts, *but* one cell expresses everything ten times more than the other then do you consider those cells similar or not? If you do (and for many applications I would argue that it is reasonable to consider those two cells to be identical; total RNA content is often simply a function of cell size which is often not really of biological interest) then normalising by total read count seems sufficient. If you do not consider those cells identical then I think you need some kind of spike-in (like ERCC) to normalise against. Do you have those included in your design?

ADD REPLY • link 9.7 years ago alexgutteridge ▴ 50

0

Entering edit mode

Thanks for the feedback. For my case, in the example you describe the two cells would be regarded identical. I guess it would also be very hard to distinguish actual elevated levels of gene expression differences in the PCR-amplification. About the spike-in, we didn't think about that. But that is definitely something we should have added...

ADD REPLY • link 9.7 years ago Jon Bråte ▴ 260

score 1 · Answer 1 · 2015-03-11

1

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 26 days ago

Icahn School of Medicine at Mount Sinai…

In the comments, you say that you consider cells with the same relative gene expression levels to be functionally identical. In that case, I'd say that the standard normalization procedures in edgeR and DESeq2 will be fine for your purposes.

ADD COMMENT • link 9.7 years ago Ryan C. Thompson ★ 7.9k