Question

Technical-biological replicates in edgeR

0

Entering edit mode

mo17 • 0

@992655c9

Last seen 3.9 years ago

Hello! I would appreciate if I can get some help with the following issue that I have been struggling with.

My experiment consists of 3 biological replicates per genotype (total of 3 genotypes; thus, 9 biological replicates) per time point. I have two technical replicates per biological replicate. In my case, technical replicates were not made by running the library multiple times and/or on different lines; instead, the two technical replicates correspond to two independent Illumina 150 PE mRNA sequencing libraries made from one single biological replicate (e.g. Biological Replicate 1; has libraries "A" and "B"; instead of the same library being run multiple times)

I have read posts were counts of technical replicates are merged before starting the DEG analysis in edgeR, but this was in cases where technical replicates referred to identical libraries being run many times. Unfortunately, that is not my case, so I would really appreciate any feedback on how I could proceed. So my first question would be: Is it statically appropriate to add up the counts from my technical replicates in this experiment? If not, what approach could I follow?

Please note that so far, I have fragment counts from featureCounts for all my libraries; nevertheless, there are big differences between technical replicates sometimes (e.g. for gene "X" within a biological replicate/time point, I get 477 counts from one library but 975 from the other one). Therefore, I calculated the relative standard deviation between the 2 technical replicates of each biological replicate/time point. Then, I only selected the genes that do not surpass a certain relative standard deviation threshold and made an average from those that were under the threshold in order to have a final biological replicate as an input in edgeR. My second question would be: Could I get any comments on this last method? That would be greatly appreciated!

Thank you very much for any help in advance! :)

edgeR • 1.5k views

ADD COMMENT • link updated 3.9 years ago by Gordon Smyth 52k • written 3.9 years ago by mo17 • 0

score 0 · Answer 1 · 2021-03-23

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 19 hours ago

San Diego

If side by side preps of the same sample are generating massively different counts, I'm think you have to question the validity of the whole experiment.

ADD COMMENT • link 3.9 years ago swbarnes2 ★ 1.4k

score 0 · Answer 2 · 2021-03-23

You can still add the technical replicates. There is no major problem with that. The replicates do not need to be re-sequencing of the same library prep nor do they need to be perfectly similar.

BTW you haven't convinced me yet of big differences between the replicates. It is not sufficient to simply compare counts (e.g., 477 vs 975) without taking into account the corresponding library sizes. Comparing replicates would be better done using edgeR's dispersion estimator or just by looking at an MDS plot.

I don't like your ad hoc method of trimming genes with large standard devitations. Data hacking like this is unnecessary and discards data that might be valuable. It also potentially biases the remaining data. If you do have a reason to be worried about genes that are highly variable between the technical replicates (and I'm not convinced you need to be), then use the limma duplicateCorrelation approach, which will downweight such genes. That can be done most easily by running edgeR::voomLmFit() with block set to the biological sample.