Technical and biological replates in BitSeq
1
0
Entering edit mode
Sam McInturf ▴ 300
@sam-mcinturf-5291
Last seen 9.2 years ago
United States

Bioconductors,

tl;dr:  Does bitseq differeniate between biological and technical variation, if it does, how do I tell BitSeq which files are belong to which biological replicate

I am looking to use BitSeq to analyze my single end illumina data.  I have a combinatorial design (3x2x2) where each condition has 3 biological replicates, and each biological replicate was split between 3 lanes to have 3 technical replicates.  So I can use BitSeq::getExpression for every technical replicate, to estimate the expression of each transcript and produce a rpkm file.

 I'll write each file name to describe the sample as c<condition>b<bioRep>t<techrep> (c1b1t1, c1b1t2, c1b1t3, c1b2t1, ..., c2b3t3).  

If I had no technical replicates I would simple say

getDE(list("A" = c(c1b1,c1b2, c1b3), "B"=1(c2b1,c2b2,c2b3))

But if  I include my technical replicates as

getDE(list("A" = c(c1b1t1, c1b1t2, c1b1t3, c1b2t1, ..., c1b3t1, ...), "B"=1(c2b1t1, c2b1t2, c2b1t3, c2b2t1, ...,c2b3t1, ...))

But this does not inform BitSeq of the relationship of the variance between each sample.  (technical and biological variation).  I have read the Bioinformatics paper (vol 28 no 13. 2012, pages 1721-1728) with some level of understanding, but I am by no means fluent/good with bayesian concepts.  But I didn't see an explicit term for biological and technical variance (although I am used to dealing with tech reps by comparing a full model vs a reduced model, DESeq2 style).  In section 3.4 DE analysis, second paragraph, the authors talk about combining the posterior probabilities, but I believe that is a direct reference to making Figure 5 b and d, not to how to feed in the data prior to DE calls.

 

Thanks for any wisdom!

Sam

BitSeq bitseq rnaseq • 1.5k views
ADD COMMENT
0
Entering edit mode

Hi Sam,

Currently, BitSeq does not support technical replicates. I would proceed exactly as you mentioned at the second getDE command and then I would check DE consistency by comparing with the results arising when combining all technical replicates into a single sample (e.g: c1b1 = c1b1t1 + c1b1t2 + c1b1t3, c1b2 = c1b2t1 + c1b2t2 + c1b2t3 etc...) as Ryan suggested.

 

ADD REPLY
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 6 weeks ago
Icahn School of Medicine at Mount Sinai…

General practice is to combine technical replicates into a single samples. This is justified because in the absence of alternative isoforms, technical variation is known to follow a Poisson distribution and the sum of two Poission distributions is a Poisson distribution, so no information is lost by combining. This is complicated a bit by splicing, but in any case, from my reading of the BitSeq paper, it uses MCMC to estimate the distribution (i.e. technical variation) of each transcript in each sample, so I think there's no need to keep the technical replicates separate. (I'm not a BitSeq user, though; this is just based on a quick reading of the methods section of the paper.)

In any case, if you are worried about excess technical variation, I would recommend quantifying the replicates separately and running a PCA plot or other exploratory data analysis techniques. If the technical replicates all cluster closely together, you are probably justified in combining them.

ADD COMMENT
0
Entering edit mode

Thank you for the response,

I have done the PCA for my samples and they cluster as expected, I guess I was just hoping that I could squeeze a little more out of the data :P

ADD REPLY

Login before adding your answer.

Traffic: 889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6