Question

RNA-Seq, generate batch-free count matrix

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 9 days ago

United States

Sorry to resurrect a somehow old thread, but I wanted to kick the tires on this and wanted to clarify the google-record for posterity ... so: On Sun, Jul 20, 2014 at 5:18 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > > On Mon, 21 Jul 2014, Gordon K Smyth wrote: > >> logCPMc <- removeBatchEffect(y, batch) > > > Should have written > > logCPMc <- removeBatchEffect(y2, batch) Shouldn't the correction actually be this? R> lobCPMc <- removeBatchEffect(logCPM, batch) Because in the code snippet `y2` is a DGEList dataset (of counts). So, removing batch effects from soup to nuts: ## ------------------------------------------------ library(edgeR) y <- DGEList(counts=counts) ## Filter non-expressed genes: A <- aveLogCPM(y) y2 <- y2[A>1,] ## Then normalize and compute log2 counts-per-million with an offset y2 <- calcNormFactors(y2) logCPM <- cpm(y2, log=TRUE, prior.count=5) ## Then remove batch correct: logCPMc <- removeBatchEffect(logCPM, batch) ## ------------------------------------------------ Thanks, -steve -- Steve Lianoglou Computational Biologist Genentech

• 1.8k views

ADD COMMENT • link updated 10.7 years ago by Gordon Smyth 52k • written 10.7 years ago by Steve Lianoglou ★ 13k

score 0 · Answer 1 · 2014-08-01

Hi Steve, I think that's right. I put an example of what I was experimenting with here: https://github.com/brianjohnhaas/singleCellRNASeq note, it turned out that I really shouldn't have been trying to remove the 'batch' effect here, because my two plates of cells turned out to be quite different from each other beyond any batch issue (true biological differences). In any case, I'm still looking to identify various code snippets that are available through bioconductor to remove nuisance variation from data, and will fill out the TBD sections when I find some time, and contributions are always welcome. best, ~brian On Fri, Aug 1, 2014 at 6:02 PM, Steve Lianoglou <lianoglou.steve@gene.com> wrote: > Sorry to resurrect a somehow old thread, but I wanted to kick the > tires on this and wanted to clarify the google-record for posterity > ... so: > > On Sun, Jul 20, 2014 at 5:18 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > > > > On Mon, 21 Jul 2014, Gordon K Smyth wrote: > > > >> logCPMc <- removeBatchEffect(y, batch) > > > > > > Should have written > > > > logCPMc <- removeBatchEffect(y2, batch) > > Shouldn't the correction actually be this? > > R> lobCPMc <- removeBatchEffect(logCPM, batch) > > Because in the code snippet `y2` is a DGEList dataset (of counts). > > So, removing batch effects from soup to nuts: > > ## ------------------------------------------------ > library(edgeR) > y <- DGEList(counts=counts) > > ## Filter non-expressed genes: > A <- aveLogCPM(y) > y2 <- y2[A>1,] > > ## Then normalize and compute log2 counts-per-million with an offset > y2 <- calcNormFactors(y2) > logCPM <- cpm(y2, log=TRUE, prior.count=5) > > ## Then remove batch correct: > logCPMc <- removeBatchEffect(logCPM, batch) > ## ------------------------------------------------ > > Thanks, > -steve > > -- > Steve Lianoglou > Computational Biologist > Genentech > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas [[alternative HTML version deleted]]

score 0 · Answer 2 · 2014-08-02

Thanks Steve, that is what I had meant to write. Gordon On Fri, 1 Aug 2014, Steve Lianoglou wrote: > So, removing batch effects from soup to nuts: > > ## ------------------------------------------------ > library(edgeR) > y <- DGEList(counts=counts) > > ## Filter non-expressed genes: > A <- aveLogCPM(y) > y2 <- y2[A>1,] > > ## Then normalize and compute log2 counts-per-million with an offset > y2 <- calcNormFactors(y2) > logCPM <- cpm(y2, log=TRUE, prior.count=5) > > ## Then remove batch correct: > logCPMc <- removeBatchEffect(logCPM, batch) > ## ------------------------------------------------ > > Thanks, > -steve > > -- > Steve Lianoglou > Computational Biologist > Genentech > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}