Estimating and removing batch effects from rna-seq dataset
2
1
Entering edit mode
AB ▴ 110
@ab-8975
Last seen 2.3 years ago
United States

Hi everyone,

I'm working on RNA-Seq datasets containing 22 samples from 3 batches. I used mm10 as the reference genome and generated the count table from the BAM files using GenomicAlignments package and then used the rpkm() function to get the rpkm data. Now I want to perform batch correction using either the count matrix or the rpkm data. How do I use the sva package for this purpose ? Is there any other way I can perform batch estimation and correction with the data I have ? Can i use removeBatchEffects() function or is it only for microarray data ?

Thanks

 

sva batch effect rnaseq removebatcheffect() limma • 11k views
ADD COMMENT
3
Entering edit mode

As Ryan suggested, you should read through the vignettes for sva and RUVSeq, which were written to address this type of general question that you are asking.

Once you have tried to apply these procedures to your data and run into specific problems, you are welcome to come back to the support site and ask a more specific question that describes your problem and includes a code snippet that outlines the steps you've taken to get to the point you might be asking about.

At that point, you'll likely receive useful help.

ADD REPLY
0
Entering edit mode

Thank you Dr.Lianoglu. I tried to implement sva but I am getting this error.

Error in dat %*% (Id - mod %*% solve(t(mod) %*% mod) %*% t(mod)) : 
  requires numeric/complex matrix/vector arguments.

this is my code

data <- read.csv("data.csv")
batch<-factor(c("b1","b1","b1","b1","b1","b1","b1","b1","b2","b2","b2","b2","b2","b2","b2","b2","b3","b3","b3","b3","b3","b3"))
type<-factor(c("ctrl","ctrl","ctrl","ctrl","mut","mut","mut","mut", "ctrl","ctrl","ctrl","ctrl","mut","mut","mut","mut", "ctrl","ctrl","mut", "mut","mut","mut"))
coldata<-data.frame(batch,type)

mod <- model.matrix(~type, colData=coldata)
mod0 <- model.matrix(~ 1, colData=coldata)
svseq <- svaseq(data, mod, mod0, n.sv=2)

 

ADD REPLY
0
Entering edit mode

You should post a new question on the support site for specific issues, but before you do that make sure that you are passing the correct data into the svaseq function.

In particular, the dat argument (your data variable) needs to be a matrix of counts (integers). A call to read.csv will return a data.frame, so you should check what types of things are stored in the columns of it (sapply(data, class)). These all need to be integer (or numeric), otherwise when you naively convert it (data) to a matrix, it will be upcast to whatever the most general column type is (maybe you have a character column in there?)

Also, I don't think your call to model.matrix is doing what you think it's doing, ie. I'm not sure what the colData argument is doing here (likely nothing), you just happen to get lucky and have the type variable defined in your global workspace. Please inspect those values (mod and mod0) to make sure they are what they should be, because the error you are getting suggest otherwise.

 

ADD REPLY
0
Entering edit mode

How do you know you have 3 batches if you don't have any known batches? The sva documentation is fairly straightforward. What part of the documentation are you having difficulty with?

ADD REPLY
4
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

removeBatchEffects() is not just for microarray data but (as the documentation tells you) it is intended to be used in conjunction with data exploration like a heatmap or MDS plot. It is not intended to be used in conjunction with a differential expression analysis. For a DE analysis, you can get the same effect as removeBatchEffects() just by adding the batches as an additive factor to the linear model.

ADD COMMENT
2
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.8 years ago
United States

Hi

sva is mostly useful for estimating batch effects when you don't know what the batches/artifacts are. if you know the batch effects, you can build a model matrix including them as terms in the matrix and then use limma/voom, DESeq, edgeR, etc. 

If you are discovering batch effects from sequencing data you need to use something like svaseq (or just take the log of the counts plus a small constant and apply sva) 

Best

Jeff

ADD COMMENT

Login before adding your answer.

Traffic: 443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6