surrogate variable analysis using R package "sva"
1
0
Entering edit mode
fromhj304 ▴ 10
@fromhj304-11588
Last seen 8.3 years ago

Hi all, I have RNA-Seq raw count data and I want to take rlog (regularized log transform) to run surrogate variable analysis using R package.

FYI, here is my code:

>source("https://bioconductor.org/biocLite.R")
>biocLite("DESeq")
>library(DESeq2)

>dat <- read.table("TestData_09_30_2016.txt",header=T)
>row.names(dat)<-dat[,1]
>dat<-dat[,-1]
>dat <- as.matrix(dat)
>head(dat)
>(condition <- factor(c(rep("KO1", 1), rep("KO2", 1), rep("KO3", 1), rep("KO4", 1), rep("KO5", 1), rep("KO6", 1), rep("Control1", 1), rep("Control2", 1))))
>(coldata <- data.frame(row.names=colnames(dat), condition))

>dds <- DESeqDataSetFromMatrix(countData=dat, colData=coldata, design=~condition)

#run the DESeq pipeline
>dds <- DESeq(dds)

#delete all the data that have total of 0 counts
>dds<-dds[rowSums(counts(dds))>1,]

#get regularized log transformation
>rld <- rlogTransformation(dds)

 

This is my raw count data (without the data that have total of 0 counts) head(dat):

                 KO1 KO2 KO3 KO4  KO5 KO6 Control1 Control2
ENSG00000000003  770 520 650 364 1165 440     1116     1002
ENSG00000000005   23  17   9  15   30  18       31       16
ENSG00000000419 1086 473 800 425 1091 659     1187     1192
ENSG00000000457  401 231 374 156  583 312      457      485
ENSG00000000460  619 364 604 268  795 428      883      684
ENSG00000000938    0   1   0   2    0   3        0        0

And this is my head(assay(rld)):

                        KO1         KO2         KO3         KO4        KO5         KO6    Control1
ENSG00000000003  9.38864787  9.55798078  9.33509314  9.38546651  9.5167140  9.19017804  9.63294632
ENSG00000000005  4.26857690  4.32402555  4.14925099  4.32936408  4.2673566  4.28471196  4.31229727
ENSG00000000419  9.78054476  9.55463023  9.59997856  9.60364472  9.5358608  9.62371048  9.76759852
ENSG00000000457  8.42800759  8.45243834  8.45965033  8.27816421  8.5124251  8.49092122  8.45058666
ENSG00000000460  9.05770197  9.10173211  9.13116762  8.98096336  9.0382444  9.02339178  9.28249708
ENSG00000000938 -0.08566075 -0.06565163 -0.08529477 -0.04374719 -0.0864924 -0.03977232 -0.08599017
                   Control2
ENSG00000000003  9.48105503
ENSG00000000005  4.17943696
ENSG00000000419  9.71825081
ENSG00000000457  8.45220418
ENSG00000000460  9.00489782
ENSG00000000938 -0.08616627

So my question is:

1) What does an output look like when I run SVA? Do I get a list of genes without the genes that are removed? Or do I get the list of genes that are subjected to removal?

2) My final goal is to see differentially expressed genes after running SVA. When I just use dds for DESeq and run results(dds), there were so many repeated adjusted p-values, which did not give me enough information to determine differentially expressed genes. There were many suggestions in previous posts, but my professor wants me to use SVA then determine DE. What would be the simplest way to do this?

Thanks!

sva r bioinformatics • 1.4k views
ADD COMMENT
0
Entering edit mode
Diego Diez ▴ 760
@diego-diez-4520
Last seen 4.2 years ago
Japan

`sva()` returns a list with several components. One is `sv` which contains the surrogate variables. You can include those variables  in your model (with *limma/edgeR/DEseq2*). Take a look at `?sva` and the vignette (calling `browseVignettes("sva")` in R) for more details, limitations and examples on how to use it in different settings.

ADD COMMENT

Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6