Question

surrogate variable analysis using R package "sva"

0

Entering edit mode

fromhj304 ▴ 10

@fromhj304-11588

Last seen 8.3 years ago

Hi all, I have RNA-Seq raw count data and I want to take rlog (regularized log transform) to run surrogate variable analysis using R package.

FYI, here is my code:

>source("https://bioconductor.org/biocLite.R")
>biocLite("DESeq")
>library(DESeq2)

>dat <- read.table("TestData_09_30_2016.txt",header=T)
>row.names(dat)<-dat[,1]
>dat<-dat[,-1]
>dat <- as.matrix(dat)
>head(dat)
>(condition <- factor(c(rep("KO1", 1), rep("KO2", 1), rep("KO3", 1), rep("KO4", 1), rep("KO5", 1), rep("KO6", 1), rep("Control1", 1), rep("Control2", 1))))
>(coldata <- data.frame(row.names=colnames(dat), condition))

>dds <- DESeqDataSetFromMatrix(countData=dat, colData=coldata, design=~condition)

#run the DESeq pipeline
>dds <- DESeq(dds)

#delete all the data that have total of 0 counts
>dds<-dds[rowSums(counts(dds))>1,]

#get regularized log transformation
>rld <- rlogTransformation(dds)

This is my raw count data (without the data that have total of 0 counts) head(dat):

                 KO1 KO2 KO3 KO4 KO5 KO6 Control1 Control2
ENSG00000000003 770 520 650 364 1165 440     1116     1002
ENSG00000000005   23 17   9 15   30 18       31       16
ENSG00000000419 1086 473 800 425 1091 659     1187     1192
ENSG00000000457 401 231 374 156 583 312      457      485
ENSG00000000460 619 364 604 268 795 428      883      684
ENSG00000000938    0   1   0   2    0   3        0        0

And this is my head(assay(rld)):

KO1 KO2 KO3 KO4 KO5 KO6 Control1
ENSG00000000003 9.38864787 9.55798078 9.33509314 9.38546651 9.5167140 9.19017804 9.63294632
ENSG00000000005 4.26857690 4.32402555 4.14925099 4.32936408 4.2673566 4.28471196 4.31229727
ENSG00000000419 9.78054476 9.55463023 9.59997856 9.60364472 9.5358608 9.62371048 9.76759852
ENSG00000000457 8.42800759 8.45243834 8.45965033 8.27816421 8.5124251 8.49092122 8.45058666
ENSG00000000460 9.05770197 9.10173211 9.13116762 8.98096336 9.0382444 9.02339178 9.28249708
ENSG00000000938 -0.08566075 -0.06565163 -0.08529477 -0.04374719 -0.0864924 -0.03977232 -0.08599017
Control2
ENSG00000000003 9.48105503
ENSG00000000005 4.17943696
ENSG00000000419 9.71825081
ENSG00000000457 8.45220418
ENSG00000000460 9.00489782
ENSG00000000938 -0.08616627

So my question is:

1) What does an output look like when I run SVA? Do I get a list of genes without the genes that are removed? Or do I get the list of genes that are subjected to removal?

2) My final goal is to see differentially expressed genes after running SVA. When I just use dds for DESeq and run results(dds), there were so many repeated adjusted p-values, which did not give me enough information to determine differentially expressed genes. There were many suggestions in previous posts, but my professor wants me to use SVA then determine DE. What would be the simplest way to do this?

Thanks!

sva r bioinformatics • 1.4k views

ADD COMMENT • link updated 8.3 years ago by Diego Diez ▴ 760 • written 8.3 years ago by fromhj304 ▴ 10

score 0 · Answer 1 · 2016-10-04

`sva()` returns a list with several components. One is `sv` which contains the surrogate variables. You can include those variables in your model (with *limma/edgeR/DEseq2*). Take a look at `?sva` and the vignette (calling `browseVignettes("sva")` in R) for more details, limitations and examples on how to use it in different settings.