Error trying to load counts matrix and lane information using "newSeqExpressionSet"
1
0
Entering edit mode
@rebaduncan-10914
Last seen 17 months ago
United States

Hello all,

I am not too familiar with running bioconductor programs and I am not too familiar with R in general, so I apologize if I am overlooking something that is obvious to more seasoned users.  I am trying to use EDASeq to explore my RNA-seq data and normalize the count data for downstream expression analysis.  I am stuck trying to load my counts matrix and lane information into a data object by an error message that I'm not sure how to interpret. I have been following Davide Risso's example (http://www.stat.berkeley.edu/~davide/EDASeq-example.pdf) but substituting my own data files.  Here is the code I have tried so far:

> filename <- "Pcitri.Trinity.fasta"
> fa <- readFasta(filename)
> abc <- alphabetFrequency(sread(fa), baseOnly=TRUE)
> rownames(abc) <- sapply(strsplit(as.character(id(fa))," "),function(x) x[1])
> alphabet <- abc[,1:4]
> gc <- rowSums(alphabet[,2:3])/rowSums(alphabet)
> length <- width(sread(fa))
> head(gc)
TRINITY_DN48_c0_g1_i1 TRINITY_DN27_c0_g1_i1 TRINITY_DN10_c0_g1_i1 
            0.3960396             0.4007634             0.3705882 
TRINITY_DN28_c0_g1_i1 TRINITY_DN56_c0_g1_i1 TRINITY_DN60_c0_g1_i1 
            0.3893805             0.3155340             0.4257426 
> head(length)
[1] 202 262 340 226 206 202
> geneInfo <- data.frame(length=length, gc=gc)
> head(geneInfo)
                      length        gc
TRINITY_DN48_c0_g1_i1    202 0.3960396
TRINITY_DN27_c0_g1_i1    262 0.4007634
TRINITY_DN10_c0_g1_i1    340 0.3705882
TRINITY_DN28_c0_g1_i1    226 0.3893805
TRINITY_DN56_c0_g1_i1    206 0.3155340
TRINITY_DN60_c0_g1_i1    202 0.4257426
> geneLevelCounts <- read.table("genes.counts.matrix", header=TRUE, row.names=1)
> laneInfo <- read.table("laneInfo.txt", header=TRUE, row.names=1)
> means <- rowMeans(geneLevelCounts)
> filter <- means >= 10
> table(filter)
filter
FALSE  TRUE 
97992 26569 
> geneLevelCounts <- geneLevelCounts[filter,]
> head(geneLevelCounts)
                      CVD.B CVD.WI USDA.B USDA.WI
TRINITY_DN32994_c1_g1    42    203      0       8
TRINITY_DN34242_c0_g2 45064    927  24886    1331
TRINITY_DN17158_c0_g1  2368   2081    835     548
TRINITY_DN25351_c0_g1  2461   2176    873     662
TRINITY_DN22364_c0_g1   435    246    141      53
TRINITY_DN31244_c0_g2    39     45      8      27
> head(laneInfo)
        lib_prep  conditions sample
CVD.B        CVD  Bacteriome      1
CVD.WI       CVD WholeInsect      2
USDA.B      USDA  Bacteriome      3
USDA.WI     USDA WholeInsect      4
> library(EDASeq)
> data <- newSeqExpressionSet(exprs = as.matrix(geneLevelCounts),
+ featureData = geneInfo[rownames(geneLevelCounts), ],
+ phenoData = laneInfo)
Error in assayDataNew(counts = counts, normalizedCounts = normalizedCounts,  : 
  argument "counts" is missing, with no default

Everything seems to be working up until the data <- newSeqExpressionSet part, and I am not sure what is going on.  If anyone has any ideas I would be grateful for your help!

edaseq error rna-seq normalization • 1.8k views
ADD COMMENT
1
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 8 months ago
University of Padova

Hi,

that PDF that you are following uses an outdated version of EDASeq. Please, follow the steps in the package vignette (http://bioconductor.org/packages/release/bioc/vignettes/EDASeq/inst/doc/EDASeq.pdf), in particular:

data <- newSeqExpressionSet(counts=as.matrix(geneLevelCounts),
         featureData=geneInfo[rownames(geneLevelCounts), ],
         phenoData=laneInfo)

I haven't tried it, but it should work. Let me know if it doesn't.

ADD COMMENT
0
Entering edit mode

Thanks, Davide.  That worked!  Now I have the problem of non-integer counts, so I need to go backwards and see what happened!

ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6