Hello all,
I am not too familiar with running bioconductor programs and I am not too familiar with R in general, so I apologize if I am overlooking something that is obvious to more seasoned users. I am trying to use EDASeq to explore my RNA-seq data and normalize the count data for downstream expression analysis. I am stuck trying to load my counts matrix and lane information into a data object by an error message that I'm not sure how to interpret. I have been following Davide Risso's example (http://www.stat.berkeley.edu/~davide/EDASeq-example.pdf) but substituting my own data files. Here is the code I have tried so far:
> filename <- "Pcitri.Trinity.fasta" > fa <- readFasta(filename) > abc <- alphabetFrequency(sread(fa), baseOnly=TRUE) > rownames(abc) <- sapply(strsplit(as.character(id(fa))," "),function(x) x[1]) > alphabet <- abc[,1:4] > gc <- rowSums(alphabet[,2:3])/rowSums(alphabet) > length <- width(sread(fa)) > head(gc) TRINITY_DN48_c0_g1_i1 TRINITY_DN27_c0_g1_i1 TRINITY_DN10_c0_g1_i1 0.3960396 0.4007634 0.3705882 TRINITY_DN28_c0_g1_i1 TRINITY_DN56_c0_g1_i1 TRINITY_DN60_c0_g1_i1 0.3893805 0.3155340 0.4257426 > head(length) [1] 202 262 340 226 206 202 > geneInfo <- data.frame(length=length, gc=gc) > head(geneInfo) length gc TRINITY_DN48_c0_g1_i1 202 0.3960396 TRINITY_DN27_c0_g1_i1 262 0.4007634 TRINITY_DN10_c0_g1_i1 340 0.3705882 TRINITY_DN28_c0_g1_i1 226 0.3893805 TRINITY_DN56_c0_g1_i1 206 0.3155340 TRINITY_DN60_c0_g1_i1 202 0.4257426 > geneLevelCounts <- read.table("genes.counts.matrix", header=TRUE, row.names=1) > laneInfo <- read.table("laneInfo.txt", header=TRUE, row.names=1) > means <- rowMeans(geneLevelCounts) > filter <- means >= 10 > table(filter) filter FALSE TRUE 97992 26569 > geneLevelCounts <- geneLevelCounts[filter,] > head(geneLevelCounts) CVD.B CVD.WI USDA.B USDA.WI TRINITY_DN32994_c1_g1 42 203 0 8 TRINITY_DN34242_c0_g2 45064 927 24886 1331 TRINITY_DN17158_c0_g1 2368 2081 835 548 TRINITY_DN25351_c0_g1 2461 2176 873 662 TRINITY_DN22364_c0_g1 435 246 141 53 TRINITY_DN31244_c0_g2 39 45 8 27 > head(laneInfo) lib_prep conditions sample CVD.B CVD Bacteriome 1 CVD.WI CVD WholeInsect 2 USDA.B USDA Bacteriome 3 USDA.WI USDA WholeInsect 4 > library(EDASeq) > data <- newSeqExpressionSet(exprs = as.matrix(geneLevelCounts), + featureData = geneInfo[rownames(geneLevelCounts), ], + phenoData = laneInfo) Error in assayDataNew(counts = counts, normalizedCounts = normalizedCounts, : argument "counts" is missing, with no default
Everything seems to be working up until the data <- newSeqExpressionSet part, and I am not sure what is going on. If anyone has any ideas I would be grateful for your help!
Thanks, Davide. That worked! Now I have the problem of non-integer counts, so I need to go backwards and see what happened!