Question

Using the RPKM function for several samples simultaneously

0

Entering edit mode

humberto_munoz • 0

@humberto_munoz-10903

Last seen 8.8 years ago

I want to use the rpkm normalization function to 11 datasets that contain three columns each (gene ID, gene counts and gene lengths). The samples are in 4 groups and I use the following lines:

> y <- dir(pattern="*\.csvReadCount,lib.size=TRUE)

I am getting the following error

> dim(y)
NULL
> RG<-DGEList(counts=yReadCount : $ operator is invalid for atomic vectors

When I use readDGE, but also I get the error below:

G<-readDGE(y, group=group, labels=NULL)
Error in `[.data.frame`(d[[i]], , columns[2]) : 
  undefined columns selected

Thanks in advance for any help you can provide me.

Humberto

normalization • 1.8k views

ADD COMMENT • link updated 4.0 years ago by Gordon Smyth 52k • written 8.7 years ago by humberto_munoz • 0

score 0 · Answer 1 · 2016-07-25

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 11 hours ago

WEHI, Melbourne, Australia

Your object 'y' is just a character vector of filenames.

You haven't actually read any of those files into your R session. So you don't have any counts from which to make a DGEList object.

A basic principle of using R, and programming in general, is to look at your results at each step to check that they are correct. In this case you can look at 'y' by typing print(y) and you will see that it contains no counts.

ADD COMMENT • link 8.7 years ago Gordon Smyth 52k

0

Entering edit mode

Gordon, thanks for your comments. Yes, I also realized it with dim(y). only has the filenames of the 11 samples.

I used the readDGE function in other datasets of samples with two columns for gene ID and gene counts for the TMM normalization, and it worked well. However, now my datasets have three columns per sample, and this function gives me the error below.

RG<-readDGE(y, group=group, labels=NULL)
Error in `[.data.frame`(d[[i]], , columns[2])

ADD REPLY • link 8.7 years ago humberto_munoz • 0

0

Entering edit mode

First, readDGE() has no difficulty with 3-column files.

The error suggests that your csv files actually have only one column, not 3.

ADD REPLY • link 8.7 years ago Gordon Smyth 52k

0

Entering edit mode

I created the list RPKM that contains the files with the rpkm values with the following code lines

setwd("/Users/hmunozbarona/Documents/Normalization-R")
rm(list=ls(all=TRUE)) # remove all variables
# files <- dir(pattern="*\\.csv$")
files <- list.files(pattern = "*\\.csv$")
print(files)
my_data <- list()
RPKM <- list()
for (i in seq_along(files)) {
my_data[[i]] <- read.csv(file = files[i])
id<- my_data[[i]]['GeneID']
cnts<- my_data[[i]]['ReadCount']
lens<- my_data[[i]]['Length']
y <- DGEList(genes=data.frame(gene = id,Length=lens), counts=cnts)
RPKM[[i]] <- data.frame(gene =id, rpkm(y))
}
print(RPKM)
# countDf <- data.frame(gene = id, count = cnts, length = lens)
group<- c(1,1,1,1,1,1,1,1,1,1,1)
RG<-readDGE(RPKM, columns=c(1,2), group=NULL, labels=NULL)
RG$samples
keep <-rowSums(cpm(RG)>1) >=1
RG<- RG[keep, , keep.lib.sizes=FALSE]

This is the error that I am getting:

> # countDf <- data.frame(gene = id, count = cnts, length = lens)
> group<- c(1,1,1,1,1,1,1,1,1,1,1)
> RG<-readDGE(RPKM, columns=c(1,2), group=NULL, labels=NULL)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'list(GeneID = c(641612588, 641610273, 641612823, 641611428, 641610552, 641612589, 641612814, 641611971, 641612541, 641610260, 641610167, 641610059, 641611190, 641612220, 641611972, 641610056, 641611932, 641611769, 641610514, 641612072, 641612379, 641612089, 641611662, 641612486, 641610821, 641612639, 641612019, 641612117, 641612604, 641611040, 641611500, 641612071, 641610161, 641611192, 641611191, 641610177, 641611018, 641611468, 641611939, 641611041, 641610151, 641611569, 641611453, 641610223, 641611610,
641610788, 641611624, 641610374, 641611613, 641612219, 641610996, 641610491, 641612093, 641610282, 641610967, 641612218, 641610340, 641611372, 641610932, 641611470, 641610994, 641612187, 641612630, 641611703, 641610248, 641611366, 641610015, 641611940, 641610231, 641612397, 641612566, 641610351, 641610023, 641612355, 641611901, 641610060, 641611223, 641612605, 641612626, 641612357, 641611934, 641610256, 641611044, 641610852, 641612582, 641612190, 641610586, 64161120 [... truncated]

ADD REPLY • link 8.7 years ago humberto_munoz • 0