I want to use the rpkm normalization function to 11 datasets that contain three columns each (gene ID, gene counts and gene lengths). The samples are in 4 groups and I use the following lines:
> y <- dir(pattern="*\.csvReadCount,lib.size=TRUE)
I am getting the following error
> dim(y)
NULL
> RG<-DGEList(counts=yReadCount : $ operator is invalid for atomic vectors
When I use readDGE, but also I get the error below:
G<-readDGE(y, group=group, labels=NULL)
Error in `[.data.frame`(d[[i]], , columns[2]) :
undefined columns selected
Thanks in advance for any help you can provide me.
Humberto
Gordon, thanks for your comments. Yes, I also realized it with dim(y). only has the filenames of the 11 samples.
I used the readDGE function in other datasets of samples with two columns for gene ID and gene counts for the TMM normalization, and it worked well. However, now my datasets have three columns per sample, and this function gives me the error below.
RG<-readDGE(y, group=group, labels=NULL)
Error in `[.data.frame`(d[[i]], , columns[2])
First, readDGE() has no difficulty with 3-column files.
The error suggests that your csv files actually have only one column, not 3.
I created the list RPKM that contains the files with the rpkm values with the following code lines
setwd("/Users/hmunozbarona/Documents/Normalization-R")
rm(list=ls(all=TRUE)) # remove all variables
# files <- dir(pattern="*\\.csv$")
files <- list.files(pattern = "*\\.csv$")
print(files)
my_data <- list()
RPKM <- list()
for (i in seq_along(files)) {
my_data[[i]] <- read.csv(file = files[i])
id<- my_data[[i]]['GeneID']
cnts<- my_data[[i]]['ReadCount']
lens<- my_data[[i]]['Length']
y <- DGEList(genes=data.frame(gene = id,Length=lens), counts=cnts)
RPKM[[i]] <- data.frame(gene =id, rpkm(y))
}
print(RPKM)
# countDf <- data.frame(gene = id, count = cnts, length = lens)
group<- c(1,1,1,1,1,1,1,1,1,1,1)
RG<-readDGE(RPKM, columns=c(1,2), group=NULL, labels=NULL)
RG$samples
keep <-rowSums(cpm(RG)>1) >=1
RG<- RG[keep, , keep.lib.sizes=FALSE]
This is the error that I am getting:
> # countDf <- data.frame(gene = id, count = cnts, length = lens)
> group<- c(1,1,1,1,1,1,1,1,1,1,1)
> RG<-readDGE(RPKM, columns=c(1,2), group=NULL, labels=NULL)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'list(GeneID = c(641612588, 641610273, 641612823, 641611428, 641610552, 641612589, 641612814, 641611971, 641612541, 641610260, 641610167, 641610059, 641611190, 641612220, 641611972, 641610056, 641611932, 641611769, 641610514, 641612072, 641612379, 641612089, 641611662, 641612486, 641610821, 641612639, 641612019, 641612117, 641612604, 641611040, 641611500, 641612071, 641610161, 641611192, 641611191, 641610177, 641611018, 641611468, 641611939, 641611041, 641610151, 641611569, 641611453, 641610223, 641611610,
641610788, 641611624, 641610374, 641611613, 641612219, 641610996, 641610491, 641612093, 641610282, 641610967, 641612218, 641610340, 641611372, 641610932, 641611470, 641610994, 641612187, 641612630, 641611703, 641610248, 641611366, 641610015, 641611940, 641610231, 641612397, 641612566, 641610351, 641610023, 641612355, 641611901, 641610060, 641611223, 641612605, 641612626, 641612357, 641611934, 641610256, 641611044, 641610852, 641612582, 641612190, 641610586, 64161120 [... truncated]