Using the RPKM function for several samples simultaneously
1
0
Entering edit mode
@humberto_munoz-10903
Last seen 8.4 years ago

I want to use the rpkm normalization function to 11 datasets that contain three columns each (gene ID, gene counts and gene lengths). The samples are in 4 groups and I use the following lines:

> y <- dir(pattern="*\.csvReadCount,lib.size=TRUE)

I am getting the following error

> dim(y)
NULL
> RG<-DGEList(counts=yReadCount : $ operator is invalid for atomic vectors

When I use readDGE, but also I get the error below:

G<-readDGE(y, group=group, labels=NULL)
Error in `[.data.frame`(d[[i]], , columns[2]) : 
  undefined columns selected

Thanks in advance for any help you can provide me.

Humberto

normalization • 1.7k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 24 minutes ago
WEHI, Melbourne, Australia

Your object 'y' is just a character vector of filenames.

You haven't actually read any of those files into your R session. So you don't have any counts from which to make a DGEList object.

A basic principle of using R, and programming in general, is to look at your results at each step to check that they are correct. In this case you can look at 'y' by typing print(y) and you will see that it contains no counts.

ADD COMMENT
0
Entering edit mode

Gordon, thanks for your comments. Yes, I also realized it with dim(y). only has the filenames of the 11 samples.

I used  the readDGE function in other datasets of samples with two columns for gene ID and gene counts for the TMM normalization, and it worked well. However, now my datasets have three columns per sample, and this function gives me the error below.

RG<-readDGE(y, group=group, labels=NULL)
Error in `[.data.frame`(d[[i]], , columns[2]) 

ADD REPLY
0
Entering edit mode

First, readDGE() has no difficulty with 3-column files.

The error suggests that your csv files actually have only one column, not 3.

ADD REPLY
0
Entering edit mode

 

I created the list RPKM that contains the files with the rpkm values with the following code lines

setwd("/Users/hmunozbarona/Documents/Normalization-R")
rm(list=ls(all=TRUE)) # remove all variables
# files <- dir(pattern="*\\.csv$")
files <- list.files(pattern = "*\\.csv$")
print(files)
my_data <- list()
RPKM <- list()
for (i in seq_along(files)) {
  my_data[[i]] <- read.csv(file = files[i])
  id<- my_data[[i]]['GeneID']
  cnts<- my_data[[i]]['ReadCount']
  lens<- my_data[[i]]['Length']
  y <- DGEList(genes=data.frame(gene = id,Length=lens), counts=cnts)
  RPKM[[i]] <- data.frame(gene =id, rpkm(y))
}
print(RPKM)
# countDf <- data.frame(gene = id, count = cnts, length = lens)
group<- c(1,1,1,1,1,1,1,1,1,1,1)
RG<-readDGE(RPKM, columns=c(1,2), group=NULL, labels=NULL)
RG$samples
keep <-rowSums(cpm(RG)>1) >=1
RG<- RG[keep, , keep.lib.sizes=FALSE]

This is the error that I am getting:

> # countDf <- data.frame(gene = id, count = cnts, length = lens)
> group<- c(1,1,1,1,1,1,1,1,1,1,1)
> RG<-readDGE(RPKM, columns=c(1,2), group=NULL, labels=NULL)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'list(GeneID = c(641612588, 641610273, 641612823, 641611428, 641610552, 641612589, 641612814, 641611971, 641612541, 641610260, 641610167, 641610059, 641611190, 641612220, 641611972, 641610056, 641611932, 641611769, 641610514, 641612072, 641612379, 641612089, 641611662, 641612486, 641610821, 641612639, 641612019, 641612117, 641612604, 641611040, 641611500, 641612071, 641610161, 641611192, 641611191, 641610177, 641611018, 641611468, 641611939, 641611041, 641610151, 641611569, 641611453, 641610223, 641611610, 
641610788, 641611624, 641610374, 641611613, 641612219, 641610996, 641610491, 641612093, 641610282, 641610967, 641612218, 641610340, 641611372, 641610932, 641611470, 641610994, 641612187, 641612630, 641611703, 641610248, 641611366, 641610015, 641611940, 641610231, 641612397, 641612566, 641610351, 641610023, 641612355, 641611901, 641610060, 641611223, 641612605, 641612626, 641612357, 641611934, 641610256, 641611044, 641610852, 641612582, 641612190, 641610586, 64161120 [... truncated]

ADD REPLY

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6