Question

Problem using the readDGE function

0

Entering edit mode

humberto_munoz • 0

@humberto_munoz-10903

Last seen 8.4 years ago

I plan to use the readDGE function with two CSV files containing gene counts from different samples. One has 2814 genes and the other 2809 genes. The files are on my Desktop and this is the error that I get:

> files <- dir(pattern="*\\.csv$")

>
> RG <- readDGE(files)
Error in `[.data.frame`(d[[i]], , columns[2]) :
undefined columns selected
>

How I can fix the error?

readDGE edgeR • 3.3k views

ADD COMMENT • link updated 8.5 years ago by Gordon Smyth 51k • written 8.5 years ago by humberto_munoz • 0

0

Entering edit mode

How many columns does each of the two files have? What are the column headings?

ADD REPLY • link 8.5 years ago Gordon Smyth 51k

0

Entering edit mode

Each file has two columns, the first are the Gene IDs and the second Gene Read Counts. Is the function readDGE creating a DGEList that includes all genes that have at least one count in one of the samples?

ADD REPLY • link 8.5 years ago humberto_munoz • 0

0

Entering edit mode

Aaron I followed your comment and I got these results. How I can see all rows in data 2, or to compute its size. Also I want to do the TMM normalization, but I got the error message below.

> readDGE(data2, sep=",")
An object of class "DGEList"
$samples
files group lib.size norm.factors
Dark Aerobic Dark Aerobic.csv 1 481909 1
Dark Anaerobic Dark Anaerobic.csv 1 1033135 1

$counts
Samples
Tags Dark Aerobic Dark Anaerobic
641610012 17 28
641610013 55 36
641610014 331 1551
641610015 1005 2292
641610016 96 136
2816 more rows ...

> y<-calcNormFactors(data2, method=c("TMM","RLE","upperquartile","none"),
+ refColumn=NULL, logratioTrim=.3, sumTrim=0.05, doWeighting=TRUE,
+ Acutoff=-1e10, p=0.75)
Error in colSums(x) : 'x' must be numeric
>

Thanks for your helpful comments.

ADD REPLY • link updated 8.5 years ago by Gordon Smyth 51k • written 8.5 years ago by humberto_munoz • 0

0

Entering edit mode

Humberto, please use the "ADD COMMENT" button to add your replies, rather than adding your replies as Answers. I have been moving your answers back here as to be comments on the original questions.

ADD REPLY • link 8.5 years ago Gordon Smyth 51k

score 2 · Answer 1 · 2016-06-15

readDGE expects that each file is tab-separated and contains at least two columns (one of gene names/IDs and another of gene counts). It seems that your files do not follow this format, i.e., fewer columns than expected. This is probably because a different separator is involved - for CSV files, you should set sep="," in the readDGE call, as is mentioned in the documentation for the function. Also see the columns argument in ?readDGE if there are more than two columns and the first two do not correspond to the IDs and counts.

score 0 · Answer 2 · 2016-06-16

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 30 minutes ago

WEHI, Melbourne, Australia

There a few problems here:

First, as already noted, you need to specify sep="," because you have a comma-separated file.

Second, there is a problem with your files. Somewhere in one of your data files you have a character entered where you should have a number. Have a look especially at the last row of your files, as that is often the culprit. Check that your files don't contain any unnecessary spaces, because a space will be read as a character.

Third, you only have two samples in total, meaning the sample size is n=1 in each group. In other words you have no replication. So there isn't much analysis that edgeR will be able to do for you, because edgeR is designed to work with biological replicates.

ADD COMMENT • link 8.5 years ago Gordon Smyth 51k

0

Entering edit mode

Actually, I have sample studies of an experiment with 11 different conditions and not biological replicates. My intention is to apply TMM normalization considering the first sample as the reference (Dark Aerobic). First, I'm trying with the first two samples (Dark Aerobic and Dark Anaerobic). According to your last commend, this TMM normalization is not applicable with these data sets.

ADD REPLY • link 8.5 years ago humberto_munoz • 0

0

Entering edit mode

You are mis-interpreting my answer. Your difficulties and my answer having nothing to do with the TMM method.

ADD REPLY • link 8.4 years ago Gordon Smyth 51k