I plan to use the readDGE function with two CSV files containing gene counts from different samples. One has 2814 genes and the other 2809 genes. The files are on my Desktop and this is the error that I get:
Each file has two columns, the first are the Gene IDs and the second Gene Read Counts. Is the function readDGE creating a DGEList that includes all genes that have at least one count in one of the samples?
Aaron I followed your comment and I got these results. How I can see all rows in data 2, or to compute its size. Also I want to do the TMM normalization, but I got the error message below.
> readDGE(data2, sep=",")
An object of class "DGEList"
$samples
files group lib.size norm.factors
Dark Aerobic Dark Aerobic.csv 1 481909 1
Dark Anaerobic Dark Anaerobic.csv 1 1033135 1
$counts
Samples
Tags Dark Aerobic Dark Anaerobic
641610012 17 28
641610013 55 36
641610014 331 1551
641610015 1005 2292
641610016 96 136
2816 more rows ...
> y<-calcNormFactors(data2, method=c("TMM","RLE","upperquartile","none"),
+ refColumn=NULL, logratioTrim=.3, sumTrim=0.05, doWeighting=TRUE,
+ Acutoff=-1e10, p=0.75)
Error in colSums(x) : 'x' must be numeric
>
Humberto, please use the "ADD COMMENT" button to add your replies, rather than adding your replies as Answers. I have been moving your answers back here as to be comments on the original questions.
readDGE expects that each file is tab-separated and contains at least two columns (one of gene names/IDs and another of gene counts). It seems that your files do not follow this format, i.e., fewer columns than expected. This is probably because a different separator is involved - for CSV files, you should set sep="," in the readDGE call, as is mentioned in the documentation for the function. Also see the columns argument in ?readDGE if there are more than two columns and the first two do not correspond to the IDs and counts.
First, as already noted, you need to specify sep="," because you have a comma-separated file.
Second, there is a problem with your files. Somewhere in one of your data files you have a character entered where you should have a number. Have a look especially at the last row of your files, as that is often the culprit. Check that your files don't contain any unnecessary spaces, because a space will be read as a character.
Third, you only have two samples in total, meaning the sample size is n=1 in each group. In other words you have no replication. So there isn't much analysis that edgeR will be able to do for you, because edgeR is designed to work with biological replicates.
Actually, I have sample studies of an experiment with 11 different conditions and not biological replicates. My intention is to apply TMM normalization considering the first sample as the reference (Dark Aerobic). First, I'm trying with the first two samples (Dark Aerobic and Dark Anaerobic). According to your last commend, this TMM normalization is not applicable with these data sets.
How many columns does each of the two files have? What are the column headings?
Each file has two columns, the first are the Gene IDs and the second Gene Read Counts. Is the function readDGE creating a DGEList that includes all genes that have at least one count in one of the samples?
Aaron I followed your comment and I got these results. How I can see all rows in data 2, or to compute its size. Also I want to do the TMM normalization, but I got the error message below.
> readDGE(data2, sep=",")
An object of class "DGEList"
$samples
files group lib.size norm.factors
Dark Aerobic Dark Aerobic.csv 1 481909 1
Dark Anaerobic Dark Anaerobic.csv 1 1033135 1
$counts
Samples
Tags Dark Aerobic Dark Anaerobic
641610012 17 28
641610013 55 36
641610014 331 1551
641610015 1005 2292
641610016 96 136
2816 more rows ...
> y<-calcNormFactors(data2, method=c("TMM","RLE","upperquartile","none"),
+ refColumn=NULL, logratioTrim=.3, sumTrim=0.05, doWeighting=TRUE,
+ Acutoff=-1e10, p=0.75)
Error in colSums(x) : 'x' must be numeric
>
Thanks for your helpful comments.
Humberto, please use the "ADD COMMENT" button to add your replies, rather than adding your replies as Answers. I have been moving your answers back here as to be comments on the original questions.