I have a data set of RNAseq performed on an Affymetrix HiSeq 2500, and am in the process of cleaning and pre-analysis, following the vignette from the edgeR package. The point where I'm stuck is in removing genes that are lowly expressed. There are 6 conditions in my data set; the edger file has dimensions (24421, 6). When I run the sample code from the edgeR vignette that evaluates for row sums == 0, I find there are 5100 rows that meet this condition, leaving 19321 rows with sums >0. So up to that point things appear to be fine. Then I run the code line
> keep.expr <- filterByExprs(edger, group=group).
This gives the following warning:
"Warning message in min(n[n > 1L]):
"no non-missing arguments to min; returning Inf"
and also creates the keep.expr output file. The keep.expr file should contain a boolean for each row signifying if it was kept or not, plus the RNAseq values for rows kept. It has no RNAseq values at all. I used
> length(which(keep.exprs))
> length(which(!keep.exprs))
to find the total of "TRUE" and "FALSE" entries in keep.expr, and found that the booleans from filterByExprs are all "FALSE". There should have been 19321 "TRUE" entries along with their RNAseq values, and 5100 "FALSE" entries. The next command in the vignette after filterByExprs uses keep.expr to trim the edger file. When I do that the dimensions of new edger file are (0, 6), consistent with there being no RNAseq values left after doing the filterByExprs. I've been studying the edgeRUserGuide but cannot find what is missing that keeps filterByExprs from working properly.
Heber
I assume you mean have used an Illumina HiSeq 2500. Affymetrix is a microarray platform and they don't make sequencers.