I tried to filter out those genes which do not express by gene filter method as follows:
f1 <- pOverA(0.25, 3.5)
ffun1 <- filterfun(f1)
flrGene <- genefilter(data,ffun1)
sum(flrGene)
Then it gives me zero , why ? means I should keep all the genes ? is there any other method to remove those genes with very low expression over samples ?
@James W. MacDonald In fact, the data is a microarray data consisting of 30000 probes 2000 samples , each row represents a gene (probe) and each column a sample. The data are the log-fold changes. by running the head(data) I get something for example like below (since it is a large data set) I only show few column and few rows
That's sort of weird, as those are Affy IDs, and Affy IDs are single color. Are these paired samples that you have computed fold changes manually?
Anyway, you don't want to use pOverA() for fold change data, as you will have both positive and negative values. pOverA() is intended for single-color expression values, which are strictly positive, and usually range from say 3 to 14 or so, after taking logs.
If you want to filter out genes that don't appear to change, you can just define a fold change that you think isn't different from zero, and then do the test:
@James W. MacDonald what is weird about it?
You are for sure right that pOver is for positive ones and your solution is a good idea. Thanks!
I am wondering whether I can have your email address to send you an email ?
What is weird about it is that Affy arrays are single color, meaning you only hybridize one sample to the array. Since there is only one sample per array, the data are not by default a ratio (because a ratio implies two samples, and you only hybed one to the array).
So the fact that you apparently have Affy data, but you also seem to have log ratios is not within my expectation for Affy data. So there is evidently more going on with these data than the run of the mill analysis.
@James W. MacDonald what is your suggestion? do you have any reference for it ?
By the way, by setting fc to 0.2, I removed over 20000 of genes, do you think it is a good approach to get raid of the genes which do not highly expressed ?
I have over 5 cell informations, should i keep the same selected genes and discard the other genes ? if so, how should I do it ?
I'm not sure what you are asking here. In addition, as I already mentioned, these data do not fulfill my expectations for Affy data, and I have no idea why you have log ratios rather than log expression values.
I am very hesitant to give any analysis advice as a general rule, and in this case that goes double since I really have no idea about these data, nor what you are trying to do. I would highly recommend that you find a local statistician to help you with this analysis, especially if you are trying to do real science rather than just practicing.