Dear Fellows,
I have 2 questions I hope from your precious time you will try to solve my queries:
1) Firstly my dataset is already processed it is not RAW. So I retrieved boxplot of samples successfully by using R. Then next day I identified DEGs by using LIMMA and GEOquery packages and the adj. P.val was set as < 0.5 (I know mostly people prefer 0.05 value but in my case when I put 0.05 so I don't get any result while when I put <0.5 I get probe ids and all the desired data) After that I identified DEGs. The question is this that do I choose a correct approach can I lead this DEGs towards enrichment analysis and secondly as I choose 0.5 val. for P.adj/fdr how to jusitfy it as I am new to this I try to read so many papers regarding this but still I am not able to justify the scoring criteria of fdr is it acceptable to choose fr/adj.pval<0.5.
2) In my case I got 150 down regulated genes and and 95 up regualted genes so collectively I got 245 DEGs. The question is that should I consider up an down regulated genes separately for enrichment analysis including GO, KEGG and TF analysis or can I collectively calculate Enrichment analysis of all DEGs (it includes up and down regulated genes).
I try to make my questions crystal clear still if there is any mistake so sorry for the inconvenience.
I just want to know in 1st query is my approach is correct and in 2nd query I want to know that what is the correct approach or mostly used approach for enrichment analysis (using DEGs collectively or up and down regulated genes separately) I try to search literature and read many posts on biostar and bioconductor as well but still it is not clear to me.
Thank you in advance.
Dear Aaron,
First of all thank you my concepts are much clear now it such a detailed information. As you said 0.5 is not a good score but can you suggest me what should I do because when I choose 0.05 FDR score it shows me no result.
I also optimized the result by following FDR score:
FDR score > No. of DEGs
0.1 > 3
0.3> 38
0.4> 155
I am working on cancer dataset which has 10 samples (I downloaded the dataset from GEOdb) I used LIMMA and GEOquery packages and then I want to do its functional enrichment analysis and miRNA analysis.
Should I go with 0.4 it shows 155 DEG which means it have 62 false positive genes it is much better than 0.5 score. Your suggestion on this will be really appreciated,
Well, 40% isn't much different from 50% in my opinion.
As a general rule, I wouldn't go above a FDR of 20%, which means that there is, at worst, 1 false positive on average for every 5 genes in the DE set. Of course, this probably means you won't have enough DE genes for a gene set enrichment analysis, but doing such analyses with a DE list identified at a FDR of 40-50% would be a waste of time anyway. With an expected 62 false positive genes, entire gene sets could be filled with false positives, which would be misleading.