Question

Boxplot and Identification of DEG

0

Entering edit mode

nia ▴ 30

@nia-12707

Last seen 5.1 years ago

Dear Fellows,

I have 2 questions I hope from your precious time you will try to solve my queries:

1) Firstly my dataset is already processed it is not RAW. So I retrieved boxplot of samples successfully by using R. Then next day I identified DEGs by using LIMMA and GEOquery packages and the adj. P.val was set as < 0.5 (I know mostly people prefer 0.05 value but in my case when I put 0.05 so I don't get any result while when I put <0.5 I get probe ids and all the desired data) After that I identified DEGs. The question is this that do I choose a correct approach can I lead this DEGs towards enrichment analysis and secondly as I choose 0.5 val. for P.adj/fdr how to jusitfy it as I am new to this I try to read so many papers regarding this but still I am not able to justify the scoring criteria of fdr is it acceptable to choose fr/adj.pval<0.5.

2) In my case I got 150 down regulated genes and and 95 up regualted genes so collectively I got 245 DEGs. The question is that should I consider up an down regulated genes separately for enrichment analysis including GO, KEGG and TF analysis or can I collectively calculate Enrichment analysis of all DEGs (it includes up and down regulated genes).

I try to make my questions crystal clear still if there is any mistake so sorry for the inconvenience.

I just want to know in 1st query is my approach is correct and in 2nd query I want to know that what is the correct approach or mostly used approach for enrichment analysis (using DEGs collectively or up and down regulated genes separately) I try to search literature and read many posts on biostar and bioconductor as well but still it is not clear to me.

Thank you in advance.

limma biobase geoquery • 1.9k views

ADD COMMENT • link updated 7.4 years ago by Aaron Lun ★ 28k • written 7.4 years ago by nia ▴ 30

score 1 · Answer 1 · 2017-11-22

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 5 hours ago

The city by the bay

For your first question: an FDR of 50% means that we can expect (on average) up to half of your detected DE genes to be false positives. In most situations, I would find this unacceptable. How can anybody be confident in a set of DE genes where half of them are expected to be false positives? Such a threshold would only be useful if your wet lab can afford to do lots of validation studies on the DE genes (many of which are likely to fail).

For your second question: most analyses use all DE genes in gene set tests, at least at the beginning. This is because most gene sets are not defined with any directionality, e.g., a GO term will usually include both positive and negative regulators of a process. In such cases, it is more relevant to test with all DE genes. However, there may be some interesting insights gained from repeating the tests with only up- or down-regulated genes, especially for GO terms that represent positive or negative regulation. It also depends on the test you're using - for example, ROAST will automatically test against all alternative hypotheses (up, down or mixed), which will tell you the general direction of the changes in the gene set without needing to manually subset by direction.

Of course, the second answer is somewhat irrelevant when you have a FDR of 50% in your DE list.

ADD COMMENT • link 7.4 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

First of all thank you my concepts are much clear now it such a detailed information. As you said 0.5 is not a good score but can you suggest me what should I do because when I choose 0.05 FDR score it shows me no result.

I also optimized the result by following FDR score:

FDR score > No. of DEGs

0.1 > 3

0.3> 38

0.4> 155

I am working on cancer dataset which has 10 samples (I downloaded the dataset from GEOdb) I used LIMMA and GEOquery packages and then I want to do its functional enrichment analysis and miRNA analysis.

Should I go with 0.4 it shows 155 DEG which means it have 62 false positive genes it is much better than 0.5 score. Your suggestion on this will be really appreciated,

ADD REPLY • link 7.4 years ago nia ▴ 30

0

Entering edit mode

Well, 40% isn't much different from 50% in my opinion.

As a general rule, I wouldn't go above a FDR of 20%, which means that there is, at worst, 1 false positive on average for every 5 genes in the DE set. Of course, this probably means you won't have enough DE genes for a gene set enrichment analysis, but doing such analyses with a DE list identified at a FDR of 40-50% would be a waste of time anyway. With an expected 62 false positive genes, entire gene sets could be filled with false positives, which would be misleading.

ADD REPLY • link 7.4 years ago Aaron Lun ★ 28k