Hello, In a previous RNA-Seq work, I've successfully used the function goana to perform a Gene Ontology enrichment analysis, as shown in this edgeR quasi-likelihood pipeline.
tr <- glmTreat(fit, contrast=B.LvsP, lfc=log2(1.5))
...
go <- goana(tr, species="Mm")
Particularly, as shown above, the function takes as input a tr object, a list of DEGs, which, in a edgeR pipeline, appears similar to this (just random data to show what the format looks like):
Length Symbol logFC logCPM F PValue FDR
SMEL4.1_02g001580.1 10924 SMEL4.1_02g001580.1 0.386082756876937 5.47193248924038 25.2797509008095 4.42855378062147e-06 0.0722591857534324
SMEL4.1_07g022510.1 3129 SMEL4.1_07g022510.1 -0.748615816092187 5.36055092214935 23.9132538015929 7.36455549825986e-06 0.0722591857534324
SMEL4.1_08g023290.1 6074 SMEL4.1_08g023290.1 0.652417990788294 5.68631841662533 22.8866172367585 1.08564820963941e-05 0.0722591857534324
SMEL4.1_06g029720.1 4487 SMEL4.1_06g029720.1 3.05216960716702 -1.21675146726858 22.8190148238499 1.11482103848422e-05 0.0722591857534324
SMEL4.1_06g000870.1 9348 SMEL4.1_06g000870.1 -0.769865924141295 7.81712996389281 22.2868032192706 1.36528711320395e-05 0.0722591857534324
SMEL4.1_12g000930.1 11716 SMEL4.1_12g000930.1 -0.528781930515847 3.76724160223338 20.8372219018991 2.39400058202952e-05 0.105587395670412
SMEL4.1_04g022680.1 5765 SMEL4.1_04g022680.1 0.449553768563767 6.33454740738143 20.1535126943511 3.1322861836535e-05 0.118413841825746
SMEL4.1_11g016210.1 4515 SMEL4.1_11g016210.1 -0.626808335105108 5.93739473093748 19.0991859367415 4.76537706101889e-05 0.157632716457179
SMEL4.1_11g022810.1 3534 SMEL4.1_11g022810.1 -0.583940556612912 4.31038909943095 18.3097484326857 6.55191830346608e-05 0.158176465103364
In the past, I've succesfully used tr as imput and even provided an external list of DEGs (a sub-set of them).
But this time I'm interested in using the goana function for the output of a DESeq2 analysis.
The DEG list is quite different, and looks, as shown in the official tutorial, like this (again, an example to show the format):
baseMean log2FoldChange lfcSE stat pvalue padj
SMEL4.1_02g001580.1 2366.00298555111 0.364304598757931 0.074243385889273 4.9068963436185 9.25289021738908e-07 0.0076683327676612
SMEL4.1_07g022510.1 2196.50866292512 -0.761717027785759 0.152859394549787 -4.98312210400432 6.25664404354783e-07 0.0076683327676612
SMEL4.1_06g000870.1 12072.4148262247 -0.780492533034506 0.16593702035746 -4.70354675137095 2.55680515074842e-06 0.0105947613434138
SMEL4.1_12g000930.1 727.48576752828 -0.542466292269438 0.11410126216564 -4.75425321309718 1.99181275267645e-06 0.0105947613434138
SMEL4.1_08g023290.1 2745.00816946739 0.642851279859146 0.140000470422468 4.59177942702098 4.39482718044595e-06 0.0145688521031783
SMEL4.1_09g005850.1 5266.90686547789 -0.394356643158581 0.0875154389042871 -4.50613798086389 6.60181671396486e-06 0.0182375186723279
SMEL4.1_04g022680.1 4306.1114111016 0.436780400377881 0.0981091951616122 4.45198230052123 8.50811805120964e-06 0.0201460080998285
SMEL4.1_11g001830.1 1944.89865643966 5.53066904283725 1.26836216149314 4.3604809499571 1.29776880663403e-05 0.0268881474624488
SMEL4.1_11g016210.1 3279.56438989611 -0.635759169045224 0.148173951998916 -4.29062706682666 1.78169304175309e-05 0.032812846852286
Since, apparently, goana (belonging to the package limma) isn't designed specifically for edgeR, I'd like to run goana on a DESeq2 DEGs list provided externally. However, in order to do this, I'd still need to modify my original DESeq2 list so that it looks like an edgeR DEGs list, and I was wondering how to change it. So, trying to figure out a correspondence between the headers of the lists obtained from edgeR and DESeq2... my guess is that goana needs for sure the colums logFC and PValue to run enrichment analysis, which are log2FoldChange and pvalue, respectively, in the DESeq2 object.
I'm not sure which other parameters are necessary.
As far as I know, p-adj in DESeq2 is the equivalent of the edgeR's FDR.
logCPM isn't shown at all in DESeq2 ...
F and stat are probably both parameters related to the specific DEGs statistical analysis so may not be necessary in the further GO enrichment test...
Should I provide these parameters in the input list for goana? Are they necessary for the analysis?
Am I making wrong assumptions, or forgetting anything important? Is it correct to use goana on DESeq2 data this way? Thanks in advance for your tips!
Very interesting and specific.
It isn't a problem for the experiment I'm running right now, but in the future, in case I need to use specifically
goana
/kegga
(which somehow allows me to use my own 'customized' GO/genes correspondence dataset, that's why I'm sticking to it) in a DESeq2 pipeline, is there any way to obtain such objects? limma for instance isn't a package designed specifically for edgeR. The objects created byglmTreat, glmQLFTest
glmLRT
to the best of my knowledge are obtained in edgeR pipelines. I have no experience about the other ones you mentioned.goana
andkegga
are generic functions and they have specific methods written for limma and edgeR objects. Seegoana.DGELRT
.