Entering edit mode
Arnaud Mounier
▴
40
@arnaud-mounier-5957
Last seen 8.7 years ago
Hi,
I've build a specific DataFrame with python pandas to compute ontology
frequencies with goProfiles in bioconductor. I use the basicProfile
function with option 'GOTermsFrame' but without the optional column
'Evidence'. I've got one big dataframe as follow :
In [1]: df.info()
<class 'pandas.core.frame.dataframe'="">
Int64Index: 119626 entries, 0 to 119625
Data columns (total 3 columns):
GeneID 119626 non-null object
GOID 119626 non-null object
Ontology 119626 non-null object
dtypes: object(3)
So, almost 120000 entries with divided with Ontology as follow :
In [2]: df.groupby(['Ontology'])['Ontology'].count()
Ontology
BP 58802
CC 26867
MF 33957
When I compute goProfile with any three Ontology at level 2, I've got
this frequencies :
In [3]: rdf = com.convert_to_r_dataframe(df)
In [4]: %%R -i rdf
> library(goProfiles)
> rdf <- as.data.frame(rdf)
> print(head(rdf))
GeneID GOID Ontology
0 VIT_201s0011g00010.1 GO:0043565 MF
1 VIT_201s0011g00010.1 GO:0003964 MF
2 VIT_201s0011g00010.1 GO:0006278 BP
3 VIT_201s0011g00010.1 GO:0006367 BP
4 VIT_201s0011g00010.1 GO:0003743 MF
5 VIT_201s0011g00010.1 GO:0005840 CC
> profiles.ANY <-
basicProfile(rdf,idType='GOTermsFrame',onto="ANY",level=2)
> printProfiles(profiles.ANY,percentage=T,aTitle="Test GO
Profile")
Test GO Profile
========================
[1] "MF ontology"
Description GOID Frequency
12 antioxidant activity GO:0016209 1.0
9 binding GO:0005488 75.0
4 catalytic activity GO:0003824 65.1
1 electron carrier activity... GO:0009055 3.5
15 enzyme regulator activity... GO:0030234 1.6
21 molecular transducer acti... GO:0060089 3.1
3 nucleic acid binding tran... GO:0001071 2.8
6 nutrient reservoir activi... GO:0045735 0.5
2 protein binding transcrip... GO:0000988 0.1
5 receptor activity GO:0004872 1.2
7 structural molecule activ... GO:0005198 2.8
8 transporter activity GO:0005215 8.2
[1] "BP ontology"
[1] Description GOID Frequency
<0 lignes> (ou 'row.names' de longueur nulle)
[1] "CC ontology"
[1] Description GOID Frequency
<0 lignes> (ou 'row.names' de longueur nulle)
So, neither BP or CC Ontology is show up.
But when I take a slice of 500 rows of this big dataframe and compute
the same ways (any ontology, level=2), I've got this :
In [5]: dft = df[0:500]
In [6]: rdft = com.convert_to_r_dataframe(dft)
In [7]: %%R -i rdft
> profs.ANY <-
basicProfile(rdf,idType='GOTermsFrame',onto="ANY",level=2)
> printProfiles(profiles.ANY,percentage=T,aTitle="Test GO
Profile")
Test Profile
============
[1] "MF ontology"
Description GOID Frequency
9 binding GO:0005488 77.8
4 catalytic activity GO:0003824 49.2
1 electron carrier activity... GO:0009055 3.2
3 nucleic acid binding tran... GO:0001071 1.6
7 structural molecule activ... GO:0005198 1.6
8 transporter activity GO:0005215 12.7
[1] "BP ontology"
[1] Description GOID Frequency
<0 lignes> (ou 'row.names' de longueur nulle)
[1] "CC ontology"
Description GOID Frequency
3 cell GO:0005623 93.4
6 cell junction GO:0030054 3.3
17 cell part GO:0044464 93.4
2 extracellular region GO:0005576 8.2
9 macromolecular complex... GO:0032991 21.3
1 membrane GO:0016020 34.4
8 membrane-enclosed lumen... GO:0031974 3.3
15 membrane part GO:0044425 19.7
4 nucleoid GO:0009295 1.6
10 organelle GO:0043226 75.4
13 organelle part GO:0044422 21.3
19 symplast GO:0055044 3.3
I'm not really understand why :
- there is no BP frequencies in both df whereas thereis 58802 genes
with
BP ontology in the main frame
- there is CC frequencies in short frame and not at all in the main
frame whereas the short in first part of the big one.
Can the level (2 in this case) can explain this big difference ?
Thank's a lot,
Arnome.
--
? Quand les hommes consid?rent certaines situations comme r?elles,
elles
sont r?elles dans leur cons?quence. ?
Le th?or?me de Thomas.
Arnaud Mounier
INRA - UMR Agro?cologie 1347
CNRS - ERL IPM 6300 (Plant-Microorganism Interaction)
17, rue Sully - BP 86510 - F-21065 Dijon Cedex - France
Work phone : +33 380 693 167 - Fax : +33 380 693 753
https://www6.dijon.inra.fr/umragroecologie/Personnel/IPM/ITA/MOUNIER-
Arnaud