Question

incorrect biomaRt output order of gene list

0

Entering edit mode

Yuqia • 0

@yuqia-15072

Last seen 3.6 years ago

Switzerland

Hello,

I have another problem with biomaRt.

My ranked list of differentially expressed genes (ENSG...) are not ordered by the id number but by the expression fold change (naturally):

> library(biomaRt)
> mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
> list <- as.vector(testBiomaRt)
> list

V1
1 ENSG00000185247
2 ENSG00000268089
3 ENSG00000151136
4 ENSG00000054793
5 ENSG00000121895
6 ENSG00000172264
7 ENSG00000162409
8 ENSG00000142698
9 ENSG00000132109
10 ENSG00000140090

But after I used getBM function to get the gene names for the list:

> res <- getBM(attributes = c('ensembl_gene_id', 'external_gene_name'),
filters = 'ensembl_gene_id',
values = list,
mart = mart)

> res

the result is a list shuffled by the ranked ENSG number from lowest to highest:

ensembl_gene_id external_gene_name
1 ENSG00000054793 ATP9A
2 ENSG00000121895 TMEM156
3 ENSG00000132109 TRIM21
4 ENSG00000140090 SLC24A4
5 ENSG00000142698 C1orf94
6 ENSG00000151136 BTBD11
7 ENSG00000162409 PRKAA2
8 ENSG00000172264 MACROD2
9 ENSG00000185247 MAGEA11
10 ENSG00000268089 GABRQ

How can I maintain the original rank order of my input list in the output?

Thank you!

biomart ensembldb error • 2.7k views

ADD COMMENT • link 7.1 years ago Yuqia • 0

score 0 · Answer 1 · 2018-03-07

You shouldn't expect results from biomaRt to be sorted in any particular way, since it's at heart a database query and those come back unsorted. Instead you should query on the filter as well (like you did in the second example) and then use match to re-order correctly.

> library(org.Hs.eg.db)
> ensgs <- head(keys(org.Hs.eg.db, "ENSEMBL"), 30)
> ensgs
 [1] "ENSG00000121410" "ENSG00000175899" "ENSG00000256069" "ENSG00000171428"
 [5] "ENSG00000156006" "ENSG00000196136" "ENSG00000114771" "ENSG00000127837"
 [9] "ENSG00000129673" "ENSG00000090861" "ENSG00000183044" "ENSG00000165029"
[13] "ENSG00000107331" "ENSG00000167972" "ENSG00000131269" "ENSG00000204574"
[17] "ENSG00000225989" "ENSG00000232169" "ENSG00000206490" "ENSG00000236149"
[21] "ENSG00000236342" "ENSG00000231129" "ENSG00000198691" "ENSG00000097007"
[25] "ENSG00000002726" "ENSG00000143322" "ENSG00000175164" "ENSG00000281879"
[29] "ENSG00000159842" "ENSG00000276016"
> fakedata <- data.frame(EnsGene = ensgs, fakestuff =rnorm(30), stringsAsFactors = FALSE)
> z <- getBM(c("ensembl_gene_id","hgnc_symbol"), "ensembl_gene_id", fakedata[,1], mart)
> head(z)
  ensembl_gene_id hgnc_symbol
1 ENSG00000002726        AOC1
2 ENSG00000090861        AARS
3 ENSG00000097007        ABL1
4 ENSG00000107331       ABCA2
5 ENSG00000114771       AADAC
6 ENSG00000121410        A1BG
> fakedata$symbol <- z[match(fakedata[,1], z[,1]),2]
> fakedata
           EnsGene  fakestuff   symbol
1  ENSG00000121410  0.4284778     A1BG
2  ENSG00000175899  0.7939349      A2M
3  ENSG00000256069 -0.7086452    A2MP1
4  ENSG00000171428  1.0021299     NAT1
5  ENSG00000156006 -1.5783433     NAT2
6  ENSG00000196136 -0.2404067 SERPINA3
7  ENSG00000114771  0.6693119    AADAC
8  ENSG00000127837  0.7599615     AAMP
9  ENSG00000129673  0.2137799    AANAT
10 ENSG00000090861 -1.1840495     AARS
11 ENSG00000183044  1.7226415     ABAT
12 ENSG00000165029  1.0943198    ABCA1
13 ENSG00000107331 -0.9202913    ABCA2
14 ENSG00000167972 -0.9597995    ABCA3
15 ENSG00000131269  1.7312147    ABCB7
16 ENSG00000204574  0.1767857    ABCF1
17 ENSG00000225989 -1.3855398    ABCF1
18 ENSG00000232169 -0.4199767    ABCF1
19 ENSG00000206490  0.1315500    ABCF1
20 ENSG00000236149 -1.1918259    ABCF1
21 ENSG00000236342  1.2702487    ABCF1
22 ENSG00000231129  1.3142160    ABCF1
23 ENSG00000198691 -0.3801780    ABCA4
24 ENSG00000097007  1.6980101     ABL1
25 ENSG00000002726  0.3474628     AOC1
26 ENSG00000143322 -0.9090020     ABL2
27 ENSG00000175164 -2.6847646      ABO
28 ENSG00000281879 -1.2566993      ABO
29 ENSG00000159842  0.7652590      ABR
30 ENSG00000276016 -0.5721807      ABR

score 0 · Answer 2 · 2018-03-08

0

Entering edit mode

cherlyn.t • 0

@cherlynt-13245

Last seen 5.2 years ago

Hi,

James's method is great, but I have found the merge function to be simpler, you can merge both the converted list and your original list by = "ensembl_gene_id ", you just have to change the colname of "V1" to "ensembl_gene_id ".

I hope it helps.

ADD COMMENT • link 7.1 years ago cherlyn.t • 0

score 0 · Answer 3 · 2018-03-08

0

Entering edit mode

Yuqia • 0

@yuqia-15072

Last seen 3.6 years ago

Switzerland

Hi James,

Thank you very much! That sounds great! I'll try that.

Hi cherlyn, the V1 is the automatic header of the list when I use as.vector(list). But when I use read.csv to import the data, the list does not have that V1. Thank you for your input. I'll try that as well.

ADD COMMENT • link 7.1 years ago Yuqia • 0