incorrect biomaRt output order of gene list
3
0
Entering edit mode
Yuqia • 0
@yuqia-15072
Last seen 3.2 years ago
Switzerland

Hello,

I have another problem with biomaRt.

My ranked list of differentially expressed genes (ENSG...) are not ordered by the id number but by the expression fold change (naturally):

> library(biomaRt)
> mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
> list <- as.vector(testBiomaRt)
> list

                V1
1  ENSG00000185247
2  ENSG00000268089
3  ENSG00000151136
4  ENSG00000054793
5  ENSG00000121895
6  ENSG00000172264
7  ENSG00000162409
8  ENSG00000142698
9  ENSG00000132109
10 ENSG00000140090

But after I used getBM function to get the gene names for the list:

> res <- getBM(attributes = c('ensembl_gene_id', 'external_gene_name'),                     
             filters = 'ensembl_gene_id', 
             values = list,
             mart = mart)

> res

the result is a list shuffled by the ranked ENSG number from lowest to highest:

ensembl_gene_id                external_gene_name
1  ENSG00000054793              ATP9A
2  ENSG00000121895            TMEM156
3  ENSG00000132109             TRIM21
4  ENSG00000140090            SLC24A4
5  ENSG00000142698            C1orf94
6  ENSG00000151136             BTBD11
7  ENSG00000162409             PRKAA2
8  ENSG00000172264            MACROD2
9  ENSG00000185247            MAGEA11
10 ENSG00000268089              GABRQ

How can I maintain the original rank order of my input list in the output?

Thank you!

biomart ensembldb error • 2.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 47 minutes ago
United States

You shouldn't expect results from biomaRt to be sorted in any particular way, since it's at heart a database query and those come back unsorted. Instead you should query on the filter as well (like you did in the second example) and then use match to re-order correctly.

> library(org.Hs.eg.db)
> ensgs <- head(keys(org.Hs.eg.db, "ENSEMBL"), 30)
> ensgs
 [1] "ENSG00000121410" "ENSG00000175899" "ENSG00000256069" "ENSG00000171428"
 [5] "ENSG00000156006" "ENSG00000196136" "ENSG00000114771" "ENSG00000127837"
 [9] "ENSG00000129673" "ENSG00000090861" "ENSG00000183044" "ENSG00000165029"
[13] "ENSG00000107331" "ENSG00000167972" "ENSG00000131269" "ENSG00000204574"
[17] "ENSG00000225989" "ENSG00000232169" "ENSG00000206490" "ENSG00000236149"
[21] "ENSG00000236342" "ENSG00000231129" "ENSG00000198691" "ENSG00000097007"
[25] "ENSG00000002726" "ENSG00000143322" "ENSG00000175164" "ENSG00000281879"
[29] "ENSG00000159842" "ENSG00000276016"
> fakedata <- data.frame(EnsGene = ensgs, fakestuff =rnorm(30), stringsAsFactors = FALSE)
> z <- getBM(c("ensembl_gene_id","hgnc_symbol"), "ensembl_gene_id", fakedata[,1], mart)
> head(z)
  ensembl_gene_id hgnc_symbol
1 ENSG00000002726        AOC1
2 ENSG00000090861        AARS
3 ENSG00000097007        ABL1
4 ENSG00000107331       ABCA2
5 ENSG00000114771       AADAC
6 ENSG00000121410        A1BG
> fakedata$symbol <- z[match(fakedata[,1], z[,1]),2]
> fakedata
           EnsGene  fakestuff   symbol
1  ENSG00000121410  0.4284778     A1BG
2  ENSG00000175899  0.7939349      A2M
3  ENSG00000256069 -0.7086452    A2MP1
4  ENSG00000171428  1.0021299     NAT1
5  ENSG00000156006 -1.5783433     NAT2
6  ENSG00000196136 -0.2404067 SERPINA3
7  ENSG00000114771  0.6693119    AADAC
8  ENSG00000127837  0.7599615     AAMP
9  ENSG00000129673  0.2137799    AANAT
10 ENSG00000090861 -1.1840495     AARS
11 ENSG00000183044  1.7226415     ABAT
12 ENSG00000165029  1.0943198    ABCA1
13 ENSG00000107331 -0.9202913    ABCA2
14 ENSG00000167972 -0.9597995    ABCA3
15 ENSG00000131269  1.7312147    ABCB7
16 ENSG00000204574  0.1767857    ABCF1
17 ENSG00000225989 -1.3855398    ABCF1
18 ENSG00000232169 -0.4199767    ABCF1
19 ENSG00000206490  0.1315500    ABCF1
20 ENSG00000236149 -1.1918259    ABCF1
21 ENSG00000236342  1.2702487    ABCF1
22 ENSG00000231129  1.3142160    ABCF1
23 ENSG00000198691 -0.3801780    ABCA4
24 ENSG00000097007  1.6980101     ABL1
25 ENSG00000002726  0.3474628     AOC1
26 ENSG00000143322 -0.9090020     ABL2
27 ENSG00000175164 -2.6847646      ABO
28 ENSG00000281879 -1.2566993      ABO
29 ENSG00000159842  0.7652590      ABR
30 ENSG00000276016 -0.5721807      ABR​
ADD COMMENT
0
Entering edit mode
cherlyn.t • 0
@cherlynt-13245
Last seen 4.8 years ago

Hi,

James's method is great, but I have found the merge function to be simpler, you can merge both the converted list and your original list by = "ensembl_gene_id ", you just have to change the colname of "V1" to "ensembl_gene_id ".

I hope it helps.

 

ADD COMMENT
0
Entering edit mode
Yuqia • 0
@yuqia-15072
Last seen 3.2 years ago
Switzerland

Hi James,

Thank you very much! That sounds great! I'll try that.

Hi cherlyn, the V1 is the automatic header of the list when I use as.vector(list). But when I use read.csv to import the data, the list does not have that V1. Thank you for your input. I'll try that as well.

ADD COMMENT

Login before adding your answer.

Traffic: 818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6