seemingly erratic responses to different getBM biomaRt requests depending on the attributes requested
1
0
Entering edit mode
efoss ▴ 10
@efoss-8908
Last seen 3.3 years ago
United States

120315

This may be related to a getBM biomaRt returns different results for the same attribute, depending on which attributes I request I posted, but if so, I'm still in need of guidance: 

I made a Mart object as follows: 

Mmmart2 <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl")

Now I will use "getBM" to ask for six different sets of attributes for 1 non-coding RNA (Xist) and three protein-coding genes, and I will get what seem to me to be rather unpredictable responses depending on which set of attributes I request:

(1)

> getBM(attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide"),
+       filters = "mgi_symbol",
+       values = c("Xist", "Yy1", "Hist1h1c", "Hist1h3c"),
+       mart = Mmmart)
  refseq_mrna refseq_ncrna refseq_peptide
1   NM_015786           NA      NP_056601
2   NM_175653           NA      NP_783584
3   NM_009537           NA      NP_033563

Why did I get nothing for my non-coding RNAs? There is a non-coding RNA in there, and Biomart returns it to me if I use the same command except changing the attributes that I request to only the non-coding RNA:

(2)

attributes = c("refseq_ncrna")
  refseq_ncrna
1    NR_001463

So why didn't my request in (1) give something like this:

  refseq_mrna refseq_ncrna refseq_peptide
1   NM_015786           NA      NP_056601
2   NM_175653           NA      NP_783584
3   NM_009537           NA      NP_033563
4   NA                         NR_001463      NA

?

(3)

Now I try to add names to my request in (1) so that I know which pair of "NM_" and "NP_" identifiers go with which gene symbol, but it complains with an error:

attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide", "mgi_symbol")
Error in getBM(attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide",  :
  Query ERROR: caught BioMart::Exception::Usage: Too many attributes selected for External References

Is 4 requested attributes really too many, or is it instead that the attributes I've requested are somehow incompatible?

(5)

If I ask for the same thing except dropping the request for the non-coding RNA attribute, it works fine, but doesn't acknowledge anything for Xist, which is still in the values:

attributes = c("refseq_mrna", "refseq_peptide", "mgi_symbol")
  refseq_mrna refseq_peptide mgi_symbol
1   NM_015786      NP_056601   Hist1h1c
2   NM_175653      NP_783584   Hist1h3c
3   NM_009537      NP_033563        Yy1

(6)

But then when I do the converse and ask for the non-coding RNA with its symbol, it acknowledges the coding genes (unlike in (5) for the non-coding RNA) and gives me two Xist rows, one with and one without an "NR_" identifier:

attributes = c("refseq_ncrna", "mgi_symbol")
  refseq_ncrna mgi_symbol
1                Hist1h1c
2                Hist1h3c
3                    Xist
4    NR_001463       Xist
5                     Yy1
>

If I do things like this with an OrgDb object, I get everything with one simple request:

desired <- c("Xist", "Yy1", "Hist1h1c", "Hist1h3c")
desiredRefs <- select(x = mouseOrgDb,
                      keys = keys(mouseOrgDb, keytype = "SYMBOL")[which(keys(mouseOrgDb, keytype = "SYMBOL") %in% desired)],
                      keytype = "SYMBOL",
                      columns = c("SYMBOL", "REFSEQ"))
> desiredRefs
     SYMBOL       REFSEQ
1       Yy1    NM_009537
2       Yy1    NP_033563
3       Yy1 XM_006515820
4       Yy1 XP_006515883
5  Hist1h1c    NM_015786
6  Hist1h1c    NP_056601
7      Xist    NR_001463
8      Xist    NR_001570
9  Hist1h3c    NM_175653
10 Hist1h3c    NP_783584

(I initially had requested the "refseq_ncrna_predicted" and "refseq_peptide_predicted" attributes in my getBM queries, but that caused even more trouble.)

Looking back at my getBM request (6), I see that the empty Xist slot was probably for "NR_001570", since it seems to know that there were two, though it only gave me "NR_001463".

One response to my question could, of course, be to just forget about biomaRt and stick with OrgDb-style requests, but I would like to understand what is going on with biomaRt. Any advice would be appreciated.

Thanks.

Eric

 

getbm biomart • 2.1k views
ADD COMMENT
0
Entering edit mode
Thomas Maurel ▴ 800
@thomas-maurel-5295
Last seen 21 months ago
United Kingdom

Dear Eric,

Please find below answers to your questions:

(1) I am afraid that what you observe is a BioMart software known bug. The refseq_peptide information is annotated on the protein level and because of this bug mart will only return protein coding transcripts in the output instead of all the Transcripts (for more details, please read our Ensembl FAQ: http://www.ensembl.org/Help/Faq?id=476). As you have noticed in (2), if you remove the "refseq_peptide" attribute from your second query, you will get all the non coding information back. We believe that it may be possible to change this behaviour to be more consistent and have requested such a change from the BioMart developers.

(3) To allow BioMart to return results in a reasonable amound of time, we have restricted the number of External references that you can select to 3. I am afraid that you can't visualy see the limit when using R but you can see it on the BioMart interface (http://www.ensembl.org/biomart/martview), e.g: "External References (max 3)"   

(5) and (6) same explanation as (1) above.

Hope this helps,
Best Regards,
Thomas

ADD COMMENT
0
Entering edit mode

Dear Thomas, 

Thank you very much. I really appreciate it. 

Eric

ADD REPLY

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6