Entering edit mode
Elizabeth Purdom
▴
210
@elizabeth-purdom-2486
Last seen 3.0 years ago
USA/ Berkeley/UC Berkeley
Hello,
I am baffled by something I happened to discover in the results of my
query with biomaRt and I can't figure out what's going on. I am using
getBM to pull down a large number of gene coordinates, and filtering
to
restrict to chromosomes 1-22 and X,Y. For some reason this procedure
(which is giving no errors) is not pulling down some genes that I
think
it should.
My basic code for pulling down all of this information is:
tempAll<-getBM(c("ensembl_gene_id", "start_position",
"end_position","strand","chromosome_name","biotype"),filter =
"chromosome_name", values = c(1:22, "X", "Y"),mart = mart)
A particular gene, "ENSG00000011677", is found by 'getGene' (and other
getBM queries with different filters, as I discuss below) but not in
my
main query:
> getGene("ENSG00000011677","ensembl_gene_id",mart)
ensembl_gene_id hgnc_symbol
1 ENSG00000011677 GABRA3
description
1 Gamma-aminobutyric acid receptor subunit alpha-3 precursor (GABA(A)
receptor subunit alpha-3). [Source:Uniprot/SWISSPROT;Acc:P34903]
chromosome_name band strand start_position end_position
ensembl_gene_id
1 X q28 -1 151086290 151370993
ENSG00000011677
> tempAll[match("ENSG00000011677",tempAll$ensembl_gene_id),]
ensembl_gene_id start_position end_position strand chromosome_name
biotype
NA <na> NA NA NA <na>
<na>
Oddly, if I change my main code to filter on chromosome_name but just
"X", just c("X","Y"), just c(1,"X"), and a couple of other
combinations
I picked then this gene correctly appears. It also appears if I filter
on 'biotype' equals 'protein_coding'. I won't show all of these
results
unless someone wants, but I just copied and pasted so that was
definitely the only thing changing.
When I looked, of the 21,021 genes on chr1-22,X,Y brought down with
filter of 'biotype' equals 'protein_coding', only 16,236 of them were
in
my main query that limited by chromosome ('tempAll' above). The ~5,000
missing ones are only in chr 5-9 and X,Y. I'm thinking there is some
matching problem going on but I don't know where (and if it's my error
or not).
For now I'm just pulling it all down and filtering myself, but I would
like to know what's going on here.
Best,
Elizabeth