Question

getBM. Get genome location for probes that do not map to genes

1

Entering edit mode

Aedin Culhane ▴ 510

@aedin-culhane-1526

Last seen 5.6 years ago

United States

Hi

I am trying to get annotation on affy_hugene probes. When I go to EnsEMBL, i can see these map unique to introns of genes. Can I use getBM (biomaRt) to retrieve the genome location of mapped probes. Then I can query other db to get annotation for that genomics region

mart <- useDataset("hsapiens_gene_ensembl",useMart("ensembl"))

getBM(attributes = c("chromosome_name", "band","start_position", "end_position"),filters= "affy_hugene_1_0_st_v1", values= "7893529", mart= mart)

http://useast.ensembl.org/Homo_sapiens/Location/Genome?fdb=funcgen;ftype=ProbeFeature;id=7893529;ptype=pset

Thanks

Aedin

biomart annotation • 1.8k views

ADD COMMENT • link 8.5 years ago Aedin Culhane ▴ 510

score 0 · Answer 1 · 2016-10-11

0

Entering edit mode

Aedin Culhane ▴ 510

@aedin-culhane-1526

Last seen 5.6 years ago

United States

BTW, I also tried

select(hugene10sttranscriptcluster.db, keys = "7893529", keytype = "PROBEID", columns="MAP")

ADD COMMENT • link 8.5 years ago Aedin Culhane ▴ 510

score 0 · Answer 2 · 2016-10-11

You can't use any of the regular annotation packages to do that, as they are just what Affy gives us, repackaged. And they don't say anything about the intronic probes, in general.

This is echoed in the pdInfoPackage

> dbGetQuery(con, "select * from featureSet where transcript_cluster_id='7893529';")
   fsetid strand start stop transcript_cluster_id exon_id crosshyb_type level
1 7893529     NA    NA   NA               7893529       0             0    NA
  chrom type
1    NA   10

You can get the probe sequences from the probeset fasta file and you could then align against the human genome using say matchPDict from Biostrings, or just use blat at UCSC. Pasting the following in blat brings up the gene ANAPC5

>probe:HuGene-1_0-st-v1:814225;474:775; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
AAATGTAAAGAGCCGCTATTCATAA
>probe:HuGene-1_0-st-v1:533338;987:507; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
CAGAAATGTAAAGAGCCGCTATTCA
>probe:HuGene-1_0-st-v1:28669;318:27; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
GAAATGTAAAGAGCCGCTATTCATA
>probe:HuGene-1_0-st-v1:699266;1015:665; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
TAAAGAGCCGCTATTCATAACAGCC

score 0 · Answer 3 · 2016-10-12

0

Entering edit mode

Aedin Culhane ▴ 510

@aedin-culhane-1526

Last seen 5.6 years ago

United States

Thanks James

EnsEMBL has already mapped all of the sequences to the genome (see http://useast.ensembl.org/Homo_sapiens/Location/Genome?fdb=funcgen;ftype=ProbeFeature;id=7893529;ptype=pset), so really I just want to pull their mappings. But I couldn't work out how to do this with BiomaRt. Do you know?

Aedin

ADD COMMENT • link 8.5 years ago Aedin Culhane ▴ 510

0

Entering edit mode

As far as I can tell you can't get that information from Biomart. As you note, they have done the mapping at Ensembl, and you can get it by searching (and they say you can get it from Biomart, but it appears only to be true if the probes hit an exon), so I don't know if there is any easy way to get at it without hitting the Ensembl DB directly. They do have a Perl API that you can use if you want to get your Perl on.

I tried playing around with a direct query to their MySQL database, but A) they kick you off in like three nanoseconds if you aren't doing something, and B) doing direct queries would require some knowledge of their DB schemas, which I don't have.

If it were my project, I would probably just read the probe fasta file into a DNAStringSet, subset to just the intronic controls, and then align to the human genome using something similar to what Herve does in section 8 of this vignette. That seems like the less 'teeth gnashy' way to proceed.

ADD REPLY • link 8.5 years ago James W. MacDonald 68k