I am trying to get annotation on affy_hugene probes. When I go to EnsEMBL, i can see these map unique to introns of genes. Can I use getBM (biomaRt) to retrieve the genome location of mapped probes. Then I can query other db to get annotation for that genomics region
mart <- useDataset("hsapiens_gene_ensembl",useMart("ensembl"))
You can't use any of the regular annotation packages to do that, as they are just what Affy gives us, repackaged. And they don't say anything about the intronic probes, in general.
This is echoed in the pdInfoPackage
> dbGetQuery(con, "select * from featureSet where transcript_cluster_id='7893529';")
fsetid strand start stop transcript_cluster_id exon_id crosshyb_type level
1 7893529 NA NA NA 7893529 0 0 NA
chrom type
1 NA 10
You can get the probe sequences from the probeset fasta file and you could then align against the human genome using say matchPDict from Biostrings, or just use blat at UCSC. Pasting the following in blat brings up the gene ANAPC5
As far as I can tell you can't get that information from Biomart. As you note, they have done the mapping at Ensembl, and you can get it by searching (and they say you can get it from Biomart, but it appears only to be true if the probes hit an exon), so I don't know if there is any easy way to get at it without hitting the Ensembl DB directly. They do have a Perl API that you can use if you want to get your Perl on.
I tried playing around with a direct query to their MySQL database, but A) they kick you off in like three nanoseconds if you aren't doing something, and B) doing direct queries would require some knowledge of their DB schemas, which I don't have.
If it were my project, I would probably just read the probe fasta file into a DNAStringSet, subset to just the intronic controls, and then align to the human genome using something similar to what Herve does in section 8 of this vignette. That seems like the less 'teeth gnashy' way to proceed.
As far as I can tell you can't get that information from Biomart. As you note, they have done the mapping at Ensembl, and you can get it by searching (and they say you can get it from Biomart, but it appears only to be true if the probes hit an exon), so I don't know if there is any easy way to get at it without hitting the Ensembl DB directly. They do have a Perl API that you can use if you want to get your Perl on.
I tried playing around with a direct query to their MySQL database, but A) they kick you off in like three nanoseconds if you aren't doing something, and B) doing direct queries would require some knowledge of their DB schemas, which I don't have.
If it were my project, I would probably just read the probe fasta file into a DNAStringSet, subset to just the intronic controls, and then align to the human genome using something similar to what Herve does in section 8 of this vignette. That seems like the less 'teeth gnashy' way to proceed.