Hi Mike,
I have a general question and what seems to be a bug or something weird.
My general goal would be:
given a set of genomic coordinates get the genomic DNA sequence, regardless if the element is an exon, intro, coding region, enhancer, whatever.
I don't need to get flanking regions upstream or downstream
I'd like to specify the genome assembly (e.g. hg38, hg19, mm10, or mm9)
I fear that biomaR
t actually cannot do that, but I'd be awesome if I were to be proven wrong.
For example, let's get 20bp of an intergenic region, that could be an exon or an intron. The coordinates are: chrX:100636100-100636120
biomaRt::getSequence(chromosome = "x",
start = 100636100,
end = 100636120,
seqType = 'cdna',
type = 'ensembl_gene_id',
mart = ensembl,
verbose = F) -> tmp
As seqType
I specified cdna
as I read in the manual that that would return a nucleotide sequence. For the type
argument I selected ensembl_gene_id
just because I was forced to pick one ID. I would expect such query to return a dataframe
with only one row containing the 20bp nucleotide sequence.
However,
dim(tmp)
[1] 5 2
nchar(tmp$cdna)
[1] 3768 3796 1025 820 900
Meaning that I get 5 rows each with DNA sequences of different lenght.
Now is there a way to get only the correct genomic DNA sequence?
I hoped that with the seqType
argument one could get such thing, so while playing around and when using coding_gene_flank
biomaRt::getSequence(chromosome = 'x',
start = 100636100,
end = 100636120,
seqType = 'coding_gene_flank',
type = 'ensembl_gene_id',
mart = ensembl,
verbose = F)
and I came across this error message.
Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = fullXmlQuery, :
The query to the BioMart webservice returned an invalid result: the number of columns in the result table does not equal the number of attributes in the query.
Please report this on the support site at http://support.bioconductor.org
So, as requested here I am.
Thanks!
PS:
I'm using biomaRt
v 2.42.0.
Just as a side note, maybe I'd be nice to include this info in the official documentation.