Entering edit mode
Hello list.
I'm using the following script to try and retrieve the 3'UTR start and
end coordinates from Ensembl.
rm(list=ls())
library(biomaRt)
#read in probes called present on affy array (CPH in this script)
present <- read.table('cph_present_probes.txt', header=F, sep='\t')
present<-as.character(present[,1])
#present is a set of transcript ids
#get DB connection to retrieve required info
ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl")
#get 3'utr coords
utr_coords<-getBM(attributes=c('ensembl_gene_id',
'sequence_3utr_start', 'sequence_3utr_end'),
filters='ensembl_transcript_id', values=present, mart=ensmart)
Running the script gives the following error.
V1
1 Query ERROR: caught BioMart::Exception::Usage: Attribute 3utr_start
NOT FOUND
Error in getBM(attributes = c("ensembl_gene_id",
"sequence_3utr_start", :
Number of columns in the query result doesn't equal number of
attributes in query. This is probably an internal error, please
report.
Presumably some transcripts have more than 1 3'UTR (hence the number
of columns difference described above)
Can anyone suggest a solution? Either a way to retrieve the start and
end coords of the 3'UTRs or the length of the 3'UTRs (my real
objective).
I have a separate script which will download the 3'UTR sequences and
then count the characters but the datasets are large and that process
seems somewhat laborious if the information is directly available.
Thanks
Iain
> sessionInfo()
R version 2.8.0 (2008-10-20)
x86_64-pc-linux-gnu
locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB
.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_N
AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTI
FICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_1.16.0
loaded via a namespace (and not attached):
[1] RCurl_0.91-0 XML_1.95-3