biomaRt getSequence(): Error in .lock2
1
1
Entering edit mode
Melanie ▴ 10
@3a07eec7
Last seen 3 months ago
United States

I want to gather mRNA sequences by region (e.g. 3utr, 5utr, cds). At a certain point while accessing the sequences, I receive the following error message:

Error in .lock2(dbfile, exclusive = TRUE) : requested an exclusive lock when caller only holds a shared lock

Any suggestions would be greatly appreciated!

getRegionSequence <- function(region, start_iteration_idx){
  ensembl_list <- list()
  sequence_list <- list()
  for(idx in start_iteration_idx:(start_iteration_idx+5000)){
  # for (idx in start_iteration_idx:nrow(ensembl_ids)){
    ensembl_id <- ensembl_ids[idx,]
    sequence_id <- getSequence(mart=ensembl_GRCh37, seqType=region, type="ensembl_gene_id", id=ensembl_id)
    sequence_only <- sequence_id[region]

    print(idx)
    # print(region)
    # print(sequence_only[, region])
    # print(sequence_only)
    if (length(sequence_only[, region]) == 0){
      len_ensembl <- length(ensembl_list)    
      len <- length(sequence_list)

      ensembl_list[[len+1]] <- ensembl_id
      sequence_list[[len+1]] <- "Sequence unavailable"; next
    }

    sequence <- sequence_only[which.max(nchar(sequence_only[, region])),]
    if ((sequence == "Sequence unavailable") & (length(sequence_only[, region]) > 1)){
      print(sequence_only)
      sequence_only <- data.frame(sequence_only[!grepl('Sequence unavailable', sequence_only[, region]),])
      colnames(sequence_only)[1] <- "3utr"
      print(sequence_only)
      sequence <- sequence_only[which.max(nchar(sequence_only[, region])),]
      print(sequence)
    }

    len_ensembl <- length(ensembl_list)    
    len <- length(sequence_list)

    ensembl_list[[len+1]] <- ensembl_id
    sequence_list[[len+1]] <- sequence
  }
  # return(sequences)

  return(list("ensembl_id" = ensembl_list, "sequence" = sequence_list))
}

id_sequences <- getRegionSequence("3utr", 7985)
ensembldb getSequence • 518 views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 4 hours ago
EMBL Heidelberg

Thanks for the report. The actual error is coming from the BiocFileCache packaged, which biomaRt uses to store query results on disk and speed up repeated queries if you run them lots of times.

Based on the error message it seems like there might be more than one R session running, and if they're both trying to get access to the BiocFileCache database at the same time this might happen. So the first thing I would check is to make sure there's only one R session.

You could also try running your called to getSequence() with the argument useCache = FALSE which will disable this caching mechanism. That should bypass the database calls, but you obviously lose the speed benefits if you're running this over and over again.

Finally I should point out that getSequence() will happily take a vector of Ensembl IDs. So rather than sitting inside this loop, you can just pass the whole of ensembl_ids to the function. There's a chance this will timeout if it's requires a really long time to get everything from biomaRt, but if it works it should be significantly quicker than doing each individual ID inside the loop one at a time.

ADD COMMENT

Login before adding your answer.

Traffic: 825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6