Entering edit mode
Elizabeth Purdom
▴
210
@elizabeth-purdom-2486
Last seen 3.0 years ago
USA/ Berkeley/UC Berkeley
Hello,
I am trying to run code like that in the pdf documentation of BSGenome
to match patterns to all the chromosomes of a genome. I have copied
the
function 'runOneStrandAnalysis' from the documentation and have
changed
runAnalysis2 to be the human genome (BSgenome.Hsapiens.UCSC.hg18) as
follows:
runAnalysis2 <- function(dict0, outfile="")
{
seqnames <- seqnames(Hsapiens)
runOneStrandAnalysis(dict0, Celegans, seqnames, "+",
outfile=outfile,
append=FALSE)
runOneStrandAnalysis(dict0, Celegans, seqnames, "-",
outfile=outfile,
append=TRUE)
}
When I run the code using the patterns in the example, it runs fine
(though of course the patterns in the dictionary were intended for C
elegans, but that's irrelevant):
runAnalysis2(ce2dict0cw15, outfile="ce2dict0cw15_ana2.txt")
Instead I want to find if regions in the genome are repeated. So I
made
a dictionary that is class 'XStringViews' by using views to grab the
regions:
MAPLENGTH<-36
WRITELENGTH<-100
myDictViews<-views(Hsapiens$chr1, 1:WRITELENGTH,
(1:WRITELENGTH)+MAPLENGTH-1)
And I get the following error:
runAnalysis2(myDictViews, outfile="sample.txt")
>>> Finding all hits in strand + of chromosome chr1 ...
Error in .match.CWdna_PDict.exact(pdict, subject, fixed, count.only) :
Biostrings internal error in _init_chrtrtable(): invalid code -1
The problem is in matchPDict:
mypdict<-PDict(myDictViews)
subject <- Hsapiens[["chr1"]]
mindex <- matchPDict(mypdict, subject)
Error in .match.CWdna_PDict.exact(pdict, subject, fixed, count.only) :
Biostrings internal error in _init_chrtrtable(): invalid code -1
I have checked and there are only A,C,G,T values in this region. Is it
because I have a 'XStringViews' object rather than a DNAStringSet
object? If this is a bug (I know this part is still under progress) is
there a quick fix, like converting from 'XStringViews' to
'DNAStringSet'?
Also, if I want to do this on a large scale, is this a reasonable way
to
go or are there tricks for this that would speed it up? I'm still not
sure that I will ultimately do this in R, but just to know.
Thanks,
Elizabeth
UC, Berkeley