Hi,
I am using Rsamtools to generate pileup from a list of positions. Rsamtools doesn't give the reference base in the output so I am trying to import my reference fasta and retrieve the reference bases from it. Here is my code:
positions_file <- read.delim('positions.txt',header=F) head(positions_file) V1 V2 10 1156771 10 37484026 10 78483209 10 82960189 10 9551751 11 19256468 fa <- FaFile(file='gr37.fasta') idx <- scanFaIndex(fa) refbase <- getSeq(fa,GRanges(positions_file$V1,IRanges(start=as.numeric(positions_file$V2),end=as.numeric(positions_file$V2)))) head(refbase) A DNAStringSet instance of length 185 width seq names [1] 1 C 10 [2] 1 C 10 [3] 1 T 10 [4] 1 A 10 [5] 1 G 10 ... ... ... [181] 1 T 3 [182] 1 A 3 [183] 1 A 3 [184] 1 A 3 [185] 1 C 4 class(refbase) [1] "DNAStringSet" attr(,"package") [1] "Biostrings" REF <- as.data.frame(refbase)$x # right now I am doing something like this to extract sequences
I can retrieve the width and names using width(refbase) and names(refbase) but I am unable to retrieve the sequences using a single function. I can retrieve it by converting it into a dataframe and extracting that column. Just wanted to know if there is an inbuilt function for that.