Question

issue retrieving metadata after psetdiff function

0

Entering edit mode

andreag1987 • 0

@andreag1987-14482

Last seen 7.0 years ago

Hello,

I am trying to fetch promoter regions that don't overlap with cds. This is what my code looks like:

phy<-makeTxDbFromGRanges(phy)
GR <- cds(phy,columns=c("GENEID","TXID","CDSNAME","CDSID"))

indexFa(file.choose())
#select the fasta file
fasta<-file.choose()
phy.fasta<-FaFile(fasta, index=sprintf("%s.fai", fasta))

#selecting the promoter regions
p<-promoters(phy,300,0,columns=c("GENEID","TXID","CDSNAME","CDSID"))
names(p)<-paste(p$GENEID)

#retrieving those sequences that don't fall into cds

hits <- findOverlaps(p,GR)
overlapl <- extractList(GR, as(hits, "List"))
promoter_seqs<-psetdiff(p, overlapl)
mcols(promoter_seqs)<-mcols(p)
promoter_seqs<-unlist(promoter_seqs)

#extracting sequences from the fasta file
promoter_seqs_fa <- getSeq(phy.fasta,promoter_seqs)

Initially I used subsetByOverlaps:

promoter_seqs<-subsetByOverlaps(p,GR,minoverlap=284,invert = TRUE)

but than I wanted to keep the sequences with less than 300bp without having overlaps, so I find this way:

hits <- findOverlaps(p,GR)
overlapl <- extractList(GR, as(hits, "List"))
promoter_seqs<-psetdiff(p, overlapl)
mcols(promoter_seqs)<-mcols(p)
promoter_seqs<-unlist(promoter_seqs)

The problem is that after calling psetdiff() and unlist (), I am not able to retrieve any metadata column using mcols. I use the metadata columns to map each fragment with the correspondent gene.

Using:

mcols(promoter_seqs)<-mcols(p)

after unlisting doesn't work because promoter_seqs and p will have different lenghts.

Thanks to everyone who can help me.

grangeslist granges • 1.3k views

ADD COMMENT • link 7.0 years ago andreag1987 • 0

score 0 · Answer 1 · 2018-01-24

By the way I found a solution, I edited the psetdiff() methods adding the following line:

mcols(ansGRanges)<-rep(mcols(x),elementNROWS(ansRanges))

Here the code with the added line:

setMethod("psetdiff", c("GRanges", "GRangesList"),
          function(x, y, ignore.strand=FALSE)
          {
            ansSeqinfo <- merge(seqinfo(x), seqinfo(y))
            if (length(x) != length(y))
              stop("'x' and 'y' must have the same length")
            ok <- compatibleSeqnames(rep(seqnames(x), elementNROWS(y)),
                                     seqnames(y@unlistData))
            if (!ignore.strand)
              ok <- ok & compatibleStrand(rep(strand(x), elementNROWS(y)),
                                          strand(y@unlistData))
            if (!all(ok)) {
              ok <-
                new2("CompressedLogicalList", unlistData=as.vector(ok),
                     partitioning=y@partitioning)
              y <- y[ok]
            }
            
            ansRanges <- gaps(ranges(y), start=start(x), end=end(x))
            ansSeqnames <- rep(seqnames(x), elementNROWS(ansRanges))
            ansStrand <- rep(strand(x), elementNROWS(ansRanges))
            
            ansGRanges <-
              GRanges(ansSeqnames, unlist(ansRanges, use.names=TRUE), ansStrand)
            seqinfo(ansGRanges) <- ansSeqinfo
            mcols(ansGRanges)<-rep(mcols(x),elementNROWS(ansRanges))
            relist(ansGRanges, PartitioningByEnd(ansRanges))
            
          }
)

Hope it can be helpful