issue retrieving metadata after psetdiff function
1
0
Entering edit mode
@andreag1987-14482
Last seen 7.0 years ago

Hello,

I am trying to fetch promoter regions that don't overlap with cds. This is what my code looks like:

phy<-makeTxDbFromGRanges(phy)
GR <- cds(phy,columns=c("GENEID","TXID","CDSNAME","CDSID"))

indexFa(file.choose())
#select the fasta file
fasta<-file.choose()
phy.fasta<-FaFile(fasta, index=sprintf("%s.fai", fasta))

#selecting the promoter regions
p<-promoters(phy,300,0,columns=c("GENEID","TXID","CDSNAME","CDSID"))
names(p)<-paste(p$GENEID)

#retrieving those sequences that don't fall into cds

hits <- findOverlaps(p,GR)
overlapl <- extractList(GR, as(hits, "List"))
promoter_seqs<-psetdiff(p, overlapl)
mcols(promoter_seqs)<-mcols(p)
promoter_seqs<-unlist(promoter_seqs)

#extracting sequences from the fasta file
promoter_seqs_fa <- getSeq(phy.fasta,promoter_seqs)

Initially I used subsetByOverlaps:

promoter_seqs<-subsetByOverlaps(p,GR,minoverlap=284,invert = TRUE)

but than I wanted to keep the sequences with less than 300bp without having overlaps, so I find this way:

hits <- findOverlaps(p,GR)
overlapl <- extractList(GR, as(hits, "List"))
promoter_seqs<-psetdiff(p, overlapl)
mcols(promoter_seqs)<-mcols(p)
promoter_seqs<-unlist(promoter_seqs)

The problem is that after calling psetdiff() and unlist (), I am not able to retrieve any metadata column using mcols. I use the metadata columns to map each fragment with the correspondent gene.

Using:

mcols(promoter_seqs)<-mcols(p)

after unlisting doesn't work because promoter_seqs and p will have different lenghts.

Thanks to everyone who can help me.

 

grangeslist granges • 1.3k views
ADD COMMENT
0
Entering edit mode
@andreag1987-14482
Last seen 7.0 years ago

By the way I found a solution, I edited the psetdiff() methods adding the following line:

mcols(ansGRanges)<-rep(mcols(x),elementNROWS(ansRanges))

Here the code with the added line:

setMethod("psetdiff", c("GRanges", "GRangesList"),
          function(x, y, ignore.strand=FALSE)
          {
            ansSeqinfo <- merge(seqinfo(x), seqinfo(y))
            if (length(x) != length(y))
              stop("'x' and 'y' must have the same length")
            ok <- compatibleSeqnames(rep(seqnames(x), elementNROWS(y)),
                                     seqnames(y@unlistData))
            if (!ignore.strand)
              ok <- ok & compatibleStrand(rep(strand(x), elementNROWS(y)),
                                          strand(y@unlistData))
            if (!all(ok)) {
              ok <-
                new2("CompressedLogicalList", unlistData=as.vector(ok),
                     partitioning=y@partitioning)
              y <- y[ok]
            }
            
            ansRanges <- gaps(ranges(y), start=start(x), end=end(x))
            ansSeqnames <- rep(seqnames(x), elementNROWS(ansRanges))
            ansStrand <- rep(strand(x), elementNROWS(ansRanges))
            
            ansGRanges <-
              GRanges(ansSeqnames, unlist(ansRanges, use.names=TRUE), ansStrand)
            seqinfo(ansGRanges) <- ansSeqinfo
            mcols(ansGRanges)<-rep(mcols(x),elementNROWS(ansRanges))
            relist(ansGRanges, PartitioningByEnd(ansRanges))
            
          }
)

Hope it can be helpful

ADD COMMENT
0
Entering edit mode

The current behavior is intentional. There is not a 1-1 correspondence between the input and output ranges (which required the use of rep() here). Therefore it does not make sense in general to carry over the metadata.

ADD REPLY

Login before adding your answer.

Traffic: 612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6