Hello,
I am trying to fetch promoter regions that don't overlap with cds. This is what my code looks like:
phy<-makeTxDbFromGRanges(phy) GR <- cds(phy,columns=c("GENEID","TXID","CDSNAME","CDSID")) indexFa(file.choose()) #select the fasta file fasta<-file.choose() phy.fasta<-FaFile(fasta, index=sprintf("%s.fai", fasta)) #selecting the promoter regions p<-promoters(phy,300,0,columns=c("GENEID","TXID","CDSNAME","CDSID")) names(p)<-paste(p$GENEID) #retrieving those sequences that don't fall into cds hits <- findOverlaps(p,GR) overlapl <- extractList(GR, as(hits, "List")) promoter_seqs<-psetdiff(p, overlapl) mcols(promoter_seqs)<-mcols(p) promoter_seqs<-unlist(promoter_seqs) #extracting sequences from the fasta file promoter_seqs_fa <- getSeq(phy.fasta,promoter_seqs)
Initially I used subsetByOverlaps:
promoter_seqs<-subsetByOverlaps(p,GR,minoverlap=284,invert = TRUE)
but than I wanted to keep the sequences with less than 300bp without having overlaps, so I find this way:
hits <- findOverlaps(p,GR) overlapl <- extractList(GR, as(hits, "List")) promoter_seqs<-psetdiff(p, overlapl) mcols(promoter_seqs)<-mcols(p) promoter_seqs<-unlist(promoter_seqs)
The problem is that after calling psetdiff() and unlist (), I am not able to retrieve any metadata column using mcols. I use the metadata columns to map each fragment with the correspondent gene.
Using:
mcols(promoter_seqs)<-mcols(p)
after unlisting doesn't work because promoter_seqs and p will have different lenghts.
Thanks to everyone who can help me.
The current behavior is intentional. There is not a 1-1 correspondence between the input and output ranges (which required the use of rep() here). Therefore it does not make sense in general to carry over the metadata.