I'm trying to create a GenomicRanges list with mature miRNA coordinates relative to their hairpin precursor sequence. Essentially each hairpin will be its own chromosome with 1-2 mature miRNAs per "hairpin chromosome". I downloaded the human miRNA gff file from miRbase and did the following:
gff <- import('hsa.gff3') hairpin <- gff[mcols(gff)$type == 'miRNA_primary_transcript',] names(hairpin) <- mcols(hairpin)$ID mature <- gff[mcols(gff)$type == 'miRNA',] names(mature) <- mcols(mature)$Derives_from mature_relative <- shift(mature, -start(hairpin[names(mature),])) names(mature_relative) <- mcols(mature)$ID
This is what I end up with:
GRanges object with 2813 ranges and 8 metadata columns: seqnames ranges strand | source type score phase ID Alias Name Derives_from <Rle> <IRanges> <Rle> | <factor> <factor> <numeric> <integer> <character> <CharacterList> <character> <character> MIMAT0027618 chr1 [40, 62] - | <NA> miRNA <NA> <NA> MIMAT0027618 MIMAT0027618 hsa-miR-6859-5p MI0022705 MIMAT0027619 chr1 [ 0, 22] - | <NA> miRNA <NA> <NA> MIMAT0027619 MIMAT0027619 hsa-miR-6859-3p MI0022705 MIMAT0005890 chr1 [72, 92] + | <NA> miRNA <NA> <NA> MIMAT0005890 MIMAT0005890 hsa-miR-1302 MI0006363 MIMAT0027618_1 chr1 [40, 62] - | <NA> miRNA <NA> <NA> MIMAT0027618_1 MIMAT0027618 hsa-miR-6859-5p MI0026420 MIMAT0027619_1 chr1 [ 0, 22] - | <NA> miRNA <NA> <NA> MIMAT0027619_1 MIMAT0027619 hsa-miR-6859-3p MI0026420
However, I next want to count reads overlapping their mature miRNA using the summarizeOverlaps function. The miRNA reads have been mapped to miRNA hairpins so that when I import the bam file, all of the seqnames are the miRNA hairpin ID (MI00....).
In order to overlap reads, I need to replace the 'chr' seqnames in the mature_relative Genomic Ranges object with the hairpin ID in the Derives_from column. However, when I try to do this I get the following error:
seqnames(mature_relative) <- mcols(mature_relative)$Derives_from Error in `seqnames<-`(`*tmp*`, value = c("MI0022705", "MI0022705", "MI0006363", : levels of supplied 'seqnames' must be identical to 'seqlevels(x)'
Hi Michael,
When I try the second option, it doesn't add the hairpin ID as the seqnames. Could this be because there are multiple mature per hairpin?
No it is because there is a bug. I fixed this in GenomicFeatures 1.22.6, will be out in a couple days. Temporary work around is: