Entering edit mode
jhanks1981
•
0
@jhanks1981-14273
Last seen 7.2 years ago
I have already read a few pages such as this one subset GRanges
library(GenomicRanges) library(GenomicFeatures) #make the list of chromosomes to filter by #make the database and transcripts txdb <- makeTxDbFromGFF("Me/inputs/gencode.v19.chr_patch_hapl_scaff.annotation.gtf", format="gtf") tscripts <- transcriptsBy(txdb, by="gene")
> #test the subsetting.. seems to work > tscripts@unlistData@seqnames@values[tscripts@unlistData@seqnames@values == "chr1"] [1] chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 [34] chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 [67] chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 [100] chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1
But it doesnt work as expected..
> tscripts[tscripts@unlistData@seqnames@values == "chr1"] GRangesList object of length 5272: $ENSG00000000419.8 GRanges object with 7 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <Rle> <IRanges> <Rle> | <integer> <character> [1] chr20 [49551404, 49575087] - | 182563 ENST00000371588.5 [2] chr20 [49551404, 49575087] - | 182564 ENST00000466152.1 [3] chr20 [49551404, 49575092] - | 182565 ENST00000371582.4 [4] chr20 [49551433, 49562398] - | 182566 ENST00000494752.1 [5] chr20 [49551482, 49575058] - | 182567 ENST00000371584.4 [6] chr20 [49551490, 49575069] - | 182568 ENST00000371583.5 [7] chr20 [49552685, 49575069] - | 182569 ENST00000413082.1
I also tried this, but got empty elements
> tscripts[seqnames(tscripts) == "chr1"] GRangesList object of length 63568: $ENSG00000000003.10 GRanges object with 0 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <Rle> <IRanges> <Rle> | <integer> <character> $ENSG00000000005.5 GRanges object with 0 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name $ENSG00000000419.8 GRanges object with 0 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name
What am I doing wrong here?
I am trying to compare two GTF files.
First, I want to subset it so that I'm only including Chr1-22+X+Y from each GTF file.
Second, I want to see what features are not common to both objects
Third, I want to take each feature and compare the start and end locations between each file.
Finally, I want to make a list of every feature that does not exactly match and find out if the length is the same, or if the start/end is just shifted.
But as you can see, I'm stuck on the subsetting part, which isn't working as I expected it to.
It sounds like you might just want to get the transcripts as a GRanges directly, i.e., call
transcripts()
instead oftranscriptsBy()
.