Question

GRange and bam files

0

Entering edit mode

Merlin ▴ 10

@merlin-15723

Last seen 4.9 years ago

Vancouver

Hello,

I'm trying to do a differential expression analysis with Rstudio.

I know that GRanges represents a collection of genomic ranges that each have a single start and end location on the genome but I don't know how to set it up.

Assuming that I'm interested of a certain region of the chromosome to analyze the expression of a particular gene,
Where I can find the region that I'm looking for and how to set up the code?

Here is an example:

> gr <- GRanges(
seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),score = 1:10,GC = seq(1, 0, length=10))
> gr
       <Rle> <IRanges>  <Rle> | <integer>
GRanges object with 10 ranges and 2 metadata columns:
    seqnames    ranges strand |     score
a  chr1
b  chr2
c  chr2
. ...
h  chr3
i  chr3
j  chr3
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths

Thank you

bioconductor Grange • 1.2k views

ADD COMMENT • link 6.6 years ago Merlin ▴ 10

score 1 · Answer 1 · 2018-05-10

Generally speaking, you get that sort of information from either a TxDb package, or maybe a GTF or GFF file. You don't say what species you are working with, so here's an example for human. Say you are working with human data, and you want to use the GRCh38 genome. And further assume you want to use UCSC's knownGene table to define what is and isn't a gene.

library(BiocInstaller)
biocLite("TxDb.Hsapiens.UCSC.hg38.knownGene")
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
ex <- exonsBy(TxDb.Hsapiens.UCSC.hg38.knownGene, by = "gene")

And you will now have a GRangesList that defines the exonic locations for all the genes in that table. The names for the list will be the Entrez Gene ID, so if you just care about one or a set of those genes, you could use the org.Hs.eg.db package to get those IDs and subset.

library(org.Hs.eg.db)
genesILike <- c("BRCA1","BRCA2")
ids <- mapIds(org.Hs.eg.db, genesILike, "ENTREZID","SYMBOL")
ex.subset <- ex[[ids]]

You can also create a GRangesList based on the transcript, or the gene, etc. See the vignettes for GenomicRanges and AnnotationDbi for more information.

score 0 · Answer 2 · 2018-05-10

0