Annotation for CNVs
2
0
Entering edit mode
Haiying.Kong ▴ 110
@haiyingkong-9254
Last seen 5.8 years ago
Germany

Is there any Bioconductor tool for annotating CNVs? I could not find any.

CNV annotation • 2.7k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 1 day ago
United States

How do you want to 'annotate' your CNVs? Maybe rtracklayer::import() and GenomicRanges::findOverlaps() are sufficient.

ADD COMMENT
0
Entering edit mode

Thank you so much  :)

 

ADD REPLY
0
Entering edit mode

so do you know how to do this now?

ADD REPLY
0
Entering edit mode

The goal is to make a GRanges object that contains your copy number regions, a a GRanges object that contains your annotations.

For the first, maybe you have a plain text file with chromosome, start, end, strand information. You could read that in to R as a data.frame, then use

cnv = GenomicRanges::makeGRangesFromDataFrame()

For the second, the annotations may be available in a 'TxDb' package, or in a gtf (maybe from Ensembl and available through AnnotationHub, maybe from some other source). If from a gtf file, I'd suggest creating a TxDb object

txdb = makeTxDbFromGFF()  # see other makeTxDb

For use of AnnotationHub, see it's vignette. For use of TxDb packages, the flow is to install the package via biocLite() then use it

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg38.knownGene

For either txdb, I would then extract the GRanges corresponding to the features I wish to annotate, e.g.,

genes = genes(txdb)

find the overlaps between the 'query' cnv ranges and the 'subject' genes

olaps = GenomicRanges::findOverlaps(cnv, genes)

Create a 'long form' version of the data

long_annotated = cnv[queryHits(olaps)]
long_annotated$gene_id = genes[subjectHits(olaps)]$gene_id

Alternatively annotate each overlap with the gene id

mcols(olaps)$gene_id = genes$gene_id[subjectHits(olaps)]

and summarize the genes in each cnv

cnv_factor = factor(queryHits(olaps), levels=seq_len(queryHits(olaps)))
cvn$gene_id = IRanges::splitAsList(mcols(olaps)$gene_id, cnv_factor)

A function summarizing these steps is

annotateCnvs <-
    function(cnv, txdb)
{
    stopifnot(is(cnv, "GRanges"), is(txdb, "TxDb"))
    anno = genes(txdb)
    olaps = findOverlaps(cnv, anno)
    mcols(olaps)$gene_id = genes$gene_id[subjectHits(olaps)]
    cnv_factor = factor(queryHits(olaps), levels=seq_len(queryLength(olaps)))
    cnv$gene_id = IRanges::splitAsList(mcols(olaps)$gene_id, cnv_factor)
    cnv
}        

If I had a couple of copy number regions, created 'by hand' rather than using makeGRangesFrom... or rtracklayer::import()

cnv = GRanges(c("chr1:10000000-20000000", "chr20:1-1000000"))

I could get

> annotateCnvs(cnv, txdb)
GRanges object with 2 ranges and 1 metadata column:
      seqnames               ranges strand |                           gene_id
         <Rle>            <IRanges>  <Rle> |                   <CharacterList>
  [1]     chr1 [10000000, 20000000]      * | 100133301,100302276,100379251,...
  [2]    chr20 [       1,  1000000]      * |        100507459,10616,113278,...
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

The gene_ids are Entrez gene identifiers, because that is the form of the annotation in the UCSC knownGene track.

Be careful that the coordinate system of the copy number variants are the same as the coordinate system of the annotation (e.g., 1-based, open intervals).

ADD REPLY
0
Entering edit mode

I have data from control-freec,foramt:chr start end num cnvtype(gain or loss),so how i annotate this ?

 

ADD REPLY
0
Entering edit mode
Haiying.Kong ▴ 110
@haiyingkong-9254
Last seen 5.8 years ago
Germany

Probably this is too simple to make a package, I guess.

I probably just use bedtools to compare the regions I found and gene regions that I downloaded from UCSC.

ADD COMMENT

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6