How to generate custom annotation for rat genome rn6 in ChIPQC
1
0
Entering edit mode
@frederic-cremazy-24161
Last seen 3.7 years ago
Paris

Hi,

I would like to use ChIPQC on rat ChIP-seq data that were generated using rn6 genome version. Despite another post explaining that it's possible to generate custom annotation (https://support.bioconductor.org/p/70893/), I haven't succeeded to do it myself for rn6.

Has anyone be able to do it?

Thanks, Fred

ChIPQC annotation rn6 • 2.3k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 22 hours ago
United States

Is there something wrong with the existing TxDb for rn6?

ADD COMMENT
0
Entering edit mode

I don't think so. I guess the problem is that I don't know how to generate a ChIPQC custom annotation from TxDb.Rnorvegicus.UCSC.rn6.refGene.

Could you tell me how to do that please?

ADD REPLY
0
Entering edit mode

Well it depends on what you want to use as your binding regions. If you want 'the usual' I guess there are two ways to go about it. You could 'cheat the system' (untested!) by doing

library(TxDb.Rnorvegicus.UCSC.rn6.refGene)
TxDb.Rnorvegicus.UCSC.rn4.ensGene <- TxDb.Rnorvegicus.UCSC.rn6.refGene
whatevs <- ChIPQC(samples, annotation = "rn4")

Which should then use the rn6 TxDb package instead of the rn4 version. Or you could just make your own annotation, using the existing getAnnotation function.

myNewGetAnnotation <- function(){
    txdb <- TxDb.Rnorvegicus.UCSC.rn6.refGene
    All5utrs <- reduce(unique(unlist(fiveUTRsByTranscript(txdb))))
    All3utrs <- reduce(unique(unlist(threeUTRsByTranscript(txdb))))
    Allcds <- reduce(unique(unlist(cdsBy(txdb,"tx"))))
    Allintrons <- reduce(unique(unlist(intronsByTranscript(txdb))))
    Alltranscripts <- reduce(unique(transcripts(txdb)))

    posAllTranscripts <- Alltranscripts[strand(Alltranscripts) == "+"]
    posAllTranscripts <- posAllTranscripts[!(start(posAllTranscripts)-20000 < 0)]
    negAllTranscripts <- Alltranscripts[strand(Alltranscripts) == "-"]
    chrLimits <- seqlengths(negAllTranscripts)[as.character(seqnames(negAllTranscripts))]
    negAllTranscripts <- negAllTranscripts[!(end(negAllTranscripts)+20000 > chrLimits)]
    Alltranscripts <- c(posAllTranscripts,negAllTranscripts)
    Promoters500 <-  reduce(flank(Alltranscripts,500))
    Promoters2000to500 <-  reduce(flank(Promoters500,1500))
    LongPromoter20000to2000  <- reduce(flank(Promoters2000to500,18000))
    if(!missing(AllChr) & !is.null(AllChr)){
      All5utrs <- GetGRanges(All5utrs,AllChr=AllChr)
      All3utrs <- GetGRanges(All3utrs,AllChr=AllChr)
      Allcds <- GetGRanges(Allcds,AllChr=AllChr)
      Allintrons <- GetGRanges(Allintrons,AllChr=AllChr)
      Alltranscripts <- GetGRanges(Alltranscripts,AllChr=AllChr)
      Promoters500 <- GetGRanges(Promoters500,AllChr=AllChr)
      Promoters2000to500 <-  GetGRanges(Promoters2000to500,AllChr=AllChr)
      LongPromoter20000to2000  <- GetGRanges(LongPromoter20000to2000,AllChr=AllChr)
    }
  }
  return(list(version=GeneAnnotation,LongPromoter20000to2000=LongPromoter20000to2000,
              Promoters2000to500=Promoters2000to500,Promoters500=Promoters500,
              All5utrs=All5utrs,Alltranscripts=Alltranscripts,Allcds=Allcds,
              Allintrons=Allintrons,All3utrs=All3utrs))
}

## after loading the rn6 TxDb, do
annot <- myNewGetAnnotation()
whatevs <- ChIPQC(samples, annotation = annot)

There may be a simpler, obvious way that I am missing, and perhaps Rory Stark or Tom Carroll will be along in a bit to clear that up. But I don't see a simple way to use a TxDb other than the choices they have hard-coded, so unfortunately it might take extra measures like this.

ADD REPLY
0
Entering edit mode

Hi James,

Sorry I was away for a while.

But unfortunately the code is not working in my hands. Can you still help to find what is wrong?

Could return(list(version=GeneAnnotation be wrong here (you spoke about the existing getAnnotation yourself).

I also got Error in missing(AllChr) : 'missing' peut seulement être utilisé pour des arguments eventually.

Well, as you can see, I'm pretty lost here! Thanks!

ADD REPLY
0
Entering edit mode

Ah. My bad. Try

myNewGetAnnotation <- function(GeneAnnotation = "rn6", AllChr){
    txdb <- TxDb.Rnorvegicus.UCSC.rn6.refGene
    All5utrs <- reduce(unique(unlist(fiveUTRsByTranscript(txdb))))
    All3utrs <- reduce(unique(unlist(threeUTRsByTranscript(txdb))))
    Allcds <- reduce(unique(unlist(cdsBy(txdb,"tx"))))
    Allintrons <- reduce(unique(unlist(intronsByTranscript(txdb))))
    Alltranscripts <- reduce(unique(transcripts(txdb)))

    posAllTranscripts <- Alltranscripts[strand(Alltranscripts) == "+"]
    posAllTranscripts <- posAllTranscripts[!(start(posAllTranscripts)-20000 < 0)]
    negAllTranscripts <- Alltranscripts[strand(Alltranscripts) == "-"]
    chrLimits <- seqlengths(negAllTranscripts)[as.character(seqnames(negAllTranscripts))]
    negAllTranscripts <- negAllTranscripts[!(end(negAllTranscripts)+20000 > chrLimits)]
    Alltranscripts <- c(posAllTranscripts,negAllTranscripts)
    Promoters500 <-  reduce(flank(Alltranscripts,500))
    Promoters2000to500 <-  reduce(flank(Promoters500,1500))
    LongPromoter20000to2000  <- reduce(flank(Promoters2000to500,18000))
    if(!missing(AllChr) & !is.null(AllChr)){
      All5utrs <- ChIPQC:::GetGRanges(All5utrs,AllChr=AllChr)
      All3utrs <- ChIPQC:::GetGRanges(All3utrs,AllChr=AllChr)
      Allcds <- ChIPQC:::GetGRanges(Allcds,AllChr=AllChr)
      Allintrons <- ChIPQC:::GetGRanges(Allintrons,AllChr=AllChr)
      Alltranscripts <- ChIPQC:::GetGRanges(Alltranscripts,AllChr=AllChr)
      Promoters500 <- ChIPQC:::GetGRanges(Promoters500,AllChr=AllChr)
      Promoters2000to500 <-  ChIPQC:::GetGRanges(Promoters2000to500,AllChr=AllChr)
      LongPromoter20000to2000  <- ChIPQC:::GetGRanges(LongPromoter20000to2000,AllChr=AllChr)
    }
  return(list(version=GeneAnnotation,LongPromoter20000to2000=LongPromoter20000to2000,
              Promoters2000to500=Promoters2000to500,Promoters500=Promoters500,
              All5utrs=All5utrs,Alltranscripts=Alltranscripts,Allcds=Allcds,
              Allintrons=Allintrons,All3utrs=All3utrs))
}

> zz <- myNewGetAnnotation(AllChr = "chr1")
> zz
$version
[1] "rn6"

$LongPromoter20000to2000
GRanges object with 1950 ranges and 0 metadata columns:
         seqnames              ranges strand
            <Rle>           <IRanges>  <Rle>
     [1]     chr1       376700-394699      +
     [2]     chr1       679413-697412      +
     [3]     chr1     1682696-1700695      +
     [4]     chr1     1751721-1782077      +
     [5]     chr1     1806170-1824169      +
     ...      ...                 ...    ...
  [1946]     chr1 281378835-281396834      -
  [1947]     chr1 281758160-281776159      -
  [1948]     chr1 281876512-281894511      -
  [1949]     chr1 282172018-282190017      -
  [1950]     chr1 282237892-282271193      -
  -------
  seqinfo: 1 sequence from rn6 genome

$Promoters2000to500
GRanges object with 2344 ranges and 0 metadata columns:
         seqnames              ranges strand
            <Rle>           <IRanges>  <Rle>
     [1]     chr1       394700-396199      +
     [2]     chr1       697413-698912      +
     [3]     chr1     1700696-1702195      +
     [4]     chr1     1769721-1771220      +
     [5]     chr1     1782078-1783577      +
     ...      ...                 ...    ...
  [2340]     chr1 281756660-281758159      -
  [2341]     chr1 281875012-281876511      -
  [2342]     chr1 282170518-282172017      -
  [2343]     chr1 282236392-282237891      -
  [2344]     chr1 282251694-282253193      -
  -------
  seqinfo: 1 sequence from rn6 genome
<snip>
ADD REPLY
0
Entering edit mode

It works great. Thanks a lot for your help James!

ADD REPLY
0
Entering edit mode

Thank you for discussing this. I hope I can get help with my question. I found the above function useful for constructing a custom annotation file to proceed with ChIPQC package and downstream analysis. However, I couldn't understand why to provide semicolumn for rn6 in ChIPQC and custom function.

I am working on Rice genome (https://rapdb.dna.affrc.go.jp/). A custom txdb file is constructed using "makeTxDbFromGFF" function. We proceeded with the resultant txdb file to make an annotation file for ChIPQC package by following the script written above.

Here is the code for preparing the txdb.

Rice_txdb <- GenomicFeatures::makeTxDbFromGFF("../Oryzasativa/RAP/Sept2023/IRGSP-1.0_transcript.gff3", 
                                         format="gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK



Rice_txdb

TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: /Oryzasativa/RAP/Sept2023/IRGSP-1.0_transcript.gff3
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 44980
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2023-12-26 22:53:20 +0500 (Tue, 26 Dec 2023)
# GenomicFeatures version at creation time: 1.52.2
# RSQLite version at creation time: 2.3.4
# DBSCHEMAVERSION: 1.2

Code given above

myNewGetAnnotation <- function(GeneAnnotation = "Rice_txdb", AllChr){
  txdb <- Rice_txdb
  All5utrs <- reduce(unique(unlist(fiveUTRsByTranscript(txdb))))
  All3utrs <- reduce(unique(unlist(threeUTRsByTranscript(txdb))))
  Allcds <- reduce(unique(unlist(cdsBy(txdb,"tx"))))
  Allintrons <- reduce(unique(unlist(intronsByTranscript(txdb))))
  Alltranscripts <- reduce(unique(transcripts(txdb)))

  posAllTranscripts <- Alltranscripts[strand(Alltranscripts) == "+"]
  posAllTranscripts <- posAllTranscripts[!(start(posAllTranscripts)-20000 < 0)]
  negAllTranscripts <- Alltranscripts[strand(Alltranscripts) == "-"]
  chrLimits <- seqlengths(negAllTranscripts)[as.character(seqnames(negAllTranscripts))]
  negAllTranscripts <- negAllTranscripts[!(end(negAllTranscripts)+20000 > chrLimits)]
  Alltranscripts <- c(posAllTranscripts,negAllTranscripts)
  Promoters500 <-  reduce(flank(Alltranscripts,500))
  Promoters2000to500 <-  reduce(flank(Promoters500,1500))
  LongPromoter20000to2000  <- reduce(flank(Promoters2000to500,18000))
  if(!missing(AllChr) & !is.null(AllChr)){
    All5utrs <- ChIPQC:::GetGRanges(All5utrs,AllChr=AllChr)
    All3utrs <- ChIPQC:::GetGRanges(All3utrs,AllChr=AllChr)
    Allcds <- ChIPQC:::GetGRanges(Allcds,AllChr=AllChr)
    Allintrons <- ChIPQC:::GetGRanges(Allintrons,AllChr=AllChr)
    Alltranscripts <- ChIPQC:::GetGRanges(Alltranscripts,AllChr=AllChr)
    Promoters500 <- ChIPQC:::GetGRanges(Promoters500,AllChr=AllChr)
    Promoters2000to500 <-  ChIPQC:::GetGRanges(Promoters2000to500,AllChr=AllChr)
    LongPromoter20000to2000  <- ChIPQC:::GetGRanges(LongPromoter20000to2000,AllChr=AllChr)
  }
  return(list(version=GeneAnnotation,LongPromoter20000to2000=LongPromoter20000to2000,
              Promoters2000to500=Promoters2000to500,Promoters500=Promoters500,
              All5utrs=All5utrs,Alltranscripts=Alltranscripts,Allcds=Allcds,
              Allintrons=Allintrons,All3utrs=All3utrs))
}

zz <- myNewGetAnnotation(AllChr = c("chr01", "chr02", "chr03", "chr04", "chr05", "chr06",
                                    "chr07", "chr08", "chr09", "chr10", "chr11", "chr12"))
Error: logical subscript contains NAs.

It ended up with the error displayed above.

Let me know if I am missing any basics in understanding the code.

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6