Entering edit mode
Hello!
I have some RNASeq data that I am analyzing with edgeR and I would
like to use the cqn package to correct for GC bias. I have aligned
the data to the UCSC hg38 genome.
>From googling I have found the the bedtools 'nuc' command will give
me the GC content with ranges and the length. Providing that I have a
bed file of the hg38 genome.
What I need to make sure of is that I am calculating the length and
the GC content for the correct intervals. I bin my reads using
GenomicFeatures like this (thank you Homer)
# Load Libraries
library(GenomicFeatures)
library(GenomicRanges)
txdb <- makeTranscriptDbFromUCSC(genome='hg38',tablename='refGene')
tr_by_gene <- transcriptsBy(txdb,'gene')
library(Rsamtools)
r_AE89_m <- readGAlignmentsFromBam("sorted_trimmed_AE89_minus.bam")
r_AE89_p <- readGAlignmentsFromBam("sorted_trimmed_AE89_plus.1.bam")
r_AE90_m <- readGAlignmentsFromBam("sorted_trimmed_AE90_minus.bam")
r_AE90_p <- readGAlignmentsFromBam("sorted_trimmed_AE90_plus.bam")
# counts reads overlapping the genes.
c_AE89_m <- countOverlaps(tr_by_gene,r_AE89_m)
c_AE89_p <- countOverlaps(tr_by_gene,r_AE89_p)
c_AE90_m <- countOverlaps(tr_by_gene,r_AE90_m)
c_AE90_p <- countOverlaps(tr_by_gene,r_AE90_p)
# Create a count table
countTable <- data.frame(AE89_minus=c_AE89_m, AE89_1_plus=c_AE89_p,
AE90_minus=c_AE90_m, AE90_plus=c_AE90_p, stringsAsFactors=FALSE)
rownames(countTable)=names(tr_by_gene)
Is there a way to get lengths and gc content from 'tr_by-gene'? Is
there a way to create a bed file from 'tr_by_gene' so that I can then
use bedtools nuc?
Thank you.
Matt