Hi everybody,
I am fairly new to NGS analysis, so this might be an easy question to answer, but I�ve been desperately trying to get it to work with no success for several days now.
I have three ChIP-seq datasets on which I am trying to do metagenomic analysis. Basically, I would like to average and plot the patterns of methylation marks around given features, in my case transcription start sites (TSS). For now, I have done the following steps:
- Read in my bam files (that correspond to peaks) into variables called: aligns1, 2, 3
- Reformatted these into coverage objects called: cov1, 2, 3
- Gotten the TSS coordinates using biomaRt and saved these in an variable called: martReply
This is were I get stuck. I have tried to follow the tutorial found on the pdf "EMBL course on Short Read analysis with Bioconductor: An exercice with coverage vectors� (23 June 2009), but there seems like some functions are deprecated since it was written. I get the following error:
Error in as.vector(subseq(coverage, winCentres[i] + left, winCentres[i] + : error in evaluating the argument 'x' in selecting a method for function 'as.vector': Error in (function (classes, fdef, mtable) : unable to find an inherited method for function �subseq� for signature �"SimpleRleList"�
I would appreciate any help with this.
Thanks a lot in advance,
Patrick
# Read in alignments from bam files aligns1 <- readGAlignmentsFromBam(samplespath[1]) aligns2 <- readGAlignmentsFromBam(samplespath[2]) aligns3 <- readGAlignmentsFromBam(samplespath[3]) # Convert to coverage cov1 <- coverage(aligns1) cov2 <- coverage(aligns2) cov3 <- coverage(aligns3) # Get TSS from Biomart library("biomaRt") library("doBy") mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl") martReply <- getBM( attributes=c( "chromosome_name", "transcript_start", "transcript_end", "strand" ), mart=mart ) head(martReply) tss <- ifelse(martReply$strand == 1, martReply$transcript_start, martReply$transcript_end) deltaConvolve <- function(coverage, winCentres, revStrand,left, right){ result = rep( 0, right - left + 1 ) for(i in seq(along=winCentres)){ range = IRanges(winCentres[i]+left, winCentres[i]+right) v = as.vector(Views(coverage, range)) if(revStrand[i]) v <- rev(v) result <- result+v } return(result) } win = c(-500, 500) H3K4me3 <- deltaConvolve(coverage=cov1, winCentres=tss, martReply$strand == -1, left=win[1], right=win[2])
Have you looked at the ChIPpeakAnno or ChIPseeker packages? A good starting point is to go to their landing pages and to read their description (generally a short summary of their functionalities). If that sounds promising, then keep going and have a look at their vignettes. You might find that they provide tools for doing exactly what you are trying to do.
If you really want to stick to the "do it myself approach", it's going to be more work and will require that you learn the basics of many important containers like GRanges, GRangesList, TranscriptDb (renamed TxDb in BioC 3.0), RleList, RleViewsList, etc... and methods to operate on these objects. It might be worth it. If you run into some problems or need help with a specific piece of code, please ask and try to be as specific as you can (and show the exact code you're having problem with + the output you get). Note that the use of RangedData objects is now discouraged. We recommend that people use GRanges objects instead if they can. This is one of the many things that have changed in the last 5 years. However, some popular packages dealing with ChIPseq data still use RangedData objects so it's OK in that context.
Hope this helps,
H.