When counting reads from a bamfile, I manually extracted the reads aligned to a particular location and found that CAGEr had clustered some reads at the location that did not exist in the bamfile.
The commands I am using are:
> My_CAGEset <- new("CAGEset", genomeName = "BSgenome.Hsapiens.UCSC.hg38",
inputFiles = <input_path>, inputFilesType = "bam",
sampleLabels = c(<sample_names>))
> ctss <- getCTSS(My_CAGEset)
If I write ctss to out to a table/file and count the number of tags assigned to a position in this file vs manually counting the number of sequences that are aligned to that position in the bam/sam the numbers do not correspond to each other.
Is there some aspect of CAGEr I am misunderstanding that explains this behaviour?
Hello,
the
getCTSS
function applies several filters on the data, and when using BAM files as input it will also attempt to correct the "G bias" of CAGE, as explained in the function's documentation. If this does not answer your question, can you give me a reproductible test case illustrating your problem ?Have a nice day,
-- Charles
Hi Charles, Many thanks for getting back to me.
When I output the ctss object above as a table I get, for example, the following:
For a small region in chromosome 5 between 150412750-150412758
If I use genomecov as part of Bedtools, which displays different the data differently (piled up reads, not the start point of reads) I get the following:
The output is, of course, very different, but what is striking is the lack of reads in the + direction in the genomecov version compared to that from the ctss CAGEr output. I should note that the bamfiles used for the genomecov function were sorted and filtered for quality prior to use, but with a reasonably lax filtration step.
Finally, if I manually search the bam/samfile for positive sequences in this region I do not get any hits. This is the case if I expand the range to cover any 'pile up' sequences - although they should not be contributing.
If necessary, I can email you the bamfiles themselves?
Yes, please send me a minimal BAM file to my maintainer address.