Using summarizeOverlaps for bacterial genomes
1
0
Entering edit mode
as9309 • 0
@as9309-9643
Last seen 8.8 years ago

Hi,

I have a set of bam files for my samples and I just want to align them to the e coli reference genome to get non-normalised unique reads for each gene. I am trying to do this using summarizeOverlaps (as I'm using R and I don't have a mac for featurecounts) and I have looked at the example on bioconductor as follows:

(ebg <- exonsBy(txdb, by="gene"))
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=FALSE,
                        ignore.strand=TRUE,
                        fragments=TRUE )

I don't really understand how my code will differ for the E coli genome compared to the example; what if I don't have introns of exons and just want to align to each gene? What do I write/create for "features="?

I assume I keep the "Union" mode as shown in Fig 1 of the vignette? Are there other optional modes?

As my data is single ended, I assume I also put true and do not put anything for ignore.strand or fragments if singleEnd=TRUE?

So far my code is:

(ebg <- exonsBy(txdb, by="gene"))
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=FALSE)

Is this correct or do I need to change ebg? 

I've searched through the forum and still can't understand, sorry.

Thanks for your help.

 
summarizeoverlaps bacteria e coli unique reads • 1.7k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.8 years ago
United States

Hi,

The GenomicFeatures package has many extractors for isolating regions of interest, e.g., transcriptsBy(), genes(), etc. See the man pages ?genes and ?exonsBy. The genes(txdb) extractor returns the full range of the gene. Using this as 'features' will count reads that fall in any portion of the gene range (i.e., they aren't restricted to introns, exons, etc.). To count at the gene-level but enforce that the reads fall in the transcripts use transcriptsBy(txdb, "gene"). Similarly, exonsBy(txdb, "gene") counts at the gene-level but enforces that the reads fall in the exons, and so on.

The different counting modes for summarizeOverlaps() are all described on the man page ?summarizeOverlaps and in the 'Counting Reads with summarizeOverlaps' vignette, see browseVignettes("GenomicAlignments"). You can also provide your own count function as the 'mode' argument, this is shown in the examples.

If your data are single end, yes, you want singleEnd=TRUE. How you set ignore.strand depends on your data (how processed, what type) and what type of analysis you are planning to do. 

Valerie

ADD COMMENT

Login before adding your answer.

Traffic: 912 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6