Hello Bioconductors,
I am working on an Arabidopsis RNA seq set, and after executing my
count
reads script, I get a count table with no reads counted (ie
sum(colSums(mat)) = 0).
I mapped my reads using Tophat2 (without supplying a gtf/gff file) and
used
samtools to sort and index my accepted_hits.bam files. I loaded these
bam
files into IGV (integrated Genome Browser) and the reads appear to
have
been mapped (additionally the align_summary.txt says I have good
mapping).
Script and session info below.
Essentially I use Rsamtools to load in my bamfiles, and then
makeTranscriptsFromGFF to make a txdb object, transcriptsBy to get
transcripts, and then summarizeOverlaps to count the reads. My gff
file is
TAIR10. I am relatively sure that the genome I mapped to was the
TAIR10
release, but I am not 100% on that.
I have also used biomaRt to do this (commented out), but I had the
same
results.
I am currently upgrading to R 3.0.2 and re-installing bioconductor
(maybe
it will change something?)
Thanks for any thoughts!
Script:
gffdir <- "/data/smcintur/Annotation/TAIR10_GFF3_genes.gff";
library(ShortRead)
library(GenomicFeatures)
library(Rsamtools)
#library(biomaRt)
setwd("/data/smcintur/RNASeq/NMseq/newTophat/BamFiles/")
bf <- list.files(pattern = ".bam$")
bi <- list.files(pattern = "bam.bai$")
bfl <- BamFileList(bf, bi)
#mart <- makeTranscriptDbFromBiomart(biomart = "Public_TAIRV10",
dataset=
"tairv10")
tx <- transcriptsBy(mart)
gff <- makeTranscriptDbFromGFF(gffdir, format = "gff")
gff
cds <- cdsBy(gff)
tx <- transcriptsBy(gff)
olapTx <- summarizeOverlaps(tx, bfl)
olapCds <- summarizeOverlaps(cds, bfl)
txMat <- assays(olapTx)$counts
cdsMat <- assays(olapCds)$counts
write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
write.table(cdsMat, file = "countMatCdsGff.txt", sep = "\t")
sessionInfo
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.12.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0
[4] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
[7] BSgenome_1.30.0 DBI_0.2-7
GenomicFeatures_1.14.2
[10] GenomicRanges_1.14.3 IRanges_1.20.6 parallel_3.0.0
[13] RCurl_1.95-4.1 Rsamtools_1.14.2 RSQLite_0.11.4
[16] rtracklayer_1.22.0 stats4_3.0.0 tools_3.0.0
[19] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0
--
Sam McInturf
[[alternative HTML version deleted]]
Hi Sam,
Are you getting any error messages related to seqlevels when you
count?
If you have confirmed that annotation seqlevels match the bam files
next
look at the maximum possible overlap in a single bam file.
bf <- singleBamFile
reads <- readGAlignments(bf) ## assuming single-end reads
so <- summarizeOverlaps(tx, reads, mode=Union, inter.feature=FALSE)
'inter.feature=FALSE' with 'mode=Union' is countOverlaps with
'type=ANY'. Reads that hit multiple features will be double counted in
this case. The idea to to see if there are any common regions at all.
If there are still no counts, you could look more closely at a smaller
chromosome region in 'reads' where you would expect counts. This
visual
inspection might give you some insight as to why there is no overlap.
Valerie
On 12/05/13 18:51, Sam McInturf wrote:
> Hello Bioconductors,
> I am working on an Arabidopsis RNA seq set, and after executing
my count
> reads script, I get a count table with no reads counted (ie
> sum(colSums(mat)) = 0).
>
> I mapped my reads using Tophat2 (without supplying a gtf/gff file)
and used
> samtools to sort and index my accepted_hits.bam files. I loaded
these bam
> files into IGV (integrated Genome Browser) and the reads appear to
have
> been mapped (additionally the align_summary.txt says I have good
mapping).
> Script and session info below.
>
> Essentially I use Rsamtools to load in my bamfiles, and then
> makeTranscriptsFromGFF to make a txdb object, transcriptsBy to get
> transcripts, and then summarizeOverlaps to count the reads. My gff
file is
> TAIR10. I am relatively sure that the genome I mapped to was the
TAIR10
> release, but I am not 100% on that.
>
> I have also used biomaRt to do this (commented out), but I had the
same
> results.
>
> I am currently upgrading to R 3.0.2 and re-installing bioconductor
(maybe
> it will change something?)
>
> Thanks for any thoughts!
>
> Script:
>
> gffdir <- "/data/smcintur/Annotation/TAIR10_GFF3_genes.gff";
> library(ShortRead)
> library(GenomicFeatures)
> library(Rsamtools)
> #library(biomaRt)
> setwd("/data/smcintur/RNASeq/NMseq/newTophat/BamFiles/")
> bf <- list.files(pattern = ".bam$")
> bi <- list.files(pattern = "bam.bai$")
> bfl <- BamFileList(bf, bi)
>
> #mart <- makeTranscriptDbFromBiomart(biomart = "Public_TAIRV10",
dataset=
> "tairv10")
> tx <- transcriptsBy(mart)
> gff <- makeTranscriptDbFromGFF(gffdir, format = "gff")
> gff
>
> cds <- cdsBy(gff)
> tx <- transcriptsBy(gff)
>
> olapTx <- summarizeOverlaps(tx, bfl)
> olapCds <- summarizeOverlaps(cds, bfl)
>
> txMat <- assays(olapTx)$counts
> cdsMat <- assays(olapCds)$counts
>
> write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
> write.table(cdsMat, file = "countMatCdsGff.txt", sep = "\t")
>
> sessionInfo
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.24.0 Biobase_2.22.0
BiocGenerics_0.8.0
> [4] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
> [7] BSgenome_1.30.0 DBI_0.2-7
GenomicFeatures_1.14.2
> [10] GenomicRanges_1.14.3 IRanges_1.20.6 parallel_3.0.0
> [13] RCurl_1.95-4.1 Rsamtools_1.14.2 RSQLite_0.11.4
> [16] rtracklayer_1.22.0 stats4_3.0.0 tools_3.0.0
> [19] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0
>
Hi Sam,
It depends on whether you are using single read or paired end reads
files.
As far as I known summarizeOverlaps function only working in case of
single
read. Anyone please correct me if I am wrong.
Kind Regards
On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain
<vobencha@fhcrc.org>wrote:
> Hi Sam,
>
> Are you getting any error messages related to seqlevels when you
count? If
> you have confirmed that annotation seqlevels match the bam files
next look
> at the maximum possible overlap in a single bam file.
>
> bf <- singleBamFile
> reads <- readGAlignments(bf) ## assuming single-end reads
> so <- summarizeOverlaps(tx, reads, mode=Union,
inter.feature=FALSE)
>
> 'inter.feature=FALSE' with 'mode=Union' is countOverlaps with
'type=ANY'.
> Reads that hit multiple features will be double counted in this
case. The
> idea to to see if there are any common regions at all.
>
> If there are still no counts, you could look more closely at a
smaller
> chromosome region in 'reads' where you would expect counts. This
visual
> inspection might give you some insight as to why there is no
overlap.
>
>
> Valerie
>
>
> On 12/05/13 18:51, Sam McInturf wrote:
>
>> Hello Bioconductors,
>> I am working on an Arabidopsis RNA seq set, and after executing
my
>> count
>> reads script, I get a count table with no reads counted (ie
>> sum(colSums(mat)) = 0).
>>
>> I mapped my reads using Tophat2 (without supplying a gtf/gff file)
and
>> used
>> samtools to sort and index my accepted_hits.bam files. I loaded
these bam
>> files into IGV (integrated Genome Browser) and the reads appear to
have
>> been mapped (additionally the align_summary.txt says I have good
mapping).
>> Script and session info below.
>>
>> Essentially I use Rsamtools to load in my bamfiles, and then
>> makeTranscriptsFromGFF to make a txdb object, transcriptsBy to get
>> transcripts, and then summarizeOverlaps to count the reads. My gff
file
>> is
>> TAIR10. I am relatively sure that the genome I mapped to was the
TAIR10
>> release, but I am not 100% on that.
>>
>> I have also used biomaRt to do this (commented out), but I had the
same
>> results.
>>
>> I am currently upgrading to R 3.0.2 and re-installing bioconductor
(maybe
>> it will change something?)
>>
>> Thanks for any thoughts!
>>
>> Script:
>>
>> gffdir <- "/data/smcintur/Annotation/TAIR10_GFF3_genes.gff";
>> library(ShortRead)
>> library(GenomicFeatures)
>> library(Rsamtools)
>> #library(biomaRt)
>> setwd("/data/smcintur/RNASeq/NMseq/newTophat/BamFiles/")
>> bf <- list.files(pattern = ".bam$")
>> bi <- list.files(pattern = "bam.bai$")
>> bfl <- BamFileList(bf, bi)
>>
>> #mart <- makeTranscriptDbFromBiomart(biomart = "Public_TAIRV10",
dataset=
>> "tairv10")
>> tx <- transcriptsBy(mart)
>> gff <- makeTranscriptDbFromGFF(gffdir, format = "gff")
>> gff
>>
>> cds <- cdsBy(gff)
>> tx <- transcriptsBy(gff)
>>
>> olapTx <- summarizeOverlaps(tx, bfl)
>> olapCds <- summarizeOverlaps(cds, bfl)
>>
>> txMat <- assays(olapTx)$counts
>> cdsMat <- assays(olapCds)$counts
>>
>> write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
>> write.table(cdsMat, file = "countMatCdsGff.txt", sep = "\t")
>>
>> sessionInfo
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods
base
>>
>> other attached packages:
>> [1] BiocInstaller_1.12.0
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.24.0 Biobase_2.22.0
BiocGenerics_0.8.0
>> [4] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
>> [7] BSgenome_1.30.0 DBI_0.2-7
GenomicFeatures_1.14.2
>> [10] GenomicRanges_1.14.3 IRanges_1.20.6 parallel_3.0.0
>> [13] RCurl_1.95-4.1 Rsamtools_1.14.2 RSQLite_0.11.4
>> [16] rtracklayer_1.22.0 stats4_3.0.0 tools_3.0.0
>> [19] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0
>>
>>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.
> science.biology.informatics.conductor
>
--
Reema Singh
PhD Scholar
Computational Biology and Bioinformatics
School of Computational and Integrative Sciences
Jawaharlal Nehru University
New Delhi-110067
INDIA
[[alternative HTML version deleted]]
summarizeOverlaps() does count paired-end reads. Use option
singleEnd=FALSE. Details under 'singleEnd' in the 'Arguments' section
on
the man page.
library(GenomicAlignments) ## devel
library(GenomicRanges) ## release
?summarizeOverlaps
Valerie
On 12/08/13 09:42, Reema Singh wrote:
> Hi Sam,
>
> It depends on whether you are using single read or paired end reads
> files. As far as I known summarizeOverlaps function only working in
case
> of single read. Anyone please correct me if I am wrong.
>
> Kind Regards
>
>
> On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">> wrote:
>
> Hi Sam,
>
> Are you getting any error messages related to seqlevels when you
> count? If you have confirmed that annotation seqlevels match the
bam
> files next look at the maximum possible overlap in a single bam
file.
>
> bf <- singleBamFile
> reads <- readGAlignments(bf) ## assuming single-end reads
> so <- summarizeOverlaps(tx, reads, mode=Union,
inter.feature=FALSE)
>
> 'inter.feature=FALSE' with 'mode=Union' is countOverlaps with
> 'type=ANY'. Reads that hit multiple features will be double
counted
> in this case. The idea to to see if there are any common regions
at all.
>
> If there are still no counts, you could look more closely at a
> smaller chromosome region in 'reads' where you would expect
counts.
> This visual inspection might give you some insight as to why
there
> is no overlap.
>
>
> Valerie
>
>
> On 12/05/13 18:51, Sam McInturf wrote:
>
> Hello Bioconductors,
> I am working on an Arabidopsis RNA seq set, and after
> executing my count
> reads script, I get a count table with no reads counted (ie
> sum(colSums(mat)) = 0).
>
> I mapped my reads using Tophat2 (without supplying a gtf/gff
> file) and used
> samtools to sort and index my accepted_hits.bam files. I
loaded
> these bam
> files into IGV (integrated Genome Browser) and the reads
appear
> to have
> been mapped (additionally the align_summary.txt says I have
good
> mapping).
> Script and session info below.
>
> Essentially I use Rsamtools to load in my bamfiles, and then
> makeTranscriptsFromGFF to make a txdb object, transcriptsBy
to get
> transcripts, and then summarizeOverlaps to count the reads.
My
> gff file is
> TAIR10. I am relatively sure that the genome I mapped to
was
> the TAIR10
> release, but I am not 100% on that.
>
> I have also used biomaRt to do this (commented out), but I
had
> the same
> results.
>
> I am currently upgrading to R 3.0.2 and re-installing
> bioconductor (maybe
> it will change something?)
>
> Thanks for any thoughts!
>
> Script:
>
> gffdir <-
"/data/smcintur/Annotation/__TAIR10_GFF3_genes.gff";
> library(ShortRead)
> library(GenomicFeatures)
> library(Rsamtools)
> #library(biomaRt)
> setwd("/data/smcintur/RNASeq/__NMseq/newTophat/BamFiles/")
> bf <- list.files(pattern = ".bam$")
> bi <- list.files(pattern = "bam.bai$")
> bfl <- BamFileList(bf, bi)
>
> #mart <- makeTranscriptDbFromBiomart(__biomart =
> "Public_TAIRV10", dataset=
> "tairv10")
> tx <- transcriptsBy(mart)
> gff <- makeTranscriptDbFromGFF(__gffdir, format = "gff")
> gff
>
> cds <- cdsBy(gff)
> tx <- transcriptsBy(gff)
>
> olapTx <- summarizeOverlaps(tx, bfl)
> olapCds <- summarizeOverlaps(cds, bfl)
>
> txMat <- assays(olapTx)$counts
> cdsMat <- assays(olapCds)$counts
>
> write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
> write.table(cdsMat, file = "countMatCdsGff.txt", sep = "\t")
>
> sessionInfo
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets
methods base
>
> other attached packages:
> [1] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.24.0 Biobase_2.22.0
> BiocGenerics_0.8.0
> [4] biomaRt_2.18.0 Biostrings_2.30.1
bitops_1.0-6
> [7] BSgenome_1.30.0 DBI_0.2-7
> GenomicFeatures_1.14.2
> [10] GenomicRanges_1.14.3 IRanges_1.20.6
parallel_3.0.0
> [13] RCurl_1.95-4.1 Rsamtools_1.14.2
RSQLite_0.11.4
> [16] rtracklayer_1.22.0 stats4_3.0.0
tools_3.0.0
> [19] XML_3.98-1.1 XVector_0.2.0
zlibbioc_1.8.0
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
> Search the archives:
>
http://news.gmane.org/gmane.__science.biology.informatics.__conductor
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">
>
>
>
>
> --
> Reema Singh
> PhD Scholar
> Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> Jawaharlal Nehru University
> New Delhi-110067
> INDIA
Hi all,
Indeed summarizeOverlaps() works on paired end data. But different
counting
modes [Union, Intersection and IntersectionNotEmpty] they do not
handle
paired end reads. Here the reference [
http://www.bioconductor.org/packages/release/bioc/vignettes/GenomicRan
ges/inst/doc/summarizeOverlaps.pdf]-Page
2 [under Counting Modes].
Kind Regards
On Sun, Dec 8, 2013 at 6:31 PM, Valerie Obenchain
<vobencha@fhcrc.org>wrote:
> summarizeOverlaps() does count paired-end reads. Use option
> singleEnd=FALSE. Details under 'singleEnd' in the 'Arguments'
section on
> the man page.
>
> library(GenomicAlignments) ## devel
> library(GenomicRanges) ## release
> ?summarizeOverlaps
>
> Valerie
>
>
> On 12/08/13 09:42, Reema Singh wrote:
>
>> Hi Sam,
>>
>> It depends on whether you are using single read or paired end reads
>> files. As far as I known summarizeOverlaps function only working in
case
>> of single read. Anyone please correct me if I am wrong.
>>
>> Kind Regards
>>
>>
>> On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain
<vobencha@fhcrc.org>> <mailto:vobencha@fhcrc.org>> wrote:
>>
>> Hi Sam,
>>
>> Are you getting any error messages related to seqlevels when
you
>> count? If you have confirmed that annotation seqlevels match
the bam
>> files next look at the maximum possible overlap in a single bam
file.
>>
>> bf <- singleBamFile
>> reads <- readGAlignments(bf) ## assuming single-end reads
>> so <- summarizeOverlaps(tx, reads, mode=Union,
inter.feature=FALSE)
>>
>> 'inter.feature=FALSE' with 'mode=Union' is countOverlaps with
>> 'type=ANY'. Reads that hit multiple features will be double
counted
>> in this case. The idea to to see if there are any common
regions at
>> all.
>>
>> If there are still no counts, you could look more closely at a
>> smaller chromosome region in 'reads' where you would expect
counts.
>> This visual inspection might give you some insight as to why
there
>> is no overlap.
>>
>>
>> Valerie
>>
>>
>> On 12/05/13 18:51, Sam McInturf wrote:
>>
>> Hello Bioconductors,
>> I am working on an Arabidopsis RNA seq set, and after
>> executing my count
>> reads script, I get a count table with no reads counted (ie
>> sum(colSums(mat)) = 0).
>>
>> I mapped my reads using Tophat2 (without supplying a
gtf/gff
>> file) and used
>> samtools to sort and index my accepted_hits.bam files. I
loaded
>> these bam
>> files into IGV (integrated Genome Browser) and the reads
appear
>> to have
>> been mapped (additionally the align_summary.txt says I have
good
>> mapping).
>> Script and session info below.
>>
>> Essentially I use Rsamtools to load in my bamfiles, and
then
>> makeTranscriptsFromGFF to make a txdb object, transcriptsBy
to get
>> transcripts, and then summarizeOverlaps to count the reads.
My
>> gff file is
>> TAIR10. I am relatively sure that the genome I mapped to
was
>> the TAIR10
>> release, but I am not 100% on that.
>>
>> I have also used biomaRt to do this (commented out), but I
had
>> the same
>> results.
>>
>> I am currently upgrading to R 3.0.2 and re-installing
>> bioconductor (maybe
>> it will change something?)
>>
>> Thanks for any thoughts!
>>
>> Script:
>>
>> gffdir <-
"/data/smcintur/Annotation/__TAIR10_GFF3_genes.gff";
>> library(ShortRead)
>> library(GenomicFeatures)
>> library(Rsamtools)
>> #library(biomaRt)
>> setwd("/data/smcintur/RNASeq/__NMseq/newTophat/BamFiles/")
>>
>> bf <- list.files(pattern = ".bam$")
>> bi <- list.files(pattern = "bam.bai$")
>> bfl <- BamFileList(bf, bi)
>>
>> #mart <- makeTranscriptDbFromBiomart(__biomart =
>>
>> "Public_TAIRV10", dataset=
>> "tairv10")
>> tx <- transcriptsBy(mart)
>> gff <- makeTranscriptDbFromGFF(__gffdir, format = "gff")
>>
>> gff
>>
>> cds <- cdsBy(gff)
>> tx <- transcriptsBy(gff)
>>
>> olapTx <- summarizeOverlaps(tx, bfl)
>> olapCds <- summarizeOverlaps(cds, bfl)
>>
>> txMat <- assays(olapTx)$counts
>> cdsMat <- assays(olapCds)$counts
>>
>> write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
>> write.table(cdsMat, file = "countMatCdsGff.txt", sep =
"\t")
>>
>> sessionInfo
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets
methods
>> base
>>
>> other attached packages:
>> [1] BiocInstaller_1.12.0
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.24.0 Biobase_2.22.0
>> BiocGenerics_0.8.0
>> [4] biomaRt_2.18.0 Biostrings_2.30.1
bitops_1.0-6
>> [7] BSgenome_1.30.0 DBI_0.2-7
>> GenomicFeatures_1.14.2
>> [10] GenomicRanges_1.14.3 IRanges_1.20.6
parallel_3.0.0
>> [13] RCurl_1.95-4.1 Rsamtools_1.14.2
RSQLite_0.11.4
>> [16] rtracklayer_1.22.0 stats4_3.0.0
tools_3.0.0
>> [19] XML_3.98-1.1 XVector_0.2.0
zlibbioc_1.8.0
>>
>>
>> _________________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org <mailto:bioconductor@r-project.org>
>> https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>
>> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives:
>>
http://news.gmane.org/gmane.__science.biology.informatics.__conductor<
>> http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>>
>>
>>
>>
>> --
>> Reema Singh
>> PhD Scholar
>> Computational Biology and Bioinformatics
>> School of Computational and Integrative Sciences
>> Jawaharlal Nehru University
>> New Delhi-110067
>> INDIA
>>
>
>
--
Reema Singh
Postdoctoral Research Assistant
College of Life Sciences
University of Dundee,
Dundee DD1 4HN, Scotland
United Kingdom
[[alternative HTML version deleted]]
Reema,
Thank you for pointing out the error in the summarizeOverlaps
vignette.
That sentence was a leftover from a previous version and should have
been removed. I have updated the vignette in both release
(GenomicRanges
1.14.4) and devel (GenomicAlignments 0.99.8).
When the 'reads' argument is a GAlignmentPairs object the stand-alone
functions of Union(), IntersectionStrict() and IntersectionNotEmpty()
count the data as paired-end. If 'reads' is a GAlignments object they
are counted as single-end. summarizeOverlaps() reads the data into a
GAlignments or GAlignmentPairs object and calls these functions
internally; they are exactly the same.
Valerie
On 12/09/2013 02:46 AM, Reema Singh wrote:
> Hi all,
>
> Indeed summarizeOverlaps() works on paired end data. But different
> counting modes [Union, Intersection and IntersectionNotEmpty] they
do
> not handle paired end reads. Here the reference
> [http://www.bioconductor.org/packages/release/bioc/vignettes/Genomic
Ranges/inst/doc/summarizeOverlaps.pdf]-
> Page 2 [under Counting Modes].
>
> Kind Regards
>
>
>
> On Sun, Dec 8, 2013 at 6:31 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">> wrote:
>
> summarizeOverlaps() does count paired-end reads. Use option
> singleEnd=FALSE. Details under 'singleEnd' in the 'Arguments'
> section on the man page.
>
> library(GenomicAlignments) ## devel
> library(GenomicRanges) ## release
> ?summarizeOverlaps
>
> Valerie
>
>
> On 12/08/13 09:42, Reema Singh wrote:
>
> Hi Sam,
>
> It depends on whether you are using single read or paired
end reads
> files. As far as I known summarizeOverlaps function only
working
> in case
> of single read. Anyone please correct me if I am wrong.
>
> Kind Regards
>
>
> On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain
> <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">
> <mailto:vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">>> wrote:
>
> Hi Sam,
>
> Are you getting any error messages related to seqlevels
> when you
> count? If you have confirmed that annotation seqlevels
> match the bam
> files next look at the maximum possible overlap in a
single
> bam file.
>
> bf <- singleBamFile
> reads <- readGAlignments(bf) ## assuming single-end
reads
> so <- summarizeOverlaps(tx, reads, mode=Union,
> inter.feature=FALSE)
>
> 'inter.feature=FALSE' with 'mode=Union' is
countOverlaps with
> 'type=ANY'. Reads that hit multiple features will be
double
> counted
> in this case. The idea to to see if there are any
common
> regions at all.
>
> If there are still no counts, you could look more
closely at a
> smaller chromosome region in 'reads' where you would
expect
> counts.
> This visual inspection might give you some insight as
to
> why there
> is no overlap.
>
>
> Valerie
>
>
> On 12/05/13 18:51, Sam McInturf wrote:
>
> Hello Bioconductors,
> I am working on an Arabidopsis RNA seq set,
and after
> executing my count
> reads script, I get a count table with no reads
counted (ie
> sum(colSums(mat)) = 0).
>
> I mapped my reads using Tophat2 (without supplying
a
> gtf/gff
> file) and used
> samtools to sort and index my accepted_hits.bam
files.
> I loaded
> these bam
> files into IGV (integrated Genome Browser) and the
> reads appear
> to have
> been mapped (additionally the align_summary.txt
says I
> have good
> mapping).
> Script and session info below.
>
> Essentially I use Rsamtools to load in my bamfiles,
and
> then
> makeTranscriptsFromGFF to make a txdb object,
> transcriptsBy to get
> transcripts, and then summarizeOverlaps to count
the
> reads. My
> gff file is
> TAIR10. I am relatively sure that the genome I
mapped
> to was
> the TAIR10
> release, but I am not 100% on that.
>
> I have also used biomaRt to do this (commented
out),
> but I had
> the same
> results.
>
> I am currently upgrading to R 3.0.2 and re-
installing
> bioconductor (maybe
> it will change something?)
>
> Thanks for any thoughts!
>
> Script:
>
> gffdir <-
> "/data/smcintur/Annotation/____TAIR10_GFF3_genes.gff";
> library(ShortRead)
> library(GenomicFeatures)
> library(Rsamtools)
> #library(biomaRt)
>
> setwd("/data/smcintur/RNASeq/____NMseq/newTophat/BamFiles/")
>
> bf <- list.files(pattern = ".bam$")
> bi <- list.files(pattern = "bam.bai$")
> bfl <- BamFileList(bf, bi)
>
> #mart <- makeTranscriptDbFromBiomart(____biomart =
>
> "Public_TAIRV10", dataset=
> "tairv10")
> tx <- transcriptsBy(mart)
> gff <- makeTranscriptDbFromGFF(____gffdir, format =
"gff")
>
> gff
>
> cds <- cdsBy(gff)
> tx <- transcriptsBy(gff)
>
> olapTx <- summarizeOverlaps(tx, bfl)
> olapCds <- summarizeOverlaps(cds, bfl)
>
> txMat <- assays(olapTx)$counts
> cdsMat <- assays(olapCds)$counts
>
> write.table(txMat, file = "countMatTxGff.txt", sep
= "\t")
> write.table(cdsMat, file = "countMatCdsGff.txt",
sep =
> "\t")
>
> sessionInfo
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils
datasets
> methods base
>
> other attached packages:
> [1] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.24.0 Biobase_2.22.0
> BiocGenerics_0.8.0
> [4] biomaRt_2.18.0 Biostrings_2.30.1
> bitops_1.0-6
> [7] BSgenome_1.30.0 DBI_0.2-7
> GenomicFeatures_1.14.2
> [10] GenomicRanges_1.14.3 IRanges_1.20.6
> parallel_3.0.0
> [13] RCurl_1.95-4.1 Rsamtools_1.14.2
> RSQLite_0.11.4
> [16] rtracklayer_1.22.0 stats4_3.0.0
> tools_3.0.0
> [19] XML_3.98-1.1 XVector_0.2.0
> zlibbioc_1.8.0
>
>
> ___________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
> Search the archives:
> http://news.gmane.org/gmane.____science.biology.informatics.
____conductor
> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>
>
>
>
>
>
> --
> Reema Singh
> PhD Scholar
> Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> Jawaharlal Nehru University
> New Delhi-110067
> INDIA
>
>
>
>
>
> --
> Reema Singh
> Postdoctoral Research Assistant
> College of Life Sciences
> University of Dundee,
> Dundee DD1 4HN, Scotland
> United Kingdom
--
Valerie Obenchain
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: vobencha at fhcrc.org
Phone: (206) 667-3158
Fax: (206) 667-1319
Valerie and Reema,
I am using single end reads.
I did as Valerie suggested
so <- summarizeOverlaps(tx, reads, mode="Union",type = "any")
colSums(assays(so)$counts
#185,940
which feels silly, allow for double counts and return fewer counts
than
prior
P.S. thank you for your prompt reply, my email "hid" it from me
Best
Sam
On Mon, Dec 9, 2013 at 6:08 PM, Valerie Obenchain
<vobencha@fhcrc.org>wrote:
> Reema,
>
> Thank you for pointing out the error in the summarizeOverlaps
vignette.
> That sentence was a leftover from a previous version and should have
been
> removed. I have updated the vignette in both release (GenomicRanges
1.14.4)
> and devel (GenomicAlignments 0.99.8).
>
> When the 'reads' argument is a GAlignmentPairs object the stand-
alone
> functions of Union(), IntersectionStrict() and
IntersectionNotEmpty() count
> the data as paired-end. If 'reads' is a GAlignments object they are
counted
> as single-end. summarizeOverlaps() reads the data into a GAlignments
or
> GAlignmentPairs object and calls these functions internally; they
are
> exactly the same.
>
> Valerie
>
>
>
> On 12/09/2013 02:46 AM, Reema Singh wrote:
>
>> Hi all,
>>
>> Indeed summarizeOverlaps() works on paired end data. But different
>> counting modes [Union, Intersection and IntersectionNotEmpty] they
do
>> not handle paired end reads. Here the reference
>> [http://www.bioconductor.org/packages/release/bioc/
>> vignettes/GenomicRanges/inst/doc/summarizeOverlaps.pdf]-
>> Page 2 [under Counting Modes].
>>
>> Kind Regards
>>
>>
>>
>> On Sun, Dec 8, 2013 at 6:31 PM, Valerie Obenchain
<vobencha@fhcrc.org>> <mailto:vobencha@fhcrc.org>> wrote:
>>
>> summarizeOverlaps() does count paired-end reads. Use option
>> singleEnd=FALSE. Details under 'singleEnd' in the 'Arguments'
>> section on the man page.
>>
>> library(GenomicAlignments) ## devel
>> library(GenomicRanges) ## release
>> ?summarizeOverlaps
>>
>> Valerie
>>
>>
>> On 12/08/13 09:42, Reema Singh wrote:
>>
>> Hi Sam,
>>
>> It depends on whether you are using single read or paired
end
>> reads
>> files. As far as I known summarizeOverlaps function only
working
>> in case
>> of single read. Anyone please correct me if I am wrong.
>>
>> Kind Regards
>>
>>
>> On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain
>> <vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">
>> <mailto:vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">>>
wrote:
>>
>> Hi Sam,
>>
>> Are you getting any error messages related to
seqlevels
>> when you
>> count? If you have confirmed that annotation seqlevels
>> match the bam
>> files next look at the maximum possible overlap in a
single
>> bam file.
>>
>> bf <- singleBamFile
>> reads <- readGAlignments(bf) ## assuming single-
end reads
>> so <- summarizeOverlaps(tx, reads, mode=Union,
>> inter.feature=FALSE)
>>
>> 'inter.feature=FALSE' with 'mode=Union' is
countOverlaps with
>> 'type=ANY'. Reads that hit multiple features will be
double
>> counted
>> in this case. The idea to to see if there are any
common
>> regions at all.
>>
>> If there are still no counts, you could look more
closely at
>> a
>> smaller chromosome region in 'reads' where you would
expect
>> counts.
>> This visual inspection might give you some insight as
to
>> why there
>> is no overlap.
>>
>>
>> Valerie
>>
>>
>> On 12/05/13 18:51, Sam McInturf wrote:
>>
>> Hello Bioconductors,
>> I am working on an Arabidopsis RNA seq set,
and
>> after
>> executing my count
>> reads script, I get a count table with no reads
counted
>> (ie
>> sum(colSums(mat)) = 0).
>>
>> I mapped my reads using Tophat2 (without supplying
a
>> gtf/gff
>> file) and used
>> samtools to sort and index my accepted_hits.bam
files.
>> I loaded
>> these bam
>> files into IGV (integrated Genome Browser) and the
>> reads appear
>> to have
>> been mapped (additionally the align_summary.txt
says I
>> have good
>> mapping).
>> Script and session info below.
>>
>> Essentially I use Rsamtools to load in my
bamfiles, and
>> then
>> makeTranscriptsFromGFF to make a txdb object,
>> transcriptsBy to get
>> transcripts, and then summarizeOverlaps to count
the
>> reads. My
>> gff file is
>> TAIR10. I am relatively sure that the genome I
mapped
>> to was
>> the TAIR10
>> release, but I am not 100% on that.
>>
>> I have also used biomaRt to do this (commented
out),
>> but I had
>> the same
>> results.
>>
>> I am currently upgrading to R 3.0.2 and re-
installing
>> bioconductor (maybe
>> it will change something?)
>>
>> Thanks for any thoughts!
>>
>> Script:
>>
>> gffdir <-
>> "/data/smcintur/Annotation/____TAIR10_GFF3_genes.gff";
>> library(ShortRead)
>> library(GenomicFeatures)
>> library(Rsamtools)
>> #library(biomaRt)
>>
>>
setwd("/data/smcintur/RNASeq/____NMseq/newTophat/BamFiles/")
>>
>>
>> bf <- list.files(pattern = ".bam$")
>> bi <- list.files(pattern = "bam.bai$")
>> bfl <- BamFileList(bf, bi)
>>
>> #mart <- makeTranscriptDbFromBiomart(____biomart =
>>
>>
>> "Public_TAIRV10", dataset=
>> "tairv10")
>> tx <- transcriptsBy(mart)
>> gff <- makeTranscriptDbFromGFF(____gffdir, format
=
>> "gff")
>>
>>
>> gff
>>
>> cds <- cdsBy(gff)
>> tx <- transcriptsBy(gff)
>>
>> olapTx <- summarizeOverlaps(tx, bfl)
>> olapCds <- summarizeOverlaps(cds, bfl)
>>
>> txMat <- assays(olapTx)$counts
>> cdsMat <- assays(olapCds)$counts
>>
>> write.table(txMat, file = "countMatTxGff.txt", sep
=
>> "\t")
>> write.table(cdsMat, file = "countMatCdsGff.txt",
sep =
>> "\t")
>>
>> sessionInfo
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils
datasets
>> methods base
>>
>> other attached packages:
>> [1] BiocInstaller_1.12.0
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.24.0 Biobase_2.22.0
>> BiocGenerics_0.8.0
>> [4] biomaRt_2.18.0 Biostrings_2.30.1
>> bitops_1.0-6
>> [7] BSgenome_1.30.0 DBI_0.2-7
>> GenomicFeatures_1.14.2
>> [10] GenomicRanges_1.14.3 IRanges_1.20.6
>> parallel_3.0.0
>> [13] RCurl_1.95-4.1 Rsamtools_1.14.2
>> RSQLite_0.11.4
>> [16] rtracklayer_1.22.0 stats4_3.0.0
>> tools_3.0.0
>> [19] XML_3.98-1.1 XVector_0.2.0
>> zlibbioc_1.8.0
>>
>>
>> ___________________________________________________
>>
>> Bioconductor mailing list
>> Bioconductor@r-project.org
<mailto:bioconductor@r-project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>
>> https://stat.ethz.ch/mailman/____listinfo/bioconductor
>> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>
>>
>> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
>> Search the archives:
>>
http://news.gmane.org/gmane.____science.biology.informatics.
>> ____conductor
>>
<http: news.gmane.org="" gmane.__science.biology.informatics._="">> _conductor>
>>
>>
<http: news.gmane.org="" gmane.__science.biology.informatics._="">> _conductor
>> <http: news.gmane.org="" gmane.science.biology.informatics.="">> conductor>>
>>
>>
>>
>>
>>
>> --
>> Reema Singh
>> PhD Scholar
>> Computational Biology and Bioinformatics
>> School of Computational and Integrative Sciences
>> Jawaharlal Nehru University
>> New Delhi-110067
>> INDIA
>>
>>
>>
>>
>>
>> --
>> Reema Singh
>> Postdoctoral Research Assistant
>> College of Life Sciences
>> University of Dundee,
>> Dundee DD1 4HN, Scotland
>> United Kingdom
>>
>
>
> --
> Valerie Obenchain
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B155
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: vobencha@fhcrc.org
> Phone: (206) 667-3158
> Fax: (206) 667-1319
>
--
Sam McInturf
[[alternative HTML version deleted]]
On 12/21/13 05:14, Sam McInturf wrote:
> Valerie and Reema,
> I am using single end reads.
>
> I did as Valerie suggested
>
> so <- summarizeOverlaps(tx, reads, mode="Union",type = "any")
This is not the example I used below. You did not include
'inter.feature=FALSE'. 'type' is not an argument to
summarizeOverlaps().
In other words, do not use 'type', use 'inter.feature'. Please see my
previous explanation and the man page for summarizeOverlaps().
Valerie
> colSums(assays(so)$counts
> #185,940
>
> which feels silly, allow for double counts and return fewer counts
than
> prior
>
> P.S. thank you for your prompt reply, my email "hid" it from me
> Best
>
> Sam
>
>
>
> On Mon, Dec 9, 2013 at 6:08 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">> wrote:
>
> Reema,
>
> Thank you for pointing out the error in the summarizeOverlaps
> vignette. That sentence was a leftover from a previous version
and
> should have been removed. I have updated the vignette in both
> release (GenomicRanges 1.14.4) and devel (GenomicAlignments
0.99.8).
>
> When the 'reads' argument is a GAlignmentPairs object the
> stand-alone functions of Union(), IntersectionStrict() and
> IntersectionNotEmpty() count the data as paired-end. If 'reads'
is a
> GAlignments object they are counted as single-end.
> summarizeOverlaps() reads the data into a GAlignments or
> GAlignmentPairs object and calls these functions internally;
they
> are exactly the same.
>
> Valerie
>
>
>
> On 12/09/2013 02:46 AM, Reema Singh wrote:
>
> Hi all,
>
> Indeed summarizeOverlaps() works on paired end data. But
different
> counting modes [Union, Intersection and
IntersectionNotEmpty]
> they do
> not handle paired end reads. Here the reference
> [http://www.bioconductor.org/__packages/release/bioc/__vigne
ttes/GenomicRanges/inst/__doc/summarizeOverlaps.pdf]-
> <http: www.bioconductor.org="" packages="" release="" bioc="" vignettes="" genomicranges="" inst="" doc="" summarizeoverlaps.pdf%5d-="">
> Page 2 [under Counting Modes].
>
> Kind Regards
>
>
>
> On Sun, Dec 8, 2013 at 6:31 PM, Valerie Obenchain
> <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">
> <mailto:vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">>> wrote:
>
> summarizeOverlaps() does count paired-end reads. Use
option
> singleEnd=FALSE. Details under 'singleEnd' in the
'Arguments'
> section on the man page.
>
> library(GenomicAlignments) ## devel
> library(GenomicRanges) ## release
> ?summarizeOverlaps
>
> Valerie
>
>
> On 12/08/13 09:42, Reema Singh wrote:
>
> Hi Sam,
>
> It depends on whether you are using single read or
> paired end reads
> files. As far as I known summarizeOverlaps function
> only working
> in case
> of single read. Anyone please correct me if I am
wrong.
>
> Kind Regards
>
>
> On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain
> <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">
> <mailto:vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">>
> <mailto:vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">
> <mailto:vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">>>> wrote:
>
> Hi Sam,
>
> Are you getting any error messages related to
> seqlevels
> when you
> count? If you have confirmed that annotation
seqlevels
> match the bam
> files next look at the maximum possible
overlap in
> a single
> bam file.
>
> bf <- singleBamFile
> reads <- readGAlignments(bf) ## assuming
> single-end reads
> so <- summarizeOverlaps(tx, reads,
mode=Union,
> inter.feature=FALSE)
>
> 'inter.feature=FALSE' with 'mode=Union' is
> countOverlaps with
> 'type=ANY'. Reads that hit multiple features
will
> be double
> counted
> in this case. The idea to to see if there are
any
> common
> regions at all.
>
> If there are still no counts, you could look
more
> closely at a
> smaller chromosome region in 'reads' where you
> would expect
> counts.
> This visual inspection might give you some
insight
> as to
> why there
> is no overlap.
>
>
> Valerie
>
>
> On 12/05/13 18:51, Sam McInturf wrote:
>
> Hello Bioconductors,
> I am working on an Arabidopsis RNA
seq
> set, and after
> executing my count
> reads script, I get a count table with no
> reads counted (ie
> sum(colSums(mat)) = 0).
>
> I mapped my reads using Tophat2 (without
> supplying a
> gtf/gff
> file) and used
> samtools to sort and index my
> accepted_hits.bam files.
> I loaded
> these bam
> files into IGV (integrated Genome Browser)
and the
> reads appear
> to have
> been mapped (additionally the
> align_summary.txt says I
> have good
> mapping).
> Script and session info below.
>
> Essentially I use Rsamtools to load in my
> bamfiles, and
> then
> makeTranscriptsFromGFF to make a txdb
object,
> transcriptsBy to get
> transcripts, and then summarizeOverlaps to
> count the
> reads. My
> gff file is
> TAIR10. I am relatively sure that the
genome
> I mapped
> to was
> the TAIR10
> release, but I am not 100% on that.
>
> I have also used biomaRt to do this
(commented
> out),
> but I had
> the same
> results.
>
> I am currently upgrading to R 3.0.2 and
> re-installing
> bioconductor (maybe
> it will change something?)
>
> Thanks for any thoughts!
>
> Script:
>
> gffdir <-
>
"/data/smcintur/Annotation/______TAIR10_GFF3_genes.gff";
> library(ShortRead)
> library(GenomicFeatures)
> library(Rsamtools)
> #library(biomaRt)
>
>
>
setwd("/data/smcintur/RNASeq/______NMseq/newTophat/BamFiles/")
>
>
> bf <- list.files(pattern = ".bam$")
> bi <- list.files(pattern = "bam.bai$")
> bfl <- BamFileList(bf, bi)
>
> #mart <-
> makeTranscriptDbFromBiomart(______biomart =
>
>
> "Public_TAIRV10", dataset=
> "tairv10")
> tx <- transcriptsBy(mart)
> gff <-
makeTranscriptDbFromGFF(______gffdir,
> format = "gff")
>
>
> gff
>
> cds <- cdsBy(gff)
> tx <- transcriptsBy(gff)
>
> olapTx <- summarizeOverlaps(tx, bfl)
> olapCds <- summarizeOverlaps(cds, bfl)
>
> txMat <- assays(olapTx)$counts
> cdsMat <- assays(olapCds)$counts
>
> write.table(txMat, file =
"countMatTxGff.txt",
> sep = "\t")
> write.table(cdsMat, file =
> "countMatCdsGff.txt", sep =
> "\t")
>
> sessionInfo
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu
(64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8
LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8
> LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C
LC_NAME=C
> [9] LC_ADDRESS=C
LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils
> datasets
> methods base
>
> other attached packages:
> [1] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.24.0
Biobase_2.22.0
> BiocGenerics_0.8.0
> [4] biomaRt_2.18.0
Biostrings_2.30.1
> bitops_1.0-6
> [7] BSgenome_1.30.0 DBI_0.2-7
> GenomicFeatures_1.14.2
> [10] GenomicRanges_1.14.3 IRanges_1.20.6
> parallel_3.0.0
> [13] RCurl_1.95-4.1
Rsamtools_1.14.2
> RSQLite_0.11.4
> [16] rtracklayer_1.22.0 stats4_3.0.0
> tools_3.0.0
> [19] XML_3.98-1.1 XVector_0.2.0
> zlibbioc_1.8.0
>
>
>
_____________________________________________________
>
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> https://stat.ethz.ch/mailman/______listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> Search the archives:
> http://news.gmane.org/gmane.______science.biology.informatic
s.______conductor
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
>
>
>
>
>
> --
> Reema Singh
> PhD Scholar
> Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> Jawaharlal Nehru University
> New Delhi-110067
> INDIA
>
>
>
>
>
> --
> Reema Singh
> Postdoctoral Research Assistant
> College of Life Sciences
> University of Dundee,
> Dundee DD1 4HN, Scotland
> United Kingdom
>
>
>
> --
> Valerie Obenchain
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B155
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org="">
> Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>
>
> --
> Sam McInturf
Valerie,
Sorry for the lame error, was reading your comments about
countOverlaps.
I found my problem in that the features data had strand information,
while
my sequencing was not stranded, so by setting the strand information
to "*"
in the GRanges object and then reduce(split()) I was able to map
succesfully
Thanks for the help!
On Saturday, December 21, 2013, Valerie Obenchain wrote:
> On 12/21/13 05:14, Sam McInturf wrote:
>
>> Valerie and Reema,
>> I am using single end reads.
>>
>> I did as Valerie suggested
>>
>> so <- summarizeOverlaps(tx, reads, mode="Union",type = "any")
>>
>
> This is not the example I used below. You did not include
> 'inter.feature=FALSE'. 'type' is not an argument to
summarizeOverlaps(). In
> other words, do not use 'type', use 'inter.feature'. Please see my
previous
> explanation and the man page for summarizeOverlaps().
>
>
> Valerie
>
>
> colSums(assays(so)$counts
>> #185,940
>>
>> which feels silly, allow for double counts and return fewer counts
than
>> prior
>>
>> P.S. thank you for your prompt reply, my email "hid" it from me
>> Best
>>
>> Sam
>>
>>
>>
>> On Mon, Dec 9, 2013 at 6:08 PM, Valerie Obenchain
<vobencha@fhcrc.org>> <mailto:vobencha@fhcrc.org>> wrote:
>>
>> Reema,
>>
>> Thank you for pointing out the error in the summarizeOverlaps
>> vignette. That sentence was a leftover from a previous version
and
>> should have been removed. I have updated the vignette in both
>> release (GenomicRanges 1.14.4) and devel (GenomicAlignments
0.99.8).
>>
>> When the 'reads' argument is a GAlignmentPairs object the
>> stand-alone functions of Union(), IntersectionStrict() and
>> IntersectionNotEmpty() count the data as paired-end. If 'reads'
is a
>> GAlignments object they are counted as single-end.
>> summarizeOverlaps() reads the data into a GAlignments or
>> GAlignmentPairs object and calls these functions internally;
they
>> are exactly the same.
>>
>> Valerie
>>
>>
>>
>> On 12/09/2013 02:46 AM, Reema Singh wrote:
>>
>> Hi all,
>>
>> Indeed summarizeOverlaps() works on paired end data. But
different
>> counting modes [Union, Intersection and
IntersectionNotEmpty]
>> they do
>> not handle paired end reads. Here the reference
>> [http://www.bioconductor.org/__packages/release/bioc/__
>> vignettes/GenomicRanges/inst/__doc/summarizeOverlaps.pdf]-
>> <http: www.bioconductor.org="" packages="" release="" bioc="">> vignettes/GenomicRanges/inst/doc/summarizeOverlaps.pdf%5D->
>> Page 2 [under Counting Modes].
>>
>> Kind Regards
>>
>>
>>
>> On Sun, Dec 8, 2013 at 6:31 PM, Valerie Obenchain
>> <vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">
>> <mailto:vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">>>
wrote:
>>
>> summarizeOverlaps() does count paired-end reads. Use
option
>> singleEnd=FALSE. Details under 'singleEnd' in the
'Arguments'
>> section on the man page.
>>
>> library(GenomicAlignments) ## devel
>> library(GenomicRanges) ## release
>> ?summarizeOverlaps
>>
>> Valerie
>>
>>
>> On 12/08/13 09:42, Reema Singh wrote:
>>
>> Hi Sam,
>>
>> It depends on whether you are using single read or
>> paired end reads
>> files. As far as I known summarizeOverlaps
function
>> only working
>> in case
>> of single read. Anyone please correct me if I am
wrong.
>>
>> Kind Regards
>>
>>
>> On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain
>> <vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">
>> <mailto:vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">>
>> <mailto:vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">
>> <mailto:vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">>>>
wrote:
>>
>> Hi Sam,
>>
>> Are you getting any error messages related to
>> seqlevels
>> when you
>> count? If you have confirmed that annotation
>> seqlevels
>> match the bam
>> files next look at the maximum possible
overlap in
>> a single
>> bam file.
>>
>> bf <- singleBamFile
>> reads <- readGAlignments(bf) ## assuming
>> single-end reads
>> so <- summarizeOverlaps(tx, reads,
mode=Union,
>> inter.feature=FALSE)
>>
>> 'inter.feature=FALSE' with 'mode=Union' is
>> countOverlaps with
>> 'type=ANY'. Reads that hit multiple features
will
>> be double
>> counted
>> in this case. The idea to to see if there are
any
>> common
>> regions at all.
>>
>> If there are still no counts, you could look
more
>> closely at a
>> smaller chromosome region in 'reads' where
you
>> would expect
>> counts.
>> This visual inspection might give you some
insight
>> as to
>> why there
>> is no overlap.
>>
>>
>> Valerie
>>
>>
>> On 12/05/13 18:51, Sam McInturf wrote:
>>
>> Hello Bioconductors,
>> I am working on an Arabidopsis RNA
seq
>> set, and after
>> executing my count
>> reads script, I get a count table with no
>> reads counted (ie
>> sum(colSums(mat)) = 0).
>>
>> I mapped my reads using Tophat2 (without
>> supplying a
>> gtf/gff
>> file) and used
>> samtools to sort and index my
>> accepted_hits.bam files.
>> I loaded
>> these bam
>> files into IGV (integrated Genome
Browser) and
>> the
>> reads appear
>> to have
>> been mapped (additionally the
>> align_summary.txt says I
>> have good
>> mapping).
>> Script and session info below.
>>
>> Essentially I use Rsamtools to load in my
>> bamfiles, and
>> then
>> makeTranscriptsFromGFF to make a txdb
object,
>> transcriptsBy to get
>> transcripts, and then summarizeOverlaps
to
>> count the
>> reads. My
>> gff file is
>> TAIR10. I am relatively sure that the
genome
>> I mapped
>> to was
>> the TAIR10
>> release, but I am not 100% on that.
>>
>> I have also used biomaRt to do this
(commented
>> out),
>> but I had
>> the same
>> results.
>>
>> I am currently upgrading to R 3.0.2 and
>> re-installing
>> bioconductor (
>> "/data/smcintur/Annotation/______TAIR10_GFF3_genes.gff";
>> library(ShortRead)
>> library(GenomicFeatures)
>> library(Rsamtools)
>> #library(biomaRt)
>>
>>
>>
setwd("/data/smcintur/RNASeq/______NMseq/newTophat/BamFiles/")
>>
>>
>> bf <- list.files(pattern = ".bam$")
>> bi <- list.files(pattern = "bam.bai$")
>> bfl <- BamFileList(bf, bi)
>>
>> #mart <-
>> makeTranscriptDbFromBiomart(______biomart =
>>
>>
>> "Public_TAIRV10", dataset=
>> "tairv10")
>> tx <- transcriptsBy(mart)
>> gff <-
makeTranscriptDbFromGFF(______gffdir,
>> format = "gff")
>>
>>
>> gff
>>
>> cds <- cdsBy(gff)
>> tx <- transcriptsBy(gff)
>>
>> olapTx <- summarizeOverlaps(tx, bfl)
>> olapCds <- summarizeOverlaps(cds, bfl)
>>
>> txMat <- assays(olapTx)$counts
>> cdsMat <- assays(olapCds)$counts
>>
>> write.table(txMat, file =
"countMatTxGff.txt",
>> sep = "\t")
>> write.table(cdsMat, file =
>> "countMatCdsGff.txt", sep =
>> "\t")
>>
>> sessionInfo
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu
(64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8
LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8
>> LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8
>> LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C
LC_NAME=C
>> [9] LC_ADDRESS=C
LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8
>> LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils
>> datasets
>> methods base
>>
>> other attached packages:
>> [1] BiocInstaller_1.12.0
>>
>> loaded via a namespace (and not
attached):
>> [1] AnnotationDbi_1.24.0
Biobase_2.22.0
>> BiocGenerics_0.8.0
>> [4] biomaRt_2.18.0
Biostrings_2.30.1
>> bitops_1.0-6
>> [7] BSgenome_1.30.0 DBI_0.2-7
>> GenomicFeatures_1.14.2
>> [10] GenomicRanges_1.14.3
IRanges_1.20.6
>> parallel_3.0.0
>> [13] RCurl_1.95-4.1
Rsamtools_1.14.2
>> RSQLite_0.11.4
>> [16] rtracklayer_1.22.0 stats4_3.0.0
>> tools_3.0.0
>> [19] XML_3.98-1.1 XVector_0.2.0
>> zlibbioc_1.8.0
>>
>>
>> ______________________________
>> _______________________
>>
>> Bioconductor mailing list
>> Bioconductor@r-project.org
<mailto:bioconductor@r-project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>
>> <mailto:bioconductor@r-____project.org>> <mailto:bioconductor@r-__project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>>
>> https://stat.ethz.ch/mailman/______listinfo/bioconductor
>> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>>
>>
>>
>> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
>> Search the archives:
>> http://news.gmane.org/gmane.______science.biology.
>> informatics.______conductor
>> <http: news.gmane.org="" gmane.____science.biology.="">> informatics.____conductor>
>>
>> <http: news.gmane.org="" gmane.____science.biology.="">> informatics.____conductor
>>
<http: news.gmane.org="" gmane.__science.biology.informatics._="">> _conductor>>
>>
>>
>> <http: news.gmane.org="" gmane.____science.biology.="">> informatics.____conductor
>>
<http: news.gmane.org="" gmane.__science.biology.informatics._="">> _conductor>
>>
>>
<http: news.gmane.org="" gmane.__science.biology.informatics._="">> _conductor
>> <http: news.gmane.org="" gmane.science.biology.informatics.="">> conductor>>>
>>
>>
>>
>>
>>
>> --
>> Reema Singh
>> PhD Scholar
>> Computational Biology and Bioinformatics
>> School of Computational and Integrative Sciences
>> Jawaharlal Nehru University
>> New Delhi-110067
>> INDIA
>>
>>
>>
>>
>>
>> --
>> Reema Singh
>> Postdoctoral Research Assistant
>> College of Life Sciences
>> University of Dundee,
>> Dundee DD1 4HN, Scotland
>> United Kingdom
>>
>>
>>
>> --
>> Valerie Obenchain
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B155
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: vobencha@fhcrc.org <mailto:vobencha@fhcrc.org>
>> Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>
>>
>> --
>> Sam McInturf
>>
>
>
--
Sam McInturf
[[alternative HTML version deleted]]
On 12/24/13 04:24, Sam McInturf wrote:
> Valerie,
> Sorry for the lame error, was reading your comments about
countOverlaps.
>
> I found my problem in that the features data had strand information,
> while my sequencing was not stranded, so by setting the strand
> information to "*" in the GRanges object and then reduce(split()) I
was
> able to map succesfully
Great! Glad it worked.
There is an 'ignore.strand' argument in many of the count functions.
Setting 'ignore.strand=TRUE' is equivalent to setting strand to "*" in
the GRanges/GAlignments etc. The benefit of using the argument is that
strand is ignored during counting but that information is retained in
the original GRanges.
Valerie
>
>
> Thanks for the help!
>
> On Saturday, December 21, 2013, Valerie Obenchain wrote:
>
> On 12/21/13 05:14, Sam McInturf wrote:
>
> Valerie and Reema,
> I am using single end reads.
>
> I did as Valerie suggested
>
> so <- summarizeOverlaps(tx, reads, mode="Union",type =
"any")
>
>
> This is not the example I used below. You did not include
> 'inter.feature=FALSE'. 'type' is not an argument to
> summarizeOverlaps(). In other words, do not use 'type', use
> 'inter.feature'. Please see my previous explanation and the man
page
> for summarizeOverlaps().
>
>
> Valerie
>
>
> colSums(assays(so)$counts
> #185,940
>
> which feels silly, allow for double counts and return fewer
> counts than
> prior
>
> P.S. thank you for your prompt reply, my email "hid" it from
me
> Best
>
> Sam
>
>
>
> On Mon, Dec 9, 2013 at 6:08 PM, Valerie Obenchain
> <vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">> wrote:
>
> Reema,
>
> Thank you for pointing out the error in the
summarizeOverlaps
> vignette. That sentence was a leftover from a previous
> version and
> should have been removed. I have updated the vignette
in both
> release (GenomicRanges 1.14.4) and devel
(GenomicAlignments
> 0.99.8).
>
> When the 'reads' argument is a GAlignmentPairs object
the
> stand-alone functions of Union(), IntersectionStrict()
and
> IntersectionNotEmpty() count the data as paired-end. If
> 'reads' is a
> GAlignments object they are counted as single-end.
> summarizeOverlaps() reads the data into a GAlignments
or
> GAlignmentPairs object and calls these functions
> internally; they
> are exactly the same.
>
> Valerie
>
>
>
> On 12/09/2013 02:46 AM, Reema Singh wrote:
>
> Hi all,
>
> Indeed summarizeOverlaps() works on paired end
data.
> But different
> counting modes [Union, Intersection and
> IntersectionNotEmpty]
> they do
> not handle paired end reads. Here the reference
>
> [http://www.bioconductor.org/____packages/release/bioc/____v
ignettes/GenomicRanges/inst/____doc/summarizeOverlaps.pdf]-
> <http: www.bioconductor.org="" __packages="" release="" bioc="" __vigne="" ttes="" genomicranges="" inst="" __doc="" summarizeoverlaps.pdf%5d-="">
>
> <http: www.bioconductor.org="" __packages="" release="" bioc="" __vigne="" ttes="" genomicranges="" inst="" __doc="" summarizeoverlaps.pdf%5d-=""> <http: www.bioconductor.org="" packages="" release="" bioc="" vignettes="" genomicranges="" inst="" doc="" summarizeoverlaps.pdf%5d-="">>
> Page 2 [under Counting Modes].
>
> Kind Regards
>
>
>
> On Sun, Dec 8, 2013 at 6:31 PM, Valerie Obenchain
> <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">
> <mailto:vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">>> wrote:
>
> summarizeOverlaps() does count paired-end
reads.
> Use option
> singleEnd=FALSE. Details under 'singleEnd' in
the
> 'Arguments'
> section on the man page.
>
> library(GenomicAlignments) ## devel
> library(GenomicRanges) ## release
> ?summarizeOverlaps
>
> Valerie
>
>
> On 12/08/13 09:42, Reema Singh wrote:
>
> Hi Sam,
>
> It depends on whether you are using single
read or
> paired end reads
> files. As far as I known summarizeOverlaps
> function
> only working
> in case
> of single read. Anyone please correct me
if I
> am wrong.
>
> Kind Regards
>
>
> On Sun, Dec 8, 2013 at 8:25 PM, Valerie
Obenchain
> <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">
> <mailto:vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">>
> <mailto:vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">
> <mailto:vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">>>> wrote:
>
> Hi Sam,
>
> Are you getting any error messages
related to
> seqlevels
> when you
> count? If you have confirmed that
> annotation seqlevels
> match the bam
> files next look at the maximum
possible
> overlap in
> a single
> bam file.
>
> bf <- singleBamFile
> reads <- readGAlignments(bf) ##
assuming
> single-end reads
> so <- summarizeOverlaps(tx, reads,
> mode=Union,
> inter.feature=FALSE)
>
> 'inter.feature=FALSE' with
'mode=Union' is
> countOverlaps with
> 'type=ANY'. Reads that hit multiple
> features will
> be double
> counted
> in this case. The idea to to see if
there
> are any
> common
> regions at all.
>
> If there are still no counts, you
could
> look more
> closely at a
> smaller chromosome region in 'reads'
> where you
> would expect
> counts.
> This visual inspection might give you
> some insight
> as to
> why there
> is no overlap.
>
>
> Valerie
>
>
> On 12/05/13 18:51, Sam McInturf
wrote:
>
> Hello Bioconductors,
> I am working on an
Arabidopsis
> RNA seq
> set, and after
> executing my count
> reads script, I get a count table
with no
> reads counted (ie
> sum(colSums(mat)) = 0).
>
> I mapped my reads using Tophat2
(without
> supplying a
> gtf/gff
> file) and used
> samtools to sort and index my
> accepted_hits.bam files.
> I loaded
> these bam
> files into IGV (integrated Genome
> Browser) and the
> reads appear
> to have
> been mapped (additionally the
> align_summary.txt says I
> have good
> mapping).
> Script and session info below.
>
> Essentially I use Rsamtools to
load in my
> bamfiles, and
> then
> makeTranscriptsFromGFF to make a
txdb
> object,
> transcriptsBy to get
> transcripts, and then
> summarizeOverlaps to
> count the
> reads. My
> gff file is
> TAIR10. I am relatively sure
that
> the genome
> I mapped
> to was
> the TAIR10
> release, but I am not 100% on
that.
>
> I have also used biomaRt to do
this
> (commented
> out),
> but I had
> the same
> results.
>
> I am currently upgrading to R
3.0.2 and
> re-installing
> bioconductor (
> "/data/smcintur/Annotation/________TAIR10_GFF3_genes.gff";
> library(ShortRead)
> library(GenomicFeatures)
> library(Rsamtools)
> #library(biomaRt)
>
>
>
>
setwd("/data/smcintur/RNASeq/________NMseq/newTophat/BamFiles/__")
>
>
> bf <- list.files(pattern =
".bam$")
> bi <- list.files(pattern =
"bam.bai$")
> bfl <- BamFileList(bf, bi)
>
> #mart <-
> makeTranscriptDbFromBiomart(________biomart =
>
>
> "Public_TAIRV10", dataset=
> "tairv10")
> tx <- transcriptsBy(mart)
> gff <-
> makeTranscriptDbFromGFF(________gffdir,
> format = "gff")
>
>
> gff
>
> cds <- cdsBy(gff)
> tx <- transcriptsBy(gff)
>
> olapTx <- summarizeOverlaps(tx,
bfl)
> olapCds <- summarizeOverlaps(cds,
bfl)
>
> txMat <- assays(olapTx)$counts
> cdsMat <- assays(olapCds)$counts
>
> write.table(txMat, file =
> "countMatTxGff.txt",
> sep = "\t")
> write.table(cdsMat, file =
> "countMatCdsGff.txt", sep =
> "\t")
>
> sessionInfo
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-
gnu
> (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8
> LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8
> LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C
> LC_NAME=C
> [9] LC_ADDRESS=C
> LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices
utils
> datasets
> methods base
>
> other attached packages:
> [1] BiocInstaller_1.12.0
>
> loaded via a namespace (and not
> attached):
> [1] AnnotationDbi_1.24.0
> Biobase_2.22.0
> BiocGenerics_0.8.0
> [4] biomaRt_2.18.0
> Biostrings_2.30.1
> bitops_1.0-6
> [7] BSgenome_1.30.0
DBI_0.2-7
> GenomicFeatures_1.14.2
> [10] GenomicRanges_1.14.3
> IRanges_1.20.6
> parallel_3.0.0
> [13] RCurl_1.95-4.1
> Rsamtools_1.14.2
> RSQLite_0.11.4
> [16] rtracklayer_1.22.0
stats4_3.0.0
> tools_3.0.0
> [19] XML_3.98-1.1
XVector_0.2.0
> zlibbioc_1.8.0
>
>
>
> _______________________________________________________
>
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> https://stat.ethz.ch/mailman/________listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" ______listinfo="" bioconductor="">
>
> <https: stat.ethz.ch="" mailman="" ______listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">>
>
> <https: stat.ethz.ch="" mailman="" ______listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>>
>
>
>
>
> <https: stat.ethz.ch="" mailman="" ______listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>>
> Search the archives:
> http://news.gmane.org/gmane.________science.biology.__inform
atics.______conductor
> <http: news.gmane.org="" gmane.______science.biology.informati="" cs.______conductor="">
>
> <http: news.gmane.org="" gmane.______science.biology.__informa="" tics.____conductor=""> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">>
>
>
> <http: news.gmane.org="" gmane.______science.biology.__informa="" tics.____conductor=""> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>>
>
>
>
> <http: news.gmane.org="" gmane.______science.biology.__informa="" tics.____conductor=""> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>>
>
>
>
>
>
> --
> Reema Singh
> PhD Scholar
> Computational Biology and Bioinformatics
> School of Computational and Integrative
Sciences
> Jawaharlal Nehru University
> New Delhi-110067
> INDIA
>
>
>
>
>
> --
> Reema Singh
> Postdoctoral Research Assistant
> College of Life Sciences
> University of Dundee,
> Dundee DD1 4HN, Scotland
> United Kingdom
>
>
>
> --
> Valerie Obenchain
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B155
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org="">
> Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>
>
> --
> Sam McInturf
>
>
>
>
> --
> Sam McInturf