Hi,
I am wondering how to simply prepare the input files in range format
like bed file for BayesPeak and chipseq package , as well as input
file for RNA-seq packages. Assumed the bam files are already generated
by BWA and samtools. Thanks.
John
[[alternative HTML version deleted]]
Hi,
I am wondering how to simply prepare the input files for R BayesPeak
and chipseq packages, assuming BAM files already generated by BWA and
samtools. Thanks.
John
[[alternative HTML version deleted]]
Hi John,
I would try the Rsamtools package. You'd need something like this
(warning, untested code):
library(Rsamtools)
bamFile = "path/to/Bamfile.bam"
p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
bam <- scanBam(bamFile, param=p)[[1]]
BayesPeak accepts data.frames or RangedDatas. I would suggest the
easiest thing to do is construct a RangedData:
library(IRanges)
IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]])
x <- RangedData(ranges=IR, strand=bam[["strand"]],
space=bam[["rname"]])
chipseq accepts GRanges by preference:
library(GenomicRanges)
y <- GRanges(seqnames=bam[["rname"]], ranges=IR,
strand=bam[["strand"]])
There may be a faster/cleverer way of doing it, but this should work.
Jonathan
________________________________________
From: bioconductor-bounces@r-project.org [bioconductor-
bounces@r-project.org] On Behalf Of John linux-user
[johnlinuxuser@yahoo.com]
Sent: 17 September 2012 15:04
To: bioconductor at r-project.org
Subject: [BioC] ChiPseq input files?
Hi,
I am wondering how to simply prepare the input files for R BayesPeak
and chipseq packages, assuming BAM files already generated by BWA and
samtools. Thanks.
John
[[alternative HTML version deleted]]
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for
...{{dropped:17}}
Hi Jonathan,
Thanks for your response and codes. That saves me a lot of time to
look over the webs. Your answers are great! but if I try to create a
table using python or other scripts and then input the table to R for
statistics, how can I decide the range (e.g. start and end) when I
count the reads in each position across the chromosomes/genome? Can
you give me more suggestions? Thanks.
Best,
John
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk>
<bioconductor@r-project.org>
Sent: Monday, September 17, 2012 10:38 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I would try the Rsamtools package. You'd need something like this
(warning, untested code):
library(Rsamtools)
bamFile = "path/to/Bamfile.bam"
p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
bam <- scanBam(bamFile, param=p)[[1]]
BayesPeak accepts data.frames or RangedDatas. I would suggest the
easiest thing to do is construct a RangedData:
library(IRanges)
IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]])
x <- RangedData(ranges=IR, strand=bam[["strand"]],
space=bam[["rname"]])
chipseq accepts GRanges by preference:
library(GenomicRanges)
y <- GRanges(seqnames=bam[["rname"]], ranges=IR,
strand=bam[["strand"]])
There may be a faster/cleverer way of doing it, but this should work.
Jonathan
________________________________________
From: bioconductor-bounces@r-project.org [bioconductor-
bounces@r-project.or
Sent: 17 September 2012 15:04
To: bioconductor@r-project.org
Subject: [BioC] ChiPseq input files?
Hi,
I am wondering how to simply prepare the input files for R BayesPeak
and chipseq packages, assuming BAM files already generated by BWA and
samtools. Thanks.
John
[[alternative HTML version deleted]]
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for
...{{dropped:19}}
Hi John,
I'm afraid I don't understand your question. It sounds like you are
trying to bin the reads? This shouldn't be necessary, as both packages
do this for you. Was that your intended query?
Jonathan
________________________________________
From: John linux-user [johnlinuxuser@yahoo.com]
Sent: 17 September 2012 15:59
To: Jonathan Cairns; bioconductor at r-project.org
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Thanks for your response and codes. That saves me a lot of time to
look over the webs. Your answers are great! but if I try to create a
table using python or other scripts and then input the table to R for
statistics, how can I decide the range (e.g. start and end) when I
count the reads in each position across the chromosomes/genome? Can
you give me more suggestions? Thanks.
Best,
John
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk>
To: John linux-user <johnlinuxuser at="" yahoo.com="">; "bioconductor at
r-project.org" <bioconductor at="" r-project.org="">
Sent: Monday, September 17, 2012 10:38 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I would try the Rsamtools package. You'd need something like this
(warning, untested code):
library(Rsamtools)
bamFile = "path/to/Bamfile.bam"
p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
bam <- scanBam(bamFile, param=p)[[1]]
BayesPeak accepts data.frames or RangedDatas. I would suggest the
easiest thing to do is construct a RangedData:
library(IRanges)
IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]])
x <- RangedData(ranges=IR, strand=bam[["strand"]],
space=bam[["rname"]])
chipseq accepts GRanges by preference:
library(GenomicRanges)
y <- GRanges(seqnames=bam[["rname"]], ranges=IR,
strand=bam[["strand"]])
There may be a faster/cleverer way of doing it, but this should work.
Jonathan
________________________________________
From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-project.org="">] On Behalf Of John linux-user
[johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>]
Sent: 17 September 2012 15:04
To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
Subject: [BioC] ChiPseq input files?
Hi,
I am wondering how to simply prepare the input files for R BayesPeak
and chipseq packages, assuming BAM files already generated by BWA and
samtools. Thanks.
John
[[alternative HTML version deleted]]
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
Hi Jonathan,
Thanks for your response. I just liked to use python instead of R to
generate these IRange data.
I looked over the introduction part of RNA-seq data and it seemed that
it just counted the read hits overlapped the annotated gene regions as
coded below,
and I am wondering if it was the similar things occurred for chip-seq
data. Thanks. John
gnModel <- exonsBy(txdb, "gene")
counter <- function(fl, gnModel)
{ aln <- readGappedAlignments(fl) strand(aln) <- "*" # for strand-
blind sample prep protocol hits <- countOverlaps(aln, gnModel) counts
<- countOverlaps(gnModel, aln[hits==1]) names(counts) <-
names(gnModel) counts
}
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk>
<bioconductor@r-project.org>
Sent: Monday, September 17, 2012 11:35 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I'm afraid I don't understand your question. It sounds like you are
trying to bin the reads? This shouldn't be necessary, as both packages
do this for you. Was that your intended query?
Jonathan
________________________________________
Sent: 17 September 2012 15:59
To: Jonathan Cairns; bioconductor@r-project.org
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Thanks for your response and codes. That saves me a lot of time to
look over the webs. Your answers are great! but if I try to create a
table using python or other scripts and then input the table to R for
statistics, how can I decide the range (e.g. start and end) when I
count the reads in each position across the chromosomes/genome? Can
you give me more suggestions? Thanks.
Best,
John
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk>
<bioconductor@r-project.org>
Sent: Monday, September 17, 2012 10:38 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I would try the Rsamtools package. You'd need something like this
(warning, untested code):
library(Rsamtools)
bamFile = "path/to/Bamfile.bam"
p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
bam <- scanBam(bamFile, param=p)[[1]]
BayesPeak accepts data.frames or RangedDatas. I would suggest the
easiest thing to do is construct a RangedData:
library(IRanges)
IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]])
x <- RangedData(ranges=IR, strand=bam[["strand"]],
space=bam[["rname"]])
chipseq accepts GRanges by preference:
library(GenomicRanges)
y <- GRanges(seqnames=bam[["rname"]], ranges=IR,
strand=bam[["strand"]])
There may be a faster/cleverer way of doing it, but this should work.
Jonathan
________________________________________
From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-="" sent:="" 17="" september="" 2012="" 15:04="" to:="" bioconductor@r-project.org<mailto:bioconductor@r-project.org="">
Subject: [BioC] ChiPseq input files?
Hi,
I am wondering how to simply prepare the input files for R BayesPeak
and chipseq packages, assuming BAM files already generated by BWA and
samtools. Thanks.
John
[[alternative HTML version deleted]]
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
[[alternative HTML version deleted]]
Hi,
In RNA-seq, one knows where the regions of interest (i.e. exons) are,
so binning is straightforward. No such database of "regions of
interest" exists for ChIP-seq. Hence, peak-caller algorithms, to find
them.
IRanges/RangedData/GRanges etc are internal R objects, so you'll have
a hard time constructing such a thing in python. If disk space is a
major issue, you could try creating a .bed file from your .bam file,
and then read that in with e.g. read.bed() in BayesPeak, or import()
in rtracklayer.
J
________________________________________
From: John linux-user [johnlinuxuser@yahoo.com]
Sent: 17 September 2012 16:55
To: Jonathan Cairns; bioconductor at r-project.org
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Thanks for your response. I just liked to use python instead of R to
generate these IRange data.
I looked over the introduction part of RNA-seq data and it seemed that
it just counted the read hits overlapped the annotated gene regions as
coded below,
and I am wondering if it was the similar things occurred for chip-seq
data. Thanks. John
gnModel <- exonsBy(txdb, "gene")
counter <- function(fl, gnModel)
{
aln <- readGappedAlignments(fl)
strand(aln) <- "*" # for strand-blind sample prep protocol
hits <- countOverlaps(aln, gnModel)
counts <- countOverlaps(gnModel, aln[hits==1])
names(counts) <- names(gnModel)
counts
}
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk>
To: John linux-user <johnlinuxuser at="" yahoo.com="">; "bioconductor at
r-project.org" <bioconductor at="" r-project.org="">
Sent: Monday, September 17, 2012 11:35 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I'm afraid I don't understand your question. It sounds like you are
trying to bin the reads? This shouldn't be necessary, as both packages
do this for you. Was that your intended query?
Jonathan
________________________________________
From: John linux-user
[johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>]
Sent: 17 September 2012 15:59
To: Jonathan Cairns; bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Thanks for your response and codes. That saves me a lot of time to
look over the webs. Your answers are great! but if I try to create a
table using python or other scripts and then input the table to R for
statistics, how can I decide the range (e.g. start and end) when I
count the reads in each position across the chromosomes/genome? Can
you give me more suggestions? Thanks.
Best,
John
________________________________
From: Jonathan Cairns
<jonathan.cairns@cancer.org.uk<mailto:jonathan.cairns@cancer.org.uk>>
To: John linux-user <johnlinuxuser at="" yahoo.com<mailto:johnlinuxuser="" at="" yahoo.com="">>; "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>
Sent: Monday, September 17, 2012 10:38 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I would try the Rsamtools package. You'd need something like this
(warning, untested code):
library(Rsamtools)
bamFile = "path/to/Bamfile.bam"
p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
bam <- scanBam(bamFile, param=p)[[1]]
BayesPeak accepts data.frames or RangedDatas. I would suggest the
easiest thing to do is construct a RangedData:
library(IRanges)
IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]])
x <- RangedData(ranges=IR, strand=bam[["strand"]],
space=bam[["rname"]])
chipseq accepts GRanges by preference:
library(GenomicRanges)
y <- GRanges(seqnames=bam[["rname"]], ranges=IR,
strand=bam[["strand"]])
There may be a faster/cleverer way of doing it, but this should work.
Jonathan
________________________________________
From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>
[bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>] On
Behalf Of John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuse r@yahoo.com=""><mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo .com="">>]
Sent: 17 September 2012 15:04
To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>
Subject: [BioC] ChiPseq input files?
Hi,
I am wondering how to simply prepare the input files for R BayesPeak
and chipseq packages, assuming BAM files already generated by BWA and
samtools. Thanks.
John
[[alternative HTML version deleted]]
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
Hi Jonathan,
Your clarification is great and how to create the bed file and what
format the bed file would be is the exact question I like to ask, e.g
counting reads in each base position or in each regions. If in each
regions, how to decide the length of each region? Two specific example
below for two formats. It would be easy to count reads in format1, but
if format2, it would be hard to determine the range. Thanks for
further suggestions. Best, John
format 1,
chr start end reads
chr1,6557,6557, 233
ch10,9454,94545,100
format 2,
chr start end reads
chr1, 6557,8567, 2333
ch10,9454,194595,1000
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk>
<bioconductor@r-project.org>
Sent: Monday, September 17, 2012 12:14 PM
Subject: RE: [BioC] ChiPseq input files?
Hi,
In RNA-seq, one knows where the regions of interest (i.e. exons) are,
so binning is straightforward. No such database of "regions of
interest" exists for ChIP-seq. Hence, peak-caller algorithms, to find
them.
IRanges/RangedData/GRanges etc are internal R objects, so you'll have
a hard time constructing such a thing in python. If disk space is a
major issue, you could try creating a .bed file from your .bam file,
and then read that in with e.g. read.bed() in BayesPeak, or import()
in rtracklayer.
J
________________________________________
Sent: 17 September 2012 16:55
To: Jonathan Cairns; bioconductor@r-project.org
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Thanks for your response. I just liked to use python instead of R to
generate these IRange data.
I looked over the introduction part of RNA-seq data and it seemed that
it just counted the read hits overlapped the annotated gene regions as
coded below,
and I am wondering if it was the similar things occurred for chip-seq
data. Thanks. John
gnModel <- exonsBy(txdb, "gene")
counter <- function(fl, gnModel)
{
aln <- readGappedAlignments(fl)
strand(aln) <- "*" # for strand-blind sample prep protocol
hits <- countOverlaps(aln, gnModel)
counts <- countOverlaps(gnModel, aln[hits==1])
names(counts) <- names(gnModel)
counts
}
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk>
<bioconductor@r-project.org>
Sent: Monday, September 17, 2012 11:35 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I'm afraid I don't understand your question. It sounds like you are
trying to bin the reads? This shouldn't be necessary, as both packages
do this for you. Was that your intended query?
Jonathan
________________________________________
om>]
Sent: 17 September 2012 15:59
To: Jonathan Cairns;
bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Thanks for your response and codes. That saves me a lot of time to
look over the webs. Your answers are great! but if I try to create a
table using python or other scripts and then input the table to R for
statistics, how can I decide the range (e.g. start and end) when I
count the reads in each position across the chromosomes/genome? Can
you give me more suggestions? Thanks.
Best,
John
________________________________
From: Jonathan Cairns
<jonathan.cairns@cancer.org.uk<mailto:jonathan.cairns@cancer.org.uk>>
>>; "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Sent: Monday, September 17, 2012 10:38 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I would try the Rsamtools package. You'd need something like this
(warning, untested code):
library(Rsamtools)
bamFile = "path/to/Bamfile.bam"
p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
bam <- scanBam(bamFile, param=p)[[1]]
BayesPeak accepts data.frames or RangedDatas. I would suggest the
easiest thing to do is construct a RangedData:
library(IRanges)
IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]])
x <- RangedData(ranges=IR, strand=bam[["strand"]],
space=bam[["rname"]])
chipseq accepts GRanges by preference:
library(GenomicRanges)
y <- GRanges(seqnames=bam[["rname"]], ranges=IR,
strand=bam[["strand"]])
There may be a faster/cleverer way of doing it, but this should work.
Jonathan
________________________________________
From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>
[bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>] On
Behalf Of John linux-user [johnlinux
Sent: 17 September 2012 15:04
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org><mail to:bioconductor@r-project.org<mailto:bioconductor@r-project.org="">>
Subject: [BioC] ChiPseq input files?
Hi,
I am wondering how to simply prepare the input files for R BayesPeak
and chipseq packages, assuming BAM files already generated by BWA and
samtools. Thanks.
John
[[alternative HTML version deleted]]
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
[[alternative HTML version deleted]]
see: http://genome.ucsc.edu/FAQ/FAQformat.html#format1 - each region
should represent a single mapped read.
Format 2 is insufficient to determine the original read locations. In
fact, so is format 1 as presented; I assume the 5 on the end of 94545
is a typo. Format 1 is also missing "strand", so if you have the
original .bam files, I'd suggest starting from those and sticking to
the bed format outlined above.
How to perform bam -> bed file conversion in python is not a
bioconductor-related question and is therefore outside of the scope of
this mailing list.
J
________________________________________
From: John linux-user [johnlinuxuser@yahoo.com]
Sent: 17 September 2012 17:37
To: Jonathan Cairns; bioconductor at r-project.org
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Your clarification is great and how to create the bed file and what
format the bed file would be is the exact question I like to ask, e.g
counting reads in each base position or in each regions. If in each
regions, how to decide the length of each region? Two specific example
below for two formats. It would be easy to count reads in format1, but
if format2, it would be hard to determine the range. Thanks for
further suggestions. Best, John
format 1,
chr start end reads
chr1,6557,6557, 233
ch10,9454,94545,100
format 2,
chr start end reads
chr1, 6557,8567, 2333
ch10,9454,194595,1000
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk>
To: John linux-user <johnlinuxuser at="" yahoo.com="">; "bioconductor at
r-project.org" <bioconductor at="" r-project.org="">
Sent: Monday, September 17, 2012 12:14 PM
Subject: RE: [BioC] ChiPseq input files?
Hi,
In RNA-seq, one knows where the regions of interest (i.e. exons) are,
so binning is straightforward. No such database of "regions of
interest" exists for ChIP-seq. Hence, peak-caller algorithms, to find
them.
IRanges/RangedData/GRanges etc are internal R objects, so you'll have
a hard time constructing such a thing in python. If disk space is a
major issue, you could try creating a .bed file from your .bam file,
and then read that in with e.g. read.bed() in BayesPeak, or import()
in rtracklayer.
J
________________________________________
From: John linux-user
[johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>]
Sent: 17 September 2012 16:55
To: Jonathan Cairns; bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Thanks for your response. I just liked to use python instead of R to
generate these IRange data.
I looked over the introduction part of RNA-seq data and it seemed that
it just counted the read hits overlapped the annotated gene regions as
coded below,
and I am wondering if it was the similar things occurred for chip-seq
data. Thanks. John
gnModel <- exonsBy(txdb, "gene")
counter <- function(fl, gnModel)
{
aln <- readGappedAlignments(fl)
strand(aln) <- "*" # for strand-blind sample prep protocol
hits <- countOverlaps(aln, gnModel)
counts <- countOverlaps(gnModel, aln[hits==1])
names(counts) <- names(gnModel)
counts
}
________________________________
From: Jonathan Cairns
<jonathan.cairns@cancer.org.uk<mailto:jonathan.cairns@cancer.org.uk>>
To: John linux-user <johnlinuxuser at="" yahoo.com<mailto:johnlinuxuser="" at="" yahoo.com="">>; "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>
Sent: Monday, September 17, 2012 11:35 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I'm afraid I don't understand your question. It sounds like you are
trying to bin the reads? This shouldn't be necessary, as both packages
do this for you. Was that your intended query?
Jonathan
________________________________________
From: John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuser@ya hoo.com=""><mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>>]
Sent: 17 September 2012 15:59
To: Jonathan Cairns; bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>
Subject: Re: [BioC] ChiPseq input files?
Hi Jonathan,
Thanks for your response and codes. That saves me a lot of time to
look over the webs. Your answers are great! but if I try to create a
table using python or other scripts and then input the table to R for
statistics, how can I decide the range (e.g. start and end) when I
count the reads in each position across the chromosomes/genome? Can
you give me more suggestions? Thanks.
Best,
John
________________________________
From: Jonathan Cairns <jonathan.cairns@cancer.org.uk<mailto:jonathan.c airns@cancer.org.uk=""><mailto:jonathan.cairns@cancer.org.uk<mailto:jonat han.cairns@cancer.org.uk="">>>
To: John linux-user <johnlinuxuser at="" yahoo.com<mailto:johnlinuxuser="" at="" yahoo.com=""><mailto:johnlinuxuser at="" yahoo.com<mailto:johnlinuxuser="" at="" yahoo.com="">>>; "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>>
Sent: Monday, September 17, 2012 10:38 AM
Subject: RE: [BioC] ChiPseq input files?
Hi John,
I would try the Rsamtools package. You'd need something like this
(warning, untested code):
library(Rsamtools)
bamFile = "path/to/Bamfile.bam"
p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
bam <- scanBam(bamFile, param=p)[[1]]
BayesPeak accepts data.frames or RangedDatas. I would suggest the
easiest thing to do is construct a RangedData:
library(IRanges)
IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]])
x <- RangedData(ranges=IR, strand=bam[["strand"]],
space=bam[["rname"]])
chipseq accepts GRanges by preference:
library(GenomicRanges)
y <- GRanges(seqnames=bam[["rname"]], ranges=IR,
strand=bam[["strand"]])
There may be a faster/cleverer way of doing it, but this should work.
Jonathan
________________________________________
From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-="" bounces@r-project.org="">><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-="" bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>>
[bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-="" bounces@r-project.org="">><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-="" bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>>] On
Behalf Of John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuse r@yahoo.com=""><mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo .com="">><mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com><
mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>>>]
Sent: 17 September 2012 15:04
To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>>
Subject: [BioC] ChiPseq input files?
Hi,
I am wondering how to simply prepare the input files for R BayesPeak
and chipseq packages, assuming BAM files already generated by BWA and
samtools. Thanks.
John
[[alternative HTML version deleted]]
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.