Hi all,
I wanted to get the trans factor sites that affect a set of genes. Is
there any package in bioconductor that will enable me to do this?
thanks!
[[alternative HTML version deleted]]
On Wed, Oct 7, 2009 at 12:08 PM, Tim Smith <tim_smith_666@yahoo.com>
wrote:
> Hi all,
>
> I wanted to get the trans factor sites that affect a set of genes.
Is there
> any package in bioconductor that will enable me to do this?
>
I don't know of any package that does this directly, but here are some
tips.
If you have access to the (not free) transfac database, these
functions will
read in the database and profile (PRF) files:
readTransFac <- function(con) {
getField <- function(name) {
name <- paste("^", name, sep = "")
sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])
}
lines <- readLines(con)
nms <- getField("ID")
npos <- getField("MATR_LENGTH")
mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]",
lines)]))
f <- file()
writeLines(mats, f)
mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G",
"T")))
close(f)
matlist <- split.data.frame(mattab, rep(seq_along(npos),
as.integer(npos)))
matlist <- lapply(matlist, t) ## OUCH -- slow step
names(matlist) <- nms
attr(matlist, "labels") <- getField("NA")
attr(matlist, "threshold") <- getField("THRESHOLD")
matlist
}
readPRF <- function(con) {
read.table(con, skip = 4, comment.char = "/",
col.names = c("A", "B", "cutoff", "AC", "ID"))
}
You can use these like this:
> transfac <- readTransFac("transfac/matrixTFP92.lib")
> muscle <- readPRF("transfac/muscle_specific.prf")
> pwm <- transfac[as.character(muscle$ID)]
Then 'pwm' is a list of matrices. You can then find the hits to a
genome
using Biostrings:
> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%")
Now 'hits' represents the hits of the first PWM against Human
chromosome 1,
at 90% cutoff.
You can convert that to an IRanges object:
> ir <- as(hits, "IRanges")
And then use that with the overlap() function in IRanges, along with
some
gene annotations, like those from the GenomicFeatures package (an
experimental data package) to find associations with genes.
> library(GenomicFeatures)
> data(geneHuman)
> trans <- transcripts(geneHuman)
> hitsInPromoters <- ir[trans[1]$promoter]
To find the promoter (+/- 500bp from TSS) hits on chr1.
Most of this code is not tested, but it should serve as a nice
outline.
Michael
thanks!
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
Thanks Michael. I hadn't realized that Transfac is not free. Do you
know of any free databases that might work? -- thanks!
________________________________
From: Michael Lawrence <mflawren@fhcrc.org>
Cc: bioc <bioconductor@stat.math.ethz.ch>
Sent: Wed, October 7, 2009 6:33:57 PM
Subject: Re: [BioC] Trans sites associated with a gene (Transfac
database?)
>Hi all,
>
>>I wanted to get the trans factor sites that affect a set of genes.
Is there any package in bioconductor that will enable me to do this?
>
I don't know of any package that does this directly, but here are some
tips.
If you have access to the (not free) transfac database, these
functions will read in the database and profile (PRF) files:
readTransFac <- function(con) {
getField <- function(name) {
name <- paste("^", name, sep = "")
sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])
}
lines <- readLines(con)
nms <- getField("ID")
npos <- getField("MATR_LENGTH")
mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]",
lines)]))
f <- file()
writeLines(mats, f)
mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G",
"T")))
close(f)
matlist <- split.data.frame(mattab, rep(seq_along(npos),
as.integer(npos)))
matlist <- lapply(matlist, t) ## OUCH -- slow step
names(matlist) <- nms
attr(matlist, "labels") <- getField("NA")
attr(matlist, "threshold") <- getField("THRESHOLD")
matlist
}
readPRF <- function(con) {
read.table(con, skip = 4, comment.char = "/",
col.names = c("A", "B", "cutoff", "AC", "ID"))
}
You can use these like this:
> transfac <- readTransFac("transfac/matrixTFP92.lib")
> muscle <- readPRF("transfac/muscle_specific.prf")
> pwm <- transfac[as.character(muscle$ID)]
Then 'pwm' is a list of matrices. You can then find the hits to a
genome using Biostrings:
> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%")
Now 'hits' represents the hits of the first PWM against Human
chromosome 1, at 90% cutoff.
You can convert that to an IRanges object:
> ir <- as(hits, "IRanges")
And then use that with the overlap() function in IRanges, along with
some gene annotations, like those from the GenomicFeatures package (an
experimental data package) to find associations with genes.
> library(GenomicFeatures)
> data(geneHuman)
> trans <- transcripts(geneHuman)
> hitsInPromoters <- ir[trans[1]$promoter]
To find the promoter (+/- 500bp from TSS) hits on chr1.
Most of this code is not tested, but it should serve as a nice
outline.
Michael
>thanks!
>
>
>
>> [[alternative HTML version deleted]]
>
>>_______________________________________________
>>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
On Thu, Oct 8, 2009 at 9:00 AM, Tim Smith <tim_smith_666@yahoo.com>
wrote:
>
>
> Thanks Michael. I hadn't realized that Transfac is not free. Do you
know of
> any free databases that might work? -- thanks!
>
>
You might try:
http://jaspar.genereg.net/
Sean
> >Hi all,
> >
> >>I wanted to get the trans factor sites that affect a set of genes.
Is
> there any package in bioconductor that will enable me to do this?
> >
>
> I don't know of any package that does this directly, but here are
some
> tips.
>
> If you have access to the (not free) transfac database, these
functions
> will read in the database and profile (PRF) files:
>
> readTransFac <- function(con) {
> getField <- function(name) {
> name <- paste("^", name, sep = "")
> sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])
> }
> lines <- readLines(con)
> nms <- getField("ID")
> npos <- getField("MATR_LENGTH")
> mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]",
> lines)]))
> f <- file()
> writeLines(mats, f)
> mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G",
"T")))
> close(f)
> matlist <- split.data.frame(mattab, rep(seq_along(npos),
> as.integer(npos)))
> matlist <- lapply(matlist, t) ## OUCH -- slow step
> names(matlist) <- nms
> attr(matlist, "labels") <- getField("NA")
> attr(matlist, "threshold") <- getField("THRESHOLD")
> matlist
> }
>
> readPRF <- function(con) {
> read.table(con, skip = 4, comment.char = "/",
> col.names = c("A", "B", "cutoff", "AC", "ID"))
> }
>
> You can use these like this:
>
> > transfac <- readTransFac("transfac/matrixTFP92.lib")
> > muscle <- readPRF("transfac/muscle_specific.prf")
> > pwm <- transfac[as.character(muscle$ID)]
>
> Then 'pwm' is a list of matrices. You can then find the hits to a
genome
> using Biostrings:
>
> > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%")
>
> Now 'hits' represents the hits of the first PWM against Human
chromosome 1,
> at 90% cutoff.
>
> You can convert that to an IRanges object:
>
> > ir <- as(hits, "IRanges")
>
> And then use that with the overlap() function in IRanges, along with
some
> gene annotations, like those from the GenomicFeatures package (an
> experimental data package) to find associations with genes.
>
> > library(GenomicFeatures)
> > data(geneHuman)
> > trans <- transcripts(geneHuman)
> > hitsInPromoters <- ir[trans[1]$promoter]
>
> To find the promoter (+/- 500bp from TSS) hits on chr1.
>
> Most of this code is not tested, but it should serve as a nice
outline.
>
> Michael
>
>
>
> >thanks!
> >
> >
> >
> >> [[alternative HTML version deleted]]
> >
> >>_______________________________________________
> >>Bioconductor mailing list
> >Bioconductor@stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
The free Transfac is very old. BioBase distributes current versions
via
subscription. Have you looked at Jaspar?
http://jaspar.cgb.ki.se/. I use that with Emboss.
David
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Tim
Smith
Sent: Thursday, October 08, 2009 9:00 AM
To: Michael Lawrence
Cc: bioc
Subject: Re: [BioC] Trans sites associated with a gene (Transfac
database?)
Thanks Michael. I hadn't realized that Transfac is not free. Do you
know
of any free databases that might work? -- thanks!
________________________________
From: Michael Lawrence <mflawren@fhcrc.org>
Cc: bioc <bioconductor at="" stat.math.ethz.ch="">
Sent: Wed, October 7, 2009 6:33:57 PM
Subject: Re: [BioC] Trans sites associated with a gene (Transfac
database?)
>Hi all,
>
>>I wanted to get the trans factor sites that affect a set of genes.
Is
there any package in bioconductor that will enable me to do this?
>
I don't know of any package that does this directly, but here are some
tips.
If you have access to the (not free) transfac database, these
functions
will read in the database and profile (PRF) files:
readTransFac <- function(con) {
getField <- function(name) {
name <- paste("^", name, sep = "")
sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])
}
lines <- readLines(con)
nms <- getField("ID")
npos <- getField("MATR_LENGTH")
mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]",
lines)]))
f <- file()
writeLines(mats, f)
mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G",
"T")))
close(f)
matlist <- split.data.frame(mattab, rep(seq_along(npos),
as.integer(npos)))
matlist <- lapply(matlist, t) ## OUCH -- slow step
names(matlist) <- nms
attr(matlist, "labels") <- getField("NA")
attr(matlist, "threshold") <- getField("THRESHOLD")
matlist
}
readPRF <- function(con) {
read.table(con, skip = 4, comment.char = "/",
col.names = c("A", "B", "cutoff", "AC", "ID"))
}
You can use these like this:
> transfac <- readTransFac("transfac/matrixTFP92.lib")
> muscle <- readPRF("transfac/muscle_specific.prf")
> pwm <- transfac[as.character(muscle$ID)]
Then 'pwm' is a list of matrices. You can then find the hits to a
genome
using Biostrings:
> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%")
Now 'hits' represents the hits of the first PWM against Human
chromosome
1, at 90% cutoff.
You can convert that to an IRanges object:
> ir <- as(hits, "IRanges")
And then use that with the overlap() function in IRanges, along with
some gene annotations, like those from the GenomicFeatures package (an
experimental data package) to find associations with genes.
> library(GenomicFeatures)
> data(geneHuman)
> trans <- transcripts(geneHuman)
> hitsInPromoters <- ir[trans[1]$promoter]
To find the promoter (+/- 500bp from TSS) hits on chr1.
Most of this code is not tested, but it should serve as a nice
outline.
Michael
>thanks!
>
>
>
>> [[alternative HTML version deleted]]
>
>>_______________________________________________
>>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Tim
You could try the tools at http://www.dcode.org.
I have used the DiRE tool previously to find TF sites associated with
a gene list (well TF sites enriched in a genelist). As far as I can
remember they use TRANSFAC Pro (ie non-free) as their TF database.
Cheers
Iain
--- On Thu, 8/10/09, Tim Smith <tim_smith_666@yahoo.com> wrote:
From: Tim Smith <tim_smith_666@yahoo.com>
Subject: Re: [BioC] Trans sites associated with a gene (Transfac
database?)
To: "Michael Lawrence" <mflawren@fhcrc.org>
Cc: "bioc" <bioconductor@stat.math.ethz.ch>
Date: Thursday, 8 October, 2009, 2:00 PM
Thanks Michael. I hadn't realized that Transfac is not free. Do you
know of any free databases that might work? -- thanks!
________________________________
From: Michael Lawrence <mflawren@fhcrc.org>
Cc: bioc <bioconductor@stat.math.ethz.ch>
Sent: Wed, October 7, 2009 6:33:57 PM
Subject: Re: [BioC] Trans sites associated with a gene (Transfac
database?)
>Hi all,
>
>>I wanted to get the trans factor sites that affect a set of genes.
Is there any package in bioconductor that will enable me to do this?
>
I don't know of any package that does this directly, but here are some
tips.
If you have access to the (not free) transfac database, these
functions will read in the database and profile (PRF) files:
readTransFac <- function(con) {
getField <- function(name) {
name <- paste("^", name, sep = "")
sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])
}
lines <- readLines(con)
nms <- getField("ID")
npos <- getField("MATR_LENGTH")
mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]",
lines)]))
f <- file()
writeLines(mats, f)
mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G",
"T")))
close(f)
matlist <- split.data.frame(mattab, rep(seq_along(npos),
as.integer(npos)))
matlist <- lapply(matlist, t) ## OUCH -- slow step
names(matlist) <- nms
attr(matlist, "labels") <- getField("NA")
attr(matlist, "threshold") <- getField("THRESHOLD")
matlist
}
readPRF <- function(con) {
read.table(con, skip = 4, comment.char = "/",
col.names = c("A", "B", "cutoff", "AC", "ID"))
}
You can use these like this:
> transfac <- readTransFac("transfac/matrixTFP92.lib")
> muscle <- readPRF("transfac/muscle_specific.prf")
> pwm <- transfac[as.character(muscle$ID)]
Then 'pwm' is a list of matrices. You can then find the hits to a
genome using Biostrings:
> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%")
Now 'hits' represents the hits of the first PWM against Human
chromosome 1, at 90% cutoff.
You can convert that to an IRanges object:
> ir <- as(hits, "IRanges")
And then use that with the overlap() function in IRanges, along with
some gene annotations, like those from the GenomicFeatures package (an
experimental data package) to find associations with genes.
> library(GenomicFeatures)
> data(geneHuman)
> trans <- transcripts(geneHuman)
> hitsInPromoters <- ir[trans[1]$promoter]
To find the promoter (+/- 500bp from TSS) hits on chr1.
Most of this code is not tested, but it should serve as a nice
outline.
Michael
>thanks!
>
>
>
>> [[alternative HTML version deleted]]
>
>>_______________________________________________
>>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]
Thanks all ! I'll try the suggested sites. Thank you all very much!
________________________________
From: Iain Gallagher <iaingallagher@btopenworld.com>
Cc: bioconductor@stat.math.ethz.ch
Sent: Thu, October 8, 2009 11:19:07 AM
Subject: Re: [BioC] Trans sites associated with a gene (Transfac
database?)
Hi Tim
You could try the tools at http://www.dcode.org.
I have used the DiRE tool previously to find TF sites associated with
a gene list (well TF sites enriched in a genelist). As far as I can
remember they use TRANSFAC Pro (ie non-free) as their TF database.
Cheers
Iain
>Subject: Re: [BioC] Trans sites associated with a gene (Transfac
database?)
>To: "Michael Lawrence" <mflawren@fhcrc.org>
>Cc: "bioc" <bioconductor@stat.math.ethz.ch>
>Date: Thursday, 8 October, 2009, 2:00 PM
>
>
>
>
>Thanks Michael. I hadn't realized that Transfac is not free.
[[elided Yahoo spam]]
>
>
>________________________________
>From: Michael Lawrence <mflawren@fhcrc.org>
>
>Cc: bioc <bioconductor@stat.math.ethz.ch>
>Sent: Wed, October 7, 2009 6:33:57 PM
>Subject: Re: [BioC] Trans sites associated with a gene (Transfac
database?)
>
>
>
>
>
>
>>Hi all,
>>
>>>I wanted to get the trans factor sites that affect a set of genes.
Is there any package in bioconductor that will enable me to do this?
>>
>
>I don't know of any package that does this directly, but here are
some tips.
>
>If you have access to the (not free) transfac database, these
functions will read in the database and profile (PRF)
> files:
>
>readTransFac <- function(con) {
> getField <- function(name) {
> name <- paste("^", name, sep = "")
> sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])
> }
> lines <- readLines(con)
> nms <- getField("ID")
> npos <- getField("MATR_LENGTH")
> mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]",
lines)]))
> f <- file()
> writeLines(mats, f)
> mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G",
"T")))
> close(f)
> matlist <- split.data.frame(mattab, rep(seq_along(npos),
as.integer(npos)))
> matlist <- lapply(matlist, t) ## OUCH -- slow step
> names(matlist) <- nms
> attr(matlist, "labels") <- getField("NA")
> attr(matlist, "threshold") <- getField("THRESHOLD")
> matlist
>}
>
>readPRF <- function(con)
> {
> read.table(con, skip = 4, comment.char = "/",
> col.names = c("A", "B", "cutoff", "AC", "ID"))
>}
>
>You can use these like this:
>
>> transfac <- readTransFac("transfac/matrixTFP92.lib")
>> muscle <- readPRF("transfac/muscle_specific.prf")
>> pwm <- transfac[as.character(muscle$ID)]
>
>Then 'pwm' is a list of matrices. You can then find the hits to a
genome using Biostrings:
>
>> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%")
>
>Now 'hits' represents the hits of the first PWM against Human
chromosome 1, at 90% cutoff.
>
>You can convert that to an IRanges object:
>
>> ir <- as(hits, "IRanges")
>
>And then use that with the overlap() function in IRanges, along with
some gene annotations, like those from the GenomicFeatures package (an
experimental data package) to find associations with genes.
>
>>
> library(GenomicFeatures)
>> data(geneHuman)
>> trans <- transcripts(geneHuman)
>> hitsInPromoters <- ir[trans[1]$promoter]
>
>To find the promoter (+/- 500bp from TSS) hits on chr1.
>
>Most of this code is not tested, but it should serve as a nice
outline.
>
>Michael
>
>
>
>>thanks!
>>
>>
>>
>>> [[alternative HTML version deleted]]
>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>Bioconductor@stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
>
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]