Entering edit mode
Paul Shannon
▴
750
@paul-shannon-5161
Last seen 10.3 years ago
Hi Jing,
I am including the Bioconductor email list so that we will have a
record of your question, and the answers we arrive at.
On Nov 18, 2012, at 5:32 PM, Jing Huang wrote:
> Hi Paul,
>
> I am wondering if this would be doable. I have a few genes that form
a
> complex. They have been seen over expressed in a variety of tumors
> simultaneously.
>
Do you hypothesize that their joint over-expression suggests that they
have common regulators?
> The package that you generated seems to fit the scenario to predict
the
> match between known transcription factor and genes. I would like to
> predict the transcription factors that are unknown.
One good approach here would be to find candidate regulatory regions
for each of the members of your complex. Bioc now has a
getPromoterSeq method, demonstrated at
http://bioconductor.org/help/workflows/gene-regulation-tfbs/. The
rGADEM package finds motifs de novo when given a number of sequences,
but this can be an expensive and inconclusive search when your
sequences are long, and if your genes are few.
The ENCODE project, and John Stam's group at UW in particular, have
produced a lot of new data, including DNase1 hypersensitivity regions
and footprints, and H3K4me methylation profiles, and transcription
factor binding sites. The can narrow your search considerably. In
short, we now know much more than we used to about what and where the
regulatory regions proximal to a gene seem to be. We have just begun
prototyping a means to provide easy access in Bioconductor to these
kinds of data.
Once you have some candidate transcription factor binding sequences,
the MotIV package (and the external program 'tomtom') can match them
against know motifs in MotifDb, often identifying transcription factor
candidates.
If you could clarify your question a bit, provide an example --
anonymizing the genes in your complex if need be -- we can try and
find specific techniques for you to use.
Please reply 'on-list' so that our discussion can be archived, and so
that others with advice can chip in.
- Paul
>
> Is there anyway it is doable?
>
> Many many thanks
>
> Jing
> On 10/8/12 8:38 PM, "Paul Shannon" <pshannon at="" fhcrc.org=""> wrote:
>
>> Hi Jing,
>>
>> This took WAY too long.
>>
>> But it is at last ready. Could you take a look? Give me comments?
>>
>> http://www.bioconductor.org/help/workflows/gene-regulation-tfbs/
>>
>> Thanks!
>>
>> - Paul
>>
>> On Jul 5, 2012, at 3:58 PM, Jing Huang wrote:
>>
>>> No hurry!
>>>
>>> Jing
>>>
>>> -----Original Message-----
>>> From: Paul Shannon [mailto:pshannon at fhcrc.org]
>>> Sent: Thursday, July 05, 2012 3:43 PM
>>> To: Jing Huang
>>> Cc: Paul Shannon
>>> Subject: Re: promoter prediction
>>>
>>> Hi Jing,
>>>
>>> Should have something ready by the end of next week.
>>>
>>> Sorry it's taken so long!
>>>
>>> - Paul
>>>
>>> On Jul 5, 2012, at 3:41 PM, Jing Huang wrote:
>>>
>>>> Hi Paul,
>>>>
>>>> Are you still going to write the package for promoter prediction?
I
>>>> have been very busy with bench work and not been able to study
this.
>>>>
>>>> It will be nice if you could write the package and present at
BioC12
>>>> meeting by the end of this month.
>>>>
>>>> Jing
>>>>
>>>> -----Original Message-----
>>>> From: Paul Shannon [mailto:pshannon at fhcrc.org]
>>>> Sent: Tuesday, June 12, 2012 12:53 PM
>>>> To: Jing Huang
>>>> Cc: Paul Shannon
>>>> Subject: Re: promoter prediction
>>>>
>>>> Cool!
>>>>
>>>> On Jun 12, 2012, at 12:46 PM, Jing Huang wrote:
>>>>
>>>>> Figured it out on this one.
>>>>>
>>>>> Jing
>>>>>
>>>>> On 6/12/12 11:51 AM, "Paul Shannon" <pshannon at="" fhcrc.org="">
wrote:
>>>>>
>>>>>> It's an odd error.
>>>>>>
>>>>>> Try this:
>>>>>>
>>>>>> ?load
>>>>>> ?save
>>>>>>
>>>>>> Once you understand them, ask yourself, hmmm, what could be
wrong
>>>>>> here?
>>>>>>
>>>>>> (I am trying to teach you to fish, rather than just GIVE you
fish!)
>>>>>>
>>>>>> - Paul
>>>>>>
>>>>>> On Jun 12, 2012, at 11:48 AM, Jing Huang wrote:
>>>>>>
>>>>>>> Hi Paul,
>>>>>>>
>>>>>>> What does this mean?
>>>>>>>
>>>>>>>> if (!exists ('e2f3'))
>>>>>>> + load ('symbolsToGeneIDs.RData', envir=.GlobalEnv)
>>>>>>> Error: segfault from C stack overflow
>>>>>>>
>>>>>>> Many Thanks
>>>>>>>
>>>>>>> Jing
>>>>>>>
>>>>>>> From: Paul Shannon <pshannon at="" fhcrc.org="">
>>>>>>> To: Jing Huang <huangji at="" ohsu.edu="">
>>>>>>> Cc: Paul Shannon <pshannon at="" fhcrc.org="">
>>>>>>> Subject: Re: promoter prediction
>>>>>>>
>>>>>>> Hi Jing,
>>>>>>>
>>>>>>> Learning to install software will be a good thing to learn.
It's a
>>>>>>> basic part of any bioinformatician's work!
>>>>>>>
>>>>>>> If you look at this page:
>>>>>>>
>>>>>>> http://meme.sdsc.edu/meme/meme-download.html
>>>>>>>
>>>>>>> You will see a link to 'installation instructions'. That
would be a
>>>>>>> good place to begin.
>>>>>>>
>>>>>>> I apologize, I forgot to include this file. Put it in your
working
>>>>>>> directory:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Treat each puzzle you encounter as an opportunity to learn!
>>>>>>>
>>>>>>> - Paul
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Jun 12, 2012, at 9:08 AM, Jing Huang wrote:
>>>>>>>
>>>>>>>> HI Paul,
>>>>>>>>
>>>>>>>> I am having trouble to down load MEME. I guess I am not sure
what
>>>>>>>> to
>>>>>>> down load. In order to run MEME, It seems that they require
Perl or
>>>>>>> Python software? I don't have knowledge on those.
>>>>>>>>
>>>>>>>> I have tried to run your scripts and run into errors:
>>>>>>>>
>>>>>>>>> if (!exists ('e2f3'))
>>>>>>>> + load ('symbolsToGeneIDs.RData', envir=.GlobalEnv)
>>>>>>>> Error in readChar(con, 5L, useBytes = TRUE) : cannot open the
>>>>>>> connection
>>>>>>>> In addition: Warning message:
>>>>>>>> In readChar(con, 5L, useBytes = TRUE) :
>>>>>>>> cannot open compressed file 'symbolsToGeneIDs.RData',
probable
>>>>>>> reason 'No such file or directory'
>>>>>>>>
>>>>>>>>
>>>>>>>> Not sure what this means. I am wondering what else do my
computer
>>>>>>> need to be installed.
>>>>>>>>
>>>>>>>>
>>>>>>>> Many thanks
>>>>>>>>
>>>>>>>> Jing
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Paul Shannon <pshannon at="" fhcrc.org="">
>>>>>>>> To: Jing Huang <huangji at="" ohsu.edu="">
>>>>>>>> Cc: Paul Shannon <pshannon at="" fhcrc.org="">
>>>>>>>> Subject: Re: promoter prediction
>>>>>>>>
>>>>>>>> Hi Jing,
>>>>>>>>
>>>>>>>> My boss has some other plans for me this week :} so I am
sending
>>>>>>>> this
>>>>>>> to you tonight, giving you (I think) plenty to work on, to
study,
>>>>>>> and to
>>>>>>> comprehend.
>>>>>>>>
>>>>>>>> What I include below is all you need for finding enriched
motifs in
>>>>>>> the promoters of your genes.
>>>>>>>>
>>>>>>>> What is NOT included is finding out the transcription factors
which
>>>>>>> match those motifs. Learn all of what's here, then you will
be
>>>>>>> ready
>>>>>>> for MotIV and my new MotifDb -- which should be ready to use
by the
>>>>>>> end
>>>>>>> of the week.
>>>>>>>>
>>>>>>>> There is one file attached, a somewhat improvised R script.
It
>>>>>>>> runs,
>>>>>>> but it is not in a style you should emulate. But there's lots
to
>>>>>>> learn
>>>>>>> if you study it, line by line, until everything makes complete
>>>>>>> sense to
>>>>>>> you. Please do that!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's how to run the script
>>>>>>>> 1) Install all the libraries mentioned in the file. For
instance,:
>>>>>>>> biocLite (c ('org.Hs.eg.db',
'BSgenome.Hsapiens.UCSC.hg19',
>>>>>>> 'GenomicFeatures', 'TxDb.Hsapiens.UCSC.hg19.knownGene'))
>>>>>>>> 2) install meme; fix the path to meme in the script so that
it
>>>>>>> matches where the meme executable is on your computer
>>>>>>>> 3) source ('go.R'); run ('redo')
>>>>>>>>
>>>>>>>> meme takes maybe 20 minutes to run on my laptop.
>>>>>>>>
>>>>>>>> Having found these motifs, the next step is to use tom-tom,
or
>>>>>>> (better yet) Bioconductor package MotIV and my new MotifDb.
>>>>>>>> Be aware: the pvalues of these enrichments is not very
strong.
>>>>>>>>
>>>>>>>> Please study the script, run meme, and get really familiar
with it
>>>>>>> all. Send me questions if you have them. Then run MotIV with
>>>>>>> built-in
>>>>>>> jaspar matrices, comparing the enriched motifs meme found, to
the
>>>>>>> jaspar
>>>>>>> matrices.
>>>>>>>>
>>>>>>>> - Paul
>>>>>>>>
>>>>>>>> <pastedgraphic-1.png>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jun 8, 2012, at 2:48 PM, Jing Huang wrote:
>>>>>>>>
>>>>>>>>> Hi Paul,
>>>>>>>>>
>>>>>>>>> Here is the list but only to you. MCM2,MCM3,MCM4,MCM5,MCM6,
>>>>>>> MCM7,MCM8. The corresponding ENTREZ ID are,
>>>>>>> 4171,4172,4173,4174,4175,4176,84515.
>>>>>>>>>
>>>>>>>>> I will play with the meme as your email suggested.
>>>>>>>>>
>>>>>>>>> Have a nice weekend
>>>>>>>>>
>>>>>>>>> Jing
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Paul Shannon [mailto:pshannon at fhcrc.org]
>>>>>>>>> Sent: Friday, June 08, 2012 2:40 PM
>>>>>>>>> To: Jing Huang
>>>>>>>>> Cc: Paul Shannon
>>>>>>>>> Subject: Re: promoter prediction
>>>>>>>>>
>>>>>>>>> Well, two promoters are not enough of a sample in which to
find
>>>>>>> motif enrichments. I'll dredge up an example dataset from
>>>>>>> elsewhere.
>>>>>>>>>
>>>>>>>>> In preparation, you could install meme, and seeing if you
can
>>>>>>>>> adapt
>>>>>>> the 'get.promoter' function I sent you, for arabidopsis, to
human.
>>>>>>>>>
>>>>>>>>> I will have a human demo ready mid-week next week.
>>>>>>>>>
>>>>>>>>> - Paul
>>>>>>>>>
>>>>>>>>> On Jun 8, 2012, at 2:36 PM, Jing Huang wrote:
>>>>>>>>>
>>>>>>>>>> I don't remember what the inputs were. Somebody posted a
question
>>>>>>> on the package to our mailing group and I saw it and played
with a
>>>>>>> little bit.
>>>>>>>>>>
>>>>>>>>>> The list of gene is confidential. How about I only give you
two
>>>>>>>>>> of
>>>>>>> them MCM2 and MCM3. The correspond ENTREZ ID are 4171 and
4172.
>>>>>>>>>>
>>>>>>>>>> I hope this is enough information.
>>>>>>>>>>
>>>>>>>>>> Jing
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Paul Shannon [mailto:pshannon at fhcrc.org]
>>>>>>>>>> Sent: Friday, June 08, 2012 2:19 PM
>>>>>>>>>> To: Jing Huang
>>>>>>>>>> Cc: Paul Shannon
>>>>>>>>>> Subject: Re: promoter prediction
>>>>>>>>>>
>>>>>>>>>> Hi Jing,
>>>>>>>>>>
>>>>>>>>>> Do you know what inputs are used for the package you are
trying
>>>>>>>>>> to
>>>>>>> remember? I cannot think what it would be.
>>>>>>>>>>
>>>>>>>>>> Also (I asked this before :}) do you have a list of
specific
>>>>>>> co-regulated genes? Are they confidential? If not, please
sent me
>>>>>>> that
>>>>>>> list.
>>>>>>>>>>
>>>>>>>>>> - Paul
>>>>>>>>>>
>>>>>>>>>> On Jun 8, 2012, at 2:16 PM, Jing Huang wrote:
>>>>>>>>>>
>>>>>>>>>>> HI Paul,
>>>>>>>>>>>
>>>>>>>>>>> I am still studying the a few packages related to predict
the
>>>>>>> shared transcription factor and waiting for you for the new
advanced
>>>>>>> package to be released.
>>>>>>>>>>>
>>>>>>>>>>> There is a BIoC package that allows me to predict
promoters. I
>>>>>>> have played with it but don't remember the name of the
package. Do
>>>>>>> you
>>>>>>> know there is such package by any chance.
>>>>>>>>>>>
>>>>>>>>>>> Many thanks
>>>>>>>>>>>
>>>>>>>>>>> Jing
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> <pastedgraphic-1.png>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>