Dear Camila,
I have sent you the response to your earlier email. For your
convenience, I
also pasted here.
I think you could use the 1% accessibility data to modify the number
of
sites in your genome wide motif search number as I suggested below.
Alternatively, you could try set totalTest = peak1 + peak2 -
peaks.1and2
where peaks.1and2 is the number of peaks common to set1 and set2.
Best regards,
Julie
########### start of previous reply #######
For the ChIPpeakAnno paper, I have the peaks with all p-values.
If you do not have this kind of information, then the most
conservative
estimate of totalTest would be (peak1 + peak2) - peaks.1and2. The
smaller
the totalTest you set, the higher the p-value is. So the p-value you
obtain
will be the most conservative estimate. If this approach is too
conservative
for your purpose, then you could set the totalTest to be the number of
potential binding sites in the genome by searching for the motif
(could be
obtained from the literature or your peak sets). Certainly if
accessibility
data is available, then you could limit your search within those
accessible
regions.
Hope this helps.
Best regards,
Julie
On 8/21/13 1:40 PM, "Camila Lopez-anido" <lopezanido at="" wisc.edu="">
wrote:
> Hi again,?
>
> Based on findings from ENCODE (2012), another approach I thought of
was to
> assume that 1% (conservative estimate) of the rat genome contains
open
> chromatin... and use that number for the totalTest value? Since the
rat genome
> is?2,909,698,938bp, then totalTest (i.e. 1%) =?29096989.38
>
>
> However, the p-value = 0 and I'm not sure if I can publish this
p-value or
> would need to figure out a way to get at the actual p-value, even
though it's
> really really small?
>
>
> Any thoughts/insight would be appreciated!
>
>
> Thanks again,
> Camila?
>
> On 08/21/13, "Camila Lopez-anido" wrote:
>> Thanks!?
>>
>> I'm still considering different systematic approaches to determine
the
>> "totalTest" value, which I'm considering as the total possible
transcription
>> factor (TF) binding events in the rat genome. I saw that you
previously said
>> that one could add up the total number of peaks with p-value=1 or
>> FDR=1?(
http://permalink.gmane.org/gmane.science.biology.informatics
.conductor
>> /30115), but I don't have access to the raw data.
>>
>>
>> One idea I had was to *assume* that the total number of peaks with
p-value=1
>> is 1.5X greater than those called significant.
>> SO, if peak1 set = 7602 peaks, and peak2 set =?25335?peaks, then
could I use
>> totalTest = (7602 + 25335 = 32937) * 1.5 = 49405.5
>>
>>
>> If I use totalTest = (peak1 + peak2) * 1.5, then I could apply this
>> systematic approach to make different comparisons between various
datasets.
>> This seems like a better idea than arbitrarily picking a value
greater than
>> the largest peak data set.?Does this make sense, or would you
recommend a
>> different approach/assumptions depending on the size of the genome
(or area
>> of open chromatin accessible for TF binding)?
>>
>>
>> Also, I was trying to figure out how the value 1580 was chosen for
totalTest
>> in the Zhu (2010) ChIPpeakAnno manuscript? Was a systematic
approach used
>> similar to one I tried to come up with above?
>>
>>
>> Best,
>> Camila
>>
>>
>>
>> On 08/15/13, "Zhu, Lihua (Julie)"
>> wrote:
>>> Camila,
>>>
>>> TotalTest needs to be at least as large as the number of peaks in
>>> CNSallPeaks_rd and OPColig2Peaks_rd. For example, if there are
1000 peaks in
>>> CNSallPeaks_rd and 2000 peaks in OPColig2Peaks_rd, then totalTest
needs to
>>> be >= 2000.
>>>
>>> For details, please refer to an old post at
>>>
http://grokbase.com/t/r/bioconductor/123cz0jc0b/bioc-question-
about-makevenn
>>> diagram. Thanks!
>>>
>>> Best regards,
>>>
>>> Julie
>>>
>>>
>>> On 8/15/13 6:08 PM, "Camila Lopez-anido" <lopezanido at="" wisc.edu="">
wrote:
>>>
>>>> Hi,?
>>>>
>>>> I'm a graduate student at the UW-Madison, and I'm getting an
error message
>>>> when I use makeVennDiagram() -I was wondering if there is a
simple
>>>> solution? I
>>>> saw that someone else had asked about something related (similar
message)
>>>> online, but I couldn't find how they fixed the problem?
>>>>
>>>>
>>>> I inputed two RangedData files, which I successfully performed
>>>> findOverlappingPeaks() with:
>>>>
>>>>
>>>>> makeVennDiagram(RangedDataList(CNSallPeaks_rd,
OPColig2Peaks_rd),
>>>>> NameOfPeaks=c("CNSall","OPColig2"), maxgap=0, minoverlap=1,
totalTest=100,
>>>>> cex=1, counts.col="red", useFeature=FALSE)
>>>>
>>>>
>>>> #Error in seq.default(cnt, length.out = counts[i]) :
>>>> #length must be non-negative number
>>>> #In addition: Warning message:
>>>> #In phyper(p1.and.p2 - 1, p2, totalTest - p2, p1, lower.tail =
FALSE, :
>>>> #NaNs produced
>>>>
>>>>
>>>>
>>>> Is there a simple solution to this?
>>>>
>>>>
>>>> Thanks,
>>>> Camila Lopez-Anido