Question

ChIPpeakAnno-makeVennDiagram question

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 11 months ago

United States

Camila, TotalTest needs to be at least as large as the number of peaks in CNSallPeaks_rd and OPColig2Peaks_rd. For example, if there are 1000 peaks in CNSallPeaks_rd and 2000 peaks in OPColig2Peaks_rd, then totalTest needs to be >= 2000. For details, please refer to an old post at http://grokbase.com/t/r/bioconductor/123cz0jc0b/bioc-question-about- makevenn diagram. Thanks! Best regards, Julie On 8/15/13 6:08 PM, "Camila Lopez-anido" <lopezanido at="" wisc.edu=""> wrote: > Hi,? > > I'm a graduate student at the UW-Madison, and I'm getting an error message > when I use makeVennDiagram() -I was wondering if there is a simple solution? I > saw that someone else had asked about something related (similar message) > online, but I couldn't find how they fixed the problem? > > > I inputed two RangedData files, which I successfully performed > findOverlappingPeaks() with: > > >> makeVennDiagram(RangedDataList(CNSallPeaks_rd, OPColig2Peaks_rd), >> NameOfPeaks=c("CNSall","OPColig2"), maxgap=0, minoverlap=1, totalTest=100, >> cex=1, counts.col="red", useFeature=FALSE) > > > #Error in seq.default(cnt, length.out = counts[i]) : > #length must be non-negative number > #In addition: Warning message: > #In phyper(p1.and.p2 - 1, p2, totalTest - p2, p1, lower.tail = FALSE, : > #NaNs produced > > > > Is there a simple solution to this? > > > Thanks, > Camila Lopez-Anido

• 1.1k views

ADD COMMENT • link 11.1 years ago Julie Zhu ★ 4.3k

score 0 · Answer 1 · 2013-08-21

Dear Camila, For the ChIPpeakAnno paper, I have the peaks with all p-values. If you do not have this kind of information, then the most conservative estimate of totalTest would be (peak1 + peak2) - peaks.1and2. The smaller the totalTest you set, the higher the p-value is. So the p-value you obtain will be the most conservative estimate. If this approach is too conservative for your purpose, then you could set the totalTest to be the number of potential binding sites in the genome by searching for the motif (could be obtained from the literature or your peak sets). Certainly if accessibility data is available, then you could limit your search within those accessible regions. Hope this helps. Best regards, Julie On 8/21/13 12:39 PM, "Camila Lopez-anido" <lopezanido at="" wisc.edu=""> wrote: > Thanks!? > > I'm still considering different systematic approaches to determine the > "totalTest" value, which I'm considering as the total possible transcription > factor (TF) binding events in the rat genome. I saw that you previously said > that one could add up the total number of peaks with p-value=1 or > FDR=1?(http://permalink.gmane.org/gmane.science.biology.informatics. conductor/ > 30115), but I don't have access to the raw data. > > > One idea I had was to *assume* that the total number of peaks with p-value=1 > is 1.5X greater than those called significant. > SO, if peak1 set = 7602 peaks, and peak2 set =?25335?peaks, then could I use > totalTest = (7602 + 25335 = 32937) * 1.5 = 49405.5 > > > If I use totalTest = (peak1 + peak2) * 1.5, then I could apply this systematic > approach to make different comparisons between various datasets. This seems > like a better idea than arbitrarily picking a value greater than the largest > peak data set.?Does this make sense, or would you recommend a different > approach/assumptions depending on the size of the genome (or area of open > chromatin accessible for TF binding)? > > > Also, I was trying to figure out how the value 1580 was chosen for totalTest > in the Zhu (2010) ChIPpeakAnno manuscript? Was a systematic approach used > similar to one I tried to come up with above? > > > Best, > Camila > > > > On 08/15/13, "Zhu, Lihua (Julie)" > wrote: >> Camila, >> >> TotalTest needs to be at least as large as the number of peaks in >> CNSallPeaks_rd and OPColig2Peaks_rd. For example, if there are 1000 peaks in >> CNSallPeaks_rd and 2000 peaks in OPColig2Peaks_rd, then totalTest needs to >> be >= 2000. >> >> For details, please refer to an old post at >> http://grokbase.com/t/r/bioconductor/123cz0jc0b/bioc-question- about-makevenn >> diagram. Thanks! >> >> Best regards, >> >> Julie >> >> >> On 8/15/13 6:08 PM, "Camila Lopez-anido" <lopezanido at="" wisc.edu=""> wrote: >> >>> Hi,? >>> >>> I'm a graduate student at the UW-Madison, and I'm getting an error message >>> when I use makeVennDiagram() -I was wondering if there is a simple solution? >>> I >>> saw that someone else had asked about something related (similar message) >>> online, but I couldn't find how they fixed the problem? >>> >>> >>> I inputed two RangedData files, which I successfully performed >>> findOverlappingPeaks() with: >>> >>> >>>> makeVennDiagram(RangedDataList(CNSallPeaks_rd, OPColig2Peaks_rd), >>>> NameOfPeaks=c("CNSall","OPColig2"), maxgap=0, minoverlap=1, totalTest=100, >>>> cex=1, counts.col="red", useFeature=FALSE) >>> >>> >>> #Error in seq.default(cnt, length.out = counts[i]) : >>> #length must be non-negative number >>> #In addition: Warning message: >>> #In phyper(p1.and.p2 - 1, p2, totalTest - p2, p1, lower.tail = FALSE, : >>> #NaNs produced >>> >>> >>> >>> Is there a simple solution to this? >>> >>> >>> Thanks, >>> Camila Lopez-Anido

score 0 · Answer 2 · 2013-08-21

Dear Camila, I have sent you the response to your earlier email. For your convenience, I also pasted here. I think you could use the 1% accessibility data to modify the number of sites in your genome wide motif search number as I suggested below. Alternatively, you could try set totalTest = peak1 + peak2 - peaks.1and2 where peaks.1and2 is the number of peaks common to set1 and set2. Best regards, Julie ########### start of previous reply ####### For the ChIPpeakAnno paper, I have the peaks with all p-values. If you do not have this kind of information, then the most conservative estimate of totalTest would be (peak1 + peak2) - peaks.1and2. The smaller the totalTest you set, the higher the p-value is. So the p-value you obtain will be the most conservative estimate. If this approach is too conservative for your purpose, then you could set the totalTest to be the number of potential binding sites in the genome by searching for the motif (could be obtained from the literature or your peak sets). Certainly if accessibility data is available, then you could limit your search within those accessible regions. Hope this helps. Best regards, Julie On 8/21/13 1:40 PM, "Camila Lopez-anido" <lopezanido at="" wisc.edu=""> wrote: > Hi again,? > > Based on findings from ENCODE (2012), another approach I thought of was to > assume that 1% (conservative estimate) of the rat genome contains open > chromatin... and use that number for the totalTest value? Since the rat genome > is?2,909,698,938bp, then totalTest (i.e. 1%) =?29096989.38 > > > However, the p-value = 0 and I'm not sure if I can publish this p-value or > would need to figure out a way to get at the actual p-value, even though it's > really really small? > > > Any thoughts/insight would be appreciated! > > > Thanks again, > Camila? > > On 08/21/13, "Camila Lopez-anido" wrote: >> Thanks!? >> >> I'm still considering different systematic approaches to determine the >> "totalTest" value, which I'm considering as the total possible transcription >> factor (TF) binding events in the rat genome. I saw that you previously said >> that one could add up the total number of peaks with p-value=1 or >> FDR=1?(http://permalink.gmane.org/gmane.science.biology.informatics .conductor >> /30115), but I don't have access to the raw data. >> >> >> One idea I had was to *assume* that the total number of peaks with p-value=1 >> is 1.5X greater than those called significant. >> SO, if peak1 set = 7602 peaks, and peak2 set =?25335?peaks, then could I use >> totalTest = (7602 + 25335 = 32937) * 1.5 = 49405.5 >> >> >> If I use totalTest = (peak1 + peak2) * 1.5, then I could apply this >> systematic approach to make different comparisons between various datasets. >> This seems like a better idea than arbitrarily picking a value greater than >> the largest peak data set.?Does this make sense, or would you recommend a >> different approach/assumptions depending on the size of the genome (or area >> of open chromatin accessible for TF binding)? >> >> >> Also, I was trying to figure out how the value 1580 was chosen for totalTest >> in the Zhu (2010) ChIPpeakAnno manuscript? Was a systematic approach used >> similar to one I tried to come up with above? >> >> >> Best, >> Camila >> >> >> >> On 08/15/13, "Zhu, Lihua (Julie)" >> wrote: >>> Camila, >>> >>> TotalTest needs to be at least as large as the number of peaks in >>> CNSallPeaks_rd and OPColig2Peaks_rd. For example, if there are 1000 peaks in >>> CNSallPeaks_rd and 2000 peaks in OPColig2Peaks_rd, then totalTest needs to >>> be >= 2000. >>> >>> For details, please refer to an old post at >>> http://grokbase.com/t/r/bioconductor/123cz0jc0b/bioc-question- about-makevenn >>> diagram. Thanks! >>> >>> Best regards, >>> >>> Julie >>> >>> >>> On 8/15/13 6:08 PM, "Camila Lopez-anido" <lopezanido at="" wisc.edu=""> wrote: >>> >>>> Hi,? >>>> >>>> I'm a graduate student at the UW-Madison, and I'm getting an error message >>>> when I use makeVennDiagram() -I was wondering if there is a simple >>>> solution? I >>>> saw that someone else had asked about something related (similar message) >>>> online, but I couldn't find how they fixed the problem? >>>> >>>> >>>> I inputed two RangedData files, which I successfully performed >>>> findOverlappingPeaks() with: >>>> >>>> >>>>> makeVennDiagram(RangedDataList(CNSallPeaks_rd, OPColig2Peaks_rd), >>>>> NameOfPeaks=c("CNSall","OPColig2"), maxgap=0, minoverlap=1, totalTest=100, >>>>> cex=1, counts.col="red", useFeature=FALSE) >>>> >>>> >>>> #Error in seq.default(cnt, length.out = counts[i]) : >>>> #length must be non-negative number >>>> #In addition: Warning message: >>>> #In phyper(p1.and.p2 - 1, p2, totalTest - p2, p1, lower.tail = FALSE, : >>>> #NaNs produced >>>> >>>> >>>> >>>> Is there a simple solution to this? >>>> >>>> >>>> Thanks, >>>> Camila Lopez-Anido