How do I do a permutation test on one bed file of data to get the p-value of the highest peaks of distance frequency? Is permTest what I want? (https://www.rdocumentation.org/packages/regioneR/versions/1.4.0/topics/permTest)
I am measuring QKI binding sites at exon-intron junctions. Right now I have histograms of how common it is for QKI to be certain distances. For example the x-axis of one of my histograms is -300 nt intron <-> 0 intron-exon junction <-> exon 300 nt and the y-axis is the frequency of how often QKI is in that +-300nt range from the intron-exon junction in the entire transcriptome.
Below is a test file. The last column is the distance from that QKI binding site to the intron-exon junction (the junction is at the zero in the histogram). For example 77 means the QKI binding site is in the exon 77 nt downstream of the intron-exon junction and -191 means the QKI binding site is in the intron 191 nt upstream of the intron-exon junction. I think I can trim all of this data and only keep the final column and that would be the test.bed I input to permTest. I need help understanding how to make the proper randomize.function and evaluate.function so I can get this code to work: permTest(test.bed, ntimes=1000, randomize.function, evaluate.function, alternative="auto", min.parallel=1000, force.parallel=NULL, randomize.function.name=NULL, evaluate.function.name=NULL, verbose=TRUE)
pt <- permTest(A=my.genes, randomize.function=resampleRegions, universe=all.genes, evaluate.function=meanInRegions, x=methylation.levels.450K)
https://bioconductor.org/packages/3.7/bioc/vignettes/regioneR/inst/doc/regioneR.pdf
Except do not do x= and universe should be something like c(-300,300)
universe a region set in any of the formats accepted by toGRanges (GenomicRanges, data.frame, etc...)
https://bioconductor.org/packages/3.7/bioc/manuals/regioneR/man/regioneR.pdf
resampleRegions: This is the fastest of the built-in randomization functions. It uses sample to select N regions from a user provided universe. It is basically a linear algorithm with respect to the number of regions
https://bioconductor.org/packages/3.7/bioc/vignettes/regioneR/inst/doc/regioneR.pdf
resampleRegions(test.bed, universe=c(-300,300))
https://www.rdocumentation.org/packages/regioneR/versions/1.4.0/topics/resampleRegions
meanInRegions(test.bed, x=peakstobetested, col.name=NULL, ...)
https://bioconductor.org/packages/3.7/bioc/manuals/regioneR/man/regioneR.pdf
test.bed:
chr1 11922712 11922712 PARCLIP#QKI_Hafner2010c_hg19*QKI 1.0 - chr1 11922635 11923489 uc001atk.4_exon_1_0_chr1_11922636_r 0 - 0 77
chr1 15428836 15428836 PARCLIP#QKI_Hafner2010c_hg19*QKI 2.0 + chr1 15428592 15430343 uc001awh.2_exon_3_0_chr1_15428593_f 0 + 0 244
chr1 16305834 16305834 PARCLIP#QKI_Hafner2010c_hg19*QKI 1.0 - chr1 16305802 16305919 uc001ayg.4_exon_7_0_chr1_16305803_r 0 - 0 32
chr1 26281226 26281226 PARCLIP#QKI_Hafner2010c_hg19*QKI 1.0 + chr1 26281057 26281522 uc001blu.4_exon_2_0_chr1_26281058_f 0 + 0 169
chr1 27822452 27822452 PARCLIP#QKI_Hafner2010c_hg19*QKI 3.0 + chr1 27822230 27824452 uc001bou.4_exon_8_0_chr1_27822231_f 0 + 0 222
chr1 19119365 19119365 PARCLIP#QKI_Hafner2010c_hg19*QKI 4.0 - chr1 19118957 19119556 uc001bbi.4_intron_35_0_chr1_19118958_r 0 - 0 -191
chr1 20810619 20810619 PARCLIP#QKI_Hafner2010c_hg19*QKI 2.0 - chr1 20807500 20810737 uc001bef.4_intron_0_0_chr1_20807501_r 0 - 0 -118
@linuxborg2, you do not have a universe of all possible QKI peak position, so resample regions does not make sense. meanInRegions will compute the mean of a given value over the regions in A, but you do not have such a value. Please see my response below.