Hi List,
I couldn't get answer to this in other forums so posting here with hopes of help in computing statistical significance of my data. Suppose I have 2 baskets (B1 and B2) on a table each with mix of apple and oranges, and there are 10 such tables (T1 to T10. Now, I have computed the log-odd score of finding apples in B1 at all 10 tables:
2.95 5.56 6.025 7.225 7.37 7.39 7.54 7.54 6.82 7.295
To generate a control population I (randomly) shuffled fruits between B1 and B2 on every table, keeping the number of fruits in each basket same as above. And again computed log-odd score of finding apples in B1 at all 10 tables:
scores from shuffled control-1 0.81 1.25 0.695 0.725 -0.23 -0.25 -0.27 0.2 0.04 0.035 scores from shuffled control-2 -0.81 0.94 0.855 0.41 0.37 0.755 0.78 0.78 -0.075 0.59
and 3 more shuffled controls, so total 5 different controls with shuffled scores.
How can I compute p-values representing statistical significance of log-odd scores from real (B1) baskets against shuffled (control) baskets, for each table? Could you please suggest test or R-package for for this?
Thanks
Bade
@Gordon Smyth: Many thanks for replying and link to your paper. I almost lost hope of getting help on this.
Here baskets (B1 and B2) represent “w” and “c” strands, both independent of each other as far as this study is concerned. In above example we are just concerned about “w” strand. The “apple” and “oranges” represent “double-stranded” (ds) and “single-stranded” (ss) reads respectively. And finally "tables" are bins of specific length (~100nt) in an intergenic region. So, scores in my toy example are generalized log-odd ratio of ds-reads against ss-reads from bin-1 to bin-10 on strand “w”.
Scores from shuffled controls are ds/ss-RNA log odd score from the same bins (1 to 10) and same strand. And these shuffled controls were generated by 1000 iterations of shuffling. I can generate more of these controls if required.
I need a test to compare bin-specific log-odd scores of real-data with those from shuffled-controls and assign a bin-specific p-value of significance. Is it possible and which test would suit best? and is there any R-package available? I would greatly appreciate your help.
I know there are other possible ways to compute p-values for these score like considering all the scores on particular chromosome, and use them for some statistical testing. But, other methods don’t really fit the biological context of problem.
Bade