Hi Rajarshi,
I spent a long time thinking about this problem when I did some
screening. My problem was slightly different because I had 2 siRNAs
for
each gene and 2 replicates for each replicate, but still not enough to
do traditional stats. The first thing I suggest is that you analysis
the
data with the Biocondutor package cellHTS2 if you are not already.
After
performing several rounds of low through-put confirmation experiments
I
came to the following conclusions:
1) Without more data you cannot really do better than a threshold for
selecting hit siRNAs. The only significance you can put on an siRNA in
this situation is the rank in the hit list.
I have been thinking about some sort of FDR measure that considers the
position of the siRNA in relation to the distributions of both the
positive and negative control distributions. But I've never really
taken
it anywhere.
2) The fact that an siRNA is a hit, doesn't mean that a gene is. When
I
looked at the correlation between the two siRNAs targeting the same
gene, I saw that it was pretty much zero, while there was a
substantial
correlation between replicates. The reasons for this are probably two
fold. Firstly different siRNAs have different efficiencies in knowing
down the gene. Secondly the different siRNAs have different off-target
effects. If you are screening thousands of siRNAs, then those that
have
off-target effects relevant to your screen will score highly. If there
are many of these (which there are likely to be when you are screening
20,000 genes x 4 siRNAs), off-target effects are likely to dominate
the
top end of your list.
You could score genes based of the minimum/mean score for the 4
siRNAs,
when I did this (using the minimum of the 2 siRNAs that I had) I found
that I had to set my threshold so low that none of my putative hits
confirmed. If you do find some that do, you could be finding cases
where
both siRNA are having off-target effects (because of the massive
multiple testing). This might seem unlikely, but I have seen it
happen.
My conclusions from this are that as you say hit selection is just the
first step. You could use other information to winnow the initial
selection of hits, but I don't really think that there is any
substitute
for experimental confirmation of hits using independent siRNAs.
Winnowing based on GO/pathway analysis might help you select which
hits
you wan to confirm.
Hope all this waffle helps in some way,
Ian
---
Rajarshi Guha wrote:
> Hi, I have recently started working with RNAi screening data and
have been
> getting up to speed on the literature. I have a few questions ,which
are not
> directly related to Bioconductor (or R) but I figured that members
of the
> list would probably be able to help out. If there are more
appropriate
> places to post such questions I'dd appreciate pointers.
>
> My main question is about hit selection. I'm working with assays in
which
> each gene is targeted by 4 different siRNA's and the plates have no
> replicates. My understanding is that in this situation, one cannot
really
> use statistical tests to select siRNA's. Instead, one employs
threshold
> approaches (mean, MAD, quartile etc). Is this correct? In such a
> thresholding approach, is there any way one can provide some sort of
> significance/score to a selection oh hits?
>
> Would it be correct to say that hit selction is simply a first step
and one
> should use other informaiton (GO enrichment, pathway analysis) to
further
> winnow an initial selection of hits?
>
> I am also working on a sensititzation screen, where I am trying to
identify
> genes that are differentially knocked down. This problem seems
analogous to
> microarray studies and in that vein, I have been considering the 4
signals
> (i.e., 4 siRNA's) for each gene, in the two conditions and used a
t-test to
> determine whether there is a difference in the means.
>
> What I'm a little confused about is to what extent I need to perform
> multiple test corrections on the p-values - does the 'multiple'
refer to the
> number of conditions in which the assay is run (drug and no drug) or
the
> number of genes being considered?
>
> Thanks,
>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.