Question

DESeq and paired samples - pairing vs pooling

0

Entering edit mode

Timothy Hughes ▴ 20

@timothy-hughes-4553

Last seen 10.5 years ago

Hi, We are performing a study of 8 individuals with cancer. We have 8 pairs of samples. Each pair consists of two samples from the same individual: one from cancerous tissue and one from normal tissue. We have used DESeq to perform a pooled comparison between the normal and cancerous samples and find a number of genes that are differentially expressed. We would also like to perform a paired analysis (simple comparison between the two tissue samples from the same individual). Our logic is that the pooled analysis will tend to identify genes as differentially expressed only if they are fairly consistently up or down-regulated across individuals. But, the etiology of the same cancer type may be heterogeneous and we aim to investigate this by performing the paired analysis. In connection with this, I have two questions: 1. we read in the DESeq paper that this can be done, but are we correct in believing that we can interpret the results as I describe above? 2. Does it make sense to do a paired analysis as described above or would it make more sense to pool the normal tissues and then compare each cancerous tissue to the pool? Thanks for your help. -- Tim :) [[alternative HTML version deleted]]

Cancer DESeq Cancer DESeq • 2.3k views

ADD COMMENT • link updated 13.8 years ago by Gordon Smyth 52k • written 13.8 years ago by Timothy Hughes ▴ 20

score 0 · Answer 1 · 2011-05-21

On 05/21/2011 07:10 AM, Timothy Hughes wrote: > Hi, > > We are performing a study of 8 individuals with cancer. We have 8 pairs of > samples. Each pair consists of two samples from the same individual: one > from cancerous tissue and one from normal tissue. Hi Tim -- not really answering your question with respect to DESeq, but section 11 of the edgeR manual walks through a paired design. Martin > > We have used DESeq to perform a pooled comparison between the normal and > cancerous samples and find a number of genes that are differentially > expressed. > > We would also like to perform a paired analysis (simple comparison between > the two tissue samples from the same individual). Our logic is that the > pooled analysis will tend to identify genes as differentially expressed only > if they are fairly consistently up or down-regulated across individuals. > But, the etiology of the same cancer type may be heterogeneous and we aim to > investigate this by performing the paired analysis. In connection with this, > I have two questions: > 1. we read in the DESeq paper that this can be done, but are we correct in > believing that we can interpret the results as I describe above? > 2. Does it make sense to do a paired analysis as described above or would it > make more sense to pool the normal tissues and then compare each cancerous > tissue to the pool? > > Thanks for your help. > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

score 0 · Answer 2 · 2011-05-21

We are performing a study of 8 individuals with cancer. We have 8 pairs of samples. Each pair consists of two samples from the same individual: one from cancerous tissue and one from normal tissue. We have used DESeq to perform a pooled comparison between the normal and cancerous samples and find a number of genes that are differentially expressed. We would also like to perform a paired analysis (simple comparison between the two tissue samples from the same individual). Our logic is that the pooled analysis will tend to identify genes as differentially expressed only if they are fairly consistently up or down-regulated across individuals. But, the etiology of the same cancer type may be heterogeneous and we aim to investigate this by performing the paired analysis. In connection with this, I have two questions: 1. we read in the DESeq paper that this can be done, but are we correct in believing that we can interpret the results as I describe above? 2. Does it make sense to do a paired analysis as described above or would it make more sense to pool the normal tissues and then compare each cancerous tissue to the pool? Thanks for your help. Tim. -- Tim Hughes PhD (http://digitised.info) Medical Genetics Department Oslo University Hospital (Ullevål) Kirkeveien 166 0407 Oslo Norway Tel: (+47) 23 02 72 55 [[alternative HTML version deleted]]

score 0 · Answer 3 · 2011-05-25

Dear Tim, > From: Timothy Hughes <timothy.hughes at="" medisin.uio.no=""> > To: bioconductor at r-project.org > Subject: [BioC] DESeq and paired samples - pairing vs pooling > > We are performing a study of 8 individuals with cancer. We have 8 pairs > of samples. Each pair consists of two samples from the same individual: > one from cancerous tissue and one from normal tissue. > > We have used DESeq to perform a pooled comparison between the normal and > cancerous samples and find a number of genes that are differentially > expressed. > > We would also like to perform a paired analysis (simple comparison > between the two tissue samples from the same individual). Our logic is > that the pooled analysis will tend to identify genes as differentially > expressed only if they are fairly consistently up or down-regulated > across individuals. Not quite sure what you mean by a pooled analysis in this context. I think you mean treating the cancer and normal tissue samples as independent groups. Basically you should perform a paired analysis here, because your data is naturally paired, and otherwise you will be ignoring the baseline differences between individuals. The DE genes you have found are probably not wrong, but you have probably missed many others. > But, the etiology of the same cancer type may be heterogeneous and we > aim to investigate this by performing the paired analysis. Unfortunately, a paired analysis doesn't give you a way to handle heterogeneity of cancers. A paired analysis will still look for differential expression that is consistent across the patients. It looks for genes that have more or less consistent relative changes between normal and cancer for each patient. It will find genes that are common to the majority of the cancers. > In connection with this, > I have two questions: > 1. we read in the DESeq paper that this can be done, but are we correct in > believing that we can interpret the results as I describe above? I wonder where you have read this? I don't think the DESeq authors claim it handles paired tests. See above for comments on interpretation. > 2. Does it make sense to do a paired analysis as described above or would it > make more sense to pool the normal tissues and then compare each cancerous > tissue to the pool? If you want to find genes that are specific to one cancer, and not to the other, nor to the normal tissues, then comparing each individual cancer to the group of normals is probably your best route, at least the simplest one. You could do a standard two-group analysis with n=1 in one of the groups. This does ignore the pairing of the cancer tissue to one of the normals but, with 8 individuals, the penalty probably isn't too high. I can think of ways to a more careful analysis, but they'd be harder to explain in a publication. Using the edgeR package, you could (i) fit a paired samples model, in order to extract the biological coefficient of variation (BCV) from all the individuals, then (ii) compare each individual cancer to its own paired normal tissue, using the BCV previously estimated from all the patients. Of course, plotting the data to see how different the cancer samples seem to be should be the first step. I personally use plotMDS.dge() in the edgeR package for this purpose. Best wishes Gordon > Thanks for your help. > > Tim. > > -- > Tim Hughes PhD (http://digitised.info) > Medical Genetics Department > Oslo University Hospital (Ullev?l) > Kirkeveien 166 > 0407 Oslo > Norway > > Tel: (+47) 23 02 72 55

score 0 · Answer 4 · 2011-05-26

Dear Tim, A good question, but I think you should keep the paired normal in. I think ignoring the pairing will make the test conservative rather than invalid. Removing the cancerous sample's normal pair, while valid, would be less powerful again, so would not offer an advantage. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Wed, 25 May 2011, Timothy Hughes wrote: > Dear Gordon, > > Thanks you very much for your detailed answer, it has really helped me get a > better understanding of DESeq and edgeR. > > I see that I cannot perform a paired analysis with DEseq but can do so with > edgeR. > > When it comes to getting a grip on the specifics of each cancer sample you > say that one can compare each cancer to the all normal samples (ignoring the > pairing between the cancer sample and one of the normals) and you also > suggest a more careful analysis. But what about comparing each cancerous > sample to the group of normals excluding the cancerous sample's normal pair? > Would this be a valid approach, superior to including the normal pair and > less difficult to explain than the "careful" analysis? > > Tim > > > > On 25 May 2011 03:09, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Tim, >> >> From: Timothy Hughes <timothy.hughes at="" medisin.uio.no=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] DESeq and paired samples - pairing vs pooling >>> >>> We are performing a study of 8 individuals with cancer. We have 8 pairs of >>> samples. Each pair consists of two samples from the same individual: one >>> from cancerous tissue and one from normal tissue. >>> >>> We have used DESeq to perform a pooled comparison between the normal and >>> cancerous samples and find a number of genes that are differentially >>> expressed. >>> >>> We would also like to perform a paired analysis (simple comparison between >>> the two tissue samples from the same individual). Our logic is that the >>> pooled analysis will tend to identify genes as differentially expressed only >>> if they are fairly consistently up or down-regulated across individuals. >>> >> >> Not quite sure what you mean by a pooled analysis in this context. I think >> you mean treating the cancer and normal tissue samples as independent >> groups. Basically you should perform a paired analysis here, because your >> data is naturally paired, and otherwise you will be ignoring the baseline >> differences between individuals. The DE genes you have found are probably >> not wrong, but you have probably missed many others. >> >> But, the etiology of the same cancer type may be heterogeneous and we aim >>> to investigate this by performing the paired analysis. >>> >> >> Unfortunately, a paired analysis doesn't give you a way to handle >> heterogeneity of cancers. A paired analysis will still look for >> differential expression that is consistent across the patients. It looks >> for genes that have more or less consistent relative changes between normal >> and cancer for each patient. It will find genes that are common to the >> majority of the cancers. >> >> In connection with this, >>> I have two questions: >>> 1. we read in the DESeq paper that this can be done, but are we correct in >>> believing that we can interpret the results as I describe above? >>> >> >> I wonder where you have read this? I don't think the DESeq authors claim >> it handles paired tests. >> >> See above for comments on interpretation. >> >> 2. Does it make sense to do a paired analysis as described above or would >>> it >>> make more sense to pool the normal tissues and then compare each cancerous >>> tissue to the pool? >>> >> >> If you want to find genes that are specific to one cancer, and not to the >> other, nor to the normal tissues, then comparing each individual cancer to >> the group of normals is probably your best route, at least the simplest one. >> You could do a standard two-group analysis with n=1 in one of the groups. >> This does ignore the pairing of the cancer tissue to one of the normals >> but, with 8 individuals, the penalty probably isn't too high. >> >> I can think of ways to a more careful analysis, but they'd be harder to >> explain in a publication. Using the edgeR package, you could (i) fit a >> paired samples model, in order to extract the biological coefficient of >> variation (BCV) from all the individuals, then (ii) compare each individual >> cancer to its own paired normal tissue, using the BCV previously estimated >> from all the patients. >> >> Of course, plotting the data to see how different the cancer samples seem >> to be should be the first step. I personally use plotMDS.dge() in the edgeR >> package for this purpose. >> >> Best wishes >> Gordon >> >> Thanks for your help. >>> >>> Tim. >>> >>> -- >>> Tim Hughes PhD (http://digitised.info) >>> Medical Genetics Department >>> Oslo University Hospital (Ullev?l) >>> Kirkeveien 166 >>> 0407 Oslo >>> Norway >>> >>> Tel: (+47) 23 02 72 55 >>> >> > > > -- > Tim Hughes PhD (http://digitised.info) > Medical Genetics Department > Oslo University Hospital (Ullev?l) > Kirkeveien 166 > 0407 Oslo > Norway > > Tel: (+47) 23 02 72 55 >