Light difference when using coef vs omit a group when compare 3 groups.
1
0
Entering edit mode
Chris ▴ 20
@3fdb6f97
Last seen 26 days ago
United States

Hi all,

When I use limma to compare pathway score between 3 groups, I tried two ways:

  1. Keep 3 groups in the data and use coef to specific which 2 groups I want to compare.
  2. Omit the group I don't want to compare in the data.
    The result when filter p value < 0.05 in both way is the same when I compare late state vs control. But the result is slightly different (599 vs 600 pathways) when I compare early state vs control. Would you please have an explanation? Thank you so much!
    paths_results_ES_NC <- topTable(fit2, adjust="BH", number=Inf, coef = "ESvsNC")
    filtered_paths_ES_NC <- paths_results_ES_NC[paths_results_ES_NC$P.Value < 0.05, ]
    
limma • 1.6k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

It's a basic property of anova and linear models that results will change if you add or remove data. Why would that be surprising? It's not unique to limma but applies to any statistical procedure using regression or linear models. The differences here are miniscule.

Choosing pathways by unadjusted P.Value is not a valid procedure anyway.

ADD COMMENT
0
Entering edit mode

Thanks Gordon. My statistic knowledge currently is quite limit, so just want to make sure the result is correct. If I use adjusted P. Value, with large number of pathways like thousands, no p value is remain significant. So I think I may miss some significant pathway because the nature of the data and the sample size for early state and late state are far less than normal control.

ADD REPLY
1
Entering edit mode

I have answered many questions and comments from you about this same analysis on this forum and on Biostars. I have previously explained why omiting a group is not good practice and why it might change the results. I have previously explained that unequal sample sizes do not cause a problem. I made several suggestions how you might refine your analysis. I explained to you that pathway methods based on computing single-sample-scores are not a powerful way to find differentially expressed pathways. I suggested that you plot your data to explore whether there were any differences between the groups or any problems with the analysis. I even provided you with exact code for making the plot. The software seems to be working correctly so I don't know what more to say.

ADD REPLY
0
Entering edit mode

I really appreciate your support.

ADD REPLY
0
Entering edit mode

Hi Gordon, thank you so much for your guide! I check my data as you said there is no difference in my data. This is the MDS plot at genes level. I don't have the raw data but I think the normalized data. So can we say from the plot that there is no difference between normal control (black) and late state (red)? Just curious I have 2 big clusters. Is that because of batch effect? I am so sorry the previous MDS plot in other post! enter image description here enter image description here

ADD REPLY
1
Entering edit mode

The MDS plot shows a massive batch effect. If this is a correct plot of your data, then any analysis you have done of this data is worthless because you have not modelled the true nature of the data. The first steps of any analysis are to plot and normalize the data. You need to complete these steps satisfactorily before proceeding to any DE analysis.

I must say that the plot looks somewhat artificial compared to real microarray or RNA-seq data. According to your comments, you don't seem to know the nature of your data, what it is and how it has been normalized.

ADD REPLY
0
Entering edit mode

Hi Gordon,

Thank you so much for your guide! I don't handle the raw data but a text file with all samples that already normalized as you see in the histogram. So I need the raw data and use combat() to handle batch effect? So the text file I got, it haven't removed batch effect but normalized and log transform? The person gave me the data is not the one generate the data so she doesn't know all the information. With large number of sample, the data was created from 2005 to 2008.

ADD REPLY
1
Entering edit mode

I can't tell you how to analyse your particular dataset. I can only give general advice and help with limma syntax. I don't even know what sort of data you have, whether it is RNA-seq, microarray, proteomics or whatever. In terms of general advice, I like to have raw data but normalized data might be fine depending on what the technology is and whether the normalization was appropriate. There is never any need for combat(). You need to find out the biological cause of the batch effect, not blindly try to remove it. Once you know the cause, the extra factor is incorporated into the limma linear model. At very least, you should be exploring batch effects over time.

ADD REPLY
0
Entering edit mode

Hi Gordon, thank you so much for further guidance! I discussed this issue with psychiatrist who treat Alzheimer patients and they don't expect significant difference in gene expression from peripheral blood sample between disease and control. Based on the distribution of microarray value in histogram above, I think the data normalized and log transformed. So does the batch effect causes two big clusters?

If I use limma to compare a few thousand pathways between 3 groups, pathway A will got different p value with the case using only a few pathways. P adj will change but I don't know why p value of the same pathway also change. Basically, I just compare more. So p value of pathway A should be the same regardless number of pathway I use to compare, right? The result of this pathway should independent with other pathways. Would you please explain?

ADD REPLY

Login before adding your answer.

Traffic: 623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6