Question

Relations between average expression and fold change

0

Entering edit mode

Lluís Revilla Sancho ▴ 760

@lluis-revilla-sancho

Last seen 6 weeks ago

European Union

I have a dataset with a very complex set up, that I don't seem to be handling well: whatever I do I have a relationship between the fold change and the average expression of the genes.

I have 5 main variables:

Cell type: stem or differentiated cells
Group disease or control
Location: ileum or colon
Type: from pediatric or adult samples
Creation: old or new.

So far I decided to analyse as two cohorts the old and new samples, because they are from different experiment matrigels, there has been a couple of years in-between...

For the remaining variables I used a design experiment of interaction where I have each combination of the variables as a variable of the design: STEM_disease_ileum_pediatric, STEM_disease_ileum_adult, STEM_control_ileum_pediatric, ...

However, this ends up with comparisons like this one (done via limma): logFC vs AvgExpression

We can see that the higher the average expression is the bigger the logFC is, while I expected that the average expression would affect the fold change.

I tried changing the design to a more simple one with less interactions, I was recommended to normalize just the samples I use for each comparison but both resulted in worse results. I tried correcting using surrogate variables from sva package and it didn't work (despite finding 2 surrogate variables). The PCA did not show any clear batch effect, only that stem and diff cells have very different expression (separates them by first component, which explained the 36.5% of the variance).

I don't have more ideas to try, and suggestions about how to design/normalize the data are welcomed.

design AveExpr comparisons • 2.1k views

ADD COMMENT • link updated 5.4 years ago by James W. MacDonald 68k • written 5.4 years ago by Lluís Revilla Sancho ▴ 760

score 0 · Answer 1 · 2019-11-21

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

You don't say what kind of data these are, but what's wrong with using limma-trend?

ADD COMMENT • link 5.4 years ago James W. MacDonald 68k

1

Entering edit mode

Ugh. Need more coffee. Sorry for the noise.

This is really a question about how you should be analyzing your data rather than how to use Bioconductor tools. Without having data in hand, I am not sure anybody can help you, and I am not sure that any advice will be helpful if given. At that point it's just conjecture.

ADD REPLY • link 5.4 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks! This is RNA-seq. You made me realize that I get this with plotSA. voom trend wrong Which I have never seen... do you have any suggestion here on what might be the error?

ADD REPLY • link 5.4 years ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

Well, for complex analyses I tend to filter based on the average logCPM of each gene. If you do a density plot of the rowMeans of the logCPM data, it usually is a bimodal distribution with a low point somewhere around zero. If you make the assumption that the genes to the left of the nadir are unexpressed, and those to the right are expressed, you can exclude based on that criterion. Which might help?

ADD REPLY • link 5.4 years ago James W. MacDonald 68k

0

Entering edit mode

You might also try CQN to see if that helps.

ADD REPLY • link 5.4 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks for your help! At the end the problem was that I was using a design model with just one sample for a coefficient. Thank you very much for your advice (and sorry to answer so late, I just saw the notification).

ADD REPLY • link 5.4 years ago Lluís Revilla Sancho ▴ 760