Question

Using gcrma() and using the t-test to compare single genes of interest for disease vs healthy comparison?

0

Entering edit mode

Pratik Mehta ▴ 10

@0512b16f

Last seen 14 months ago

United States

Hello BioConductor community,

I downloaded .cel files from an Affymetrix U133A array study from NCBI GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55235

Afterwards, to get more up-to-date annotation, I used the version 25 customCDF files for gencode from the umich.edu brainarray website and read in the data with customCDF and affy package from there: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp

Afterwards, I used this tutorial/guide to perform gcrma() normalization: https://www.biostars.org/p/61987/#62003

Finally, the 10 rheumatoid arthritis (RA) samples vs 10 ND (healthy) samples were compared using a t-test for single genes of interest (without any p-value adjustment) or modeling, just mean of 10 RA samples vs mean of 10 healthy samples, for gene x, and repeat for gene y, generate t value and it's corresponding p value using an alpha of .05.

Is this correct or flawed? Would or could a housekeeping gene such as GAPDH be used to normalize or just see what expression was of GAPDH or another housekeeping gene compared to gene x of interest and gene y of interest?

As a final point, I did do the analysis as described in the limma userguide section 8.2. But was wondering, would it be correct to do it like it has been done already, with single genes and a t-test? and again, could a housekeeping gene be used to compare/reference "baseline" gene expression of a common gene with gene x of interest and gene y of interest?

Thank you very much in advance.

Respectfully, Pratik

gcrma CustomCDF Affyhgu133aExpr limma affy • 1.3k views

ADD COMMENT • link updated 2.0 years ago by James W. MacDonald 68k • written 2.0 years ago by Pratik Mehta ▴ 10

score 2 · Accepted Answer · 2023-04-14

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

You should use limma. Also, you have already normalized using GCRMA, so further normalization using a housekeeping gene will be of no benefit.

ADD COMMENT • link 2.0 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you very much for your answer.

If I am understanding correctly (and maybe extrapolating a little bit here), would plotting the mean normalized GCRMA values for gene X of healthy and disease be acceptable (through a histogram) ?

and then as a replacement for the t-test (for determining statistical significance) that has already been done (using single genes), would I instead use limma's output confint=TRUE when running topTable in limma as suggested here: Display error (error bars) for fold-change estimate from replicates in edgeR and here: Confidence intervals on edgeR logFC (to use for adding error bars on the above histogram?)

the objective here, is to kind-of make sure an analysis is conventionally correct before submitting the manuscript... I provided the gcrma normalized counts to a colleague about 2 years ago, a histogram was made with mean gcrma normalized values for gene of interest X but used single-gene t-test (as described in the original question post) to make error bars... just want to make sure what we are submitting now, is solid evidence for, hopefully, the truth or closer-to.

Thank you again :)

Respectfully, Pratik

ADD REPLY • link 2.0 years ago Pratik Mehta ▴ 10

0

Entering edit mode

Your questions are off-topic. I am unable to say what you should plot for a manuscript. As the author that is entirely up to you.

Using limma is about as conventionally correct as you can get for microarray analyses. Over 22K citations is probably sufficient for any journal, I would imagind.

ADD REPLY • link 2.0 years ago James W. MacDonald 68k