Advice for analyzing Affy data

0

Entering edit mode

Tan, MinHan ▴ 180

@tan-minhan-431

Last seen 10.2 years ago

Good evening, I'm new to R and Affymetrix data analysis. I'd truly appreciate it if someone could give me some pointers as to how to proceed. I really am not sure what I'm doing wrong. (a) performed the ReadAffy() steps, and created expression sets of my data in both MAS5 (esetmas) and RMA format. (the magnitude difference is quite startling) (b) Used genefilter to perform some simple filtering. f1<-kOverA(4,150) ffun<-filterfun(f1) whichmas<-genefilter(exprs(esetmas),ffun) exprData <- exprs(esetmas) filterData <- exprData[whichmas,] (c) I'm not sure how to perform the ideal form of unsupervised clustering and how best to view those results as plots. hc<-hclust(dist(filterData),"ave") Plot(hc) All I see is some very skewed looking data, with lots of the AFFX genes still present. I've tried running the GeneSOM function, but I don't quite understand the output. Thank you!! Best regards, Min-Han Tan, MD, MRCP(UK) Laboratory of Cancer Genetics Van Andel Research Institute 333 Bostwick NE Grand Rapids MI 49503 Tel: (616) 234-5350 Fax: (616) 234-5115 This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient(s) please contact the sender by reply email and destroy all copies of the original message. Thank you.

Cancer genefilter Cancer genefilter • 1.7k views

ADD COMMENT • link updated 21.2 years ago by James W. MacDonald 67k • written 21.2 years ago by Tan, MinHan ▴ 180

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 3 days ago

United States

First off, I would recommend using the rma expression values rather than the MAS expression values. It is not surprising that you are still seeing 'AFFX' data because the MAS expression values are very noisy, especially at the low end. Secondly, you might try an additional filter. Right now you are filtering for genes where at least four are larger than 150, but this doesn't filter out those genes that don't really change much between sample types. Since these genes are (by definition) less interesting, it is better to get rid of them before doing any clustering. I usually exclude genes where the CV is less than some ad hoc value. I base the cutoff on the number of genes I end up filtering (real scientific, I know...). I base the number of genes I want to remain based on what I am doing with the clustering result. If you are only interested in seeing how the samples cluster, the number of genes used is not that critical, except for the time/compute power required. However, if you are going to be making a heat map or some other pretty picture, then you really need to limit the number of genes because heat maps become too large to be useful at about 150 genes or so. HTH, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> "Tan, MinHan" <minhan.tan@vai.org> 09/08/03 11:13PM >>> Good evening, I'm new to R and Affymetrix data analysis. I'd truly appreciate it if someone could give me some pointers as to how to proceed. I really am not sure what I'm doing wrong. (a) performed the ReadAffy() steps, and created expression sets of my data in both MAS5 (esetmas) and RMA format. (the magnitude difference is quite startling) (b) Used genefilter to perform some simple filtering. f1<-kOverA(4,150) ffun<-filterfun(f1) whichmas<-genefilter(exprs(esetmas),ffun) exprData <- exprs(esetmas) filterData <- exprData[whichmas,] (c) I'm not sure how to perform the ideal form of unsupervised clustering and how best to view those results as plots. hc<-hclust(dist(filterData),"ave") Plot(hc) All I see is some very skewed looking data, with lots of the AFFX genes still present. I've tried running the GeneSOM function, but I don't quite understand the output. Thank you!! Best regards, Min-Han Tan, MD, MRCP(UK) Laboratory of Cancer Genetics Van Andel Research Institute 333 Bostwick NE Grand Rapids MI 49503 Tel: (616) 234-5350 Fax: (616) 234-5115 This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient(s) please contact the sender by reply email and destroy all copies of the original message. Thank you. _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 21.2 years ago James W. MacDonald 67k

0

Entering edit mode

Hi! the Tuckey bigweight summary stat method (mas) does no data transformation when raw values are supplied in the AffyBatch object, however, it magically transforms the data when they are log or arcsinh transformed. What transformation is used? Where could I find out myself? And is there a why (option?) to stop expresso from doing it? Here my code fragment: PTNbS.vsn.mas <- expresso(PTNbS.vsn, bg.correct=F, normalize=F, pmcorrect.method="subtractmm",summary.method="mas") Cheers, Hinnerk

ADD REPLY • link 21.2 years ago Hinnerk Boriss ▴ 220

Login before adding your answer.