Question

Affymetrix Mogene 1 0 ST

0

Entering edit mode

Al Ivens ▴ 30

@al-ivens-3203

Last seen 10.6 years ago

Hi all, I am analysing some Affy mogene-1_0-st-v1 arrays, and have noted that in their latest annotation resource file for these arrays: http://www.affymetrix.com/Auth/analysis/downloads/na27/wtgene/MoGene- 1_0 -st-v1.na27.mm9.transcript.csv.zip there is a column called "category", which is summarised below: >summary(affy_annotation$category) control->affx 22 control->bgp->antigenomic 45 main 28815 normgene->exon 1324 normgene->intron 5222 rescue->FLmRNA->unmapped 91 The "normgene->exon" loci are annotated as "positive controls", whilst the "normgene->intron" are annotated as "negative controls". I have searched the Affy www site for information on the positive controls might be, but the best I have managed so far is "from an exonic region of a normalization control gene". I have two questions: 1) Has anyone else had more success in tracking done more info on these loci (other than getting the sequences and blasting them)? I am guessing that they might be for spike-ins, but have not seen anything to confirm this. 2) Out of curiousity, I was wondering whether the positive controls could be used for normalisation, but could not find a way of doing it (i.e. as part of rma). I am used to controlspots=xyx for two-colour arrays (normalizeWithinArrays), but have no experience in trying to apply the same approach to Affy data Any thoughts/suggestions/guidance greatly appreciated. Best festive wishes to you all, a

Annotation Normalization affy Annotation Normalization affy • 2.5k views

ADD COMMENT • link updated 7.1 years ago by gil.hornung • 0 • written 16.3 years ago by Al Ivens ▴ 30

score 0 · Answer 1 · 2008-12-23

Hi Al, I don't know about the Mouse Gene arrays, but with regard to the Human Exon arrays, I came across something in one of the Affymetrix manuals (exon_array_design_technote.pdf) that says the intron/exon control probe sets are derived from the consensus sequences of the HG-U133 Plus 2 array normalization controls. For the mouse and rat 3' IVT arrays such as the MOE430 and RAE230, the normalization controls were the first 100 probe set IDs on the chip, i.e. probeset IDs 1415670_at to 1415769_at. Regards, Maria

score 0 · Answer 2 · 2018-02-28

I was also wondering about the normgene->exon and normgene->intron probe types, and after a long search found it in the "Quality Assessment of Exon and Gene Arrays" whitepaper from Affymetrix.

This is the current link:

https://tools.thermofisher.com/content/sfs/brochures/exon_gene_arrays_qa_whitepaper.pdf

And the paragraphs about the normalization genes:

neg_control is the set of putative intron based probe sets from putative housekeeping genes. Specifically, a number of species specific probesets on 3’ IVT arrays were shown to have constitutive expression over a large number of samples. The genes for these probesets were identified and multiple four probe probesets were selected against the putative intronic regions. (See the respective exon array design Technote for more information.) Thus in any given sample some (or many) of these putative intronic regions may be transcribed and retained. Furthermore, some (or many) of the genes may not be constitutive within certain data sets. These caveats aside, this collection makes for a moderately large collection of probesets which in general have very low signal values. These probesets are used to estimate the false positive rate for the pos_vs_neg_auc metric.

pos_control is the set of putative exon based probe sets from putative housekeeping genes. Specifically, a number of species specific probesets on 3’ IVT arrays were shown to have constitutive expression over a large number of samples. The genes for these probesets were identified and multiples of four probe probesets were selected against the putative exonic regions. (See the respective exon array design Technote for more information.) Thus in any given sample some (or many) of these putative exonic regions may not be transcribed or may be spliced out. Furthermore, some (or many) of the genes may not be constitutive within certain data sets. These caveats aside, this collection makes for a moderately large collection of probesets with target present which in general have moderate to high signal values. These probesets are used to estimate the true positive rate for the pos_vs_neg_auc metric.

The pos_control and all_probeset categories are useful in getting a handle on the overall quality of the data from each chip. Metrics based on these categories will reflect the quality of the whole experiment (RNA, target prep, chip, hybridization, scanning, and griding) and the nature of the data being used in downstream statistical analysis. The polya_spike category are useful for identifying potential problems with the target prep phase of the experiment; the bac_spike category are useful for identifying potential problems with the hybridization and chip. The caveat with these two categories is the limited number of spikes. Thus they should be used to troubleshoot problems whereas the pos_control and all_probeset categories should be used to assess overall quality.

The AUC metric they use to check the differences between pos and neg:

pos_vs_neg_auc is the area under the curve (AUC) for a receiver operating characteristic (ROC) plot comparing signal values for the positive controls to the negative controls. (See Section IV below for more information on the positive and negative probeset categories). The ROC curve is generated by evaluating how well the probe set signals separate the positive controls from the negative controls with the assumption that the negative controls are a measure of false positives and the positive controls are a measure of true positives. An AUC of 1 reflects perfect separation whereas as an AUC value of 0.5 would reflect no separation. Note that the AUC of the ROC curve is equivalent to a rank sum statistic used to test for differences in the center of two distributions. In the case of the exon and gene arrays the positive and negative controls are pseudo positives and negatives (see below). In practice the expected value for this metric is tissue type specific and may be sensitive to the quality of the RNA sample. Values between 0.80 and 0.90 are typical. For exon level analysis an additional ROC AUC metric is reported based on Detected Above BackGround (DABG) detection p-values, dabg_pos_vs_neg_auc.