Hi, I have microarray data from Affymetrix Human Genome U133 plus2.0 array to analyze. A previous student in my team analyzed it using Chipster with chiptype of hgu133plus2hsentrezg.db leading to 19425 probes after normalization. When I do the analysis on my way, and upload the CELfile the chiptype is hgu133plus2.db and I obtained 54675 probes after normalization. Could you please explain to me the difference between the two kinds of chiptype and which one I have to use?
I am not familiar with Chipster, but hgu133plus2hsentrezg.db refers to the annotation of the Affymetrix HGU133plus2 GeneChip that is based on it's so-called remapped (custom) chip definition file (CDF). These custom CDFs are in essence updated CDFs, and are (were?) created by the BrainArray group of the University of Michigan. See here and here. Corresponding paper is here. The use of entrezg in it's name shows that the probes were remapped based on the genome annotation provided by NCBI (on the level of 'entrez gene').
The file hgu133plus2.db contains the annotation information that is based on the original CDF created by Affymetrix at the time the designed this array.
You could say that the custom CDF-base files are more up-to-date than the original files. Again, check the paper for all details.
Please note that in Affymetrix original design often multiple probesets target a gene, although the specificity may differ between probesets. In contrast, in the custom CDF based on entrez gene the probesets are remapped on the level of updated gene annotations, and therefore in this case (by definition) the expression of each gene is probed by a single probeset. That is the reason why the remapped CDF 'consists' of less probesets.
Which one to use is up to you. If you would like to use the results as intended by Affymetrix, go for the default/original CDF. If you would like to incorporate the improvements in annotation that happened after Affymetrix prepared their chip design (and also would like to just have a gene detected by a single probeset), go for the remapped/custom CDF.
Thank you very much for all this explanation!!! It is much clear to me now!
So I think I will use hgu133plus2hsentrezg.db since it is maybe the most updated one!
May thanks for taking the time!
Thank you very much for all this explanation!!! It is much clear to me now! So I think I will use hgu133plus2hsentrezg.db since it is maybe the most updated one! May thanks for taking the time!