HI,
I have a quick question regarding the annotation package for HuGene2.0 array. Previously with the pd.hugene.2.0.st_3.10.0 package (R_3.1.3, oligo_1.30.0), I was able to find annotation related to the gene PTEN, however, when I updated the package to pd.hugene.2.0.st_3.14.1 (R_3.2.1, oligo_1.32.0), I got NA for geneassignment (attached below):
> features <- pData(featureData(dataset$eset))
> features <- features[features$category == 'main',c('transcriptclusterid','seqname','start','stop','geneassignment','mrnaassignment', 'unigene')];
> x <- features[features$transcriptclusterid == '16707030',]
> str(x)
'data.frame': 1 obs. of 7 variables:
$ transcriptclusterid: int 16707030
$ seqname : chr "chr10"
$ start : int 89622870
$ stop : int 89731687
$ geneassignment : chr "ENST00000371953 // PTEN // phosphatase and tensin homolog // 10q23.3 // 5728 /// NM_000314 // PTEN // phosphatase and tensin ho"| __truncated__
$ mrnaassignment : chr "ENST00000371953 // ENSEMBL // cdna:known chromosome:GRCh37:10:89622870:89731687:1 gene:ENSG00000171862 gene_biotype:protein_cod"| __truncated__
$ unigene : chr "ENST00000371953 // Hs.500466 // adipose tissue| adrenal gland| bladder| blood| bone| brain| cervix| connective tissue| ear| emb"| __truncated__
-------------------------------------------------------------------------------------------------------------------------------------
> features <- pData(featureData(dataset$eset));
> features <- features[features$category == 'main',c('transcriptclusterid','seqname','start','stop','geneassignment','mrnaassignment', 'unigene')];
> x <- features[features$transcriptclusterid == '16707030',]
> str(x)
'data.frame': 1 obs. of 7 variables:
$ transcriptclusterid: int 16707030
$ seqname : chr "chr10"
$ start : int 89622870
$ stop : int 89731687
$ geneassignment : chr NA
$ mrnaassignment : chr NA
$ unigene : chr NA
Thank,
Sylvia
Hi James,
Got it! Thanks for the quick reply! I think I'll use hugene20sttranscriptcluster.db instead then.
Best,
Sylvia
Hi James,
Sorry, but I just have another clarifying question. I was wondering where you obtained the sequence for blat. I went to netaffx and looked up the sequence for transcriptclusterID 16707030 using:
https://www.affymetrix.com/analysis/netaffx/exon/wtgene_transcript.affx?pk=712:16707030
and I found that the sequence overlapped with PTEN:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr10%3A89622870-89731687&hgsid=452878227_H7OFAdKA7suL8UyYDlBvVf3Owpkf
Would you mind correcting me if this was not the approach that you took? Sorry for the question. Thank you in advance for the clarification.
Best,
Sylvia
Hi Sylvia,
You're right - I just copied the first bit of the transcript from Affy. If you click on the link to get the whole transcript and then blat, it does indeed cover PTEN.
Like I said before, we are just passing on what we get from Affy. A while back a co-worker of mine who was working with Exon ST arrays noticed that the na34 build from Affy had far fewer annotations than an earlier build he had used. So we contacted Affy and they told us that they knew the na34 build was bad, and had been working on an update, which is the na35 build. It may well be that na35 isn't particularly good either, but 'good' depends on your frame of reference. If they are getting 95% of the annotations right, then it might be good for most people. But if the gene you care about is messed up, then obviously it's not good enough for you.
Unfortunately we don't have enough people to fully annotate the arrays (we supply annotation packages for way too many arrays for that to happen), so we have to rely on the manufacturers to get it right.
Hi James,
Thanks for the clarification! Yeah, I checked the whole transcript and found out it overlap with PTEN. I'll contact Affy as well to see how they think about this.
Best,
Sylvia