Hello..I have a list of .CEL files from HTA 2.0 chips. I created an ExpressionSet object by reading them with the R package "oligo". This object has data on 70523 features (genes) and 48 samples. What is best way to annotate these genes so that I can create a "GenomicRanges" object with the following attributes: Chromosome, Transcript_start, Transcript end ?
Sincere thanks for your reply!…But this gives the annotation for 67758 transcripts, while I have the expression data on 70523 transcripts. How can I get the annotation for the remaining transcripts?
which removes all the data where the seqname (chromosome) is missing. It turns out there are 2995 probesets for which there is no chromosome in the annotation file. If you don't have a chromosome, how will you add it to the GRanges object? If we look a bit closer at these probesets, they don't have start/end information either:
And if we go to netaffx to see what these probesets are supposed to measure, we either return no results, or we get redirected to some random page. Try querying for 2824546_st at netaffx, or better yet, one of these PSR probesets:
I have a similar situation. I have read all my 30 CEL file into R and Normalized them using Oligo package. How can I use pd.hta.2.0 package to annotate my data ? I have been searching for codes. From what I have seen some people have tried to build their on library but I am not sure if is that a necessary step to annotate my data?
The short answer is that you can't. If you re-read the code in my previous answer, you will note this line:
which removes all the data where the seqname (chromosome) is missing. It turns out there are 2995 probesets for which there is no chromosome in the annotation file. If you don't have a chromosome, how will you add it to the GRanges object? If we look a bit closer at these probesets, they don't have start/end information either:
And if we go to netaffx to see what these probesets are supposed to measure, we either return no results, or we get redirected to some random page. Try querying for 2824546_st at netaffx, or better yet, one of these PSR probesets:
So long story short, you cannot get the position information for all of the probesets, because some of them have no such information.
The 2995 probesets in question are various types of controls, and are not considered main content for gene level analysis.