Question

What is the best way to do annotation of the transcript clusters and filter in HTA 2.0

0

Entering edit mode

Marco_aurelio • 0

@marco_aurelio-23595

Last seen 17 months ago

Spain

Hello everybody.

I am Marco, working as a Bioinformatician for a research company. I'm using HTA 2.0 microarrays to analice cancer cells. My question is about how to do a properly annotation and filter of genes in HTA 2.0 analysis.

I found on internet different ways to do it but I don't know which one is the most appropriate.

For example, I was following this webpage: https://www.bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html

In this case, I used this code:

 #Read cel files
    dat <- read.celfiles()
 #RMA normalization
    eset <- rma(dat)
 #Annotation
        eset <- annotateEset(eset, pd.hta.2.0)
        load(system.file("/extdata/netaffxTranscript.rda", package = "pd.hta.2.0"))
        annot <- pData("netaffxTranscript")
        annot <- annot[featureNames(eset),]
        fdat <- fData(eset)
        fdat$LOCUSTYPE <- annot$locustype
        fData(eset) <- fdat

After annotation of the transcript clusters I added the gene symbol (SYMBOL) and a short description of the gene the cluster represents (GENENAME) and extra information (Locustype etc.). In a second step, I filtered out the probes that do not map to a gene.

#Remove NA from Symbol column
eset<- subset(eset, !is.na(SYMBOL))

And I got this:

enter image description here

However I have genes with different gene ID but same GeneName. Do I need to filter as well? I though to use the maximum absolute deviation (MAD) to eliminate duplicate GeneNames, and keep the ones of greatest interest. The probes of interest in our study are those that present the greatest variability.

What do you think about this??

HTA2.0 microarray affy r • 1.3k views

ADD COMMENT • link 4.3 years ago • updated 4.2 years ago Marco_aurelio • 0

score 0 · Answer 1 · 2020-09-22

The HTA 2.0 arrays are Affy's 'answer' to RNA-Seq, and are intended to measure transcripts rather than genes. They are actually intended to allow people to detect transcript variants and stuff, so if you are just interested in differential gene expression, you are hunting squirrels with a bazooka. Ideally you would have used the Gene ST array, which is simpler and IMO more useful. But maybe your group got a really good deal? It seems like Affy can't give the HTA arrays away, so it's possible.

Anyway, you are asking a perennial question, and it is actually an analysis question rather than a software question. We can help you with how to get a package to do what you want, but cannot (and you shouldn't want us to) tell you how to analyze your data. If you are the analyst, then you are the analyst and it's up to you to analyze. Asking mostly pseudonymous randos on the interwebs if they think you are doing the right thing is probably not the ideal way to proceed.