Entering edit mode
Hello everybody. I'm newbie in this field. I have recently run the HTA 2.0 annotation using the two packages affycoretools and as outlined in the post https://support.bioconductor.org/p/89308/.
files=list.celfiles(dir,full.names = TRUE)
#Normalization
norm_data <- oligo::rma(read.celfiles(filenames=files),background=TRUE, normalize=TRUE, target="core")
#Save data
write.exprs(norm_data,file="./BaselineVsControls/Data_normalized.txt")
#Load data
data <- read.table("./BaselineVsControls/Data_normalized.txt",header=T)
#Add matrix
med<-new("ExpressionSet", exprs=as.matrix(data))
main <- getMainProbes("pd.hta.2.0")[replace(getMainProbes("pd.hta.2.0")$type==1,is.na(getMainProbes("pd.hta.2.0")$type==1),FALSE),]
#Annotate
eset.main <- annotateEset(med, pd.hta.2.0)
eset.main <- eset.main[main$transcript_cluster_id,]
eset.main.med <- rowMedians(exprs(eset.main))
Everything runs perfectly, however now I want to export only those that are non-coding.
Using Thermo TAC (Transcriptome Analyse Control) program, there is a "non-coding gene" column in the final list. However, I couldn't find a way to do that in Bioconductor. Do you have any idea?
Thank you in advance.
OR, you could just add it to the fData slot
When I run
I got an error:
Do u know how to fix this?
Yes. Don't do what I said, but instead what I meant.
And for your personal edification, you could have figured that out on your own. You got an error saying some blahblahblah about the
pData
function. At which point you should think 'Huh, what does the help page say about that function? This guy MacDonald seems to have set me wrong.'And
?pData
would presentThe critical part being where it says that the object has to be either some sort of
eSet-class
or anAnnotatedDataFrame
. And if you then didYou would then know that you need to pass the
netaffxTranscript
object, rather than the character string "netaffxTranscript".Thank you so much for your reply and I apologize. I didn't know about the existence of netaffxTranscript, since I didn't find any efficient guide to analyze HTA 2.0 arrays and it's my first time using R and Bioconductor.
Hello James. One question. I promise you that I have tried to look for it on the internet and on this forum, but I have found nothing.
When I use my code to do the annotation of the main probes, I do this:
When I do this, I have a lot of "NA" in Gene Symbol column. However, when I use your code, I could see that the dataframe that creates the pData function, the mrnaAsigment column, most of the column does not contain "NA"
Do you know why? And what is your recommendation?
Thank you very much in advance.
My recommendation is to not do what you are doing.
Why are you writing the data to disk, and then having to recreate the
ExpressionSet
by hand? That makes no sense to me. You are apparently going through all the motions to generate aHTAFeatureSet
, then throwing all that out by writing the results to a text file. The whole rationale for using Bioconductor tools in the first place is to have these nice data containers that hold all your data, that you can operate on as if they were simpledata.frame
objects.There is nothing you can do with a text file that you couldn't have done with the original
HTAFeatureSet
object (well, except for reading into Excel...) and there are any number of things that you could do with theHTAFeatureSet
that you can't do with the text file.Anyway, this isn't a site intended for people to get their code checked by an expert (or god forbid, me), but instead is intended to provide help in using the existing functionality in Bioconductor. Any code that you write is your responsibility.
I personally just write the code to read in the CEL files and annotate the resulting object, and if I need to re-run, I re-run. Although you could also use BiocFileCache, or alternative do my usual bootleg version and just drop .Rdata or .RDS files for the big things that take time to generate and wrap in an
if
statement: