Hi,
I am using Clariom D Assays arrays and have tried to annotate it using the following:
library(oligo)
rma <- oligo::rma(data)
library(affycoretools)
anno<- annotateEset(rma, pd.clariom.d.human)
write.table(anno, file = "anno.txt")
Unfortunately, I got not annotated txt file with samples as rows and TC* IDs as columns.
How can I get annotated file (with gene annotation and transcript genome position) and with samples as a column and IDs as a row.
Thank you
Anita
Hi James.
Thank you for your reply. Yes, you are right "write" didn't work properly, using data.frame did.
but in my annotated file I have the following columns:
"PROBEID" "ID" "SYMBOL" "GENENAME" "Xsample1" "Xsample2" (...)
1. I do not know why "X" is added at the beginning of all my sample names
2.What I would like to have is a chromosome number and chromosome start position for a gene to do expression quantitative trait loci. Is this something I could get from annotateEset?
Thank you
Anita
1.) Usually an X is appended if the column names don't meet certain criteria, in which case
make.names
is called. But if yoursampleNames
are sample1 - sampleX, that shouldn't happen:Anyway, it's simple enough to strip off the X characters. See ?gsub.
2.) You will have to do an extra couple of steps to add chromosomal positions. The genomic locations aren't in the orgDb packages any longer, so you will need the Homo.sapiens package:
Hi James,
Thanks very much. I have run:
data.rma <- oligo::rma(data)
library(affycoretools)
anno <- annotateEset(data.rma, pd.clariom.d.human)
library(Homo.sapiens)
fd <- fData(anno)
fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")
fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")
out <- data.frame(fData(fd$CHRLOC), exprs(fd$CHRLOC))
write.table(out, "data.txt")
but got an error:
Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) :
mapIds must have at least one key to match against.
Calls: mapIds -> mapIds -> mapIds_base
Execution halted
The data in the pdInfoPackage doesn't include the Entrez Gene ID. You are better off using the clariomdhumantranscriptcluster.db package.
And then proceed as I mentioned. But do note that this
isn't something I ever recommended you do, and whatever you think it will do, it won't. I leave it up to you to figure out why that is.
Thank you James, yes you are right fd$CHR and fd$CHR are list objects.
What I did was to change anno into matrix, list into data.frame and try cbind but the number or rows do not much.
matrix <- exprs(anno)
df <- data.frame(matrix(unlist(fd$CHRLOC)) )
df2 <- data.frame(matrix(unlist(fd$CHR)) )
results <- cbind(df,df2, t(matrix))
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 25762, 350
> dim(df2)
[1] 25762 1
> dim(df)
[1] 25762 1
> dim(matrix)
[1] 138745 350
Why didn't you simply use the code that I already provided?
Then you would normally proceed to analysis using e.g., limma. Or if you really want, you can do
and do whatever it is that you want to do with those data.
Thank you for that James. So, I run according to your suggestions, but the output doesn't look right:
fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")
'select()' returned 1:many mapping between keys and columns
fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")
'select()' returned 1:many mapping between keys and columns
fData(anno) <- fd
out <- data.frame(fd, exprs(anno))
write.table(out, "annotated.txt")
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
unimplemented type 'list' in 'EncodeElement'
I guess the error is because I have list objects in my data.frame.
> typeof(out$PROBEID)
[1] "integer"
> typeof(out$ENTREZID)
[1] "integer"
> typeof(out$SYMBOL)
[1] "integer"
> typeof(out$GENENAME)
[1] "integer"
> typeof(out$CHR)
[1] "list"
> typeof(out$CHRLOC)
[1] "list"
and unfortunately the output is truncated and looks like that:
PROBEID
ENTREZID
SYMBOL
GENENAME
CHR
CHRLOC
Xaample1 Xsample2 (...)
AFFX-BkGr-GC03_st
AFFX-BkGr-GC03_st
NA
NA
NA
Hi James,
I have tried
df <- apply(df,5,as.character)
fwrite(out, file ="..")
out$col5 = as.character(out$col5)
but nothing is working ...
Can I also ask if the annotation is build human genome build 37, and if not is it possible to change to 37?
Thanks again
Hi again James,
I have modified the lines and it worked:
out$CHR = as.character(out$CHR)
out$CHRLOC = as.character(out$CHRLOC)
out = as.matrix(out)
write.csv(out, "data.csv")
However, I have 2 question
1. how to add CHRSTOP , I have tried CDSSTOP but had
Error in .testForValidCols(x, cols) :
Invalid columns: CDSSTOP. Please use the columns method to see a listing of valid arguments.
2. Is the annotation based on human genome build 37?
Thanks!
So you got this message:
Which seems pretty clear to me; you asked for something that isn't valid, and here's how to get a list of things that are valid. If you are planning to do much of anything with R/Bioconductor, you will have to learn to figure things out for yourself, particularly those things where you get an error message that clearly spells out where you went wrong and how to fix it.
In addition, the code you show won't return a list, so your example code doesn't match what you must have done:
So if your out$CHR and out$CHRLOC are lists, it is most likely that you are doing
which as you found doesn't work as you might expect. And to answer your final question, note that the Homo.sapiens package has a
show
method: