Importing and Extracting Annotation
1
0
Entering edit mode
@atiqahrahman-10071
Last seen 8.6 years ago

Hi all!

I'm new with R and amd currently working on data from ZebGene-1_0-st arrays. However I am having problem doing the annotations as firstly there is no package in bioconductor and secondly, the sample workflow that I found for the array does not yield a true sanity check/identical. I realised that the workflow below does not extract and reorder to match my probes. Any advice to overcome this problem helps! Thank you in advance :)

Workflow:

# Import the annotations
dat <- read.csv(file.path(metaDir, "ZebGene-1_0-st-v1.na33.3.zv9.transcript.csv"), comment.char = "#", stringsAsFactors=FALSE, na.string = "---")
dat <- col2rownames(dat, "probeset_id")
#extract and reorder to match the array features
dat <- dat[row.names(fData(affyNorm.batch)),]
dat <- dat[,c("probeset_id", "seqname", "strand", "start", "stop", "gene_assignment", "mrna_assignment")]
dat <- as.matrix(dat)
# parse mrna_assignments
headercol <- "mrna_assignment"
mrnas <- t(sapply(strsplit(dat[, headercol], " /// "), function(x) {
  dat.probe.df <- do.call(rbind, strsplit(x, " // "))
  bestrna <- dat.probe.df[1,1]
  rnas <- paste(dat.probe.df[,1], collapse=",")
  c(bestrna, rnas)
  }))
mrnas <- as.data.frame(mrnas)
names(mrnas) <- c("best.mrna", "mrnas")
# parse gene assignments
headercol <- "gene_assignment"
genes <- t(sapply(strsplit(dat[, headercol], " /// "), function(x) {
  if(is.na(x[1])){
    out <- rep("NA", 6)
    } else {
      dat.probe.mat <- as.matrix(do.call(rbind, strsplit(x, " // ")))
      bestgene <- as.character(dat.probe.mat[1,1])
      dat.probe.vec <- apply(dat.probe.mat, 2, function(y) {
        paste(unique(y), collapse=",")
        })
      out <- as.character(c(bestgene,dat.probe.vec))
      }
  return(out)
  }))

genes <- as.data.frame(genes[,c(1,2,3,4,6)])
names(genes) <- c("bestgene", "accessions", "symbols", "descriptions", "entrezIDs")
genes <- rownames2col(genes, "probeids")
#combo mrna and gene assigments
gene.annots <- cbind(genes, mrnas)
annotation zebrafish probe • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

This isn't really a good question for this site, as it is only tangentially related to Bioconductor packages, and has more to do with R coding and whatnot. And that sort of thing is IMO better learned by seeing how others have tackled similar problems and emulating what you think is reasonable.

So please note that I have very similar functionality in the devel version of affycoretools that you can see here (you want to look at .dataFromNetaffx). I would also point out a couple of things. First, the pdInfoPackage already comes with a parsed version of the annotation csv file that you can access using getNetAffx from the oligo package (which will already be loaded and available to you). Second, if you put the results into the featureData slot of your ExpressionSet, you can run validObject to make sure things line up correctly. That's a good validity check, plus the featureData slot will propagate through the limma package and end up in your topTable if you analyze your data using limma (which IMO you should).

 

ADD COMMENT

Login before adding your answer.

Traffic: 269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6