Agi4x44PreProcess /filtering probenames from GeneName

0

Entering edit mode

Maria Raeder ▴ 10

@maria-raeder-4550

Last seen 10.4 years ago

Dear Mailing List, I have been struggling for some time with some agilent single channel arrays, which I believe has been scanned with a earlier version AFE, because they do not contain the columns Sequence and chr coord, but I have tried to use the Agi4x44PreProcess package, with some adjustments, please see below. My main problem now is that I cannot remove the agilent probe names which are embedded within the genesymbol column for some genes The reason for doing this is to prepare files for GSEA analysis. The function for doing this in the Agi4x44PreProcess package: gsea.files, does not work, porbably due the the columns I am lacking, and the filter.probes also returns an error message, probably due to the same reason. I would be very grateful for any comments and help Thanks, Maria Here is the code : library("Agi4x44PreProcess") library("hgug4112a.db") library("vsn") library("convert") library("GO.db") setwd("/mydirectory") #reading targets file targets=read.targets(infile="targets_ec3.txt") targets[1:10,1:5] names(targets) #Many( has skipped them, but included FIleName, Treatment and GErep) #read in files with LIMMA: dd <- read.maimages(targets$FileName, source="agilent", columns = list(G = "gMedianSignal", Gb = "gBGUsed", R = "gProcessedSignal", Rb = "gBGMedianSignal"), annotation = c("Row", "Col","FeatureNum", "ControlType","ProbeName","ProbeUID", "GeneName", "SystematicName", "Description", "gIsWellAboveBG", "gIsFound", "gIsSaturated", "gIsFeatPopnOL", "gIsFeatNonUnifOL")) #reads inn 146 arrays) ##########Quality control (skipped) ###########Background correction and normailzation and log 2 transformation: library(vsn) ddNORM = BGandNorm(dd, BGmethod = "half", NORMmethod = "quantile",foreground = "MeanSignal", background = "BGMedianSignal", offset = 50, makePLOTpre = FALSE, makePLOTpost = FALSE) #filtering: ddFILT=filter.probes(ddNORM, control=TRUE, wellaboveBG=TRUE, isfound=TRUE, wellaboveNEG=TRUE, sat=TRUE, PopnOL=TRUE, NonUnifOL=TRUE, nas=TRUE, limWellAbove=75, limISF=75, limNEG=75, limSAT=75, limPopnOL=75, limNonUnifOL=75, limNAS=100, makePLOT=TRUE,annotation.package="hgug4112a.db",flag.c ounts=FALSE,targets) FILTERING PROBES BY FLAGS FILTERING BY ControlType FLAG Error in data.frame(PROBE_ID, as.character(probe.chr), as.character(probe.seq), : arguments imply differing number of rows: 43376, 0 [[alternative HTML version deleted]]

Annotation probe Agi4x44PreProcess Annotation probe Agi4x44PreProcess • 1.5k views

ADD COMMENT • link updated 13.8 years ago by Wolfgang Huber ★ 13k • written 13.8 years ago by Maria Raeder ▴ 10

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 4 months ago

EMBL European Molecular Biology Laborat…

Dear Maria I am not sure I understood your question, anyway: would perhaps the 'strsplit' function of R help you, that allows you to split strings and then extract components? E.g. the idiom sapply(strsplit(x, ","), "[", 2) will extract the text between the first and second comma in each string within x. Best wishes Wolfgang Il Mar/18/11 2:28 PM, Maria Raeder ha scritto: > Dear Mailing List, > > I have been struggling for some time with some agilent single channel > arrays, which I believe has been scanned with a earlier version AFE, > because they do not contain the columns Sequence and chr coord, but I > have tried to use the Agi4x44PreProcess package, with some > adjustments, please see below. My main problem now is that I cannot > remove the agilent probe names which are embedded within the > genesymbol column for some genes The reason for doing this is to > prepare files for GSEA analysis. The function for doing this in the > Agi4x44PreProcess package: gsea.files, does not work, porbably due > the the columns I am lacking, and the filter.probes also returns an > error message, probably due to the same reason. > > I would be very grateful for any comments and help > > Thanks, Maria > > Here is the code : > > library("Agi4x44PreProcess") library("hgug4112a.db") library("vsn") > library("convert") library("GO.db") > > setwd("/mydirectory") > > #reading targets file targets=read.targets(infile="targets_ec3.txt") > targets[1:10,1:5] > > names(targets) > > #Many( has skipped them, but included FIleName, Treatment and GErep) > > #read in files with LIMMA: dd<- read.maimages(targets$FileName, > source="agilent", columns = list(G = "gMedianSignal", Gb = "gBGUsed", > R = "gProcessedSignal", Rb = "gBGMedianSignal"), annotation = > c("Row", "Col","FeatureNum", "ControlType","ProbeName","ProbeUID", > "GeneName", "SystematicName", "Description", "gIsWellAboveBG", > "gIsFound", "gIsSaturated", "gIsFeatPopnOL", "gIsFeatNonUnifOL")) > > #reads inn 146 arrays) > > ##########Quality control (skipped) > > ###########Background correction and normailzation and log 2 > transformation: library(vsn) ddNORM = BGandNorm(dd, BGmethod = > "half", NORMmethod = "quantile",foreground = "MeanSignal", background > = "BGMedianSignal", offset = 50, makePLOTpre = FALSE, makePLOTpost = > FALSE) > > #filtering: ddFILT=filter.probes(ddNORM, control=TRUE, > wellaboveBG=TRUE, isfound=TRUE, wellaboveNEG=TRUE, sat=TRUE, > PopnOL=TRUE, NonUnifOL=TRUE, nas=TRUE, limWellAbove=75, limISF=75, > limNEG=75, limSAT=75, limPopnOL=75, limNonUnifOL=75, limNAS=100, > makePLOT=TRUE,annotation.package="hgug4112a.db",flag.counts=FALSE,ta rgets) > > FILTERING PROBES BY FLAGS > > > FILTERING BY ControlType FLAG Error in data.frame(PROBE_ID, > as.character(probe.chr), as.character(probe.seq), : arguments imply > differing number of rows: 43376, 0 > > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 13.8 years ago Wolfgang Huber ★ 13k

Login before adding your answer.