Problem getting the exact ProbeNames
1
0
Entering edit mode
@karsten-voigt-4431
Last seen 10.4 years ago
Dear all, I am currently working on a project where I need to get the exact IDs of probes of a custom Affymetrix Chip in order to merge it with another list containing the sequence. I am using this small R script for creating the list: mitdata <- ReadAffy(); stddata <- apply(pm(mitdata), 2, bg.adjust); nrmdata <- normalize.quantiles(stddata); namedata <- probeNames(mitdata); enddata <- cbind(namedata, nrmdata); write.table(enddata, file="probesdata.txt",sep="\t"); This is an output example ... 145 TZG_ARR_0001_x_at 135.115780787133 ... 146 TZG_ARR_0001_x_at 147.346049115501 ... 147 TZG_ARR_0001_x_at 203.840215898533 ... 148 TZG_ARR_0003_x_at 48.7635207480323 ... ... As you can see, a number of probes have the same name but refer to different oligos. The number in front of the row is just added by me, therefore you can ignore it. I received a list containing the probe name, a couple of other information AND the sequence. This is a part of it: 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1 This should be the same area. In this received list, I can identify the unique probes using the 2 numbers right after the exclamation mark, which are referring to the position on the chip, I guess. How can I extract those coordinates for my own list? I tried it with indices2xy, however I failed to get it running since I don't understand how to use this function correctly. Thanks in advance for all answers, Karsten Voigt -- _________________________________________________ Karsten Voigt, Msc. Experimentelle Bioinformatik, Hess Group University of Freiburg, BIO III t: 0761-2032708 m: 0176-61110420 e: karsten.voigt at biologie.uni-freiburg.de
probe probe • 938 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States
Hi Karsten, On 1/11/2011 12:56 PM, Karsten Voigt wrote: > Dear all, > > I am currently working on a project where I need to get the exact IDs of > probes of a custom Affymetrix Chip in order to merge it with another > list containing the sequence. > > I am using this small R script for creating the list: > > mitdata <- ReadAffy(); > stddata <- apply(pm(mitdata), 2, bg.adjust); > nrmdata <- normalize.quantiles(stddata); > namedata <- probeNames(mitdata); > enddata <- cbind(namedata, nrmdata); > write.table(enddata, file="probesdata.txt",sep="\t"); > > This is an output example > > ... > 145 TZG_ARR_0001_x_at 135.115780787133 ... > 146 TZG_ARR_0001_x_at 147.346049115501 ... > 147 TZG_ARR_0001_x_at 203.840215898533 ... > 148 TZG_ARR_0003_x_at 48.7635207480323 ... > ... > > As you can see, a number of probes have the same name but refer to > different oligos. The number in front of the row is just added by me, > therefore you can ignore it. > > I received a list containing the probe name, a couple of other > information AND the sequence. > > This is a part of it: > > 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1 > 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1 > 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1 > 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1 > > This should be the same area. > > In this received list, I can identify the unique probes using the 2 > numbers right after the exclamation mark, which are referring to the > position on the chip, I guess. How can I extract those coordinates for > my own list? I tried it with indices2xy, however I failed to get it > running since I don't understand how to use this function correctly. Using the hgu95av2cdf as an example: > library(hgu95av2cdf) > x <- as.list(hgu95av2cdf) > x <- x[order(names(x))] > x <- unlist(sapply(x, function(x) x[,1])) > xys <- indices2xy(x, cdf="hgu95av2cdf") > head(xys) x y 1000_at1 399 559 1000_at2 544 185 1000_at3 530 505 1000_at4 617 349 1000_at5 459 489 1000_at6 408 545 Best, Jim > > Thanks in advance for all answers, > > Karsten Voigt > > > > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
Hi all, On 01/11/2011 07:36 PM, James W. MacDonald wrote: > Hi Karsten, > > On 1/11/2011 12:56 PM, Karsten Voigt wrote: >> Dear all, >> >> I am currently working on a project where I need to get the exact IDs of >> probes of a custom Affymetrix Chip in order to merge it with another >> list containing the sequence. >> >> I am using this small R script for creating the list: >> >> mitdata <- ReadAffy(); >> stddata <- apply(pm(mitdata), 2, bg.adjust); >> nrmdata <- normalize.quantiles(stddata); >> namedata <- probeNames(mitdata); >> enddata <- cbind(namedata, nrmdata); >> write.table(enddata, file="probesdata.txt",sep="\t"); >> >> This is an output example >> >> ... >> 145 TZG_ARR_0001_x_at 135.115780787133 ... >> 146 TZG_ARR_0001_x_at 147.346049115501 ... >> 147 TZG_ARR_0001_x_at 203.840215898533 ... >> 148 TZG_ARR_0003_x_at 48.7635207480323 ... >> ... >> >> As you can see, a number of probes have the same name but refer to >> different oligos. The number in front of the row is just added by me, >> therefore you can ignore it. >> >> I received a list containing the probe name, a couple of other >> information AND the sequence. >> >> This is a part of it: >> >> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1 >> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1 >> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1 >> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1 >> >> This should be the same area. >> >> In this received list, I can identify the unique probes using the 2 >> numbers right after the exclamation mark, which are referring to the >> position on the chip, I guess. How can I extract those coordinates for >> my own list? I tried it with indices2xy, however I failed to get it >> running since I don't understand how to use this function correctly. > > Using the hgu95av2cdf as an example: > > > library(hgu95av2cdf) > > x <- as.list(hgu95av2cdf) > > x <- x[order(names(x))] > > x <- unlist(sapply(x, function(x) x[,1])) > > xys <- indices2xy(x, cdf="hgu95av2cdf") > > head(xys) > x y > 1000_at1 399 559 > 1000_at2 544 185 > 1000_at3 530 505 > 1000_at4 617 349 > 1000_at5 459 489 > 1000_at6 408 545 > > Best, > > Jim > first of all, many thanks to Jim for the quick and good answer. I runned your script on my own cdf and it is exactly extracting what I am looking for. However I still cannot identify the probes in my CEL-files loaded by the ReadAffy() function. If I run probeNames on it, the probes will be exported alphabetically. I cannot imagine that the CEL file probe values are also sorted alphabetically in the way I gained it. I think my way of creating this list is wrong since it is highly unlikely and impossible to prove that the probe names and the normalized data are listed in the same order: How can I prove that the probeNames are fitting to the probe values? Is it also possible to extract the x y values out of the cdf file? One other question: Is there any possibility to extract the sequence out of the cdf file? Many thanks in advance again, Karsten -- _________________________________________________ Karsten Voigt, Msc. Experimentelle Bioinformatik, Hess Group University of Freiburg, BIO III t: 0761-2032708 m: 0176-61110420 e: karsten.voigt at biologie.uni-freiburg.de
ADD REPLY
0
Entering edit mode
Hi Karsten, if you created an AffyBatch x with ReadAffy, then exprs(x) is a matrix whose rows correspond to the probes on the array, one after the other as they physically on the chip. The mapping between row-index in the AffyBatch and (x,y)-coordinates is provided by the functions indices2xy and xy2indices in the 'affy' package (whose code you can see by typing their name). Essentially, it is very simple: x = (i - 1) %% nr y = (i - 1) %/% nr and in reverse: i = x + 1 + nr * y where nr is the width of the chip. So one strategy is to compute the (x,y) index of each probe on your array by indices2xy(seq_len(nrow(mitdata)), abatch=mitdata) and use this to merge with your probe-sequence table. This might be easier and more transparent than going through probeNames. Probe sequences for many Affymetrix chips are obtained through the 'probe' packages (whose content is complementary to the smaller 'cdf' packages): library(hgu95av2probe) head(as.data.frame(hgu95av2probe)) Best wishes Wolfgang Karsten Voigt scripsit 12/01/11 15:28: > Hi all, > > On 01/11/2011 07:36 PM, James W. MacDonald wrote: >> Hi Karsten, >> >> On 1/11/2011 12:56 PM, Karsten Voigt wrote: >>> Dear all, >>> >>> I am currently working on a project where I need to get the exact IDs of >>> probes of a custom Affymetrix Chip in order to merge it with another >>> list containing the sequence. >>> >>> I am using this small R script for creating the list: >>> >>> mitdata <- ReadAffy(); >>> stddata <- apply(pm(mitdata), 2, bg.adjust); >>> nrmdata <- normalize.quantiles(stddata); >>> namedata <- probeNames(mitdata); >>> enddata <- cbind(namedata, nrmdata); >>> write.table(enddata, file="probesdata.txt",sep="\t"); >>> >>> This is an output example >>> >>> ... >>> 145 TZG_ARR_0001_x_at 135.115780787133 ... >>> 146 TZG_ARR_0001_x_at 147.346049115501 ... >>> 147 TZG_ARR_0001_x_at 203.840215898533 ... >>> 148 TZG_ARR_0003_x_at 48.7635207480323 ... >>> ... >>> >>> As you can see, a number of probes have the same name but refer to >>> different oligos. The number in front of the row is just added by me, >>> therefore you can ignore it. >>> >>> I received a list containing the probe name, a couple of other >>> information AND the sequence. >>> >>> This is a part of it: >>> >>> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1 >>> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1 >>> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1 >>> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1 >>> >>> This should be the same area. >>> >>> In this received list, I can identify the unique probes using the 2 >>> numbers right after the exclamation mark, which are referring to the >>> position on the chip, I guess. How can I extract those coordinates for >>> my own list? I tried it with indices2xy, however I failed to get it >>> running since I don't understand how to use this function correctly. >> >> Using the hgu95av2cdf as an example: >> >> > library(hgu95av2cdf) >> > x <- as.list(hgu95av2cdf) >> > x <- x[order(names(x))] >> > x <- unlist(sapply(x, function(x) x[,1])) >> > xys <- indices2xy(x, cdf="hgu95av2cdf") >> > head(xys) >> x y >> 1000_at1 399 559 >> 1000_at2 544 185 >> 1000_at3 530 505 >> 1000_at4 617 349 >> 1000_at5 459 489 >> 1000_at6 408 545 >> >> Best, >> >> Jim >> > > first of all, many thanks to Jim for the quick and good answer. I runned > your script on my own cdf and it is exactly extracting what I am looking > for. > > However I still cannot identify the probes in my CEL-files loaded by the > ReadAffy() function. If I run probeNames on it, the probes will be > exported alphabetically. I cannot imagine that the CEL file probe values > are also sorted alphabetically in the way I gained it. > > I think my way of creating this list is wrong since it is highly > unlikely and impossible to prove that the probe names and the normalized > data are listed in the same order: > > How can I prove that the probeNames are fitting to the probe values? Is > it also possible to extract the x y values out of the cdf file? > > One other question: Is there any possibility to extract the sequence out > of the cdf file? > > Many thanks in advance again, > > Karsten > > -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY
0
Entering edit mode
Dear all, thanks for the great input so far. I now have to test it and understand it. If there are any problems remaining, I will let you know ;-) Thanks and best whishes, Karsten > > Hi Karsten, > > if you created an AffyBatch x with ReadAffy, then exprs(x) is a matrix > whose rows correspond to the probes on the array, one after the other > as they physically on the chip. The mapping between row-index in the > AffyBatch and (x,y)-coordinates is provided by the functions > indices2xy and xy2indices in the 'affy' package (whose code you can > see by typing their name). Essentially, it is very simple: > > x = (i - 1) %% nr > y = (i - 1) %/% nr > and in reverse: > i = x + 1 + nr * y > > where nr is the width of the chip. So one strategy is to compute the > (x,y) index of each probe on your array by > > indices2xy(seq_len(nrow(mitdata)), abatch=mitdata) > > and use this to merge with your probe-sequence table. This might be > easier and more transparent than going through probeNames. > > Probe sequences for many Affymetrix chips are obtained through the > 'probe' packages (whose content is complementary to the smaller 'cdf' > packages): > > library(hgu95av2probe) > head(as.data.frame(hgu95av2probe)) > > > Best wishes > Wolfgang > > > Karsten Voigt scripsit 12/01/11 15:28: >> Hi all, >> >> On 01/11/2011 07:36 PM, James W. MacDonald wrote: >>> Hi Karsten, >>> >>> On 1/11/2011 12:56 PM, Karsten Voigt wrote: >>>> Dear all, >>>> >>>> I am currently working on a project where I need to get the exact >>>> IDs of >>>> probes of a custom Affymetrix Chip in order to merge it with another >>>> list containing the sequence. >>>> >>>> I am using this small R script for creating the list: >>>> >>>> mitdata <- ReadAffy(); >>>> stddata <- apply(pm(mitdata), 2, bg.adjust); >>>> nrmdata <- normalize.quantiles(stddata); >>>> namedata <- probeNames(mitdata); >>>> enddata <- cbind(namedata, nrmdata); >>>> write.table(enddata, file="probesdata.txt",sep="\t"); >>>> >>>> This is an output example >>>> >>>> ... >>>> 145 TZG_ARR_0001_x_at 135.115780787133 ... >>>> 146 TZG_ARR_0001_x_at 147.346049115501 ... >>>> 147 TZG_ARR_0001_x_at 203.840215898533 ... >>>> 148 TZG_ARR_0003_x_at 48.7635207480323 ... >>>> ... >>>> >>>> As you can see, a number of probes have the same name but refer to >>>> different oligos. The number in front of the row is just added by me, >>>> therefore you can ignore it. >>>> >>>> I received a list containing the probe name, a couple of other >>>> information AND the sequence. >>>> >>>> This is a part of it: >>>> >>>> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1 >>>> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1 >>>> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1 >>>> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1 >>>> >>>> This should be the same area. >>>> >>>> In this received list, I can identify the unique probes using the 2 >>>> numbers right after the exclamation mark, which are referring to the >>>> position on the chip, I guess. How can I extract those coordinates for >>>> my own list? I tried it with indices2xy, however I failed to get it >>>> running since I don't understand how to use this function correctly. >>> >>> Using the hgu95av2cdf as an example: >>> >>> > library(hgu95av2cdf) >>> > x <- as.list(hgu95av2cdf) >>> > x <- x[order(names(x))] >>> > x <- unlist(sapply(x, function(x) x[,1])) >>> > xys <- indices2xy(x, cdf="hgu95av2cdf") >>> > head(xys) >>> x y >>> 1000_at1 399 559 >>> 1000_at2 544 185 >>> 1000_at3 530 505 >>> 1000_at4 617 349 >>> 1000_at5 459 489 >>> 1000_at6 408 545 >>> >>> Best, >>> >>> Jim >>> >> >> first of all, many thanks to Jim for the quick and good answer. I runned >> your script on my own cdf and it is exactly extracting what I am looking >> for. >> >> However I still cannot identify the probes in my CEL-files loaded by the >> ReadAffy() function. If I run probeNames on it, the probes will be >> exported alphabetically. I cannot imagine that the CEL file probe values >> are also sorted alphabetically in the way I gained it. >> >> I think my way of creating this list is wrong since it is highly >> unlikely and impossible to prove that the probe names and the normalized >> data are listed in the same order: >> >> How can I prove that the probeNames are fitting to the probe values? Is >> it also possible to extract the x y values out of the cdf file? >> >> One other question: Is there any possibility to extract the sequence out >> of the cdf file? >> >> Many thanks in advance again, >> >> Karsten >> >> > > -- _________________________________________________ Karsten Voigt, Msc. Experimentelle Bioinformatik, Hess Group University of Freiburg, BIO III t: 0761-2032708 m: 0176-61110420 e: karsten.voigt at biologie.uni-freiburg.de
ADD REPLY

Login before adding your answer.

Traffic: 481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6