how to find probes' names in probeset
1
0
Entering edit mode
@glazko-galina-1653
Last seen 10.3 years ago
Dear list, I would appreciate if someone can clarify for me this - seemingly - simple issue: I have probes for the probe set: probes1 <- subset(drosophila2probe, Probe.Set.Name == "1631333_s_at") > as.data.frame(probes1) sequence x y Probe.Set.Name Probe.Interrogation.Position Target.Strandedness 119715 CTCACATTCTTCTCCTAATACGATA 2 273 1631333_s_at 1011 Antisense 119716 CGGCCATTCTGGACTTCTGGGACAA 4 289 1631333_s_at 490 Antisense 119717 GGTCCCGGTGGTATCATCTGCAACA 564 535 1631333_s_at 525 Antisense 119718 ATCTGCAACATTGGATCCGTCACTG 656 39 1631333_s_at 540 Antisense 119719 GGATTCAATGCCATCTACCAGGTGC 467 543 1631333_s_at 564 Antisense 119720 CGGCGTGACGGCTTACACTGTGAAC 40 289 1631333_s_at 659 Antisense 119721 TGGTGCACACGTTCAACTCCTGGTT 682 591 1631333_s_at 706 Antisense 119722 ACTCCTGGTTGGATGTTGAGCCTCA 573 145 1631333_s_at 721 Antisense 119723 TTGAGCCTCAGGTTGCCGAGAAGCT 93 725 1631333_s_at 736 Antisense 119724 GAACTTCGTCAAGGCTATCGAGCTG 670 383 1631333_s_at 800 Antisense 119725 GGAAACTGGACTTGGGCACCCTGGA 399 559 1631333_s_at 844 Antisense 119726 TGGAGGCCATCCAGTGGACCAAGCA 249 589 1631333_s_at 865 Antisense 119727 CTGGGACTCCGGCATCTAAGAAGTG 311 285 1631333_s_at 890 Antisense 119728 AAGGCTGATTCGATGCACACTCACA 612 225 1631333_s_at 992 Antisense on the other hand, if >dat<-ReadAffy() >y=log2(pm(dat),geneNames(dat))) >ind<-grep("^1631333_s_at",rownames(y)) >sub<-y[ind,] > sub E1E1_DrosophilaGenome2.0.CEL 1631333_s_at1 14.11578 1631333_s_at2 14.22671 1631333_s_at3 14.16891 1631333_s_at4 13.29505 1631333_s_at5 14.28973 1631333_s_at6 13.73725 1631333_s_at7 14.33371 1631333_s_at8 14.15979 1631333_s_at9 14.30442 1631333_s_at10 14.70169 1631333_s_at11 14.25695 1631333_s_at12 14.39359 1631333_s_at13 14.51533 1631333_s_at14 13.42114 Where do the probe numbers 1-14 come from? I would like to be able to relate them to the information in 'probes1'. For example, that 1631333_s_at1 is actually N1 in probes1, or something like this. I thought may be the numbering of probes (119715:119728) in probes1 has something to do with the numbering in y, but this is not the case: > ind [1] 118091 118092 118093 118094 118095 118096 118097 118098 118099 118100 118101 118102 118103 118104 Thank you! best regards Galina [[alternative HTML version deleted]]
probe probe • 924 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 days ago
United States
Hi Galina, On 8/27/2010 11:45 AM, Glazko, Galina wrote: > Dear list, > > I would appreciate if someone can clarify for me this - seemingly - simple issue: > > I have probes for the probe set: > probes1<- subset(drosophila2probe, Probe.Set.Name == "1631333_s_at") >> as.data.frame(probes1) > sequence x y Probe.Set.Name Probe.Interrogation.Position Target.Strandedness > 119715 CTCACATTCTTCTCCTAATACGATA 2 273 1631333_s_at 1011 Antisense > 119716 CGGCCATTCTGGACTTCTGGGACAA 4 289 1631333_s_at 490 Antisense > 119717 GGTCCCGGTGGTATCATCTGCAACA 564 535 1631333_s_at 525 Antisense > 119718 ATCTGCAACATTGGATCCGTCACTG 656 39 1631333_s_at 540 Antisense > 119719 GGATTCAATGCCATCTACCAGGTGC 467 543 1631333_s_at 564 Antisense > 119720 CGGCGTGACGGCTTACACTGTGAAC 40 289 1631333_s_at 659 Antisense > 119721 TGGTGCACACGTTCAACTCCTGGTT 682 591 1631333_s_at 706 Antisense > 119722 ACTCCTGGTTGGATGTTGAGCCTCA 573 145 1631333_s_at 721 Antisense > 119723 TTGAGCCTCAGGTTGCCGAGAAGCT 93 725 1631333_s_at 736 Antisense > 119724 GAACTTCGTCAAGGCTATCGAGCTG 670 383 1631333_s_at 800 Antisense > 119725 GGAAACTGGACTTGGGCACCCTGGA 399 559 1631333_s_at 844 Antisense > 119726 TGGAGGCCATCCAGTGGACCAAGCA 249 589 1631333_s_at 865 Antisense > 119727 CTGGGACTCCGGCATCTAAGAAGTG 311 285 1631333_s_at 890 Antisense > 119728 AAGGCTGATTCGATGCACACTCACA 612 225 1631333_s_at 992 Antisense > > on the other hand, if >> dat<-ReadAffy() >> y=log2(pm(dat),geneNames(dat))) >> ind<-grep("^1631333_s_at",rownames(y)) >> sub<-y[ind,] >> sub > E1E1_DrosophilaGenome2.0.CEL > 1631333_s_at1 14.11578 > 1631333_s_at2 14.22671 > 1631333_s_at3 14.16891 > 1631333_s_at4 13.29505 > 1631333_s_at5 14.28973 > 1631333_s_at6 13.73725 > 1631333_s_at7 14.33371 > 1631333_s_at8 14.15979 > 1631333_s_at9 14.30442 > 1631333_s_at10 14.70169 > 1631333_s_at11 14.25695 > 1631333_s_at12 14.39359 > 1631333_s_at13 14.51533 > 1631333_s_at14 13.42114 > > Where do the probe numbers 1-14 come from? These come from the fact that you cannot have duplicate row names for a data.frame, so R mangles the names by adding sequential numbers on the end. I would like to be able to relate them to the information in 'probes1'. > For example, that 1631333_s_at1 is actually N1 in probes1, or something like this. > I thought may be the numbering of probes (119715:119728) in probes1 has something to do with the numbering in y, but this is not the case: >> ind > [1] 118091 118092 118093 118094 118095 118096 118097 118098 118099 118100 118101 118102 118103 118104 You can line things up using the (x, y) coordinates from the probe package, along with the (x, y) coordinates from the cdf package. As an example, let's use the hgu95av2 chip, since I already have the required packages installed. On this chip, 10193 of the probesets are ordered the same for both the cdf and probe package. But there are > 12k probesets, so that isn't close enough. This one matches: > indices2xy(get("100_g_at", hgu95av2cdf)[,1], cdf="hgu95av2cdf") x y [1,] 497 273 [2,] 208 557 [3,] 495 355 [4,] 478 371 [5,] 612 429 [6,] 563 317 [7,] 223 559 [8,] 523 575 [9,] 551 445 [10,] 509 475 [11,] 576 249 [12,] 568 349 [13,] 523 441 [14,] 562 421 [15,] 622 473 [16,] 567 607 > a <- as.data.frame(hgu95av2probe) > a[a$Probe.Set.Name == "100_g_at",c("x","y")] x y 449 497 273 450 208 557 451 495 355 452 478 371 453 612 429 454 563 317 455 223 559 456 523 575 457 551 445 458 509 475 459 576 249 460 568 349 461 523 441 462 562 421 463 622 473 464 567 607 This one does not: > indices2xy(get("1002_f_at", hgu95av2cdf)[,1], cdf="hgu95av2cdf") x y [1,] 309 555 [2,] 195 583 [3,] 375 585 [4,] 341 403 [5,] 629 153 [6,] 619 379 [7,] 480 471 [8,] 439 475 [9,] 410 391 [10,] 619 491 [11,] 537 237 [12,] 510 255 [13,] 500 275 [14,] 381 521 [15,] 366 541 [16,] 449 357 > a[a$Probe.Set.Name == "1002_f_at",c("x","y")] x y 2870 449 357 2871 309 555 2872 195 583 2873 375 585 2874 341 403 2875 629 153 2876 619 379 2877 480 471 2878 439 475 2879 410 391 2880 619 491 2881 537 237 2882 510 255 2883 500 275 2884 381 521 2885 366 541 So lining up the data is just a matter of extracting the data, and re-ordering based on the (x, y) coordinate information. Best, Jim > > Thank you! > best regards > Galina > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT

Login before adding your answer.

Traffic: 433 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6