All-
I have been attempting to compare sequences on the HGU133 Plus 2.0
chip to
the HT HGU 133+ PM.
I am doing this to compare values of vectors in frma.
The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes
removes and
some probesets reduced in size.
Looking at the probe package:
hthgu133pluspmprobe$sequence: 519370
However, when looking at an Affybatch object made from HT CEL files:
Taking an Affybatch object: 'dat'
Index <- pmindex(dat)
tv = unlist(Index)
length(tv) #536460
It appears that the Affybatch reports that there are 536460 sequences
and
the hthgu133pluspmprobe package is reporting only 519370.
What is the difference? It is possible to find the information on the
17090 sequences not in the hthgu133pluspmprobe package?
Thanks for any information or direction.
Eric Zollars
Session info below: bioconductor 2.13, R 3.0.2
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
base
other attached packages:
[1] affy_1.40.0 hthgu133pluspmcdf_2.13.0
hgu133plus2frmavecs_1.3.0
[4] hgu133plus2probe_2.13.0 hthgu133pluspmprobe_2.13.0
AnnotationDbi_1.24.0
[7] Biobase_2.22.0 BiocGenerics_0.8.0
BiocInstaller_1.12.0
loaded via a namespace (and not attached):
[1] affyio_1.30.0 DBI_0.2-7 IRanges_1.20.6
[4] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2
[7] tools_3.0.2 zlibbioc_1.8.0
--
Eric Zollars MD, PhD
Fellow, Division of Rheumatology
The Johns Hopkins Hospital
[[alternative HTML version deleted]]
Hi Eric,
Most if not all of those probes are the oligo-dT probes that surround
the chip (and I believe there are some in the middle as well). These
probes are used by the scanner as 'landing lights' to allow the
scanner
to accurately align to the array prior to doing the scan.
The scanner does collect data from these probes, which ends up in the
cel file, but they are then ignored when the array is processed
further.
Best,
Jim
On 12/20/2013 1:28 PM, Eric Zollars wrote:
> All-
>
> I have been attempting to compare sequences on the HGU133 Plus 2.0
chip to
> the HT HGU 133+ PM.
> I am doing this to compare values of vectors in frma.
>
> The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes
removes and
> some probesets reduced in size.
>
> Looking at the probe package:
>
> hthgu133pluspmprobe$sequence: 519370
>
> However, when looking at an Affybatch object made from HT CEL files:
> Taking an Affybatch object: 'dat'
>
> Index <- pmindex(dat)
> tv = unlist(Index)
> length(tv) #536460
>
> It appears that the Affybatch reports that there are 536460
sequences and
> the hthgu133pluspmprobe package is reporting only 519370.
>
> What is the difference? It is possible to find the information on
the
> 17090 sequences not in the hthgu133pluspmprobe package?
>
> Thanks for any information or direction.
>
> Eric Zollars
>
> Session info below: bioconductor 2.13, R 3.0.2
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> base
>
> other attached packages:
> [1] affy_1.40.0 hthgu133pluspmcdf_2.13.0
> hgu133plus2frmavecs_1.3.0
> [4] hgu133plus2probe_2.13.0 hthgu133pluspmprobe_2.13.0
> AnnotationDbi_1.24.0
> [7] Biobase_2.22.0 BiocGenerics_0.8.0
> BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.30.0 DBI_0.2-7 IRanges_1.20.6
> [4] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2
> [7] tools_3.0.2 zlibbioc_1.8.0
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
Jim-
Thanks for the response.
However, in the hgu133plus2probe package there is complete agreement
between what is in the probe package and what the Affybatch object
reports
(604258 sequences).
Why would that be so?
On Fri, Dec 20, 2013 at 2:05 PM, James W. MacDonald <jmacdon@uw.edu>
wrote:
> Hi Eric,
>
> Most if not all of those probes are the oligo-dT probes that
surround the
> chip (and I believe there are some in the middle as well). These
probes are
> used by the scanner as 'landing lights' to allow the scanner to
accurately
> align to the array prior to doing the scan.
>
> The scanner does collect data from these probes, which ends up in
the cel
> file, but they are then ignored when the array is processed further.
>
> Best,
>
> Jim
>
>
>
> On 12/20/2013 1:28 PM, Eric Zollars wrote:
>
>> All-
>>
>> I have been attempting to compare sequences on the HGU133 Plus 2.0
chip to
>> the HT HGU 133+ PM.
>> I am doing this to compare values of vectors in frma.
>>
>> The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes
removes
>> and
>> some probesets reduced in size.
>>
>> Looking at the probe package:
>>
>> hthgu133pluspmprobe$sequence: 519370
>>
>> However, when looking at an Affybatch object made from HT CEL
files:
>> Taking an Affybatch object: 'dat'
>>
>> Index <- pmindex(dat)
>> tv = unlist(Index)
>> length(tv) #536460
>>
>> It appears that the Affybatch reports that there are 536460
sequences and
>> the hthgu133pluspmprobe package is reporting only 519370.
>>
>> What is the difference? It is possible to find the information on
the
>> 17090 sequences not in the hthgu133pluspmprobe package?
>>
>> Thanks for any information or direction.
>>
>> Eric Zollars
>>
>> Session info below: bioconductor 2.13, R 3.0.2
>>
>> sessionInfo()
>>>
>> R version 3.0.2 (2013-09-25)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>> States.1252
>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> base
>>
>> other attached packages:
>> [1] affy_1.40.0 hthgu133pluspmcdf_2.13.0
>> hgu133plus2frmavecs_1.3.0
>> [4] hgu133plus2probe_2.13.0 hthgu133pluspmprobe_2.13.0
>> AnnotationDbi_1.24.0
>> [7] Biobase_2.22.0 BiocGenerics_0.8.0
>> BiocInstaller_1.12.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.30.0 DBI_0.2-7 IRanges_1.20.6
>> [4] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2
>> [7] tools_3.0.2 zlibbioc_1.8.0
>>
>>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
[[alternative HTML version deleted]]
Hi Eric,
Good point. So let's look, shall we?
> library(hthgu133pluspmprobe)
> library(hthgu133pluspmcdf)
> ht <- as.data.frame(hthgu133pluspmprobe)
> prb.lst <- tapply(1:nrow(ht), ht$Probe.Set.Name, function(x)
ht[x,2:3])
> cdf.lst <- mget(ls(hthgu133pluspmcdf), hthgu133pluspmcdf)
> names(prb.lst) <- tolower(names(prb.lst)) ## because stupid Affy
can't keep their names consistent
> names(cdf.lst) <- tolower(names(cdf.lst))
> all.equal(names(prb.lst), names(cdf.lst))
[1] TRUE
> prb.lst.len <- sapply(prb.lst, nrow)
> cdf.lst.len <- sapply(cdf.lst, nrow)
> all.equal(prb.lst.len, cdf.lst.len)
[1] "Mean relative difference: 427.25"
> length(which(prb.lst.len != cdf.lst.len))
[1] 40
> cbind(prb.lst.len, cdf.lst.len)[prb.lst.len != cdf.lst.len,]
prb.lst.len cdf.lst.len
affx-nonspecificgc10_at 1 952
affx-nonspecificgc11_at 1 960
affx-nonspecificgc12_at 1 973
affx-nonspecificgc13_at 1 968
affx-nonspecificgc14_at 1 960
affx-nonspecificgc15_at 1 949
affx-nonspecificgc16_at 1 963
affx-nonspecificgc17_at 1 942
affx-nonspecificgc18_at 1 912
affx-nonspecificgc19_at 1 849
affx-nonspecificgc20_at 1 813
affx-nonspecificgc21_at 1 697
affx-nonspecificgc22_at 1 585
affx-nonspecificgc23_at 1 407
affx-nonspecificgc24_at 1 268
affx-nonspecificgc25_at 1 9
affx-nonspecificgc3_at 1 25
affx-nonspecificgc4_at 1 322
affx-nonspecificgc5_at 1 703
affx-nonspecificgc6_at 1 873
affx-nonspecificgc7_at 1 914
affx-nonspecificgc8_at 1 940
affx-nonspecificgc9_at 1 959
affx-r2-taga_at 1 11
affx-r2-tagb_at 1 11
affx-r2-tagc_at 1 11
affx-r2-tagd_at 1 11
affx-r2-tage_at 1 11
affx-r2-tagf_at 1 11
affx-r2-tagg_at 1 11
affx-r2-tagh_at 1 11
affx-r2-tagin-3_at 1 11
affx-r2-tagin-5_at 1 11
affx-r2-tagin-m_at 1 11
affx-r2-tagj-3_at 1 11
affx-r2-tagj-5_at 1 11
affx-r2-tago-3_at 1 11
affx-r2-tago-5_at 1 11
affx-r2-tagq-3_at 1 11
affx-r2-tagq-5_at 1 11
So there you go - there's a bunch of control probes of different sorts
for which Affy gives us a single sequence, but for which there appear
to
be lots of probes. Netaffx seems unwilling to say much about the
nonspecificgc probes, but as an example, it does say there are 11
individual probe sequences for e.g., affx-r2-tagin-3_at.
Best,
Jim
On 12/20/2013 2:15 PM, Eric Zollars wrote:
> Jim-
> Thanks for the response.
>
> However, in the hgu133plus2probe package there is complete agreement
> between what is in the probe package and what the Affybatch object
reports
> (604258 sequences).
>
> Why would that be so?
>
>
> On Fri, Dec 20, 2013 at 2:05 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote:
>
>> Hi Eric,
>>
>> Most if not all of those probes are the oligo-dT probes that
surround the
>> chip (and I believe there are some in the middle as well). These
probes are
>> used by the scanner as 'landing lights' to allow the scanner to
accurately
>> align to the array prior to doing the scan.
>>
>> The scanner does collect data from these probes, which ends up in
the cel
>> file, but they are then ignored when the array is processed
further.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 12/20/2013 1:28 PM, Eric Zollars wrote:
>>
>>> All-
>>>
>>> I have been attempting to compare sequences on the HGU133 Plus 2.0
chip to
>>> the HT HGU 133+ PM.
>>> I am doing this to compare values of vectors in frma.
>>>
>>> The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes
removes
>>> and
>>> some probesets reduced in size.
>>>
>>> Looking at the probe package:
>>>
>>> hthgu133pluspmprobe$sequence: 519370
>>>
>>> However, when looking at an Affybatch object made from HT CEL
files:
>>> Taking an Affybatch object: 'dat'
>>>
>>> Index <- pmindex(dat)
>>> tv = unlist(Index)
>>> length(tv) #536460
>>>
>>> It appears that the Affybatch reports that there are 536460
sequences and
>>> the hthgu133pluspmprobe package is reporting only 519370.
>>>
>>> What is the difference? It is possible to find the information on
the
>>> 17090 sequences not in the hthgu133pluspmprobe package?
>>>
>>> Thanks for any information or direction.
>>>
>>> Eric Zollars
>>>
>>> Session info below: bioconductor 2.13, R 3.0.2
>>>
>>> sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>> States.1252
>>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets
methods
>>> base
>>>
>>> other attached packages:
>>> [1] affy_1.40.0 hthgu133pluspmcdf_2.13.0
>>> hgu133plus2frmavecs_1.3.0
>>> [4] hgu133plus2probe_2.13.0 hthgu133pluspmprobe_2.13.0
>>> AnnotationDbi_1.24.0
>>> [7] Biobase_2.22.0 BiocGenerics_0.8.0
>>> BiocInstaller_1.12.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.30.0 DBI_0.2-7 IRanges_1.20.6
>>> [4] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2
>>> [7] tools_3.0.2 zlibbioc_1.8.0
>>>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099