Hi James,
On 8/31/2010 2:04 PM, James Anderson wrote:
> Hi Jim,
>
> Thanks a bunch for your help on this, it works. Sorry to bother you
> again, but is there a function to convert the probe indices into the
> probe characters you described? For example, for U133A, the probe
> indices is 1:247965, is there a function to convert it to
> Probeset1_1, ProbeSet1_2, ...?ProbeSet1_11, ProbeSet2_1,
ProbeSet2_2,
> ...ProbSet2_11, ...., ProbeSet22283_1, ProbeSet22283_2,
> ProbeSet22283_11?
So you want *all* of the probes? Not sure where you are heading with
this, but it isn't difficult to get them. I don't have the hgu133acdf
installed, so for an example I will use the hgu95av2cdf:
> library(hgu95av2cdf)
> x <- as.list(hgu95av2cdf)
> y <- sort(unlist(sapply(x, function(q) q[,1])))
> head(y)
31483_g_at16 33941_at4 33941_at5
646 647 648
31977_at2 32448_at1 38227_at12
649 650 651
> head(names(y))
[1] "31483_g_at16" "33941_at4" "33941_at5"
[4] "31977_at2" "32448_at1" "38227_at12"
> length(y)
[1] 201800
Note that there are actually 403600 usable probe positions on this
particular chip, but the other 201800 are MM probes, and have the same
exact name, so we don't need those.
Also note that there are 6000 probes on this chip that we ignore
(there
are actually 409600 rows in the exprs slot of the AffyBatch). These
extra probes are the oligo-B2 probes that are on the outside of the
chip, used by the scanner to align to the chip.
Best,
Jim
>
> Thanks again,
>
> -James
>
> --- On Tue, 8/31/10, James W. MacDonald<jmacdon at="" med.umich.edu="">
> wrote:
>
> From: James W. MacDonald<jmacdon at="" med.umich.edu=""> Subject: Re:
[BioC]
> question regarding MAS5 normalization with reduced probes To: "James
> Anderson"<janderson_net at="" yahoo.com=""> Cc:
> "bioconductor"<bioconductor at="" stat.math.ethz.ch=""> Date: Tuesday,
August
> 31, 2010, 1:15 PM
>
> Hi James,
>
> On 8/31/2010 12:17 PM, James Anderson wrote:
>> Hi Jim,
>>
>> Thanks a lot for the link. I've tried the code in the link, it
>> works without any problem if I were to take the whole probesets
>> out. However, I do encounter some problem when I need to take not
>> only some probe sets, but also some probes (but not the whole probe
>> set) out, maybe because I did not provide the correct format of the
>> probes.
>>
>> (I assume you are familiar with the content in the script provided
>> in the link).
>>
>> If I randomly take out 2000 probe sets from U133A, maskedprobeSets
>> = rownames(MAS5_matrix)[sample(1:22283,2000)]
>> RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobeSets,
>> cleancdf)
>>
>> It works fine and whatever affyBatch object read using the cleancdf
>> has a reduced dimension.
>>
>> However, if I do
>>
>> maskedprobeSets = rownames(MAS5_matrix)[sample(1:22283,2000)]
>> maskedprobes = rownames(pm(A))[1:2000]
>
> Assuming that 'A' is an AffyBatch, what you will get back from that
> call to rownames is a bunch of numbers in character format.
>
> An example using the Dilution dataset:
>
>> rownames(pm(Dilution))[1:10]
> [1] "175218" "356689" "227696" "237919" "275173" "203444" "357984"
> "368524" [9] "285352" "304510"
>
> Which you can see is not very useful. What you want are the probeset
> IDs, along with an appended number (which is equal to the position
> of the probe in the probeset).
>
> Now, say we are concerned about the "100_g_at" probeset in the
> Dilution dataset:
>
>> pm(Dilution, "100_g_at")
> 20A 20B 10A 10B 100_g_at1 221.3 146.3 192.0 116.0
100_g_at2
> 685.0 479.0 493.0 328.3 100_g_at3 1126.3 724.3 849.0 498.3
> 100_g_at4 205.0 126.5 136.0 97.0 100_g_at5 580.8 341.8 374.0
> 226.0 100_g_at6 161.3 109.5 139.0 92.3 100_g_at7 1645.3 992.3
> 1006.8 670.0 100_g_at8 624.0 348.0 336.3 224.5 100_g_at9 274.0
> 156.0 203.8 119.0 100_g_at10 240.0 156.3 223.0 122.0 100_g_at11
> 438.0 278.3 362.5 198.0 100_g_at12 554.0 334.8 421.5 220.0
> 100_g_at13 235.0 148.0 151.0 107.5 100_g_at14 571.3 415.0 508.0
> 271.0 100_g_at15 904.0 562.0 689.0 330.0 100_g_at16 141.0 93.0
> 113.5 75.5
>
> And we don't like the third and seventh probes. We could use
>
>> rownames(pm(Dilution, "100_g_at"))[c(3,7)]
> [1] "100_g_at3" "100_g_at7"
>
> And feed that into RemoveProbes(), which will then work.
>
> Best,
>
> Jim
>
>
>
>> RemoveProbes(listOutProbes=maskedprobes,
>> listOutProbeSets=maskedprobeSets, cleancdf)
>>
>> The error msg shows as: Error in get(pset[i], env =
>> get(cdfpackagename)) : object '315997at' not found
>>
>> Do you know what is the correct format of the input for the probes
>> (not probe sets) to be taken out?
>>
>>
>>
>> Thanks a lot,
>>
>>
>> -James
>>
>>
>> --- On Mon, 8/30/10, James W. MacDonald<jmacdon at="" med.umich.edu="">
>> wrote:
>>
>> From: James W. MacDonald<jmacdon at="" med.umich.edu=""> Subject: Re:
[BioC]
>> question regarding MAS5 normalization with reduced probes To:
>> "James Anderson"<janderson_net at="" yahoo.com=""> Cc:
>> "bioconductor"<bioconductor at="" stat.math.ethz.ch=""> Date: Monday,
August
>> 30, 2010, 12:25 PM
>>
>> Hi James,
>>
>> I misunderstood your question. I thought you already had a reduced
>> set of probes you wanted to run mas5() on.
>>
>> So yeah, if you want to use a reduced set of probes you could use
>> some code written by Ariel Chernomoretz (and modified by Jenny
>> Drnevitch) that has been posted and referenced many times on this
>> list:
>>
>>
https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html
>>
>>
>>
Alternatively, you could play with the affxparser package, which has
the
>> capability (IIRC) to do the same.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 8/30/2010 10:29 AM, James Anderson wrote:
>>> Hi Jim,
>>>
>>> Thanks for your email. I've run mas5 before, but only using
>>> default setting. From the help, it does not look like there is a
>>> way to specify which reduced set of probes you can use. In
>>> addition, from the file, it looks like it has more to do with
>>> whether the "object" is read using a reduced set of probes. (I
>>> believe if the "object" is read using only the reduced set, mas5
>>> will do the job), so don't know whether it has more to do with
>>> the function ReadAffy, but from that, it does not look like it
>>> has the option of specifying which reduced set of probes, if we
>>> don't use alternative CDF file. Below is the usage of mas5
>>> function. mas5(object, normalize = TRUE, sc = 500, analysis =
>>> "absolute", ...) Thanks,
>>>
>>> -James
>>>
>>> --- On Fri, 8/27/10, James W. MacDonald<jmacdon at="" med.umich.edu="">
>>> wrote:
>>>
>>> From: James W. MacDonald<jmacdon at="" med.umich.edu=""> Subject: Re:
>>> [BioC] question regarding MAS5 normalization with reduced probes
>>> To: "James Anderson"<janderson_net at="" yahoo.com=""> Cc:
>>> "bioconductor"<bioconductor at="" stat.math.ethz.ch=""> Date: Friday,
>>> August 27, 2010, 10:04 AM
>>>
>>> Hi James,
>>>
>>> On 8/26/2010 1:05 PM, James Anderson wrote:
>>>> Hi,
>>>>
>>>> I am trying to use MAS5 to normalize some cel files with
>>>> reduced set of probes (some probes whose PM is not
>>>> significantly higher than MM is filtered), does anyone know how
>>>> to do this? Does that require creating a new CDF file?
>>>
>>> Have you tried running mas5() from the affy package? Having
>>> never tried, I don't know, but it seems a simple enough test.
>>>
>>> If you do need to create a new cdf, you will want to use the
>>> affxparser package.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>>
>>>> thanks a bunch,
>>>>
>>>> -James
>>>>
>>>>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________ Bioconductor
>>>> mailing list Bioconductor at stat.math.ethz.ch
>>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>>> archives:
>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>>
>>>>
_______________________________________________
>> Bioconductor mailing list Bioconductor at stat.math.ethz.ch
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues