Hi,
I found very interesting the possibility of using reversing the
mapping
by revmap in the XXXX.db annotation databases.
However, I have two problems:
1) if I use:
egs <- c("1", "100", "1000")
unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
I am getting not only the probesets associated to the three EGs:
1 1001 1002 1003 10001
"229819_at" "1556117_at" "204639_at" "216705_s_at" "203440_at"
10002 10003
"203441_s_at" "237305_at"
There is any possibility to avoid this problem?
2) if in the egs vector is present an eg (6333) that is not present in
the annotation database I get the following error:
egs <- c("1", "100", "1000", "6333")
unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "6333" not found
There is any possibility to make a query that simply avoid the
unmapped
keys?
Many thanks
Raffaele
--
----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
Dipartimento di Scienze Cliniche e Biologiche
c/o Az. Ospedaliera S. Luigi
Regione Gonzole 10, Orbassano
10043 Torino
tel. ++39 0116705417
Lab. ++39 0116705408
Fax ++39 0119038639
Mobile ++39 3333827080
email: raffaele.calogero at unito.it
raffaele[dot]calogero[at]gmail[dot]com
www: http://www.bioinformatica.unito.it
Info: http://publicationslist.org/raffaele.calogero
Hi Raffaele,
The first looks like a bug, there are some oddities, but mostly it
is
the rather cute way that R mangles names...
> rv = revmap(hgu133plus2ENTREZID)
> rv[["1"]]
[1] "229819_at"
> rv[["100"]]
[1] "204639_at" "216705_s_at"
> rv[["1000"]]
[1] "203440_at" "203441_s_at"
and you can then do the looping explicitly, this also solves your
second problem.
I don't know how "1556117_at" got picked up, that does look wrong
and
we will need to look in to it.
> hgu133plus2ENTREZID$"1556117_at"
[1] NA
and for "237305_at", I also get an NA :-(
best wishes
Robert
rcaloger wrote:
> Hi,
> I found very interesting the possibility of using reversing the
mapping
> by revmap in the XXXX.db annotation databases.
>
> However, I have two problems:
> 1) if I use:
> egs <- c("1", "100", "1000")
> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>
> I am getting not only the probesets associated to the three EGs:
> 1 1001 1002 1003 10001
> "229819_at" "1556117_at" "204639_at" "216705_s_at" "203440_at"
> 10002 10003
> "203441_s_at" "237305_at"
> There is any possibility to avoid this problem?
>
> 2) if in the egs vector is present an eg (6333) that is not present
in
> the annotation database I get the following error:
> egs <- c("1", "100", "1000", "6333")
> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>
> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
> value for "6333" not found
>
> There is any possibility to make a query that simply avoid the
unmapped
> keys?
>
>
> Many thanks
> Raffaele
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Robert Gentleman wrote:
> Hi Raffaele,
> The first looks like a bug, there are some oddities, but mostly it
is
> the rather cute way that R mangles names...
>
> > rv = revmap(hgu133plus2ENTREZID)
> > rv[["1"]]
> [1] "229819_at"
> > rv[["100"]]
> [1] "204639_at" "216705_s_at"
> > rv[["1000"]]
> [1] "203440_at" "203441_s_at"
>
> and you can then do the looping explicitly, this also solves your
> second problem.
>
> I don't know how "1556117_at" got picked up, that does look wrong
and
> we will need to look in to it.
>
> > hgu133plus2ENTREZID$"1556117_at"
> [1] NA
>
> and for "237305_at", I also get an NA :-(
It's a version issue. The hgu133plus2.db package in the current
release
of BioC has these two probesets mapped, but they no longer appear to
be
mapped in the upcoming release.
Best,
Jim
>
>
> best wishes
> Robert
>
>
> rcaloger wrote:
>> Hi,
>> I found very interesting the possibility of using reversing the
>> mapping by revmap in the XXXX.db annotation databases.
>>
>> However, I have two problems:
>> 1) if I use:
>> egs <- c("1", "100", "1000")
>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>
>> I am getting not only the probesets associated to the three EGs:
>> 1 1001 1002 1003
10001
>> "229819_at" "1556117_at" "204639_at" "216705_s_at"
"203440_at"
>> 10002 10003
>> "203441_s_at" "237305_at"
>> There is any possibility to avoid this problem?
>>
>> 2) if in the egs vector is present an eg (6333) that is not present
in
>> the annotation database I get the following error:
>> egs <- c("1", "100", "1000", "6333")
>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>
>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>> value for "6333" not found
>>
>> There is any possibility to make a query that simply avoid the
>> unmapped keys?
>>
>>
>> Many thanks
>> Raffaele
>>
>
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
Hi Raffaele,
rcaloger wrote:
> Hi,
> I found very interesting the possibility of using reversing the
mapping
> by revmap in the XXXX.db annotation databases.
>
> However, I have two problems:
> 1) if I use:
> egs <- c("1", "100", "1000")
> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>
> I am getting not only the probesets associated to the three EGs:
> 1 1001 1002 1003 10001
> "229819_at" "1556117_at" "204639_at" "216705_s_at" "203440_at"
> 10002 10003
> "203441_s_at" "237305_at"
Well, not really. This appears to be so because you are unlisting a
named list. Since the names have to be unique, R adds an additional
integer to the end of duplicate names:
> egs <- c("1", "100", "1000")
> mget(egs, revmap(hgu133plus2ENTREZID))
$`1`
[1] "229819_at"
$`100`
[1] "1556117_at" "204639_at" "216705_s_at"
$`1000`
[1] "203440_at" "203441_s_at" "237305_at"
> There is any possibility to avoid this problem?
>
> 2) if in the egs vector is present an eg (6333) that is not present
in
> the annotation database I get the following error:
> egs <- c("1", "100", "1000", "6333")
> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>
> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
> value for "6333" not found
>
> There is any possibility to make a query that simply avoid the
unmapped
> keys?
Yes. The help for mget is a bit confusing on this point, but you need
to
use the argument ifnotfound = NA.
> egs <- c("1", "100", "1000", "6333")
> mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
$`1`
[1] "229819_at"
$`100`
[1] "1556117_at" "204639_at" "216705_s_at"
$`1000`
[1] "203440_at" "203441_s_at" "237305_at"
$`6333`
[1] NA
Best,
Jim
>
>
> Many thanks
> Raffaele
>
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
James W. MacDonald wrote:
> Hi Raffaele,
>
> rcaloger wrote:
>> Hi,
>> I found very interesting the possibility of using reversing the
>> mapping by revmap in the XXXX.db annotation databases.
>>
>> However, I have two problems:
>> 1) if I use:
>> egs <- c("1", "100", "1000")
>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>
>> I am getting not only the probesets associated to the three EGs:
>> 1 1001 1002 1003
10001
>> "229819_at" "1556117_at" "204639_at" "216705_s_at"
"203440_at"
>> 10002 10003
>> "203441_s_at" "237305_at"
>
> Well, not really. This appears to be so because you are unlisting a
> named list. Since the names have to be unique,
Well, that's were I don't follow the logic behind unlist() and I've
always
found this "feature" pretty strange. unlist() won't even make a good
job at
keeping the names unique:
> unlist(list(AA=letters[1:3], AA2="bb"))
AA1 AA2 AA3 AA2
"a" "b" "c" "bb"
So mangling the names doesn't solve anything but just adds confusion.
IMO it would be better if unlist() was keeping the original names,
even if that
means that they are not unique in the returned vector. At least I can
do something
with it programmatically, and it's easy. With the mangled names, it's
much harder
(there are a couple of serious pitfalls).
H.
> R adds an additional
> integer to the end of duplicate names:
>
> > egs <- c("1", "100", "1000")
> > mget(egs, revmap(hgu133plus2ENTREZID))
> $`1`
> [1] "229819_at"
>
> $`100`
> [1] "1556117_at" "204639_at" "216705_s_at"
>
> $`1000`
> [1] "203440_at" "203441_s_at" "237305_at"
>
>> There is any possibility to avoid this problem?
>>
>> 2) if in the egs vector is present an eg (6333) that is not present
in
>> the annotation database I get the following error:
>> egs <- c("1", "100", "1000", "6333")
>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>
>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>> value for "6333" not found
>>
>> There is any possibility to make a query that simply avoid the
>> unmapped keys?
>
> Yes. The help for mget is a bit confusing on this point, but you
need to
> use the argument ifnotfound = NA.
>
> > egs <- c("1", "100", "1000", "6333")
> > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
> $`1`
> [1] "229819_at"
>
> $`100`
> [1] "1556117_at" "204639_at" "216705_s_at"
>
> $`1000`
> [1] "203440_at" "203441_s_at" "237305_at"
>
> $`6333`
> [1] NA
>
> Best,
>
> Jim
>
>
>
>>
>>
>> Many thanks
>> Raffaele
>>
>
> James W. MacDonald wrote:
>> Hi Raffaele,
>>
>> rcaloger wrote:
>>> Hi,
>>> I found very interesting the possibility of using reversing the
>>> mapping by revmap in the XXXX.db annotation databases.
>>>
>>> However, I have two problems:
>>> 1) if I use:
>>> egs <- c("1", "100", "1000")
>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>
>>> I am getting not only the probesets associated to the three EGs:
>>> 1 1001 1002 1003
10001
>>> "229819_at" "1556117_at" "204639_at" "216705_s_at"
"203440_at"
>>> 10002 10003
>>> "203441_s_at" "237305_at"
>>
>> Well, not really. This appears to be so because you are unlisting a
>> named list. Since the names have to be unique,
>
> Well, that's were I don't follow the logic behind unlist() and I've
always
> found this "feature" pretty strange. unlist() won't even make a good
job
> at
> keeping the names unique:
> > unlist(list(AA=letters[1:3], AA2="bb"))
> AA1 AA2 AA3 AA2
> "a" "b" "c" "bb"
> So mangling the names doesn't solve anything but just adds
confusion.
>
> IMO it would be better if unlist() was keeping the original names,
even if
> that
> means that they are not unique in the returned vector. At least I
can do
> something
> with it programmatically, and it's easy. With the mangled names,
it's much
> harder
> (there are a couple of serious pitfalls).
>
The problem might originate in what one could perceive a flaw with
lists
(or any named vectors for that matter) in allowing non-unique names.
Mangled names are shurely a headache, as well as the "get only the
first
element with the given name while it was not known there were several
elements with the same name" behavior in R.
L.
> H.
>
>
>> R adds an additional
>> integer to the end of duplicate names:
>>
>> > egs <- c("1", "100", "1000")
>> > mget(egs, revmap(hgu133plus2ENTREZID))
>> $`1`
>> [1] "229819_at"
>>
>> $`100`
>> [1] "1556117_at" "204639_at" "216705_s_at"
>>
>> $`1000`
>> [1] "203440_at" "203441_s_at" "237305_at"
>>
>>> There is any possibility to avoid this problem?
>>>
>>> 2) if in the egs vector is present an eg (6333) that is not
present in
>>> the annotation database I get the following error:
>>> egs <- c("1", "100", "1000", "6333")
>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>
>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>> value for "6333" not found
>>>
>>> There is any possibility to make a query that simply avoid the
>>> unmapped keys?
>>
>> Yes. The help for mget is a bit confusing on this point, but you
need to
>> use the argument ifnotfound = NA.
>>
>> > egs <- c("1", "100", "1000", "6333")
>> > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
>> $`1`
>> [1] "229819_at"
>>
>> $`100`
>> [1] "1556117_at" "204639_at" "216705_s_at"
>>
>> $`1000`
>> [1] "203440_at" "203441_s_at" "237305_at"
>>
>> $`6333`
>> [1] NA
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>>
>>> Many thanks
>>> Raffaele
>>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
lgautier at altern.org wrote:
>> James W. MacDonald wrote:
>>> Hi Raffaele,
>>>
>>> rcaloger wrote:
>>>> Hi,
>>>> I found very interesting the possibility of using reversing the
>>>> mapping by revmap in the XXXX.db annotation databases.
>>>>
>>>> However, I have two problems:
>>>> 1) if I use:
>>>> egs <- c("1", "100", "1000")
>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>
>>>> I am getting not only the probesets associated to the three EGs:
>>>> 1 1001 1002 1003
10001
>>>> "229819_at" "1556117_at" "204639_at" "216705_s_at"
"203440_at"
>>>> 10002 10003
>>>> "203441_s_at" "237305_at"
>>> Well, not really. This appears to be so because you are unlisting
a
>>> named list. Since the names have to be unique,
>> Well, that's were I don't follow the logic behind unlist() and I've
always
>> found this "feature" pretty strange. unlist() won't even make a
good job
>> at
>> keeping the names unique:
>> > unlist(list(AA=letters[1:3], AA2="bb"))
>> AA1 AA2 AA3 AA2
>> "a" "b" "c" "bb"
>> So mangling the names doesn't solve anything but just adds
confusion.
>>
>> IMO it would be better if unlist() was keeping the original names,
even if
>> that
>> means that they are not unique in the returned vector. At least I
can do
>> something
>> with it programmatically, and it's easy. With the mangled names,
it's much
>> harder
>> (there are a couple of serious pitfalls).
>>
>
> The problem might originate in what one could perceive a flaw with
lists
> (or any named vectors for that matter) in allowing non-unique names.
>
> Mangled names are shurely a headache, as well as the "get only the
first
> element with the given name while it was not known there were
several
> elements with the same name" behavior in R.
I disagree - I think that requiring unique row names in R is/was a
mistake - restrictions are often expensive - as they limit what can be
done. Yes there are issues about dealing with non-unique row names,
but
those can be dealt with, by careful programming. Such methods would
work
in all cases of duplicate row names, but with name-mangling schemes,
one
needs to know what name mangling scheme was used to be able to
disentangle - and that means every solution is different -- not
exactly
the kind of situation I would personally engineer in.
best wishes
Robert
>
>
> L.
>
>> H.
>>
>>
>>> R adds an additional
>>> integer to the end of duplicate names:
>>>
>>> > egs <- c("1", "100", "1000")
>>> > mget(egs, revmap(hgu133plus2ENTREZID))
>>> $`1`
>>> [1] "229819_at"
>>>
>>> $`100`
>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>
>>> $`1000`
>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>
>>>> There is any possibility to avoid this problem?
>>>>
>>>> 2) if in the egs vector is present an eg (6333) that is not
present in
>>>> the annotation database I get the following error:
>>>> egs <- c("1", "100", "1000", "6333")
>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>
>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>> value for "6333" not found
>>>>
>>>> There is any possibility to make a query that simply avoid the
>>>> unmapped keys?
>>> Yes. The help for mget is a bit confusing on this point, but you
need to
>>> use the argument ifnotfound = NA.
>>>
>>> > egs <- c("1", "100", "1000", "6333")
>>> > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
>>> $`1`
>>> [1] "229819_at"
>>>
>>> $`100`
>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>
>>> $`1000`
>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>
>>> $`6333`
>>> [1] NA
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>
>>>> Many thanks
>>>> Raffaele
>>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
>
>
> lgautier at altern.org wrote:
>>> James W. MacDonald wrote:
>>>> Hi Raffaele,
>>>>
>>>> rcaloger wrote:
>>>>> Hi,
>>>>> I found very interesting the possibility of using reversing the
>>>>> mapping by revmap in the XXXX.db annotation databases.
>>>>>
>>>>> However, I have two problems:
>>>>> 1) if I use:
>>>>> egs <- c("1", "100", "1000")
>>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>>
>>>>> I am getting not only the probesets associated to the three EGs:
>>>>> 1 1001 1002 1003
10001
>>>>> "229819_at" "1556117_at" "204639_at" "216705_s_at"
"203440_at"
>>>>> 10002 10003
>>>>> "203441_s_at" "237305_at"
>>>> Well, not really. This appears to be so because you are unlisting
a
>>>> named list. Since the names have to be unique,
>>> Well, that's were I don't follow the logic behind unlist() and
I've
>>> always
>>> found this "feature" pretty strange. unlist() won't even make a
good
>>> job
>>> at
>>> keeping the names unique:
>>> > unlist(list(AA=letters[1:3], AA2="bb"))
>>> AA1 AA2 AA3 AA2
>>> "a" "b" "c" "bb"
>>> So mangling the names doesn't solve anything but just adds
confusion.
>>>
>>> IMO it would be better if unlist() was keeping the original names,
even
>>> if
>>> that
>>> means that they are not unique in the returned vector. At least I
can
>>> do
>>> something
>>> with it programmatically, and it's easy. With the mangled names,
it's
>>> much
>>> harder
>>> (there are a couple of serious pitfalls).
>>>
>>
>> The problem might originate in what one could perceive a flaw with
lists
>> (or any named vectors for that matter) in allowing non-unique
names.
>>
>> Mangled names are shurely a headache, as well as the "get only the
first
>> element with the given name while it was not known there were
several
>> elements with the same name" behavior in R.
>
> I disagree - I think that requiring unique row names in R is/was
a
> mistake - restrictions are often expensive - as they limit what can
be
> done.
I am certain that the best design decision was made at the time, and
my
comment is obviously a retrospective one.
I do suspect that computational cost weighted a lot in the decision,
and
uniqueness of names *was* necessarily not a mistake.
However, with the growing presence and of lookup structures in R (as
well
as awareness of them and their benefit by programmers from all
horizons),
these restrictions might be appearing much less expensive, and the
benefit
from having them might be becoming competitive with not having them.
Regarding the freedom, it might not be completely challenged as one
might
use a data.frame (with vector of either mode "factor" or "character"
whenever non-uniqueness is required - and those would just not be
"names"
but "tags").
Best,
Laurent
> Yes there are issues about dealing with non-unique row names, but
> those can be dealt with, by careful programming. Such methods would
work
> in all cases of duplicate row names, but with name-mangling schemes,
one
> needs to know what name mangling scheme was used to be able to
> disentangle - and that means every solution is different -- not
exactly
> the kind of situation I would personally engineer in.
>
> best wishes
> Robert
>
>>
>>
>> L.
>>
>>> H.
>>>
>>>
>>>> R adds an additional
>>>> integer to the end of duplicate names:
>>>>
>>>> > egs <- c("1", "100", "1000")
>>>> > mget(egs, revmap(hgu133plus2ENTREZID))
>>>> $`1`
>>>> [1] "229819_at"
>>>>
>>>> $`100`
>>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>>
>>>> $`1000`
>>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>>
>>>>> There is any possibility to avoid this problem?
>>>>>
>>>>> 2) if in the egs vector is present an eg (6333) that is not
present
>>>>> in
>>>>> the annotation database I get the following error:
>>>>> egs <- c("1", "100", "1000", "6333")
>>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>>
>>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>>> value for "6333" not found
>>>>>
>>>>> There is any possibility to make a query that simply avoid the
>>>>> unmapped keys?
>>>> Yes. The help for mget is a bit confusing on this point, but you
need
>>>> to
>>>> use the argument ifnotfound = NA.
>>>>
>>>> > egs <- c("1", "100", "1000", "6333")
>>>> > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
>>>> $`1`
>>>> [1] "229819_at"
>>>>
>>>> $`100`
>>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>>
>>>> $`1000`
>>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>>
>>>> $`6333`
>>>> [1] NA
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>>>
>>>>> Many thanks
>>>>> Raffaele
>>>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
>
Hi Laurent, Robert,
Maybe a simple solution for the unlist() problem would be to have an
extra argument
that gives the user more control over what to do with the names.
For example, with the unlist2() function below:
> x <- list(A=c(b=-4, 2, b=7), B=3:-1, c(a=1, a=-2), C=list(c(2:-1,
d=55), e=99))
> unlist2(x, what.names="inherited")
b A b B B B B B a a C C C C d e
-4 2 7 3 2 1 0 -1 1 -2 2 1 0 -1 55 99
names are just "inherited" without being mangled.
By default (what.names="mangled"), unlist2() behaves like
base::unlist()
> unlist2(x)
A.b A2 A.b B1 B2 B3 B4 B5 a a C1 C2 C3 C4 C.d C.e
-4 2 7 3 2 1 0 -1 1 -2 2 1 0 -1 55 99
It supports an extra mode (what.names="full"), not necessary very
useful, to get
the full names of the elements:
> unlist2(x, what.names="full")
A.b A. A.b B. B. B. B. B. .a .a C.. C.. C..
C.. C..d C.e.
-4 2 7 3 2 1 0 -1 1 -2 2 1 0
-1 55 99
The full name of an element describes the path that was followed to
walk from the
top of level of the hierarchy to the element. The names of all the
intermediate
elements that were walked on are concatenated together with the dot as
the separator.
make.name.tree <- function(x, recursive, what.names)
{
if (!is.character(what.names) || length(what.names) != 1)
stop("'what.names' must be a single string")
what.names <- match.arg(what.names, c("inherited" , "full"))
.make.name.tree.rec <- function(x, parent_name, depth)
{
if (length(x) == 0)
return(character(0))
x_names <- names(x)
if (is.null(x_names))
x_names <- rep.int(parent_name, length(x))
else if (what.names == "full")
x_names <- paste(parent_name, x_names, sep="")
else
x_names[x_names == ""] <- parent_name
if (!is.list(x) || (!recursive && depth >= 1L))
return(x_names)
if (what.names == "full")
x_names <- paste(x_names, ".", sep="")
lapply(seq_len(length(x)),
function(i) .make.name.tree.rec(x[[i]], x_names[i],
depth + 1L))
}
.make.name.tree.rec(x, "", 0L)
}
unlist2 <- function(x, recursive = TRUE, use.names = TRUE, what.names
= "mangled")
{
if (!is.character(what.names) || length(what.names) != 1)
stop("'what.names' must be a single string")
what.names <- match.arg(what.names, c("mangled", "inherited" ,
"full"))
if (!use.names || what.names == "mangled")
return(unlist(x, recursive, use.names))
ans <- unlist(x, recursive, FALSE)
names(ans) <- unlist(make.name.tree(x, recursive, what.names),
recursive, FALSE)
ans
}
Cheers,
H.
lgautier at altern.org wrote:
>>
>> lgautier at altern.org wrote:
>>>> James W. MacDonald wrote:
>>>>> Hi Raffaele,
>>>>>
>>>>> rcaloger wrote:
>>>>>> Hi,
>>>>>> I found very interesting the possibility of using reversing
the
>>>>>> mapping by revmap in the XXXX.db annotation databases.
>>>>>>
>>>>>> However, I have two problems:
>>>>>> 1) if I use:
>>>>>> egs <- c("1", "100", "1000")
>>>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>>>
>>>>>> I am getting not only the probesets associated to the three
EGs:
>>>>>> 1 1001 1002 1003
10001
>>>>>> "229819_at" "1556117_at" "204639_at" "216705_s_at"
"203440_at"
>>>>>> 10002 10003
>>>>>> "203441_s_at" "237305_at"
>>>>> Well, not really. This appears to be so because you are
unlisting a
>>>>> named list. Since the names have to be unique,
>>>> Well, that's were I don't follow the logic behind unlist() and
I've
>>>> always
>>>> found this "feature" pretty strange. unlist() won't even make a
good
>>>> job
>>>> at
>>>> keeping the names unique:
>>>> > unlist(list(AA=letters[1:3], AA2="bb"))
>>>> AA1 AA2 AA3 AA2
>>>> "a" "b" "c" "bb"
>>>> So mangling the names doesn't solve anything but just adds
confusion.
>>>>
>>>> IMO it would be better if unlist() was keeping the original
names, even
>>>> if
>>>> that
>>>> means that they are not unique in the returned vector. At least I
can
>>>> do
>>>> something
>>>> with it programmatically, and it's easy. With the mangled names,
it's
>>>> much
>>>> harder
>>>> (there are a couple of serious pitfalls).
>>>>
>>> The problem might originate in what one could perceive a flaw with
lists
>>> (or any named vectors for that matter) in allowing non-unique
names.
>>>
>>> Mangled names are shurely a headache, as well as the "get only the
first
>>> element with the given name while it was not known there were
several
>>> elements with the same name" behavior in R.
>> I disagree - I think that requiring unique row names in R is/was
a
>> mistake - restrictions are often expensive - as they limit what can
be
>> done.
>
> I am certain that the best design decision was made at the time, and
my
> comment is obviously a retrospective one.
>
> I do suspect that computational cost weighted a lot in the decision,
and
> uniqueness of names *was* necessarily not a mistake.
> However, with the growing presence and of lookup structures in R (as
well
> as awareness of them and their benefit by programmers from all
horizons),
> these restrictions might be appearing much less expensive, and the
benefit
> from having them might be becoming competitive with not having them.
>
> Regarding the freedom, it might not be completely challenged as one
might
> use a data.frame (with vector of either mode "factor" or "character"
> whenever non-uniqueness is required - and those would just not be
"names"
> but "tags").
>
>
> Best,
>
>
> Laurent
>
>
>
>> Yes there are issues about dealing with non-unique row names, but
>> those can be dealt with, by careful programming. Such methods would
work
>> in all cases of duplicate row names, but with name-mangling
schemes, one
>> needs to know what name mangling scheme was used to be able to
>> disentangle - and that means every solution is different -- not
exactly
>> the kind of situation I would personally engineer in.
>>
>> best wishes
>> Robert
>>
>>>
>>> L.
>>>
>>>> H.
>>>>
>>>>
>>>>> R adds an additional
>>>>> integer to the end of duplicate names:
>>>>>
>>>>> > egs <- c("1", "100", "1000")
>>>>> > mget(egs, revmap(hgu133plus2ENTREZID))
>>>>> $`1`
>>>>> [1] "229819_at"
>>>>>
>>>>> $`100`
>>>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>>>
>>>>> $`1000`
>>>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>>>
>>>>>> There is any possibility to avoid this problem?
>>>>>>
>>>>>> 2) if in the egs vector is present an eg (6333) that is not
present
>>>>>> in
>>>>>> the annotation database I get the following error:
>>>>>> egs <- c("1", "100", "1000", "6333")
>>>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>>>
>>>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>>>> value for "6333" not found
>>>>>>
>>>>>> There is any possibility to make a query that simply avoid the
>>>>>> unmapped keys?
>>>>> Yes. The help for mget is a bit confusing on this point, but you
need
>>>>> to
>>>>> use the argument ifnotfound = NA.
>>>>>
>>>>> > egs <- c("1", "100", "1000", "6333")
>>>>> > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
>>>>> $`1`
>>>>> [1] "229819_at"
>>>>>
>>>>> $`100`
>>>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>>>
>>>>> $`1000`
>>>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>>>
>>>>> $`6333`
>>>>> [1] NA
>>>>>
>>>>> Best,
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>>>> Many thanks
>>>>>> Raffaele
>>>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>> --
>> Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> rgentlem at fhcrc.org
>>
>
>
Hi Herve,
That would solve that particular requirement you are having with
unlist.
(I have one comment, inserted in your proposal below).
That could also be the first step toward names-oriented features for
vectors.
Example:
# split on the "names" if no factor specified
split2 <- function(x, f, drop = FALSE, ...)
{
if (missing(f)) {
f <- names(f)
split(x, f, drop = drop, ...)
}
That could me Constructions like:
split2(unlist2(x, what.names="inherited"))
Best,
Laurent
> Hi Laurent, Robert,
>
> Maybe a simple solution for the unlist() problem would be to have an
extra
> argument
> that gives the user more control over what to do with the names.
>
> For example, with the unlist2() function below:
>
> > x <- list(A=c(b=-4, 2, b=7), B=3:-1, c(a=1, a=-2),
C=list(c(2:-1,
> d=55), e=99))
>
> > unlist2(x, what.names="inherited")
> b A b B B B B B a a C C C C d e
> -4 2 7 3 2 1 0 -1 1 -2 2 1 0 -1 55 99
>
> names are just "inherited" without being mangled.
>
> By default (what.names="mangled"), unlist2() behaves like
base::unlist()
>
> > unlist2(x)
> A.b A2 A.b B1 B2 B3 B4 B5 a a C1 C2 C3 C4 C.d C.e
> -4 2 7 3 2 1 0 -1 1 -2 2 1 0 -1 55 99
>
> It supports an extra mode (what.names="full"), not necessary very
useful,
> to get
> the full names of the elements:
>
> > unlist2(x, what.names="full")
> A.b A. A.b B. B. B. B. B. .a .a C.. C.. C..
C..
> C..d C.e.
> -4 2 7 3 2 1 0 -1 1 -2 2 1 0
-1
> 55 99
>
> The full name of an element describes the path that was followed to
walk
> from the
> top of level of the hierarchy to the element. The names of all the
> intermediate
> elements that were walked on are concatenated together with the dot
as the
> separator.
I might have missed something, but what if there are several identical
names at the same level in a branch:
x <- list(A = c(a=1, b=2), A=c(a=3, b=4))
Would it return:
A.a A.b A.a A.b
1 2 3 4
?
I would share Robert's reserve regarding the use of name mangling
schemes
If one wants the path, then the path is in the original nested list
and
computation should be performed on it.
> make.name.tree <- function(x, recursive, what.names)
> {
> if (!is.character(what.names) || length(what.names) != 1)
> stop("'what.names' must be a single string")
> what.names <- match.arg(what.names, c("inherited" , "full"))
> .make.name.tree.rec <- function(x, parent_name, depth)
> {
> if (length(x) == 0)
> return(character(0))
> x_names <- names(x)
> if (is.null(x_names))
> x_names <- rep.int(parent_name, length(x))
> else if (what.names == "full")
> x_names <- paste(parent_name, x_names, sep="")
> else
> x_names[x_names == ""] <- parent_name
> if (!is.list(x) || (!recursive && depth >= 1L))
> return(x_names)
> if (what.names == "full")
> x_names <- paste(x_names, ".", sep="")
> lapply(seq_len(length(x)),
> function(i) .make.name.tree.rec(x[[i]], x_names[i],
depth
> + 1L))
> }
> .make.name.tree.rec(x, "", 0L)
> }
>
> unlist2 <- function(x, recursive = TRUE, use.names = TRUE,
what.names =
> "mangled")
> {
> if (!is.character(what.names) || length(what.names) != 1)
> stop("'what.names' must be a single string")
> what.names <- match.arg(what.names, c("mangled", "inherited" ,
> "full"))
> if (!use.names || what.names == "mangled")
> return(unlist(x, recursive, use.names))
> ans <- unlist(x, recursive, FALSE)
> names(ans) <- unlist(make.name.tree(x, recursive, what.names),
> recursive, FALSE)
> ans
> }
>
> Cheers,
> H.
>
>
> lgautier at altern.org wrote:
>>>
>>> lgautier at altern.org wrote:
>>>>> James W. MacDonald wrote:
>>>>>> Hi Raffaele,
>>>>>>
>>>>>> rcaloger wrote:
>>>>>>> Hi,
>>>>>>> I found very interesting the possibility of using reversing
the
>>>>>>> mapping by revmap in the XXXX.db annotation databases.
>>>>>>>
>>>>>>> However, I have two problems:
>>>>>>> 1) if I use:
>>>>>>> egs <- c("1", "100", "1000")
>>>>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>>>>
>>>>>>> I am getting not only the probesets associated to the three
EGs:
>>>>>>> 1 1001 1002 1003
>>>>>>> 10001
>>>>>>> "229819_at" "1556117_at" "204639_at" "216705_s_at"
>>>>>>> "203440_at"
>>>>>>> 10002 10003
>>>>>>> "203441_s_at" "237305_at"
>>>>>> Well, not really. This appears to be so because you are
unlisting a
>>>>>> named list. Since the names have to be unique,
>>>>> Well, that's were I don't follow the logic behind unlist() and
I've
>>>>> always
>>>>> found this "feature" pretty strange. unlist() won't even make a
good
>>>>> job
>>>>> at
>>>>> keeping the names unique:
>>>>> > unlist(list(AA=letters[1:3], AA2="bb"))
>>>>> AA1 AA2 AA3 AA2
>>>>> "a" "b" "c" "bb"
>>>>> So mangling the names doesn't solve anything but just adds
confusion.
>>>>>
>>>>> IMO it would be better if unlist() was keeping the original
names,
>>>>> even
>>>>> if
>>>>> that
>>>>> means that they are not unique in the returned vector. At least
I can
>>>>> do
>>>>> something
>>>>> with it programmatically, and it's easy. With the mangled names,
it's
>>>>> much
>>>>> harder
>>>>> (there are a couple of serious pitfalls).
>>>>>
>>>> The problem might originate in what one could perceive a flaw
with
>>>> lists
>>>> (or any named vectors for that matter) in allowing non-unique
names.
>>>>
>>>> Mangled names are shurely a headache, as well as the "get only
the
>>>> first
>>>> element with the given name while it was not known there were
several
>>>> elements with the same name" behavior in R.
>>> I disagree - I think that requiring unique row names in R
is/was a
>>> mistake - restrictions are often expensive - as they limit what
can be
>>> done.
>>
>> I am certain that the best design decision was made at the time,
and my
>> comment is obviously a retrospective one.
>>
>> I do suspect that computational cost weighted a lot in the
decision, and
>> uniqueness of names *was* necessarily not a mistake.
>> However, with the growing presence and of lookup structures in R
(as
>> well
>> as awareness of them and their benefit by programmers from all
>> horizons),
>> these restrictions might be appearing much less expensive, and the
>> benefit
>> from having them might be becoming competitive with not having
them.
>>
>> Regarding the freedom, it might not be completely challenged as one
>> might
>> use a data.frame (with vector of either mode "factor" or
"character"
>> whenever non-uniqueness is required - and those would just not be
>> "names"
>> but "tags").
>>
>>
>> Best,
>>
>>
>> Laurent
>>
>>
>>
>>> Yes there are issues about dealing with non-unique row names, but
>>> those can be dealt with, by careful programming. Such methods
would
>>> work
>>> in all cases of duplicate row names, but with name-mangling
schemes,
>>> one
>>> needs to know what name mangling scheme was used to be able to
>>> disentangle - and that means every solution is different -- not
exactly
>>> the kind of situation I would personally engineer in.
>>>
>>> best wishes
>>> Robert
>>>
>>>>
>>>> L.
>>>>
>>>>> H.
>>>>>
>>>>>
>>>>>> R adds an additional
>>>>>> integer to the end of duplicate names:
>>>>>>
>>>>>> > egs <- c("1", "100", "1000")
>>>>>> > mget(egs, revmap(hgu133plus2ENTREZID))
>>>>>> $`1`
>>>>>> [1] "229819_at"
>>>>>>
>>>>>> $`100`
>>>>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>>>>
>>>>>> $`1000`
>>>>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>>>>
>>>>>>> There is any possibility to avoid this problem?
>>>>>>>
>>>>>>> 2) if in the egs vector is present an eg (6333) that is not
present
>>>>>>> in
>>>>>>> the annotation database I get the following error:
>>>>>>> egs <- c("1", "100", "1000", "6333")
>>>>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>>>>
>>>>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>>>>> value for "6333" not found
>>>>>>>
>>>>>>> There is any possibility to make a query that simply avoid the
>>>>>>> unmapped keys?
>>>>>> Yes. The help for mget is a bit confusing on this point, but
you
>>>>>> need
>>>>>> to
>>>>>> use the argument ifnotfound = NA.
>>>>>>
>>>>>> > egs <- c("1", "100", "1000", "6333")
>>>>>> > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
>>>>>> $`1`
>>>>>> [1] "229819_at"
>>>>>>
>>>>>> $`100`
>>>>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>>>>
>>>>>> $`1000`
>>>>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>>>>
>>>>>> $`6333`
>>>>>> [1] NA
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Many thanks
>>>>>>> Raffaele
>>>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>> --
>>> Robert Gentleman, PhD
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M2-B876
>>> PO Box 19024
>>> Seattle, Washington 98109-1024
>>> 206-667-7700
>>> rgentlem at fhcrc.org
>>>
>>
>>
>
>