Hi Paolo
It's pretty normal that some probes cannot map to any genes because
Human genome annotation keeps updating but the probe design is
unchanged. So usually you can ignore those NA probes. If you are
really interested in them, you can easily convert the nuID to probe
sequence (use id2seq function) and map them to genome or refseq.
I will update the vignette later to avoid such confusion.
Pan
On Thu, Aug 23, 2012 at 3:26 AM, Paolo Kunderfranco
<paolo.kunderfranco at="" gmail.com=""> wrote:
> Dear All,
> I am working with lumi / limma package to detect differentially
expressed
> genes between two or more samples.
> I was wondering why when I add geneSymbol and geneName to my
Illumina
> probelist most of them (around 10%)are not called and remained NA,
for
> instance (last row):
>
>
> if (require(lumiMouseAll.db) & require(annotate)) {
> geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db')
> geneName <- sapply(lookUp(probeList,
'lumiMouseAll.db',
> 'GENENAME'), function(x) x[1])
> fit1_2$genes <- data.frame(ID= probeList,
> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE)
> }
>
>
> 7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency
> associated 5A 7.828381 9.333743 149.31710 2.773571e-18
6.144846e-14
> 29.80858
> 16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5,
transcription
> factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13
> 27.72832
> 20450 HlUzpCHheswfSZNdQo Trh thyrotropin
> releasing hormone 5.603441 8.761965 103.81774 1.102739e-16
> 8.143726e-13 27.72571
> 7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency
> associated 4 5.300619 8.626239 99.82457 1.640729e-16 9.087587e-13
> 27.45790
> 7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency
> associated 5A 7.663922 9.439668 97.09091 2.173661e-16
9.631491e-13
> 27.26346
> 17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4
> (Drosophila) 4.456642 8.585243 90.39110 4.484584e-16 1.655932e-12
> 26.74469
> 14084 06jqfFxe5_X97NRXuk Myl3 myosin, light
> polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16
1.779384e-12
> 26.57755
> 8757 oii7mSFyrr_AMWODH0 <na>
> <na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12
> 25.66512
>
> any ideas?
> thanks
> paolo
>
Hi,
Thanks for the fast reply,
Just another curiosity, I finally obtained list of genes significantly
differentially expressed between my conditions,
I would like now to build for each condition an heatmap. I constructed
for each one a data matrix based on a lumi.N.Q object (after VST
variance stabilizing transformation and normalization).
>dataMatrix <- exprs(lumi.N.Q)
>head(dataMatrix)
CMe_2 mES_2 CMa_1 CMp_1 CMe_1 mES_1 CMa_3
CMp_3 CMe_3 mES_3 CMa_2 CMp_2
T.kvUVfFyruaCeqZVs 7.306945 7.312214 7.302783 7.302670 7.294789
7.273550 7.381502 7.312916 7.301191 7.268140 7.272041 7.278205
Q96MoiqM_z49FU37tE 7.257576 7.278205 7.300322 7.226146 7.281574
7.281210 7.294743 7.393729 7.340853 7.315099 7.312916 7.270886
>heatmap(dataMatrix[1:5,])
I was wondering on the scale of the heatmap, usually values ranges
from -3 to 3, but data VST transformed in my example ranges from 7 to
14, with a median value of 7.5...
is it correct to plot a heatmap with VST transfmed data? or is
preferable to perfrom variance stabilization using log2 transform?
If is correct to use vst transformed data how do I refer to the scale?
Thanks for any suggestion,
Cheers,
Paolo
2012/8/23 Pan Du <dupan.mail at="" gmail.com="">:
> Hi Paolo
>
> It's pretty normal that some probes cannot map to any genes because
> Human genome annotation keeps updating but the probe design is
> unchanged. So usually you can ignore those NA probes. If you are
> really interested in them, you can easily convert the nuID to probe
> sequence (use id2seq function) and map them to genome or refseq.
> I will update the vignette later to avoid such confusion.
>
> Pan
>
> On Thu, Aug 23, 2012 at 3:26 AM, Paolo Kunderfranco
> <paolo.kunderfranco at="" gmail.com=""> wrote:
>> Dear All,
>> I am working with lumi / limma package to detect differentially
expressed
>> genes between two or more samples.
>> I was wondering why when I add geneSymbol and geneName to my
Illumina
>> probelist most of them (around 10%)are not called and remained NA,
for
>> instance (last row):
>>
>>
>> if (require(lumiMouseAll.db) & require(annotate)) {
>> geneSymbol <- getSYMBOL(probeList,
'lumiMouseAll.db')
>> geneName <- sapply(lookUp(probeList,
'lumiMouseAll.db',
>> 'GENENAME'), function(x) x[1])
>> fit1_2$genes <- data.frame(ID= probeList,
>> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE)
>> }
>>
>>
>> 7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency
>> associated 5A 7.828381 9.333743 149.31710 2.773571e-18
6.144846e-14
>> 29.80858
>> 16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5,
transcription
>> factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13
>> 27.72832
>> 20450 HlUzpCHheswfSZNdQo Trh thyrotropin
>> releasing hormone 5.603441 8.761965 103.81774 1.102739e-16
>> 8.143726e-13 27.72571
>> 7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency
>> associated 4 5.300619 8.626239 99.82457 1.640729e-16
9.087587e-13
>> 27.45790
>> 7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency
>> associated 5A 7.663922 9.439668 97.09091 2.173661e-16
9.631491e-13
>> 27.26346
>> 17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4
>> (Drosophila) 4.456642 8.585243 90.39110 4.484584e-16
1.655932e-12
>> 26.74469
>> 14084 06jqfFxe5_X97NRXuk Myl3 myosin, light
>> polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16
1.779384e-12
>> 26.57755
>> 8757 oii7mSFyrr_AMWODH0 <na>
>> <na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12
>> 25.66512
>>
>> any ideas?
>> thanks
>> paolo
>>
>
> I was wondering on the scale of the heatmap, usually values ranges
> from -3 to 3, but data VST transformed in my example ranges from 7
to
> 14, with a median value of 7.5...
> is it correct to plot a heatmap with VST transfmed data? or is
> preferable to perfrom variance stabilization using log2 transform?
> If is correct to use vst transformed data how do I refer to the
scale?
The scale of heatmap in the range between -3 and 3 is because by
default "heatmap" function will scale each row to 0 mean and 1
standard deviation. The purpose of this is to visualize the relative
changes of the patterns. But in the meantime, the original scale
information is lost. So the heatmap scale has no direct relation with
vst or log2 transform.
Pan
>
>
>
>
> 2012/8/23 Pan Du <dupan.mail at="" gmail.com="">:
>> Hi Paolo
>>
>> It's pretty normal that some probes cannot map to any genes because
>> Human genome annotation keeps updating but the probe design is
>> unchanged. So usually you can ignore those NA probes. If you are
>> really interested in them, you can easily convert the nuID to probe
>> sequence (use id2seq function) and map them to genome or refseq.
>> I will update the vignette later to avoid such confusion.
>>
>> Pan
>>
>> On Thu, Aug 23, 2012 at 3:26 AM, Paolo Kunderfranco
>> <paolo.kunderfranco at="" gmail.com=""> wrote:
>>> Dear All,
>>> I am working with lumi / limma package to detect differentially
expressed
>>> genes between two or more samples.
>>> I was wondering why when I add geneSymbol and geneName to my
Illumina
>>> probelist most of them (around 10%)are not called and remained NA,
for
>>> instance (last row):
>>>
>>>
>>> if (require(lumiMouseAll.db) & require(annotate)) {
>>> geneSymbol <- getSYMBOL(probeList,
'lumiMouseAll.db')
>>> geneName <- sapply(lookUp(probeList,
'lumiMouseAll.db',
>>> 'GENENAME'), function(x) x[1])
>>> fit1_2$genes <- data.frame(ID= probeList,
>>> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE)
>>> }
>>>
>>>
>>> 7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency
>>> associated 5A 7.828381 9.333743 149.31710 2.773571e-18
6.144846e-14
>>> 29.80858
>>> 16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5,
transcription
>>> factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13
>>> 27.72832
>>> 20450 HlUzpCHheswfSZNdQo Trh thyrotropin
>>> releasing hormone 5.603441 8.761965 103.81774 1.102739e-16
>>> 8.143726e-13 27.72571
>>> 7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency
>>> associated 4 5.300619 8.626239 99.82457 1.640729e-16
9.087587e-13
>>> 27.45790
>>> 7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency
>>> associated 5A 7.663922 9.439668 97.09091 2.173661e-16
9.631491e-13
>>> 27.26346
>>> 17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4
>>> (Drosophila) 4.456642 8.585243 90.39110 4.484584e-16
1.655932e-12
>>> 26.74469
>>> 14084 06jqfFxe5_X97NRXuk Myl3 myosin, light
>>> polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16
1.779384e-12
>>> 26.57755
>>> 8757 oii7mSFyrr_AMWODH0 <na>
>>> <na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12
>>> 25.66512
>>>
>>> any ideas?
>>> thanks
>>> paolo
>>>
Dear Paolo,
An alternative is to use the gene symbols from the Illumina manifest
file.
The gene symbols are often included in the probe summary profile files
output by BeadStudio or GenomeStudio, and are read in automatically by
the
read.ilmn() function of the limma package. See the "Mammary
Progenitor
Cell Populations" case study in the limma User's Guide for a fully
worked
example.
Of course not all probes have gene names, no matter what annotation
you
use.
Best wishes
Gordon
--------------- original message --------------
[BioC] #Identify differentially expressed genes
Paolo Kunderfranco paolo.kunderfranco at gmail.com
Thu Aug 23 12:26:51 CEST 2012
Dear All,
I am working with lumi / limma package to detect differentially
expressed
genes between two or more samples.
I was wondering why when I add geneSymbol and geneName to my Illumina
probelist
most of them (around 10%)are not called and remained NA, for instance
(last
row):
if (require(lumiMouseAll.db) & require(annotate)) {
geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db')
geneName <- sapply(lookUp(probeList,
'lumiMouseAll.db',
'GENENAME'), function(x) x[1])
fit1_2$genes <- data.frame(ID= probeList,
geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE)
}
7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency
associated 5A 7.828381 9.333743 149.31710 2.773571e-18 6.144846e-14
29.80858
16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5, transcription
factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13
27.72832
20450 HlUzpCHheswfSZNdQo Trh thyrotropin
releasing hormone 5.603441 8.761965 103.81774 1.102739e-16
8.143726e-13 27.72571
7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency
associated 4 5.300619 8.626239 99.82457 1.640729e-16 9.087587e-13
27.45790
7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency
associated 5A 7.663922 9.439668 97.09091 2.173661e-16 9.631491e-13
27.26346
17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4
(Drosophila) 4.456642 8.585243 90.39110 4.484584e-16 1.655932e-12
26.74469
14084 06jqfFxe5_X97NRXuk Myl3 myosin, light
polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16 1.779384e-12
26.57755
8757 oii7mSFyrr_AMWODH0 <na>
<na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12
25.66512
any ideas?
thanks
paolo
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}