#Identify differentially expressed genes
2
0
Entering edit mode
@paolo-kunderfranco-5158
Last seen 7.4 years ago
Dear All, I am working with lumi / limma package to detect differentially expressed genes between two or more samples. I was wondering why when I add geneSymbol and geneName to my Illumina probelist most of them (around 10%)are not called and remained NA, for instance (last row): if (require(lumiMouseAll.db) & require(annotate)) { geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', 'GENENAME'), function(x) x[1]) fit1_2$genes <- data.frame(ID= probeList, geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) } 7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency associated 5A 7.828381 9.333743 149.31710 2.773571e-18 6.144846e-14 29.80858 16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5, transcription factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13 27.72832 20450 HlUzpCHheswfSZNdQo Trh thyrotropin releasing hormone 5.603441 8.761965 103.81774 1.102739e-16 8.143726e-13 27.72571 7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency associated 4 5.300619 8.626239 99.82457 1.640729e-16 9.087587e-13 27.45790 7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency associated 5A 7.663922 9.439668 97.09091 2.173661e-16 9.631491e-13 27.26346 17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4 (Drosophila) 4.456642 8.585243 90.39110 4.484584e-16 1.655932e-12 26.74469 14084 06jqfFxe5_X97NRXuk Myl3 myosin, light polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16 1.779384e-12 26.57755 8757 oii7mSFyrr_AMWODH0 <na> <na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12 25.66512 any ideas? thanks paolo [[alternative HTML version deleted]]
limma lumi limma lumi • 1.3k views
ADD COMMENT
0
Entering edit mode
Pan Du ▴ 440
@pan-du-4636
Last seen 10.2 years ago
Hi Paolo It's pretty normal that some probes cannot map to any genes because Human genome annotation keeps updating but the probe design is unchanged. So usually you can ignore those NA probes. If you are really interested in them, you can easily convert the nuID to probe sequence (use id2seq function) and map them to genome or refseq. I will update the vignette later to avoid such confusion. Pan On Thu, Aug 23, 2012 at 3:26 AM, Paolo Kunderfranco <paolo.kunderfranco at="" gmail.com=""> wrote: > Dear All, > I am working with lumi / limma package to detect differentially expressed > genes between two or more samples. > I was wondering why when I add geneSymbol and geneName to my Illumina > probelist most of them (around 10%)are not called and remained NA, for > instance (last row): > > > if (require(lumiMouseAll.db) & require(annotate)) { > geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') > geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', > 'GENENAME'), function(x) x[1]) > fit1_2$genes <- data.frame(ID= probeList, > geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) > } > > > 7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency > associated 5A 7.828381 9.333743 149.31710 2.773571e-18 6.144846e-14 > 29.80858 > 16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5, transcription > factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13 > 27.72832 > 20450 HlUzpCHheswfSZNdQo Trh thyrotropin > releasing hormone 5.603441 8.761965 103.81774 1.102739e-16 > 8.143726e-13 27.72571 > 7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency > associated 4 5.300619 8.626239 99.82457 1.640729e-16 9.087587e-13 > 27.45790 > 7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency > associated 5A 7.663922 9.439668 97.09091 2.173661e-16 9.631491e-13 > 27.26346 > 17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4 > (Drosophila) 4.456642 8.585243 90.39110 4.484584e-16 1.655932e-12 > 26.74469 > 14084 06jqfFxe5_X97NRXuk Myl3 myosin, light > polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16 1.779384e-12 > 26.57755 > 8757 oii7mSFyrr_AMWODH0 <na> > <na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12 > 25.66512 > > any ideas? > thanks > paolo >
ADD COMMENT
0
Entering edit mode
Hi, Thanks for the fast reply, Just another curiosity, I finally obtained list of genes significantly differentially expressed between my conditions, I would like now to build for each condition an heatmap. I constructed for each one a data matrix based on a lumi.N.Q object (after VST variance stabilizing transformation and normalization). >dataMatrix <- exprs(lumi.N.Q) >head(dataMatrix) CMe_2 mES_2 CMa_1 CMp_1 CMe_1 mES_1 CMa_3 CMp_3 CMe_3 mES_3 CMa_2 CMp_2 T.kvUVfFyruaCeqZVs 7.306945 7.312214 7.302783 7.302670 7.294789 7.273550 7.381502 7.312916 7.301191 7.268140 7.272041 7.278205 Q96MoiqM_z49FU37tE 7.257576 7.278205 7.300322 7.226146 7.281574 7.281210 7.294743 7.393729 7.340853 7.315099 7.312916 7.270886 >heatmap(dataMatrix[1:5,]) I was wondering on the scale of the heatmap, usually values ranges from -3 to 3, but data VST transformed in my example ranges from 7 to 14, with a median value of 7.5... is it correct to plot a heatmap with VST transfmed data? or is preferable to perfrom variance stabilization using log2 transform? If is correct to use vst transformed data how do I refer to the scale? Thanks for any suggestion, Cheers, Paolo 2012/8/23 Pan Du <dupan.mail at="" gmail.com="">: > Hi Paolo > > It's pretty normal that some probes cannot map to any genes because > Human genome annotation keeps updating but the probe design is > unchanged. So usually you can ignore those NA probes. If you are > really interested in them, you can easily convert the nuID to probe > sequence (use id2seq function) and map them to genome or refseq. > I will update the vignette later to avoid such confusion. > > Pan > > On Thu, Aug 23, 2012 at 3:26 AM, Paolo Kunderfranco > <paolo.kunderfranco at="" gmail.com=""> wrote: >> Dear All, >> I am working with lumi / limma package to detect differentially expressed >> genes between two or more samples. >> I was wondering why when I add geneSymbol and geneName to my Illumina >> probelist most of them (around 10%)are not called and remained NA, for >> instance (last row): >> >> >> if (require(lumiMouseAll.db) & require(annotate)) { >> geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') >> geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', >> 'GENENAME'), function(x) x[1]) >> fit1_2$genes <- data.frame(ID= probeList, >> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) >> } >> >> >> 7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency >> associated 5A 7.828381 9.333743 149.31710 2.773571e-18 6.144846e-14 >> 29.80858 >> 16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5, transcription >> factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13 >> 27.72832 >> 20450 HlUzpCHheswfSZNdQo Trh thyrotropin >> releasing hormone 5.603441 8.761965 103.81774 1.102739e-16 >> 8.143726e-13 27.72571 >> 7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency >> associated 4 5.300619 8.626239 99.82457 1.640729e-16 9.087587e-13 >> 27.45790 >> 7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency >> associated 5A 7.663922 9.439668 97.09091 2.173661e-16 9.631491e-13 >> 27.26346 >> 17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4 >> (Drosophila) 4.456642 8.585243 90.39110 4.484584e-16 1.655932e-12 >> 26.74469 >> 14084 06jqfFxe5_X97NRXuk Myl3 myosin, light >> polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16 1.779384e-12 >> 26.57755 >> 8757 oii7mSFyrr_AMWODH0 <na> >> <na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12 >> 25.66512 >> >> any ideas? >> thanks >> paolo >>
ADD REPLY
0
Entering edit mode
> > I was wondering on the scale of the heatmap, usually values ranges > from -3 to 3, but data VST transformed in my example ranges from 7 to > 14, with a median value of 7.5... > is it correct to plot a heatmap with VST transfmed data? or is > preferable to perfrom variance stabilization using log2 transform? > If is correct to use vst transformed data how do I refer to the scale? The scale of heatmap in the range between -3 and 3 is because by default "heatmap" function will scale each row to 0 mean and 1 standard deviation. The purpose of this is to visualize the relative changes of the patterns. But in the meantime, the original scale information is lost. So the heatmap scale has no direct relation with vst or log2 transform. Pan > > > > > 2012/8/23 Pan Du <dupan.mail at="" gmail.com="">: >> Hi Paolo >> >> It's pretty normal that some probes cannot map to any genes because >> Human genome annotation keeps updating but the probe design is >> unchanged. So usually you can ignore those NA probes. If you are >> really interested in them, you can easily convert the nuID to probe >> sequence (use id2seq function) and map them to genome or refseq. >> I will update the vignette later to avoid such confusion. >> >> Pan >> >> On Thu, Aug 23, 2012 at 3:26 AM, Paolo Kunderfranco >> <paolo.kunderfranco at="" gmail.com=""> wrote: >>> Dear All, >>> I am working with lumi / limma package to detect differentially expressed >>> genes between two or more samples. >>> I was wondering why when I add geneSymbol and geneName to my Illumina >>> probelist most of them (around 10%)are not called and remained NA, for >>> instance (last row): >>> >>> >>> if (require(lumiMouseAll.db) & require(annotate)) { >>> geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') >>> geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', >>> 'GENENAME'), function(x) x[1]) >>> fit1_2$genes <- data.frame(ID= probeList, >>> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) >>> } >>> >>> >>> 7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency >>> associated 5A 7.828381 9.333743 149.31710 2.773571e-18 6.144846e-14 >>> 29.80858 >>> 16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5, transcription >>> factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13 >>> 27.72832 >>> 20450 HlUzpCHheswfSZNdQo Trh thyrotropin >>> releasing hormone 5.603441 8.761965 103.81774 1.102739e-16 >>> 8.143726e-13 27.72571 >>> 7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency >>> associated 4 5.300619 8.626239 99.82457 1.640729e-16 9.087587e-13 >>> 27.45790 >>> 7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency >>> associated 5A 7.663922 9.439668 97.09091 2.173661e-16 9.631491e-13 >>> 27.26346 >>> 17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4 >>> (Drosophila) 4.456642 8.585243 90.39110 4.484584e-16 1.655932e-12 >>> 26.74469 >>> 14084 06jqfFxe5_X97NRXuk Myl3 myosin, light >>> polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16 1.779384e-12 >>> 26.57755 >>> 8757 oii7mSFyrr_AMWODH0 <na> >>> <na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12 >>> 25.66512 >>> >>> any ideas? >>> thanks >>> paolo >>>
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 15 hours ago
WEHI, Melbourne, Australia
Dear Paolo, An alternative is to use the gene symbols from the Illumina manifest file. The gene symbols are often included in the probe summary profile files output by BeadStudio or GenomeStudio, and are read in automatically by the read.ilmn() function of the limma package. See the "Mammary Progenitor Cell Populations" case study in the limma User's Guide for a fully worked example. Of course not all probes have gene names, no matter what annotation you use. Best wishes Gordon --------------- original message -------------- [BioC] #Identify differentially expressed genes Paolo Kunderfranco paolo.kunderfranco at gmail.com Thu Aug 23 12:26:51 CEST 2012 Dear All, I am working with lumi / limma package to detect differentially expressed genes between two or more samples. I was wondering why when I add geneSymbol and geneName to my Illumina probelist most of them (around 10%)are not called and remained NA, for instance (last row): if (require(lumiMouseAll.db) & require(annotate)) { geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', 'GENENAME'), function(x) x[1]) fit1_2$genes <- data.frame(ID= probeList, geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) } 7671 69fpKOOuFduFbAjNVU Dppa5a developmental pluripotency associated 5A 7.828381 9.333743 149.31710 2.773571e-18 6.144846e-14 29.80858 16014 QpWgiAmByT4gW7iui0 Pou5f1 POU domain, class 5, transcription factor 1 5.305532 8.633706 103.85793 1.098423e-16 8.143726e-13 27.72832 20450 HlUzpCHheswfSZNdQo Trh thyrotropin releasing hormone 5.603441 8.761965 103.81774 1.102739e-16 8.143726e-13 27.72571 7670 o7Ah_nzF7JdZOTtd9U Dppa4 developmental pluripotency associated 4 5.300619 8.626239 99.82457 1.640729e-16 9.087587e-13 27.45790 7672 xjn0tTp4isUXmUkAKI Dppa5a developmental pluripotency associated 5A 7.663922 9.439668 97.09091 2.173661e-16 9.631491e-13 27.26346 17719 ZXvxHuC6s3xogRFJfo Sall4 sal-like 4 (Drosophila) 4.456642 8.585243 90.39110 4.484584e-16 1.655932e-12 26.74469 14084 06jqfFxe5_X97NRXuk Myl3 myosin, light polypeptide 3 -7.736059 13.128014 -88.39591 5.622067e-16 1.779384e-12 26.57755 8757 oii7mSFyrr_AMWODH0 <na> <na> 4.608167 8.438770 78.65631 1.833459e-15 5.077535e-12 25.66512 any ideas? thanks paolo ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6