annotation in microarray analysis
2
0
Entering edit mode
@farahqureshi5-12184
Last seen 7.3 years ago

Greetings!!

 i am sure whether this question is relevant to be aksed here or not. but please try to help me out.

i am following the article " http://bioinformatics.knowledgeblog.org/2011/06/20/analysing-microarray-data-in-bioconductor/

using the command topTable(huvec_ebFit, number=10, coef=1), i am getting an output as:

                logFC  AveExpr         t      P.Value    adj.P.Val        B

214651_s_at  7.240465 4.362789  31.31166 4.772008e-14 1.059268e-09 20.78917

216973_s_at  6.003153 4.568539  30.19540 7.758501e-14 1.059268e-09 20.46574

 

As is clear the ID part is missing, due to which  i am getting an error at 

gene.symbols <- getSYMBOL(probeset.list$ID, "hgu133plus2") command

 Is there any file of command missing and how can I this error be rectified.

 

 

Session info:

> sessioninfo()
Error: could not find function "sessioninfo"
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7600)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] hgu133plus2cdf_2.18.0 hgu133plus2.db_3.2.3  org.Hs.eg.db_3.4.0   
 [4] annotate_1.52.1       XML_3.98-1.5          AnnotationDbi_1.36.2 
 [7] IRanges_2.8.1         S4Vectors_0.12.1      limma_3.30.12        
[10] RColorBrewer_1.1-2    affyPLM_1.50.0        preprocessCore_1.36.0
[13] simpleaffy_2.50.0     gcrma_2.46.0          genefilter_1.56.0    
[16] affy_1.52.0           Biobase_2.34.0        BiocGenerics_0.20.0  

loaded via a namespace (and not attached):
 [1] Biostrings_2.42.1          BiocParallel_1.8.1        
 [3] digest_0.6.12              splines_3.3.2             
 [5] grid_3.3.2                 bitops_1.0-6              
 [7] survival_2.40-1            zlibbioc_1.20.0           
 [9] RSQLite_1.1-2              lattice_0.20-34           
[11] DBI_0.6                    Rsamtools_1.26.1          
[13] Matrix_1.2-8               BiocInstaller_1.24.0      
[15] SummarizedExperiment_1.4.0 tools_3.3.2               
[17] RCurl_1.95-4.8             XVector_0.14.0            
[19] affyio_1.44.0              GenomeInfoDb_1.10.3       
[21] GenomicRanges_1.26.3       xtable_1.8-2              
[23] memoise_1.0.0              Rcpp_0.12.9 

 

microarray • 1.0k views
ADD COMMENT
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 14 days ago
Wageningen University, Wageningen, the …

First a general comment: please note that the workflow you used is rather old (June 2011), and it is likely that the functionality of Bioconductor packages has been changed over the years. In that respect maybe also helpful to check out this more current workflow at the F1000 Bioconductor channel here.

To get to your issue: if you would have checked the content / column names of the object probeset.list, you would have noticed that there is no column present (anymore?) labelled "ID". (This could be due to some changes in limma, but I am not sure about that). You rather should use as input the rownames() of your object:

> probeset.list <- topTable(huvec_ebFit, coef=1, number=10000, lfc=4)
> gene.symbols <- getSYMBOL(rownames(probeset.list), "hgu133plus2")
> head(gene.symbols)
214651_s_at   209905_at 204779_s_at 209631_s_at 216973_s_at 207016_s_at
         NA          NA     "HOXB7"          NA     "HOXB7"          NA
>

 

Alternatively, you could make use of the annotateEset() function of James MacDonald's package affycoretools. This package uses current approaches to easily query annotation databases, and when added to your normalized data object, the annotation info will be automagically used by e.g. limma.

Please note that using the current database querying (that uses the select() interface) results in the annotation of more probesets using default settings than when using the classic bimap interface. See also e.g. A: Trouble getting probe ID info from mouse4302.db annotation package with a specif.

> library(simpleaffy)
> celfiles <- read.affy(covdesc="phenodata.txt")
> celfiles.gcrma <- gcrma(celfiles)
>
> library(affycoretools)
> library(hgu133plus2.db)
>
> celfiles.gcrma <- annotateEset(celfiles.gcrma, hgu133plus2.db)
'select()' returned 1:many mapping between keys and columns
>
> samples <- celfiles.gcrma$Target
> samples <- as.factor(samples)
> design <- model.matrix(~0 + samples)
> colnames(design) <- c("choroid", "huvec", "iris", "retina")
>
> library(limma)
> fit <- lmFit(celfiles.gcrma, design) #Note: took full, unfiltered dataset
> contrast.matrix <- makeContrasts(huvec_choroid = huvec - choroid, huvec_retina = huvec - retina, huvec_iris <- huvec - iris, levels=design)
>
> huvec_fits <- contrasts.fit(fit, contrast.matrix)
> huvec_ebFit <- eBayes(huvec_fits)
> topTable(huvec_ebFit, number=10, coef=1)
                PROBEID ENTREZID    SYMBOL
214651_s_at 214651_s_at     3205     HOXA9
209905_at     209905_at     3206    HOXA10
204779_s_at 204779_s_at     3217     HOXB7
209631_s_at 209631_s_at     2861     GPR37
216973_s_at 216973_s_at     3217     HOXB7
227377_at     227377_at    10642   IGF2BP1
207016_s_at 207016_s_at     8854   ALDH1A2
213150_at     213150_at     3206    HOXA10
205893_at     205893_at    22871     NLGN1
228564_at     228564_at   375295 LINC01116
                                                       GENENAME    logFC
214651_s_at                                         homeobox A9 8.670862
209905_at                                          homeobox A10 7.847163
204779_s_at                                         homeobox B7 7.367790
209631_s_at                       G protein-coupled receptor 37 5.192949
216973_s_at                                         homeobox B7 7.162121
227377_at   insulin like growth factor 2 mRNA binding protein 1 3.672710
207016_s_at           aldehyde dehydrogenase 1 family member A2 6.936667
213150_at                                          homeobox A10 4.639280
205893_at                                          neuroligin 1 4.480331
228564_at           long intergenic non-protein coding RNA 1116 4.625210
             AveExpr         t      P.Value    adj.P.Val        B
214651_s_at 4.446713 296.84201 3.766913e-18 2.059559e-13 21.29047
209905_at   4.240855 223.07223 4.127110e-17 1.128249e-12 21.12708
204779_s_at 4.171707 183.87424 2.083522e-16 3.797218e-12 20.95620
209631_s_at 4.003992 128.75727 4.122515e-15 5.634962e-11 20.44160
216973_s_at 4.069528 101.96645 2.908552e-14 3.180502e-10 19.90526
227377_at   3.209739  88.16569 9.828493e-14 8.956215e-10 19.46773
207016_s_at 4.027733  83.39775 1.565361e-13 1.222659e-09 19.27707
213150_at   3.438818  72.80957 4.876715e-13 3.332930e-09 18.75372
205893_at   3.543714  71.00587 6.015707e-13 3.654542e-09 18.64783
228564_at   3.435310  69.12891 7.527195e-13 4.115494e-09 18.53153
> 
ADD COMMENT
0
Entering edit mode
@farahqureshi5-12184
Last seen 7.3 years ago

thank u so much..stay blessed

ADD COMMENT

Login before adding your answer.

Traffic: 584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6