Firstly i get the (.tar) file from geo db then phenotypic data by using R ,I do data normalization with this method but this time is not working and secondly the .cel files are not opening in text format while i work before on GSE14325 and its .cel files was opened in text format.kindly tell me the possible solution for that.
For phenotypic data: library(GEOquery) gse <- getGEO('GSE94358',GSEMatrix=TRUE) pheno.df <- pData(phenoData(gse[[1]])) pData(gse[[1]]) R script for data normalization:library(affy)
library(affycoretools)
d1<-ReadAffy()
pData(d1)<-read.table("PhenoData.txt",row.names=1,header = T,sep="\t");
pData(d1)
Sample_title
GSM2473987 A_set
GSM2473988 A_set
GSM2473989 AS_set
GSM2473990 AS_set
GSM2473991 AS_set
GSM2473992 F_set
GSM2473993 F_set
GSM2473994 S_set
GSM2473995 S_set
GSM2473996 S_set
Sample_source_name_ch1
GSM2473987 A - adherent cells in high attachment flask with 10% FBS (DMEM/F12 base medium)
GSM2473988 A - adherent cells in high attachment flask with 10% FBS (DMEM/F12 base medium)
GSM2473989 AS - Adherent cells in high attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473990 AS - Adherent cells in high attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473991 AS - Adherent cells in high attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473992 F - Floating cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473993 F - Floating cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473994 S - adherent cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473995 S - adherent cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473996 S - adherent cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
Sample_characteristics_ch1
GSM2473987 date rna collect: 2015-02-05
GSM2473988 date rna collect: 2015-02-05
GSM2473989 date rna collect: 2015-02-05
GSM2473990 date rna collect: 2015-02-02
GSM2473991 date rna collect: 2015-01-30
GSM2473992 date rna collect: 2015-02-02
GSM2473993 date rna collect: 2015-01-30
GSM2473994 date rna collect: 2015-02-05
GSM2473995 date rna collect: 2015-02-02
GSM2473996 date rna collect: 2015-01-30
Sample_description
GSM2473987 expression levels for cells expanded in adherent flasks in traditional media
GSM2473988 expression levels for cells expanded in adherent flasks in traditional media
GSM2473989 expression levels for cells expanded in adherent flasks in stem cell media
GSM2473990 expression levels for cells expanded in adherent flasks in stem cell media
GSM2473991 expression levels for cells expanded in adherent flasks in stem cell media
GSM2473992 expression levels for floating cells obtained from adherent cultures that were expanded in ultra-low attachment flasks with stem cell media
GSM2473993 expression levels for floating cells obtained from adherent cultures that were expanded in ultra-low attachment flasks with stem cell media
GSM2473994 expression levels for cells obtained from adherent cultures and expanded in ultra-low attachment flasks with stem cell media
GSM2473995 expression levels for cells obtained from adherent cultures and expanded in ultra-low attachment flasks with stem cell media
GSM2473996 expression levels for cells obtained from adherent cultures and expanded in ultra-low attachment flasks with stem cell media
> d1
AffyBatch object
size of arrays=1164x1164 features (25 kb)
cdf=HG-U133_Plus_2 (54675 affyids)
number of samples=10
number of genes=54675
annotation=hgu133plus2
notes=
Warning messages:
1: replacing previous import 'AnnotationDbi::tail' by 'utils::tail' when loading 'hgu133plus2cdf'
2: replacing previous import 'AnnotationDbi::head' by 'utils::head' when loading 'hgu133plus2cdf'
> eset.rma <- rma(d1)
Background correcting
Normalizing
Calculating Expression
Error in validObject(.Object) :
invalid class "ExpressionSet" object: sampleNames differ between phenoData and protocolData
> boxplot(exprs(eset.rma))
Error in exprs(eset.rma) : object 'eset.rma' not found
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE 13.1 (Bottle) (x86_64)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] hgu133plus2cdf_2.18.0 affycoretools_1.46.5 affy_1.52.0
[4] Biobase_2.34.0 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] Category_2.40.0 bitops_1.0-6
[3] RColorBrewer_1.1-2 httr_1.2.1
[5] GenomeInfoDb_1.10.3 gcrma_2.46.0
[7] tools_3.3.3 backports_1.0.5
[9] R6_2.2.0 affyio_1.44.0
[11] KernSmooth_2.23-15 rpart_4.1-10
[13] Hmisc_4.0-2 DBI_0.6
[15] lazyeval_0.2.0 colorspace_1.3-2
[17] nnet_7.3-12 gridExtra_2.2.1
[19] GGally_1.3.0 DESeq2_1.14.1
[21] bit_1.1-12 preprocessCore_1.36.0
[23] graph_1.52.0 htmlTable_1.9
[25] rtracklayer_1.34.2 ggbio_1.22.4
[27] caTools_1.17.1 scales_0.4.1
[29] checkmate_1.8.2 genefilter_1.56.0
[31] RBGL_1.50.0 stringr_1.2.0
[33] digest_0.6.12 Rsamtools_1.26.1
[35] foreign_0.8-67 R.utils_2.5.0
[37] AnnotationForge_1.16.1 XVector_0.14.1
[39] base64enc_0.1-3 dichromat_2.0-0
[41] htmltools_0.3.5 ensembldb_1.6.2
[43] limma_3.30.13 BSgenome_1.42.0
[45] htmlwidgets_0.8 PFAM.db_3.4.0
[47] RSQLite_1.1-2 BiocInstaller_1.24.0
[49] shiny_1.0.0 GOstats_2.40.0
[51] hwriter_1.3.2 gtools_3.5.0
[53] BiocParallel_1.8.1 R.oo_1.21.0
[55] acepack_1.4.1 VariantAnnotation_1.20.3
[57] RCurl_1.95-4.8 magrittr_1.5
[59] GO.db_3.4.0 Formula_1.2-1
[61] oligoClasses_1.36.0 Matrix_1.2-8
[63] Rcpp_0.12.10 munsell_0.4.3
[65] S4Vectors_0.12.2 R.methodsS3_1.7.1
[67] stringi_1.1.3 yaml_2.1.14
[69] edgeR_3.16.5 SummarizedExperiment_1.4.0
[71] zlibbioc_1.20.0 gplots_3.0.1
[73] plyr_1.8.4 AnnotationHub_2.6.5
[75] grid_3.3.3 gdata_2.17.0
[77] ReportingTools_2.14.0 lattice_0.20-35
[79] Biostrings_2.42.1 splines_3.3.3
[81] GenomicFeatures_1.26.3 annotate_1.52.1
[83] locfit_1.5-9.1 knitr_1.15.1
[85] GenomicRanges_1.26.4 codetools_0.2-15
[87] geneplotter_1.52.0 reshape2_1.4.2
[89] biomaRt_2.30.0 stats4_3.3.3
[91] XML_3.98-1.5 biovizBase_1.22.0
[93] latticeExtra_0.6-28 data.table_1.10.4
[95] foreach_1.4.3 httpuv_1.3.3
[97] gtable_0.2.0 reshape_0.8.6
[99] assertthat_0.1 ggplot2_2.2.1
[101] mime_0.5 xtable_1.8-2
[103] ff_2.2-13 survival_2.41-2
[105] OrganismDbi_1.16.0 tibble_1.2
[107] iterators_1.0.8 GenomicAlignments_1.10.1
[109] AnnotationDbi_1.36.2 memoise_1.0.0
[111] IRanges_2.8.2 cluster_2.0.6
[113] interactiveDisplayBase_1.12.0 GSEABase_1.36.0
so if i do the LIMMA step directly so how to get the entrez id in up and down annotated table.
colnames(annotated_table)[colnames(annotated_table)=="FC"]<-"FoldChange DownS/Norm";
so if i do the LIMMA step directly so how to get the entrez id in up and down annotated table.
colnames(annotated_table)[colnames(annotated_table)=="FC"]<-"FoldChange DownS/Norm";
Thankyou James it works i need one more help.
i want to identify and retrieve all DEGs from the data and set the p value<0.05 and fdr<0.01 how can i do this?
" The limma package in R wasused to identify DEGs. Altered expression of probes was determined using t-tests, and theBenjamini-Hochberg method14 was used for multiple test corrections. Probes with expression changes of p < 0.05 and corresponding False Discovery Rate (FDR) < 0.01 were considered to be statistically significant."
You will get farther, faster if you learn how to figure things out for yourself. For example, the help for topTable has information that would be relevant for your question. I'll leave it up to you to read that help page and figure out how to do this.
I should however note that a p-value of 0.05 and an FDR of 0.01 are mutually exclusive, so whomever you are quoting is saying something nonsensical. In other words, if you have a vector of p-values and then you use p.adjust to compute FDR, the resulting FDR values will be greater than or equal to the original p-values, but NEVER smaller.