I am using the same database in this article
they are using GSE38642 database,that includes 54 non-diabetic and 9 diabetic islets samples, platform ( Affymetrix Human Gene 1.0 ST Array)
section 2.2 explain Data preprocessing by Affy package.
I tried to follow their process but I stick in limma
my code is that
rm(list = ls())
library(GEOquery)
library(affy)
library(gcrma)
library(hugene10stv1cdf)
library(hugene10stv1probe)
library(hugene10stprobeset.db)
library(hugene10sttranscriptcluster.db)
library(oligo)
library(affydata)
library(preprocessCore)
getGEOSuppFiles("GSE38642")
untar("GSE38642/GSE38642_RAW.tar", exdir="data")
cels <- list.files("data/", pattern = "[gz]")
sapply(paste("data", cels, sep="/"), gunzip)
cels = list.files("data/", pattern = "CEL")
setwd("/Users/ansamabdulrahman/data")
celfiles=read.celfiles(cels);
pms = pm(celfiles)
rmadata=bg.adjust (pms)
summary(rmadata)
celfiles.rma=normalize.quantiles(log2(rmadata))
eset=assayDataElement(celfiles,'exprs')
samples <- c("healthy","healthy","healthy","healthy","healthy","healthy", "healthy", "healthy","healthy","healthy","T2D","T2D", "healthy","healthy","healthy", "healthy","healthy","healthy","healthy","healthy", "healthy","healthy","healthy","healthy","healthy","healthy","healthy", "healthy","healthy", "healthy","healthy","healthy","healthy","healthy", "healthy","healthy", "healthy","healthy","T2D","T2D","healthy", "healthy","healthy","healthy", "healthy","T2D", "T2D","T2D","healthy","healthy", "healthy","healthy", "healthy","healthy","healthy","healthy","healthy", "healthy","healthy", "healthy", "T2D","T2D","healthy")
library(convert)
design <- model.matrix(~samples)
design
Could anyone help me to complete the process?
Thanks a lot. That is kind of you
Dear Bernrd,
You are using ram normalisation in your article.
I want to use background adjustment and quantile normalization
Do you have a clue how to use transformed expression matrix with probe level to matrix with gene level then use limma
my code is that:
rm(list = ls())
library(GEOquery)
library(affy)
library(gcrma)
library(hugene10stv1cdf)
library(hugene10stv1probe)
library(hugene10stprobeset.db)
library(hugene10sttranscriptcluster.db)
library(oligo)
library(affydata)
library(preprocessCore)
getGEOSuppFiles("GSE38642")
untar("GSE38642/GSE38642_RAW.tar", exdir="data")
cels <- list.files("data/", pattern = "[gz]")
sapply(paste("data", cels, sep="/"), gunzip)
cels = list.files("data/", pattern = "CEL")
setwd("/Users/ansamabdulrahman/data")
celfiles=read.celfiles(cels);
pms = pm(celfiles)
rma_data=bg.adjust (pms)
summary(rma_data)
celfiles_normalize=normalize.quantiles(log2(rmadata))
object<-new("ExpressionSet", exprs=as.matrix(celfiles_normalize))
eset=assayDataElement(object,'exprs')
rma=format(eset, digits=5)
I would be grateful to you.
Why are you doing these things? The authors state that they just did RMA using Affy's Expression Console, so what you are doing isn't even remotely similar to what they did. Do note that there might be some subtle differences between what Expression Console does and what
oligo::rma
does, but they won't be big enough to matter.Also note that there is no profit whatsoever in doing non-standard things, especially as a new R user. The RMA algorithm has been the de facto standard for summarizing Affy data for well over a decade now, and any deviation from that method will require a pretty good rationale. And to provide that rationale you will have to explain what you did, and why you did it, both of which I think will be difficult for you to do.
If you just do
then you will have the normalized, summarized data that the original authors submitted. For the vast majority of users, this is completely acceptable. If you insist on getting the raw data, you can just do
Note that you don't have to unzip things, nor do you need to do all that extra stuff. In either case you now have an ExpressionSet that you can feed directly to limma to make comparisons. Note that limma knows what to do with an ExpressonSet, so you don't need a matrix.
Dear James,
The authors state (The raw data was preprocessed by Affy package (R/Bioconductor) of R language following the three steps: background adjustment, quantile normalization and finally summarization, and logarithmic transformation )
I am using the same dataset in my project. I am interested in up and down regulated genes. In this article they said they got 147 DEGs (threshold of DEGs was P < 0.001)
I used the same process but I got more than 8000 DEGs
Here is my code:
rm(list = ls())
library(GEOquery)
library(affy)
library(gcrma)
library(hugene10stv1cdf)
library(hugene10stv1probe)
library(hugene10stprobeset.db)
library(hugene10sttranscriptcluster.db)
getGEOSuppFiles("GSE38642")
untar("GSE38642/GSE38642_RAW.tar", exdir="data")
cels <- list.files("data/", pattern = "[gz]")
sapply(paste("data", cels, sep="/"), gunzip)
cels = list.files("data/", pattern = "CEL")
setwd("/Users/ansamabdulrahman/data")
library(simpleaffy)
celfiles=ReadAffy(verbose=TRUE, filenames=cels, cdfname="hugene10stv1")
celfiles
celfiles.rma =rma(celfiles)
You seem not to want to listen, but instead to want to, like, copy and paste stuff. That is really tedious - I don't really care what you have done, as it doesn't make sense, and you obviously don't know what you are doing - not to mention vaguely insulting. So here you go, all wrapped up in a ribbon.
And yes, I understand that 127 != 147. The authors you cite say they 'fit a linear model' which is as uninformative as one could possibly get, so it's entirely possible that they included other confounders in their model that we are not including, or maybe they fit a slightly different model. For example, fitting a weighted model gets us up to 157 genes.
May I misunderstand you in the first comment. You said they used RMA. I used RMA as well in the above code but I did not get the same number of DEGs
Thanks for your help and clarifying, I am a new user in R