Dear All,
i have used the limma software package to implement an paired statistical analysis for a dataset regarding DE expression between control & cancer samples. Heres my code:
library(limma)
library(hgu133a.db)
conditions <- data.trusted.eset$condition
condition <- factor(conditions, levels(condition)[c(2,1)])
pairs <- factor(rep(1:13, each = 2))
design <- model.matrix(~condition+pairs)
fit <- lmFit(data.trusted.eset, design)
fit2 <- eBayes(fit)
library(hgu133a.db)
symbols <- unlist(mget(featureNames(data.trusted.eset), env=hgu133aSYMBOL))
top <- topTable(fit2, coef="conditionCancer", number=nrow(fit2), adjust.method="fdr", genelist=symbols)
select <- top[which(abs(top$logFC) >1 & top$ adj.P.Val < 0.05),]
My main question(as a beginner in R/bioconductor), is because my platform is Affymetrix and some probesets match to the same gene, how i can remove these dublicates(select$ID column) which have the same gene 2 or more times and remove them, so that i can extract my DE list without gene dublicate symbols for further analysis ?
Thank you in advance
Go read the documentation and accompanying paper for the
treat
function. It provides a statistically principled way to combine the p-value and fold change cutoff into a single test.thank you again for your recommendation-i'm definately going to read the paper to get a validated approach on my data set analysis !! i also used another approach,to get unique PROBEID, SYMBOL & ENTREZID but i'm not sure that is correct as the above :
after selected <- top[which(abs(top$logFC) >1 & top$ adj.P.Val < 0.05),]
ls("package:hgu133a.db")
columns(hgu133a.db)
keytypes(hgu133a.db)
res <- select(hgu133a.db, keys=rownames(selected),columns=c("ENTREZID", "SYMBOL"),
keytype="PROBEID")
idx <- match(rownames(selected), res$PROBEID)
res2 <- res[idx,]