Hello,
I was hoping someone could give this script a look-through and determine if I have made an error at some point. This regards the analysis of Affymetrix 1.1 st zebrafish gene arrays.
I have three groups of samples and I am detecting differentially expressed genes between all three contrasts (1: morphant-control 2: rescue-control 3: morphant-rescue). I have 8x control samples, 4x treatment#1 (morphant) samples, and 4x treatment#2 (rescue) samples. I am using limma's lmFit and eBayes for this. When I get my topTable results back, the fold change signs are inverted. The genes that should be downregulated are returned as positive logFC integers and vice versa . I know from analyzing this data previously that, for instance, KEGG pathway enrichment for phototransduction should be downregulated but in this case it will be returned as upregulated.
It seems like a silly inquiry, but I cant figure it out.
-Matt
#### read samples in
library(oligo)
CELlist <- list.celfiles("C:\\Users\\mat149\\Desktop\\GG\\CEL", full.names=TRUE, pattern=NULL, all.file=FALSE)
pdat<-read.AnnotatedDataFrame("C:\\Users\\mat149\\Desktop\\GG\\CEL\\phenoMOD.txt",header=TRUE,row.name="Filename",sep="\t")
CELdat <- read.celfiles(filenames = CELlist,experimentData=TRUE,phenoData=pdat,verbose=TRUE)
### RMA
eset<-rma(CELdat, background=TRUE, normalize=TRUE, subset=NULL, target="core")
library(affycoretools)
eset <- annotateEset(eset, annotation(eset))
library(org.Dr.eg.db)
fd <- fData(eset)
fd$ENTREZID <- mapIds(org.Dr.eg.db, as.character(fd$SYMBOL), "ENTREZID","SYMBOL")
fData(eset) <- fd
### PHENO DATA
ph = CELdat@phenoData
ph@data[ ,1] = c("WT1","WT2","WT3","WT4","WT5","WT6","WT7","WT8","MO1","MO2","MO3","MO4","RS1","RS2","RS3","RS4")
ph@data[ ,2] = c("control","control","control","control","control","control","control","control","morphant","morphant","morphant","morphant","rescue","rescue","rescue","rescue")
colnames(ph@data)[2]="Treatment"
colnames(ph@data)[1]="Sample"
groups = ph@data$Treatment
f = factor(groups,levels=c("control","morphant","rescue"))
#### LIMMA
library(limma)
design = model.matrix(~ 0 + f)
colnames(design)=c("control","morphant","rescue")
design
data.fit = lmFit(eset,design)
contrast.matrix = makeContrasts(morphant-control,rescue-control,morphant-rescue,levels=design)
data.fit.con = contrasts.fit(data.fit,contrast.matrix)
data.fit.eb = eBayes(data.fit.con)
####
topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=2,sort.by="P")
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] limma_3.28.21 org.Dr.eg.db_3.3.0 AnnotationDbi_1.34.4
[4] affycoretools_1.44.3 pd.zebgene.1.1.st_3.12.0 RSQLite_1.0.0
[7] DBI_0.5-1 oligo_1.36.1 Biostrings_2.40.2
[10] XVector_0.12.1 IRanges_2.6.1 S4Vectors_0.10.3
[13] Biobase_2.32.0 oligoClasses_1.34.0 BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 munsell_0.4.3
[3] latticeExtra_0.6-28 BSgenome_1.40.1
[5] dichromat_2.0-0 R.utils_2.5.0
[7] PFAM.db_3.3.0 httpuv_1.3.3
[9] R6_2.2.0 ensembldb_1.4.7
[11] graph_1.50.0 affxparser_1.44.0
[13] BiocInstaller_1.22.3 data.table_1.9.6
[15] reshape_0.8.6 annotate_1.50.1
[17] xtable_1.8-2 gdata_2.17.0
[19] tools_3.3.1 stringr_1.1.0
[21] rtracklayer_1.32.2 mime_0.5
[23] GSEABase_1.34.1 shiny_0.14.2
[25] chron_2.3-47 R.oo_1.21.0
[27] GOstats_2.38.1 foreach_1.4.3
[29] digest_0.6.10 KernSmooth_2.23-15
[31] GO.db_3.3.0 GenomeInfoDb_1.8.7
[33] codetools_0.2-15 GGally_1.2.0
[35] ff_2.2-13 GenomicAlignments_1.8.4
[37] gplots_3.0.1 genefilter_1.54.2
[39] scales_0.4.1 stringi_1.1.2
[41] locfit_1.5-9.1 R.methodsS3_1.7.1
[43] gcrma_2.44.0 lattice_0.20-33
[45] AnnotationForge_1.14.2 interactiveDisplayBase_1.10.3
[47] biovizBase_1.20.0 Rcpp_0.12.7
[49] OrganismDbi_1.14.1 caTools_1.17.1
[51] Hmisc_4.0-0 Formula_1.2-1
[53] ggplot2_2.1.0 htmlTable_1.7
[55] Category_2.38.0 grid_3.3.1
[57] ReportingTools_2.12.2 GenomicRanges_1.24.3
[59] preprocessCore_1.34.0 plyr_1.8.4
[61] RBGL_1.48.1 survival_2.39-4
[63] edgeR_3.14.0 acepack_1.4.1
[65] affy_1.50.0 rpart_4.1-10
[67] magrittr_1.5 SummarizedExperiment_1.2.3
[69] VariantAnnotation_1.18.7 gridExtra_2.2.1
[71] affyio_1.42.0 biomaRt_2.28.0
[73] htmltools_0.3.5 ggbio_1.20.2
[75] knitr_1.15 nnet_7.3-12
[77] gtable_0.2.0 zlibbioc_1.18.0
[79] colorspace_1.2-7 geneplotter_1.50.0
[81] cluster_2.0.4 gtools_3.5.0
[83] RCurl_1.95-4.8 DESeq2_1.12.4
[85] bitops_1.0-6 RColorBrewer_1.1-2
[87] Matrix_1.2-6 foreign_0.8-66
[89] bit_1.1-12 GenomicFeatures_1.24.5
[91] hwriter_1.3.2 reshape2_1.4.2
[93] XML_3.98-1.4 AnnotationHub_2.4.2
[95] iterators_1.0.8 splines_3.3.1
[97] BiocParallel_1.6.6 Rsamtools_1.24.0
Several points:
In addition: take a gene that "you know" the sign is inverted for and plot its expression from your data among the two groups implicated in your contrast.
I agree with Aaron in that your code looks mostly correct, so apart from perhaps a mislabelling of the conditions to samples, positive logFCs in your coef=1 result will give you genes higher expressed in "morphant" compared to their expression levels in "control"