Question

Confusion about Design Matrix ,Contrast Matrix and logFC values in limma package

0

Entering edit mode

sujata • 0

@sujata-22476

Last seen 3.7 years ago

India/Chennai/REC

For my data set 9 control and 7 cases are there The following matrix is the design matrix generated by command design1=model.matrix(~mm) where mm is

mm=pData(eset)$treatment

and eset has been got by following command

m<-as.matrix(read.csv("Alzheimer.csv", sep=",", row.names=1))
tmp <- read.csv("pdata.txt", row.names = 1,sep=",")
pdata <- AnnotatedDataFrame(tmp)
eset <- new("ExpressionSet", exprs = m, phenoData = pdata)
mm=pData(eset)$treatment

design1
   (Intercept) mm
1            1  0
2            1  0
3            1  0
4            1  0
5            1  0
6            1  0
7            1  0
8            1  0
9            1  0
10           1  1
11           1  1
12           1  1
13           1  1
14           1  1
15           1  1
16           1  1
attr(,"assign")
[1] 0 1

I am very new to the area for microarray data analysis I am making contrast matrix by the following commands and getting some warnnings Please help me in clarifying the points. I am confused whether I am doing correctly or not.

> contrastm <- makeContrasts(mm, levels = design1)
Warning message:
In makeContrasts(mm, levels = design1) : Renaming (Intercept) to Intercept
>contrastm
           Contrasts
Levels      mm
  Intercept  0
  mm         1
> fit2 <- contrasts.fit(fit, contrastm)
Warning message:
In contrasts.fit(fit, contrastm) :
  row names of contrasts don't match col names of coefficients
> fit2b <- eBayes(fit2)
> topTable(fit2b,coef=1)
                logFC   AveExpr         t      P.Value  adj.P.Val         B
215306_at    431.0302  376.9312  8.366248 5.120111e-07 0.01140914 -4.457608
218801_at   -281.7603  491.2188 -7.091080 3.782164e-06 0.02504624 -4.467288
203894_at   -413.7508  862.6063 -7.063322 3.959712e-06 0.02504624 -4.467541
219746_at    283.7810  439.5875  6.986785 4.496026e-06 0.02504624 -4.468248
213666_at   -419.2190  702.4250 -6.299508 1.457800e-05 0.05814260 -4.475374
203461_at    222.0429  227.0438  6.255007 1.576693e-05 0.05814260 -4.475888
213929_at    151.5079  209.1625  6.171945 1.826497e-05 0.05814260 -4.476866
209129_at    218.1206  436.3500  6.015035 2.417706e-05 0.06734219 -4.478782
218456_at   -700.8302 1294.4313 -5.822691 3.425263e-05 0.08330774 -4.481258
200980_s_at -334.3381 1216.2938 -5.774792 3.738623e-05 0.08330774 -4.481897

why logFC is value is so high when thresold is only 1.5

-ve values are down regulated gene and prositive values are uprelulated genes?

limma • 863 views

ADD COMMENT • link updated 5.0 years ago by Gordon Smyth 51k • written 5.0 years ago by sujata • 0

score 1 · Answer 1 · 2019-12-02

What do you mean by 'the threshold is only 1.5'? If you had a threshold of 1.5, and the fold change was 431, that seems OK? I mean, 431 is greater than 1.5, so it seems reasonable to me. But you don't have anything here that indicates a threshold, so I have no idea what you are getting at. In addition, you apparently are using raw expression values of some kind, and in general you would probably want to take logs.

Anyway, you are being pretty mysterious about what you are doing and what sort of data these are, so it's pretty difficult to provide any reasonable help.

James W. MacDonald · Answer 2 · 2019-12-02

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

The first problem is that you are apparently trying to analyse expression data that is unlogged and (possibly) unnormalized. limma works on log2-expression values, which yours are not.

Apart from that, your analysis might be ok but you are including quite a few unnecessary commands and steps. Why don't you follow the limma User's Guide examples? I wonder what documentation you are following, because comparing two groups is very straightforward. You don't even need a contrast matrix (or an ExpressionSet or an AnnotatedDataFrame).

If you want more help you would have to show us what your data is, for example by:

head(m)
head(tmp)

You would also need to provide code that is consistent. At the moment, you use objects design and fit without showing the code that created them.

ADD COMMENT • link 5.0 years ago Gordon Smyth 51k

0

Entering edit mode

The data and the code I have pasted here

GSM21215 GSM21217 GSM21218 GSM21219 GSM21220 GSM21221 GSM21226 GSM21231 GSM21232 GSM21203 GSM21206
1007_s_at   2735.0   3746.0   3317.0   5689.0   4192.1   3048.6   4723.8   4662.9   5801.2   6141.2   4533.3
1053_at       42.7     26.1    138.9     77.6     86.2    129.4     48.4    111.3     34.3    105.5    145.0
117_at       161.3    153.6    381.2    248.8    894.1     98.2    233.7     86.6    328.0     92.0   6045.3
121_at      1272.3    979.6    917.4   1291.9   1176.9   1242.9   1659.7   1398.0   1631.5   1105.5   1237.8
1255_g_at    312.7    444.6     97.5    206.4    189.8    197.5    137.1    120.2    142.0    221.4     77.4
1294_at      320.8    319.9    297.0    470.8    240.2    285.1    542.2    526.9    707.5    493.6    178.6
          GSM21207 GSM21208 GSM21212 GSM21213 GSM21229
1007_s_at   9341.1   5003.2   8013.1   7905.5   3646.4
1053_at       23.3      6.1     74.9     27.7     66.2
117_at       262.5    240.7    170.3     77.6    172.9
121_at      2538.4   2055.6   1565.0   2258.2   1287.7
1255_g_at    134.0    202.4    188.8    232.7     84.0
1294_at      450.1    419.3    588.2    532.8    303.5
 head(tmp)
         treatment
GSM21215         0
GSM21217         0
GSM21218         0
GSM21219         0
GSM21220         0
GSM21221         0

pdata.txt
id,treatment
GSM21215,0
GSM21217,0
GSM21218,0
GSM21219,0
GSM21220,0  
GSM21221,0
GSM21226,0
GSM21231,0
GSM21232,0
GSM21203,1
GSM21206,1  
GSM21207,1



code:
m<-as.matrix(read.csv("Alzheimer.csv", sep=",", row.names=1))
tmp <- read.csv("pdata.txt", row.names = 1,sep=",")
pdata <- AnnotatedDataFrame(tmp)
eset <- new("ExpressionSet", exprs = m, phenoData = pdata)
mm=pData(eset)$treatment
design1=model.matrix(~mm)
fit=lmFit(eset,design1)
contrastm <- makeContrasts(mm, levels = design1)
fit2 <- contrasts.fit(fit, contrastm)
fit2b <- eBayes(fit2)
topTable(fit2b,coef=1)

ADD REPLY • link updated 5.0 years ago by James W. MacDonald 67k • written 5.0 years ago by sujata • 0

0

Entering edit mode

That's just a part of your code. And looking at that GSE I don't see where there's any treatment? There is also no need to do it the way you have. An alternative is to do

> library(GEOquery)
> eset <- getGEO("GSE1297")[[1]]
> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22283 features, 31 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM21203 GSM21204 ... GSM21233 (31 total)
  varLabels: title geo_accession ... Sex:ch1 (56 total)
  varMetadata: labelDescription
featureData
  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (22283 total)
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 14769913 
Annotation: GPL96 
## design matrix with Alzheimer severity and sex
> design <- model.matrix(~characteristics_ch1.1 + characteristics_ch1.6, pData(eset))
> colnames(design)[-1] <- c("Incipient","Moderate","Severe","M")
> exprs(eset) <- log2(exprs(eset))
## only keep the RefSeq/GenBank ID, gene name, symbol, and Gene ID
> fData(eset) <- fData(eset)[,c(9:12)]
> fit <- lmFit(eset, design)
> fit2 <- eBayes(fit)
> topTable(fit2, 2, 2)
          Representative.Public.ID                         Gene.Title
222196_at                 AK000470 zinc finger protein 839 pseudogene
207479_at                NM_018613                                   
          Gene.Symbol ENTREZ_GENE_ID    logFC  AveExpr        t      P.Value
222196_at   LOC389906         389906 2.800012 3.879957 4.598821 7.601561e-05
207479_at                            1.788760 3.654219 4.298046 1.748021e-04
          adj.P.Val         B
222196_at 0.9068039 -3.988158
207479_at 0.9068039 -4.040426
> topTable(fit2, 3, 2)
          Representative.Public.ID
209628_at                 AK023289
202133_at                 BF674349
                                               Gene.Title Gene.Symbol
209628_at nuclear transport factor 2-like export factor 2        NXT2
202133_at  WW domain containing transcription regulator 1       WWTR1
          ENTREZ_GENE_ID    logFC  AveExpr        t      P.Value adj.P.Val
209628_at          55916 1.059065 8.579427 5.239891 1.273193e-05 0.1847719
202133_at          25937 1.816572 9.433800 5.145136 1.658411e-05 0.1847719
                 B
209628_at 2.756052
202133_at 2.545644
> topTable(fit2, 4, 5)
            Representative.Public.ID
219073_s_at                NM_017784
215320_at                   AL080179
206384_at                  NM_006539
204743_at                  NM_013259
218456_at                  NM_023925
                                                             Gene.Title
219073_s_at                           oxysterol binding protein-like 10
215320_at                                 SPATA31 subfamily C, member 2
206384_at           calcium channel, voltage-dependent, gamma subunit 3
204743_at                                                  transgelin 3
218456_at                                        caprin family member 2
            Gene.Symbol ENTREZ_GENE_ID      logFC   AveExpr         t
219073_s_at     OSBPL10         114884 -1.6670533  7.910239 -6.687176
215320_at     SPATA31C2         645961  3.1101555  2.235307  6.171726
206384_at        CACNG3          10368 -2.7021032  9.688733 -5.859596
204743_at        TAGLN3          29114 -1.1742816 11.104861 -5.728398
218456_at       CAPRIN2          65981 -1.0452887 10.278001 -5.721286
                 P.Value   adj.P.Val        B
219073_s_at 2.368482e-07 0.005277689 6.721354
215320_at   9.627984e-07 0.010727019 5.486638
206384_at   2.274559e-06 0.012618956 4.724691
204743_at   3.270739e-06 0.012618956 4.401761
218456_at   3.335861e-06 0.012618956 4.384218
> nrow(topTable(fit2, 4, Inf, p.value = 0.05))
[1] 411

So the only changes are between the severe Alzheimer disease and the controls, and there are about 411 genes with an FDR of 5%.

ADD REPLY • link 5.0 years ago James W. MacDonald 67k