Question

ALL dataset, correct syntax using the exprs call?

0

Entering edit mode

detroit.drive • 0

@detroitdrive-16757

Last seen 5.9 years ago

Hello - I'm new to the bioconductor suite and still don't quite understand how to invoke the exprs class. For example, let's say I'm trying to find genes within the ALL dataset with the most and the least significant effect on age. I've been able to write snippets to find the min/max exprs (expression level) within the set...

> min(exprs(ALL))
[1] 1.984919
> max(exprs(ALL))
[1] 14.12657

...but cannot seem to write a snippet that maps that min/max value to the actual gene?

As for assessing how age effects exprs, this is tricky. Each patient obviously has an age (which is a column) and also an expression level (which as I understand is a function) that maps back to a specific gene. As a start - I thought I'd just plot them but again I'm having trouble with the syntax, I've tried the following from what I've read on the web but to no avail...

> plot(exprs,age(ALL))

and...

> plot(exprs~age(ALL))

Anyone see a glaring problem?

A concept that I cannot quite grasp is from the above command I can see all the column names, there are 21 total. However, when I perform the following command...

> dim(exprs(ALL))
[1] 12625   128

...I can see that I have 12625 genes (rows) and 128 patients (columns). Why don't I see 128 column names with colnames(pData(ALL)) command?

ALL bioconductor biobase exprs • 2.9k views

ADD COMMENT • link updated 6.7 years ago by Martin Morgan 25k • written 6.7 years ago by detroit.drive • 0

score 1 · Answer 1 · 2018-08-06

1

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 3 months ago

United States

The ALL data is at it's heart a MATRIX of 12625 features (genes) x 128 sample expression values. The pData describe each sample, so pData(ALL) is a data frame of 128 samples x 21 attributes measures on those samples. If one wanted to plot the relationship between expression of a single gene, and the age of the sample, across all samples, you could get the age

age <- pData(ALL)$age  # shortcut: ALL$age

the expression of a single feature (row), e.g., the gene in row 8435)

expression <- exprs(ALL)[8435,]

and plot these, e.g.,

plot(expression ~ age)

or perhaps

plot(expression ~ ALL$sex)

The feature is

> rownames(ALL)[8435]
[1] "38355_at"

Since

> annotation(ALL)
[1] "hgu95av2"

We can attach this package and map the affymetrix ID to a more familiar gene symbol or other identifier

> library(hgu95av2.db)
> mapIds(hgu95av2.db::hgu95av2.db, rownames(ALL)[8435], "SYMBOL", "PROBEID")
'select()' returned 1:1 mapping between keys and columns
38355_at 
 "DDX3Y" 
> mapIds(hgu95av2.db::hgu95av2.db, rownames(ALL)[8435], "GENENAME", "PROBEID")
'select()' returned 1:1 mapping between keys and columns
                       38355_at 
"DEAD-box helicase 3, Y-linked" 
> mapIds(hgu95av2.db::hgu95av2.db, rownames(ALL)[8435], "ENTREZID", "PROBEID")
'select()' returned 1:1 mapping between keys and columns
38355_at 
  "8653"

Simple mathematical calculations like 'min' and 'max' won't provide information on 'significant effect on age', for which you would need to fit a linear model using, e.g., the limma package. The limma vignette, available on the page just linked) is extremely thorough.

ADD COMMENT • link 6.7 years ago Martin Morgan 25k

0

Entering edit mode

Thanks Martin. I think my issue revolves around writing the proper syntax to call the required pData (data.frame) and exprs (matrix) elements. I think I'll need to develop a loop to help me plot the distribution of p-values for gene effect on age (for all 12625 genes).

ADD REPLY • link 6.7 years ago detroit.drive • 0

1

Entering edit mode

Trying to understand 12000 graphs does not sound like the right thing to do at all. Instead you'll want to regress expression on age for each gene, and develop a 'top table' ranking from most to least significant. Good luck!

ADD REPLY • link 6.7 years ago Martin Morgan 25k

0

Entering edit mode

Hi Martin and other Bioconductors,

I've updated my packages to include limma and developed the following model:

> ourData <- exprs(ALL)
> ourDataFactor <- ourDataFactor <- pData(ALL)$age 

> design <-model.matrix(~ ourDataFactor)
> fit <- lmFit(ourData, design)
> fit <- ebayes(fit)
> topTable(fit)

However, I keep getting the following error for the lmFit step (also not showing as a function with help(lmFit).

> fit <- lmFit(ourData, design)

Error in lmFit(ourData, design) : 

  row dimension of design doesn't match column dimension of data object

Why can't I complete this step?

ADD REPLY • link 6.7 years ago detroit.drive • 0

1

Entering edit mode

lmFit is in the limma package, so you need to say

library(limma)
fit <- lmFit(ourData, design)
...

or

fit <- limma::lmFit(ourData, design)

probably the former is better for interactive use.

ADD REPLY • link 6.7 years ago Martin Morgan 25k

0

Entering edit mode

Martin, I found this webpage you put together a few years ago (https://www.bioconductor.org/help/course-materials/2016/BiocIntro-May/A3_Statistical_Analysis.html). I'm wondering if its possible to adapt your code so it builds a linear model for age effect on gene expression in ALL. You have the following:

(fit <- lm(age ~ sex, pdata))

I'd like to switch sex with exprs(ALL).

ADD REPLY • link 6.7 years ago detroit.drive • 0

1

Entering edit mode

You're asking a different question, so ask a new top-level post 'building a linear model' or some such, after of course reading the excellent limma manual.

ADD REPLY • link 6.7 years ago Martin Morgan 25k