Question

How to find out the genes that changes in expression by using gene level expression data matrix?

0

Entering edit mode

Jurat Shahidin ▴ 80

@jurat-shahidin-9488

Last seen 4.9 years ago

Chicago, IL, USA

Hi, BioC community:

I am relatively new to microarray analysis and initially know about a few important packages like limma, affy. However, I get familiarized with the basic workflow of microarray analysis such as background correction, normalization. However, I have a gene-level expression data matrix was obtained using RMA, and I intend to run PCA for the purpose of dimension reduction for the features.

Essentially, I have gene-level expression data matrix (32830 features of rows, 735 genes of columns), and I have profile data of the target (735 rows and 6 columns). I used data from this source.

My attempt

after go through few microarray analysis tutorials on Bioconductor, I tried basic workflow as follow:

# load gene expression data matrix
HTA20_rma <- load("data/HTA20_RMA.RData")
# load sample annotation file (profile data of target variable)
pheno=read.csv("data/anoSC1_v11_nokey.csv",stringsAsFactors = FALSE)

## select top 3 genes
threesymbs=c("ANXA1","IFIT1","RPS24")

#get symbols for the above gene level expression matrix
library(org.Hs.eg.db)
symbol=as.vector(unlist(mget(gsub("_at","",rownames(eset_HTA20)), envir=org.Hs.egSYMBOL, ifnotfound=NA)))
mypreds=rownames(eset_HTA20)[match(threesymbs,symbol)]  #find row names corresponding to the 3 genes

objective:

I am not quite sure what would be the correct procedure after finished above workflow, seeking possible guidance.

I want to find out which genes have a possible correlation with target data profile. How can I find out the gene that changes in expression? How can I make feature selection for loaded gene expression data matrix? What would be a logical continuation workflow of my above attempt? Is there anyone possibly points me out how to conduct feature selection, PCA analysis on gene-level expression data matrix? Thanks in advance

limma microarray affy edger • 2.2k views

ADD COMMENT • link 5.7 years ago Jurat Shahidin ▴ 80

score 2 · Accepted Answer · 2019-06-15

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 12 hours ago

WEHI, Melbourne, Australia

You say that you have learned the limma package, and the aims you that you state are standard parts of any limma pipeline. So you can simply follow a standard limma case study. For example, the plotMDS() function produces a PCA plot or (more generally) an MDS plot.

BTW, the rows of your data correspond to Affymetrix probe-sets and the columns of your data correspond to RNA samples (not to genes as you say in your question).

ADD COMMENT • link 5.7 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks for your reply. you mention that using limma standard case study could answer my question, which specific one are you referring to? could you show a few steps of the coding workflows/examples for this? I haven't completely finished limma's user guide so still in the learning process. thanks

ADD REPLY • link 5.7 years ago Jurat Shahidin ▴ 80

1

Entering edit mode

Your questions are so standard that any case study would be relevant. Your data comes from Affymetrix microarrays, so any case study that uses Affymetrix or single-channel microarray data would be ideal, for example Sections 9.1-9.4 and 17.1-17.2 of the limma User's Guide. You don't need to read about two-color microarrays or RNA-seq. Your data is already background corrected and normalized, so you don't need to read about either of those things.

Making a PCA plot is as easy as

plotMDS(HTA20_rma, gene.selection="common")

Doing a differential expression analysis requires knowledge of what groups your RNA samples belong to and what comparisons are of interest to you. We don't know anything about your data, so we can't design an analysis for you. But every DE analysis involves the same steps: model.matrix to make the design matrix, then lmFit, eBayes and topTable.

You also need to learn more about Affymetrix microarrays, Affymetrix probe-sets and Bioconductor annotation packages for Affymetrix microarrays. At the moment, your code using org.Hs.eg.db is not correct and will not identify any gene symbols. So far, your workflow doesn't use any Bioconductor functionality at all.

If this is your first experience with microarray data, it might be a good idea to start with a small data example to get some experience rather than ploughing straight into a dataset with 735 arrays.

ADD REPLY • link 5.7 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks for your help. Could you point me out a simple workflow that includes Affymetrix microarrays, Affymetrix probe-sets and Bioconductor annotation packages for Affymetrix microarrays for the data that I am using? Would it be possible for you to provide simple reproducible data and examples to go through with this? Thanks again for your community help.

ADD REPLY • link 5.7 years ago Jurat Shahidin ▴ 80

1

Entering edit mode

I have already referred you to two simple reproducible case studies using Affymetrix microarrays with all the data and complete code provided. The second of the two case studies includes use of the appropriate Bioconductor annotation package.

A Google search for "Affymetrix limma" brings up other example workflows with more bells and whistles.

ADD REPLY • link 5.7 years ago Gordon Smyth 52k

1

Entering edit mode

Next to the very useful case studies in the limma user guide, a good read to get started is also this workflow at the F1000: https://f1000research.com/articles/5-1384 Be sure to first study the case studies, though!

ADD REPLY • link 5.7 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

@Guido Hooiveld :

Thanks for your response. If the loaded expression matrix as Affymetrix probe-sets in rows, RNA samples in columns, how can I factor out the matrix in gene expression level matrix instead? Because standard Affymetrix microarrays workflow starts with row cell files, while in my case I used preprocessed Affymetrix expression data.