Hi, BioC community:
I am relatively new to microarray analysis and initially know about a few important packages like limma
, affy
. However, I get familiarized with the basic workflow of microarray analysis such as background correction, normalization. However, I have a gene-level expression data matrix was obtained using RMA
, and I intend to run PCA
for the purpose of dimension reduction for the features.
Essentially, I have gene-level expression data matrix (32830 features of rows, 735 genes of columns), and I have profile data of the target (735 rows and 6 columns). I used data from this source.
My attempt
after go through few microarray analysis tutorials on Bioconductor, I tried basic workflow as follow:
# load gene expression data matrix
HTA20_rma <- load("data/HTA20_RMA.RData")
# load sample annotation file (profile data of target variable)
pheno=read.csv("data/anoSC1_v11_nokey.csv",stringsAsFactors = FALSE)
## select top 3 genes
threesymbs=c("ANXA1","IFIT1","RPS24")
#get symbols for the above gene level expression matrix
library(org.Hs.eg.db)
symbol=as.vector(unlist(mget(gsub("_at","",rownames(eset_HTA20)), envir=org.Hs.egSYMBOL, ifnotfound=NA)))
mypreds=rownames(eset_HTA20)[match(threesymbs,symbol)] #find row names corresponding to the 3 genes
objective:
I am not quite sure what would be the correct procedure after finished above workflow, seeking possible guidance.
I want to find out which genes have a possible correlation with target data profile. How can I find out the gene that changes in expression? How can I make feature selection for loaded gene expression data matrix? What would be a logical continuation workflow of my above attempt? Is there anyone possibly points me out how to conduct feature selection, PCA analysis on gene-level expression data matrix? Thanks in advance
Thanks for your reply. you mention that using limma standard case study could answer my question, which specific one are you referring to? could you show a few steps of the coding workflows/examples for this? I haven't completely finished limma's user guide so still in the learning process. thanks
Your questions are so standard that any case study would be relevant. Your data comes from Affymetrix microarrays, so any case study that uses Affymetrix or single-channel microarray data would be ideal, for example Sections 9.1-9.4 and 17.1-17.2 of the limma User's Guide. You don't need to read about two-color microarrays or RNA-seq. Your data is already background corrected and normalized, so you don't need to read about either of those things.
Making a PCA plot is as easy as
Doing a differential expression analysis requires knowledge of what groups your RNA samples belong to and what comparisons are of interest to you. We don't know anything about your data, so we can't design an analysis for you. But every DE analysis involves the same steps:
model.matrix
to make the design matrix, thenlmFit
,eBayes
andtopTable
.You also need to learn more about Affymetrix microarrays, Affymetrix probe-sets and Bioconductor annotation packages for Affymetrix microarrays. At the moment, your code using
org.Hs.eg.db
is not correct and will not identify any gene symbols. So far, your workflow doesn't use any Bioconductor functionality at all.If this is your first experience with microarray data, it might be a good idea to start with a small data example to get some experience rather than ploughing straight into a dataset with 735 arrays.
Thanks for your help. Could you point me out a simple workflow that includes Affymetrix microarrays, Affymetrix probe-sets and Bioconductor annotation packages for Affymetrix microarrays for the data that I am using? Would it be possible for you to provide simple reproducible data and examples to go through with this? Thanks again for your community help.
I have already referred you to two simple reproducible case studies using Affymetrix microarrays with all the data and complete code provided. The second of the two case studies includes use of the appropriate Bioconductor annotation package.
A Google search for "Affymetrix limma" brings up other example workflows with more bells and whistles.
Next to the very useful case studies in the
limma
user guide, a good read to get started is also this workflow at the F1000: https://f1000research.com/articles/5-1384 Be sure to first study the case studies, though!@Guido Hooiveld :
Thanks for your response. If the loaded expression matrix as Affymetrix probe-sets in rows, RNA samples in columns, how can I factor out the matrix in gene expression level matrix instead? Because standard Affymetrix microarrays workflow starts with row cell files, while in my case I used preprocessed Affymetrix expression data.
You simply follow the workflow from after rma has been run.