Question

Microarray Workflow - NA error

0

Entering edit mode

ammasakshay • 0

@ammasakshay-23464

Last seen 4.2 years ago

So, the entire code is running and I am getting a table output. The problem is that the numbers from the manufacturer are NOT getting converted into gene symbols— instead they are all labeled NA, which makes me suspect it is some kind of annotation issue. Ive put the code down below, and would be grateful if anyone could look at it and see if there are any errors.

Thanks.

#Preprocessing, Normalisation, Annotation, Differential Expression

#STEP 1 - Installation of Packages
#Checks to see if Bioconductor is on your computer. Installs Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
#Updates Bioconductor if necessary.
BiocManager::install()
#Checks if you have affy, annotate, tkwidgets and limma. If you don't have them, it installs them
if (!requireNamespace("affy", quietly = TRUE))
  BiocManager::install("affy")
if (!requireNamespace("annotate", quietly = TRUE))
  BiocManager::install("annotate")
if (!requireNamespace("tkWidgets", quietly = TRUE))
  BiocManager::install("tkWidgets")
if (!requireNamespace("limma", quietly = TRUE))
  BiocManager::install("limma")
#Install the Platform, Probe and Affymatrix Chippackages for the Platform you used.
#This information can be found at https://bioconductor.org/packages/3.11/data/annotation/
if (!requireNamespace("pd.hg.u95a", quietly = TRUE))
  BiocManager::install("pd.hg.u95a") #Platform for Manufacturer HG_U95A
if (!requireNamespace("hgu95aprobe", quietly = TRUE))
  BiocManager::install("hgu95aprobe") #Probe data for Manufacturer HG_U95A
if (!requireNamespace("hgu95a.db", quietly = TRUE))
  BiocManager::install("hgu95a.db") #Affymatrix Chip data for Manufacturer HG_U95A

#Load Required Packages
library(BiocManager)
library(affy)
library(tkWidgets)
library(pd.hg.u95a)
library(hgu95aprobe)
library(hgu95a.db)
library(annotate)
library(limma)

#STEP 2 - Loading/Importing CEL Files and Creating PHENODATA.
#Make a phenodata document. Seperate it with TAB. Headers are Filename and Target.
#Label each Filename with Target value according to if it is control ot comparison
#After you create the phenodata.txt, move it to the folder with your celfiles. Add the path to pDataFile
#Phenodata is a .txt files with two tab seperated columns. The first one is the filename and the second
#is the target (Control/Experiment in this case A and C)
pDataFile <- "C:/Users/user/Desktop/Cel Files/phenodata.txt"
pData <- read.table(pDataFile,
                    row.names=1, header=TRUE, sep="\t")#Turns the data fram into phenodata.
#Load Celfiles. Add Phenodata to Celfiles in AffyBatch. Select the celfiles in the pop-up window.
Data <- ReadAffy(widget=TRUE,phenoData = pData)

#STEP 3 - PREPROCESSING DATA, 
#Background Correction, Normilisation [Using RMA,  probe specific background correction, summarizing the probe
#set values into one expression measure and, in some cases,a standard error for this summary.
#Strores data in expression set format.
eset <- affy::rma(Data)

#STEP 4 - Differential Expression using Limma
# Read in the sample names to targets
targets <- readTargets("C:/Users/user/Desktop/Cel Files/phenodata.txt")
# Set up the design matrix for running limma
Group <- factor(targets$Target, levels=c("A","C"))
design <- model.matrix(~Group)
colnames(design) <- c("A","AvsC")
# view the design matrix
design
# Extract the probe IDs from eset, which has all the normalized expression data
ID <- featureNames(eset)
# Apply the probe to gene symbol ID for each probe in eset
Symbol <- getSYMBOL(ID,"hgu95a.db")
# Add the gene symbols back into your eset
fData(eset) <- data.frame(Symbol=Symbol)
# Perform the lmfit function in limma to your expression data
fit <- lmFit(eset, design)
# Apply the eBayes method in limma
fit2 <- eBayes(fit)
# Use the topTable function to get the differential expression results
topTable(fit2, coef="AvsC", adjust="BH")
# Name the results to "Results", printing out as many genes as desired 
Results = topTable(fit2, coef="AvsC", adjust="BH", number = 10000)
# Write the Results to a csv file
write.csv(Results, file = "A-Vs-C-Limma_Results.csv", row.names = TRUE)

limma affy • 906 views

ADD COMMENT • link updated 5.0 years ago by Kevin Blighe ★ 4.0k • written 5.0 years ago by ammasakshay • 0

score 1 · Answer 1 · 2020-05-02

It should work okay. A quick integrity check, which should return a value of TRUE, could be::

all(rownames(exprs(eset)) == featureNames(eset))

Also, check that ID and Symbol actually contain what you think.

Here is a reproducible example using another study and array, but following your code's same logic:

require(hgu133a.db)
require(Biobase)
require(GEOquery)
require(limma)

# load series and platform data from GEO
gset <- getGEO('GSE2990', GSEMatrix =TRUE, getGPL=FALSE)
gset <- gset[[1]]
gset <- gset[-grep('^AFFX', rownames(gset)),] # remove Affymetrix control probes

ID <- featureNames(gset)
Symbol <- annotate::getSYMBOL(ID, 'hgu133a.db')
fData(gset) <- data.frame(PROBEID = ID, SYMBOL = Symbol)
head(fData(gset))
            PROBEID SYMBOL
1007_s_at 1007_s_at   <NA>
1053_at     1053_at   RFC2
117_at       117_at  HSPA6
121_at       121_at   PAX8
1255_g_at 1255_g_at GUCA1A
1294_at     1294_at   <NA>

design <- model.matrix( ~ pData(gset)$"characteristics_ch1.1")
fit <- eBayes(lmFit(gset, design))
topTable(fit, coef = 2, adjust = 'BH')

                PROBEID  SYMBOL     logFC  AveExpr        t      P.Value
202954_at     202954_at   UBE2C 1.3897196 8.580698 6.402208 1.144826e-09
208079_s_at 208079_s_at   AURKA 1.2782354 6.485801 5.942086 1.298782e-08
202705_at     202705_at   CCNB2 1.1585107 7.644667 5.928133 1.395546e-08
222077_s_at 222077_s_at RACGAP1 1.1855079 6.807924 5.908297 1.545366e-08
202095_s_at 202095_s_at   BIRC5 1.3652800 7.074810 5.686088 4.770356e-08
203145_at     203145_at   SPAG5 0.8245364 6.880884 5.583982 7.931258e-08
201584_s_at 201584_s_at  DDX39A 0.7920797 8.624266 5.550280 9.367777e-08
211762_s_at 211762_s_at   KPNA2 1.0701913 9.223251 5.437382 1.628042e-07
214710_s_at 214710_s_at   CCNB1 1.2671521 7.153348 5.377040 2.180678e-07
204033_at     204033_at  TRIP13 1.2006301 7.461272 5.370253 2.253241e-07
               adj.P.Val         B
202954_at   2.543232e-05 11.397153
208079_s_at 8.582575e-05  9.194922
202705_at   8.582575e-05  9.129824
222077_s_at 8.582575e-05  9.037451
202095_s_at 2.119469e-04  8.017030
203145_at   2.936548e-04  7.557193
201584_s_at 2.972931e-04  7.406689
211762_s_at 4.520868e-04  6.907234
214710_s_at 5.005575e-04  6.643284
204033_at   5.005575e-04  6.613728

You can see that limma has, indeed, pulled in the data set to fData(gset)

Kevin