So, the entire code is running and I am getting a table output. The problem is that the numbers from the manufacturer are NOT getting converted into gene symbols— instead they are all labeled NA, which makes me suspect it is some kind of annotation issue. Ive put the code down below, and would be grateful if anyone could look at it and see if there are any errors.
Thanks.
#Preprocessing, Normalisation, Annotation, Differential Expression
#STEP 1 - Installation of Packages
#Checks to see if Bioconductor is on your computer. Installs Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
#Updates Bioconductor if necessary.
BiocManager::install()
#Checks if you have affy, annotate, tkwidgets and limma. If you don't have them, it installs them
if (!requireNamespace("affy", quietly = TRUE))
BiocManager::install("affy")
if (!requireNamespace("annotate", quietly = TRUE))
BiocManager::install("annotate")
if (!requireNamespace("tkWidgets", quietly = TRUE))
BiocManager::install("tkWidgets")
if (!requireNamespace("limma", quietly = TRUE))
BiocManager::install("limma")
#Install the Platform, Probe and Affymatrix Chippackages for the Platform you used.
#This information can be found at https://bioconductor.org/packages/3.11/data/annotation/
if (!requireNamespace("pd.hg.u95a", quietly = TRUE))
BiocManager::install("pd.hg.u95a") #Platform for Manufacturer HG_U95A
if (!requireNamespace("hgu95aprobe", quietly = TRUE))
BiocManager::install("hgu95aprobe") #Probe data for Manufacturer HG_U95A
if (!requireNamespace("hgu95a.db", quietly = TRUE))
BiocManager::install("hgu95a.db") #Affymatrix Chip data for Manufacturer HG_U95A
#Load Required Packages
library(BiocManager)
library(affy)
library(tkWidgets)
library(pd.hg.u95a)
library(hgu95aprobe)
library(hgu95a.db)
library(annotate)
library(limma)
#STEP 2 - Loading/Importing CEL Files and Creating PHENODATA.
#Make a phenodata document. Seperate it with TAB. Headers are Filename and Target.
#Label each Filename with Target value according to if it is control ot comparison
#After you create the phenodata.txt, move it to the folder with your celfiles. Add the path to pDataFile
#Phenodata is a .txt files with two tab seperated columns. The first one is the filename and the second
#is the target (Control/Experiment in this case A and C)
pDataFile <- "C:/Users/user/Desktop/Cel Files/phenodata.txt"
pData <- read.table(pDataFile,
row.names=1, header=TRUE, sep="\t")#Turns the data fram into phenodata.
#Load Celfiles. Add Phenodata to Celfiles in AffyBatch. Select the celfiles in the pop-up window.
Data <- ReadAffy(widget=TRUE,phenoData = pData)
#STEP 3 - PREPROCESSING DATA,
#Background Correction, Normilisation [Using RMA, probe specific background correction, summarizing the probe
#set values into one expression measure and, in some cases,a standard error for this summary.
#Strores data in expression set format.
eset <- affy::rma(Data)
#STEP 4 - Differential Expression using Limma
# Read in the sample names to targets
targets <- readTargets("C:/Users/user/Desktop/Cel Files/phenodata.txt")
# Set up the design matrix for running limma
Group <- factor(targets$Target, levels=c("A","C"))
design <- model.matrix(~Group)
colnames(design) <- c("A","AvsC")
# view the design matrix
design
# Extract the probe IDs from eset, which has all the normalized expression data
ID <- featureNames(eset)
# Apply the probe to gene symbol ID for each probe in eset
Symbol <- getSYMBOL(ID,"hgu95a.db")
# Add the gene symbols back into your eset
fData(eset) <- data.frame(Symbol=Symbol)
# Perform the lmfit function in limma to your expression data
fit <- lmFit(eset, design)
# Apply the eBayes method in limma
fit2 <- eBayes(fit)
# Use the topTable function to get the differential expression results
topTable(fit2, coef="AvsC", adjust="BH")
# Name the results to "Results", printing out as many genes as desired
Results = topTable(fit2, coef="AvsC", adjust="BH", number = 10000)
# Write the Results to a csv file
write.csv(Results, file = "A-Vs-C-Limma_Results.csv", row.names = TRUE)