Question

Creating an Expression Set with a csv file

1

Entering edit mode

Vani ▴ 20

@vani-8145

Last seen 8.9 years ago

United States

Hi,

Is it possible to create an expression set from 1. a text file containing an the results of an RMA normalization 2. a csv file containing the annotation for the specific affy chip pertaining to the normalized data. If yes please advise.

Thanks.

eset annotation • 8.8k views

ADD COMMENT • link updated 9.5 years ago by Diego Diez ▴ 760 • written 9.5 years ago by Vani ▴ 20

score 3 · Answer 1 · 2015-06-25

It is possible to construct an ExpressionSet from separate files. For example imagine you have three separate files, one for the expression data (e.g. from RMA as you mentioned), another with probe annotations (like your csv one) and another with sample annotations. For example:

#===========================
# phenoData - sample annotations.
# pdata.txt (comma separated)
# 
# id,treatment
# sample1,1
# sample2,1
# sample3,0
# sample4,0

#===========================
# featureData - probe annotations.
# fdata.txt (comma separated)
# 
# id,symbol
# probe1,gene1
# probe2,gene2
# probe3,gene3
# probe4,gene4

#===========================
# expression data
# 
# exprs.txt (tab delimited)
# sample1 sample2 sample3 sample4
# probe1 10 9 11 8
# probe2 10 11 2 1
# probe3 2 3 12 10
# probe4 1 3 2 1

I assumed the annotations to be comma separated value files and the expression data to be tab separated file. But this does not matter- it only changes the R function used to read it. You can do then something like this:

library(Biobase)

# phenoData:
tmp <- read.csv("pdata.txt", row.names = 1)
pdata <- AnnotatedDataFrame(tmp)

# featureData:
tmp <- read.csv("fdata.txt", row.names = 1)
fdata <- AnnotatedDataFrame(tmp)

# expression data:
tmp <- read.table("exprs.txt")
m <- as.matrix(tmp)

## create ExpressionSet object:
eset <- new("ExpressionSet", exprs = m, phenoData = pdata, featureData = fdata)

pData(eset)
fData(eset)
eset$treatment

The only requirement (I think) is that the sample names and feature names agree between the different files.

score 1 · Answer 2 · 2015-06-24

1

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 14 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Vani,

suppose for the first case of the txt file, the file has headers with the sample names and the rows are the probesets.

You can use then in your current directory:

rma.file <- read.table("name of your file.txt", header=TRUE, sep="\t") # the last argument not neseccary

eset <- new("ExpressionSet", exprs=as.matrix(rma.file))

In similar way, you can use read,csv for the csv file . You can check also in more detail the above functions, including read.delim

ADD COMMENT • link 9.5 years ago svlachavas ▴ 840

0

Entering edit mode

Thank you.

So if I wanted to add the csv file as a parameter in the expression: eset <- new("ExpressionSet", exprs=as.matrix(rma.file)), I would read in the csv file using the read.csv, then add it like this: eset <- new("ExpressionSet", exprs=as.matrix(rma.file), annotation = cvs.file)?

ADD REPLY • link 9.5 years ago Vani ▴ 20

0

Entering edit mode

Dear Vani,

please excuse me because i misread your second part. By the annotation of the specific affy chip you mean the pheno data object ? that is, the phenotype of your data ? if so, you could use :

read.csv to read the csv file and convert it into a data.frame object(can be made after read.csv with the function as.data.frame) and the if the object is called for instance dat2:

phenoData(eset) <- new("AnnotatedDataFrame", data=dat2)

ADD REPLY • link 9.5 years ago svlachavas ▴ 840

0

Entering edit mode

The csv file contains the annotation of the HuGene-1_0-st-v1 affymetrix chip. So basically it has all the info like enterzID, GeneSymbol etc. Would it still be considered to be a pheno data object?

ADD REPLY • link 9.5 years ago Vani ▴ 20

0

Entering edit mode

No, in my opinion there is no need to load it in r, as this is your annotation file, which you could use after your statistical analysis, to annotate your results. By "pheno data" object i meant the description of your samples: i.e disease, healthy, cancer, control etc. Moreover, although i have never used this specific platform of Affymetrix you could find useful the specific package of the specific HuGene platform (http://bioconductor.org/packages/release/data/annotation/html/pd.hugene.1.0.st.v1.html)