Hello I have a CSV/EXCEL file that has the following structure: The first 3 columns are the annotation of the genes. The first row contains the sample names (from the 4th cell), the second row contains the sample class (cancer/healthy from the 4th cell). This is an example of the input:
spotid IDREF IDENTIFIER V2 V3 V5 V16 V18 V19 rr tt class c.pre c.pre c.pre c.pre c.pre c.pre NM182762 a0 7A5 216.1879482 242.8363448 266.6281318 291.2514072 208.2681999 216.0300168 NM130786 a1 A1BG 278.0686005 350.6547207 348.8587161 337.9679966 309.8860592 346.6712924 NM130786 a2 A1BG 235.3235982 252.1576559 222.1535341 278.3976445 229.4160807 226.1739529 NM138932 a3 A1CF 218.3375826 255.9218462 283.0537841 221.9676333 245.6430974 208.2932626 NM014576 a4 A1CF 241.1702464 277.4401364 225.9655761 293.0314266 239.8472254 291.5985657
How to construct an ExpressionSet object and apply basic filter approach to remove noisy genes. Also how i can save the results to CSV file? I have tried this code: library("Biobase") library("genefilter")
edata = read.csv(file="C:\Users\Owner\Dropbox\LouiseShoweProjects\lung cancer mRNA and miRNA\c.prevsn.pre\c.prevsn.pre.csv", header=TRUE, sep=",")
ma.file <- read.table("C:\Users\Owner\Dropbox\LouiseShoweProjects\lung cancer mRNA and miRNA\c.prevsn.pre\c.prevsn.pre.csv", header=TRUE, sep="\t") # the last argument not neseccary eset <- new("ExpressionSet", exprs=as.matrix(ma.file)) data(eset.ExpressionSet)
Best Malik
You would get more input if you actually showed the code you have tried so far. Have you tried any of the steps described at https://bioconductor.org/packages/release/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf ?
Thanks for your reply. I have updated my post. I should say that I'm not an expert in R or Bioconductor- so I need the help to read the files correctly that allow me to use the different packages fro gene expressions.
Malik
You should be able to import the data with read.table and the settings more or less like the code you have. If it is truly an excel file you could go for the readxl package and read the excel file directly:
You can than convert to a matrix following the description found in chapter 4.1 in Expression Set Intro that I linked to earlier.
I would then depending on what you want to do with your data either use this https://bioconductor.org/packages/release/bioc/vignettes/genefilter/inst/doc/howtogenefilter.pdf to filter data.
If you are doing differential expression I would encourage you to have a look at Limma that is part of bioconductor with a manual that covers more or less everything. Chapter 9 in the user guide of limma starts an example analysis from a expressionset object and could perhaps be worth a read