Dear Biocoductor community,
I am trying to analyze a GEO dataset that is based on a dual channel platform, but I don't know how to split the two gene expressions. The dataset is the following one: GSE7339 . As you can see, each sample has signals of two channels: LKN and LKT.
I downloaded this dataset with the following commands:
listOfBiocPackages <- c("annotate", "GEOquery")
library("easypackages")
libraries(list.of.packages)
GSE_code <- "GSE7339"
gset <- getGEO(GSE_code, GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep(thisGEOplatform, attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
And now my gset@phenoData@data
variable contains the following fields:
str(gset@phenoData@data)
'data.frame': 100 obs. of 57 variables:
$ title : chr "Lung cancer 01" "Lung cancer 02" "Lung cancer 03" "Lung cancer 04" ...
$ geo_accession : chr "GSM176905" "GSM176906" "GSM176907" "GSM176908" ...
$ status : chr "Public on Mar 30 2007" "Public on Mar 30 2007" "Public on Mar 30 2007" "Public on Mar 30 2007" ...
$ submission_date : chr "Mar 22 2007" "Mar 22 2007" "Mar 22 2007" "Mar 22 2007" ...
$ last_update_date : chr "Mar 30 2007" "Mar 30 2007" "Mar 30 2007" "Mar 30 2007" ...
$ type : chr "RNA" "RNA" "RNA" "RNA" ...
$ channel_count : chr "2" "2" "2" "2" ...
$ source_name_ch1 : chr "LK01N" "LK02N" "LK03N" "LK04N" ...
$ organism_ch1 : chr "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" ...
$ characteristics_ch1 : chr "Non-tumorus tissue" "Non-tumorus tissue" "Non-tumorus tissue" "Non-tumorus tissue" ...
$ characteristics_ch1.1 : chr "Age:71" "Age:71" "Age:73" "Age:62" ...
$ characteristics_ch1.2 : chr "Gender:Male" "Gender:Male" "Gender:Male" "Gender:Female" ...
$ characteristics_ch1.3 : chr "Right lung" "Right lung" "Left lung" "Right lung" ...
$ characteristics_ch1.4 : chr "Tissue: resected Adenocarcinoma" "Tissue: resected Adenocarcinoma" "Tissue: resected Squamous cell carcinoma" "Tissue: resected Adenocarcinoma" ...
$ molecule_ch1 : chr "total RNA" "total RNA" "total RNA" "total RNA" ...
$ extract_protocol_ch1 : chr "TriZol procedure" "TriZol procedure" "TriZol procedure" "TriZol procedure" ...
$ label_ch1 : chr "Cy3" "Cy3" "Cy3" "Cy3" ...
$ label_protocol_ch1 : chr "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp? aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp? aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp? aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp? aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ ...
$ taxid_ch1 : chr "9606" "9606" "9606" "9606" ...
$ source_name_ch2 : chr "LK01T" "LK02T" "LK03T" "LK04T" ...
$ organism_ch2 : chr "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" ...
$ characteristics_ch2 : chr "tumor tissue" "tumor tissue" "tumor tissue" "tumor tissue" ...
$ characteristics_ch2.1 : chr "Histlogical Type:Adenocarcinoma" "Histlogical Type:Adenocarcinoma" "Histlogical Type:Squamous cell carcinoma" "Histlogical Type:Adenocarcinoma" ...
$ characteristics_ch2.2 : chr "Age:71" "Age:71" "Age:73" "Age:62" ...
$ characteristics_ch2.3 : chr "Gender:Male" "Gender:Male" "Gender:Male" "Gender:Female" ...
$ characteristics_ch2.4 : chr "Right lung cancer" "Right lung cancer" "Left lung cancer" "Right lung cancer" ...
$ characteristics_ch2.5 : chr "Stage:çV" "Stage:çV" "Stage:çU" "Stage:çV" ...
$ characteristics_ch2.6 : chr "LN Metastasis:Positive" "LN Metastasis:Positive" "LN Metastasis:Positive" "LN Metastasis:Positive" ...
$ characteristics_ch2.7 : chr "Postperative Tumor Recurrence:Recurrence" "Postperative Tumor Recurrence:Non-recurrence" "Postperative Tumor Recurrence:Non-recurrence" "Postperative Tumor Recurrence:Recurrence" ...
$ molecule_ch2 : chr "total RNA" "total RNA" "total RNA" "total RNA" ...
$ extract_protocol_ch2 : chr "TriZol procedure" "TriZol procedure" "TriZol procedure" "TriZol procedure" ...
$ label_ch2 : chr "Cy5" "Cy5" "Cy5" "Cy5" ...
$ label_protocol_ch2 : chr "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp? aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp? aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp? aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp? aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ ...
$ taxid_ch2 : chr "9606" "9606" "9606" "9606" ...
$ hyb_protocol : chr "Hybridized for 16 h at 42 C. Hybridization buffer and washing protocol was followed by the protocol supplied by"| __truncated__ "Hybridized for 16 h at 42 C. Hybridization buffer and washing protocol was followed by the protocol supplied by"| __truncated__ "Hybridized for 16 h at 42 C. Hybridization buffer and washing protocol was followed by the protocol supplied by"| __truncated__ "Hybridized for 16 h at 42 C. Hybridization buffer and washing protocol was followed by the protocol supplied by"| __truncated__ ...
$ scan_protocol : chr "ScanArray HT (PerkinElmer Japan Co., Ltd.) was used for scanning. Array images were analyzed with DNASIS Array "| __truncated__ "ScanArray HT (PerkinElmer Japan Co., Ltd.) was used for scanning. Array images were analyzed with DNASIS Array "| __truncated__ "ScanArray HT (PerkinElmer Japan Co., Ltd.) was used for scanning. Array images were analyzed with DNASIS Array "| __truncated__ "ScanArray HT (PerkinElmer Japan Co., Ltd.) was used for scanning. Array images were analyzed with DNASIS Array "| __truncated__ ...
$ description : chr "LK01" "LK02" "LK03" "LK04" ...
$ data_processing : chr "This data were analyzed by DNASIS array software(Hitachi Software Engineering), which converted the signal inte"| __truncated__ "This data were analyzed by DNASIS array software(Hitachi Software Engineering), which converted the signal inte"| __truncated__ "This data were analyzed by DNASIS array software(Hitachi Software Engineering), which converted the signal inte"| __truncated__ "This data were analyzed by DNASIS array software(Hitachi Software Engineering), which converted the signal inte"| __truncated__ ...
$ platform_id : chr "GPL1293" "GPL1293" "GPL1293" "GPL1293" ...
$ contact_name : chr "YASUMITSU,,MORIYA" "YASUMITSU,,MORIYA" "YASUMITSU,,MORIYA" "YASUMITSU,,MORIYA" ...
$ contact_department : chr "Thoracic surgery" "Thoracic surgery" "Thoracic surgery" "Thoracic surgery" ...
$ contact_institute : chr "Chiba University" "Chiba University" "Chiba University" "Chiba University" ...
$ contact_address : chr "1-8-1, Inohana" "1-8-1, Inohana" "1-8-1, Inohana" "1-8-1, Inohana" ...
$ contact_city : chr "Chuo-ku, Chiba" "Chuo-ku, Chiba" "Chuo-ku, Chiba" "Chuo-ku, Chiba" ...
$ contact_zip/postal_code : chr "2608670" "2608670" "2608670" "2608670" ...
$ contact_country : chr "Japan" "Japan" "Japan" "Japan" ...
$ supplementary_file : chr "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM176nnn/GSM176905/suppl/GSM176905.txt.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM176nnn/GSM176906/suppl/GSM176906.txt.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM176nnn/GSM176907/suppl/GSM176907.txt.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM176nnn/GSM176908/suppl/GSM176908.txt.gz" ...
$ data_row_count : chr "10368" "10368" "10368" "10368" ...
$ Age:ch1 : chr "71" "71" "73" "62" ...
$ Age:ch2 : chr "71" "71" "73" "62" ...
$ Gender:ch1 : chr "Male" "Male" "Male" "Female" ...
$ Gender:ch2 : chr "Male" "Male" "Male" "Female" ...
$ Histlogical Type:ch2 : chr "Adenocarcinoma" "Adenocarcinoma" "Squamous cell carcinoma" "Adenocarcinoma" ...
$ LN Metastasis:ch2 : chr "Positive" "Positive" "Positive" "Positive" ...
$ Postperative Tumor Recurrence:ch2: chr "Recurrence" "Non-recurrence" "Non-recurrence" "Recurrence" ...
$ Stage:ch2 : chr "çV" "çV" "çU" "çV" ...
$ Tissue:ch1 : chr "resected Adenocarcinoma" "resected Adenocarcinoma" "resected Squamous cell carcinoma" "resected Adenocarcinoma" ...
Do you have any suggestion on what commands I could use to split the dual channel dataset into two datasets?
Thanks
Hi Gordon Smyth Thanks for your reply and your help. I briefly checked the limma package but I was unable to understand how to use it for my scope (that is to obtain the gene expression for the controls and the gene expression for the tumor patients). Maybe you can help me understanding.
Considering the GSE7339 study design, is there a way for me to do so? Are there some functions that can let me have an R variable with gene expression of the controls (with probeset ID's on the rows and samples on the columns, or viceversa) and another R variable with gene expression of the tumor patients (with probeset ID's on the rows and samples on the columns, or viceversa)?
If now I run
exprs(gset)
, I can only see one table.Thank you!
Have you read the limma User's Guide? There is a chapter that explains it.
You seem to be assuming that this dataset contains controls and patients, but actually it doesn't. All the samples are from tumor patients.
Regarding your other questions, what you want to do is quite easy but unwise for the reasons I already explained. No one will provide code that produces the output you have asked for because it is such a bad thing to do. If you know better, then perhaps you can read a bit of documentation and extract out what you want yourself. You will need to use limma to read and process the raw data files. You cannot use
getGEO
.