Question

Request for the assistance to use MEDME

0

Entering edit mode

@prashantha-hebbar-kiradi-mu-mlsc-3159

Last seen 10.5 years ago

Dear Dr. Pelizzola, I am able to do MeDIP data analysis using MEDME, But still I have some doubts. I would like to clrify those with you. In our microarray experiment we have differentially labeled the samples which is Cy3 for input/WCE and Cy5 for immunoprecipitated DNA. Now the question is whether it is possible to use all the input across the array as the reference genome (Like your case a reference genome which is fully methylated). Now the second question is, at present we have 18 dual channel data and we have done the single channel normalization, now if we submit normalized average signal of all the G's across 18 array as a single Reference Genome(in your case it is artificially methylated DNA) and the 18 red channel data as test to MEDME whether the analysis will be possible? And will it make any sense? The third question is, just for sake of argument if you have 1000 bp window of a CpG island and 60 mer oligos in 100 bp gap is it sensible to use the average normalized mean of all the oligos belonging to that particular CpG island and then submitting it to MEDME? The fourth question is, after constructing the reference genome do we need to use the normalized log ratio of dual channel or the red channel? The fifth question is, when I perform single or double channel data normalization at the end I find some probes with no genome coordinates. This creates error when I submit the data to MEDME with NA values. So, If I remove those NA data will it create any problem in analysis result? The sixth question is about MEDMEextreme parameter. What are the scores used in MEDMEextreme corresponding to and what is the purpose of using it? The seventh question is whether should we fit the LIMMA model on MeDIP enriched data (i.e, lmFit()) or directly normalized data has to be taken to MEDME? In case you find time please clear my doubts. Thanking you in anticipation. Regards, Prashanta -----Original Message----- From: Mattia Pelizzola [mailto:mattia.pelizzola@yale.edu] Sent: Monday, December 01, 2008 9:12 PM To: Prashantha Hebbar Kiradi [MU-MLSC] Cc: Annette Molinaro; bioconductor at stat.math.ethz.ch; Swagata Halder [MU-MLSC] Subject: Re: FW: Request for the assistance to use MEDME Hi Prashantha, regarding the "pos" slot you have to provide a single number for each probe. This can be usually be the mid position. You can determine it from a matrix where you have start and stop columns using the rowMeans function. Otherwise, you can just use the start position especially if the probes are short in respect to the window size (60bp or so for the probe compared to 1000bp window size). In the specific case of you dataset, since you have the positions in the "chr3:071885975-071886019" format it is a little bit more complicated, and you could do as follow: data = read.table("test.txt", sep="\t", header=TRUE, row.names = 6, stringsAsFactors = FALSE) pos = strsplit(split=":", data$SystematicName, fixed=TRUE) chr = sapply(pos, function(x) x[1]) pos = sapply(pos, function(x) x[2]) pos = strsplit(split='-', pos, fixed=TRUE) pos = sapply(pos, function(x) mean(as.numeric(x))) now you have to build the matrix with the MeDIP enrichment data: logRmat = as.matrix(data$logFC) rownames(logRmat) = rownames(data) if you have more than one array, you can you the cbind or data.frame functions to build logRmat. For example: logRmat = data.frame(array1 = data1$logFC, array2 = data2$logFC, array2 = data2$logFC) rownames(logRmat) = rownames(data) finally you are ready to build the MEDMEset object (has for human, mmu for mouse): Mset = new("MEDMEset", pos=pos, chr=chr, logR = logRmat, organism="hsa") I am sorry but I did not understand the second point, as "weighting of MeDIP enrichment" and determination of probe position are not related. I hope I have clarified this above. Regarding the dataset to be used for fitting the model, there are two possibilities. On one hand you could generate artificially fully methylated DNA, as described in the paper. On the other hand, you could try to use directly the MeDIP data that you already have. The latter is expected to work for samples where high level of methylation is expected. This is for example true for human or mouse genomic DNA hybridized on chromosome-wide tiling arrays. Rather, if you array represent mostly promoter regions this is not a good idea as these regions are usually hypo-methylated. I hope this helps, mattia Prashantha Hebbar Kiradi [MU-MLSC] wrote: > > Dear Mattia, > > I am little confused. Though read.table is working on .txt, I have now > converted normalized .txt output to .gff output using script. We get > chromosomal assignment and probe position information in raw as well as > normalized data. We get position information as range parameter (Ex: > strat - end) but what I see in the example below that you have provided > single parameter for "pos" slot. Now I have again gone through the paper > and understood that you have done some more work (as explained in > "Weighting of MeDIP enrichment" section) to get positon in single > parameter. Can you please explian me little more about "Weighting of > MeDIP enrichment" and the care needs to be taken in this step? Is there > any command to this in R or we have to script for it? To implement MEDME > sucessfully in our lab as a third party, do we need to have refrence > genome where all the "C" are methylated (As explianed in the "Derivation > of fully methylated DNA" section of paper)? > > As you see in attachment files logRatio is not in matrix form. we get > only logFC value. So, Can you please tell me how should I proceed further? > > Thanking you and Dr. Molinaro in anticipation for the support. > > Regards, > > Prashantha > > > > > > -----Original Message----- > From: Mattia Pelizzola [mailto:mattia.pelizzola at yale.edu] > Sent: Wed 11/26/2008 8:25 PM > To: Prashantha Hebbar Kiradi [MU-MLSC] > Cc: Annette Molinaro > Subject: Re: FW: Request for the assistance to use MEDME > > Hi Prashantha, > > thanks for using MEDME. Unfortunately, the Agilent file format that you > are using is not supported and MEDME.readFiles can't be used. In this > case you have to use a little bit basic R functions to "manually" load > the data into R and create a MEDMEset object. > > As you can find in the documentation of the MEDMEset class (type > "class?MEDMEset" in R) there are several data slots (chr, pos, logR, > etc..). Few of these are actually mandatory to create a minimal MEDMEset > (chr, pos, logR and organims). You can find details on all of these in > the documentation page above, but I'll provide here some examples: > > > chr = c("chr1","chr1","chr5","chr6") > > pos = c(1000, 3000, 100, 5000) > > logR = cbind(c(2.2, 4.1, -0.5, 0.1),c(3, 0, 0.2, -1)) # a matrix > with N columns for N samples > > rownames(logR) = letters[1:4] # probes are named a,b,c,d here > > organism = "hsa" # for human or "mmu" for mouse > > finally you use these to initialize a new MEDMEset: > Mset = new('MEDMEset', chr=chr, pos=pos, logR=logR, organism=organism) > > Now, in you case you have to load all these data from your files. I > could not find chromosomal assignment and probe position in the header > of your file (e.g. "test_cis_reg.gff") so I guess these are available > (or can be exported from the Agilent software) in a separate file. Be > careful of matching chr and pos with the data in "test_cis_reg.gff" > respecting the probe order. > > You can load the data from datafile.txt with the function read.table: > > > data = read.table(file = "test_cis_reg.gff", sep="\t", header=TRUE, > row.names = NULL, stringsAsFactors = FALSE) > now you can extract the MedIP data with: > > MeDIPdata = data[,"logFC"] # assuming that this is the column > containing the MeDIP logRatio .. > and the probe names with: > > probeNames = data[,"ProbeName"] > > > In case you have many samples you have to repeat that many times (you > can use a "for" cycle to iterate on file names in case ..) and you can > put them together with the "cbind" function that I used in the example > above (be careful about the order of probeNames being consistent through > files!). Finally you assign probe names and you are set. > > You have to do something similar with another file to get probe chr and > pos. Then you can initialize the MEDMEset object. > > Let me know if you have any problem, > > mattia > > > > Annette Molinaro wrote: > > Hi Mattia - > > Can you follow up on this email? > > > > Many thanks, > > > > Annette > > > > > > > > > > > > *From:* Prashantha Hebbar Kiradi [MU-MLSC] > > [mailto:prashantha.hebbar at manipal.edu] > > *Sent:* Wednesday, November 26, 2008 1:41 AM > > *To:* annette.molinaro at yale.edu > > *Subject:* Request for the assistance to use MEDME > > > > > > > > > > > > Dear Dr. Molinaro, > > > > I am Prashantha from Manipal Life Sciences Center, Manipal University, > > India. Recently, I have gone through MEDME (Pelizzola et.al, Genome Res. > > 2008) and tried to implement the algorithm on our data. I used limma for > > the normalization. But now I am stucked in MEDME. So, I would like to > > discuss the issues with you. Incase you do not find the time, Please put > > me in touch with any of your lab mates. > > > > Following are my doubts: > > > > 1. The normalized data is in .tsv format and has 'Row', 'Col', > > 'ProbeUID', 'ControlType', 'ProbeName', 'GeneName', 'SystematicName', > > 'Description', 'logFC', 'AveExpr', 't', 'P.Value', 'adj.P.Val', 'B' as > > columns. As you know the GFF format has 9 fields (seqname, source, > > feature, start, end, score, strand, frame, group). So, for the "score" > > field in GFF format which field you have chosen from the fileds in > > normalized data. We use here Agilents image extraction software for > > feature extraction and I think, there is no way to get the data in GFF. > > > > 2. In order to test my data now I have created a dummy GFF by taking > > logFc as "score". When I use following command > >> MEDME.readFiles(path = getwd(), files ="test_cis_reg.gff", > > format="gff", organism="hsp") > > > > I will get following error message, > > > ---------------------------------------------------------------------- ------ ----------- > > Error in initialize(value, ...) : > > logR has to be a matrix with probeIds as rownames .. > > In addition: Warning message: > > In MEDME.readFiles(path = getwd(), files = "test_cis_reg.gff", format = > > "gff", : > > no unique probe names provided on column 3; the resulting dataset > > lacks rownames .. > > > ---------------------------------------------------------------------- ------ ------------- > > > > I request you to kindly help me to come out of these problems. > > > > Thank you in anticipation for the support. > > > > Sincerely, > > > > Mr. Prashantha Hebbar > > Bioinformatician > > Manipal Life Sciences Center, > > Manipal University, > > Manipal, India > > PIN:576104 > > Ph:+91-9886359007 > > > > ---------------------------------------------------------------------- -- > > > > This e-mail is privileged and confidential. If you are not the > > intended recipient please delete the message and notify the sender. > > Any views or opinions presented are solely those of the author. > > > > ---------------------------------------------------------------------- -- > > > -------------------------------------------------------------------- ---- > This e-mail is privileged and confidential. If you are not the > intended recipient please delete the message and notify the sender. > Any views or opinions presented are solely those of the author. > -------------------------------------------------------------------- ---- > ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:7}}

Microarray Normalization Organism probe limma MEDME cycle ASSIGN Microarray Normalization • 1.5k views

ADD COMMENT • link updated 16.1 years ago by Mattia Pelizzola ▴ 30 • written 16.1 years ago by Prashantha Hebbar Kiradi [MU-MLSC] ▴ 90

score 0 · Answer 1 · 2009-01-10

Hi Prashanta, in the order: 1, 2 and 4 - you can't use an individual channel to build the model, as an individual channel is not a measure of MeDIP enrichment . This is determined as the ratio between the MeDIP and the input channels. Also, the input channel signal by definition is not dependent on the methylation status of your genomic DNA. 3- I would not recommend any average across probes before using MEDME, I only suggest to use the smooth function to apply smoothing on the normalized data as described in the vignette. This would modify the robe intensity according to the intensity of the surrounding probes but would not reduce the number of data. 5- right, you have to remove the probes without genomic position as they are likely to be control probes of your arrays. 6- these extremes need to be set to avoid to predict infinite absolute methylation score. You can use the defaults or define them after visual inspection of the model-fit graph. They roughly correspond to upper and lower limit of the logistic model, please compare with the example and figure provided in the vignette. 7- The Limma library is usually use for normalization purposes or for determining differential expression (or in principle even for dfferential methylation). For this reason is not related with MEDME. Rather it can be used before MEDME (for normalization issues) and for additional analysis after using MEDME. As I should have mentioned in the previous emails please do not use anymore this address as it is not the official address for contacting me for MEDME issues. I am moving in the next weeks so this email address won't be anymore working. Rather use mattia DOT pelizzola AT gmail DOT com. Thanks, mattia Prashantha Hebbar wrote: > Dear Dr. Pelizzola, > > I am able to do MeDIP data analysis using MEDME, But still I have some > doubts. I would like to clrify those with you. > > In our microarray experiment we have differentially labeled the samples > which is Cy3 for input/WCE and Cy5 for immunoprecipitated DNA. Now the > question is whether it is possible to use all the input across the array as > the reference genome (Like your case a reference genome which is fully > methylated). > > Now the second question is, at present we have 18 dual channel data and we > have done the single channel normalization, now if we submit normalized > average signal of all the G's across 18 array as a single Reference > Genome(in your case it is artificially methylated DNA) and the 18 red > channel data as test to MEDME whether the analysis will be possible? And > will it make any sense? > > The third question is, just for sake of argument if you have 1000 bp window > of a CpG island and 60 mer oligos in 100 bp gap is it sensible to use the > average normalized mean of all the oligos belonging to that particular CpG > island and then submitting it to MEDME? > > The fourth question is, after constructing the reference genome do we need > to use the normalized log ratio of dual channel or the red channel? > > The fifth question is, when I perform single or double channel data > normalization at the end I find some probes with no genome coordinates. This > creates error when I submit the data to MEDME with NA values. So, If I > remove those NA data will it create any problem in analysis result? > > The sixth question is about MEDMEextreme parameter. What are the scores used > in MEDMEextreme corresponding to and what is the purpose of using it? > > The seventh question is whether should we fit the LIMMA model on MeDIP > enriched data (i.e, lmFit()) or directly normalized data has to be taken to > MEDME? > > In case you find time please clear my doubts. > > Thanking you in anticipation. > > Regards, > > Prashanta > > -----Original Message----- > From: Mattia Pelizzola [mailto:mattia.pelizzola at yale.edu] > Sent: Monday, December 01, 2008 9:12 PM > To: Prashantha Hebbar Kiradi [MU-MLSC] > Cc: Annette Molinaro; bioconductor at stat.math.ethz.ch; Swagata Halder > [MU-MLSC] > Subject: Re: FW: Request for the assistance to use MEDME > > Hi Prashantha, > > regarding the "pos" slot you have to provide a single number for each > probe. This can be usually be the mid position. You can determine it > from a matrix where you have start and stop columns using the rowMeans > function. Otherwise, you can just use the start position especially if > the probes are short in respect to the window size (60bp or so for the > probe compared to 1000bp window size). > In the specific case of you dataset, since you have the positions in the > "chr3:071885975-071886019" format it is a little bit more complicated, > and you could do as follow: > > data = read.table("test.txt", sep="\t", header=TRUE, row.names = 6, > stringsAsFactors = FALSE) > pos = strsplit(split=":", data$SystematicName, fixed=TRUE) > chr = sapply(pos, function(x) x[1]) > pos = sapply(pos, function(x) x[2]) > pos = strsplit(split='-', pos, fixed=TRUE) > pos = sapply(pos, function(x) mean(as.numeric(x))) > > now you have to build the matrix with the MeDIP enrichment data: > logRmat = as.matrix(data$logFC) > rownames(logRmat) = rownames(data) > if you have more than one array, you can you the cbind or data.frame > functions to build logRmat. For example: > logRmat = data.frame(array1 = data1$logFC, array2 = data2$logFC, array2 > = data2$logFC) > rownames(logRmat) = rownames(data) > > finally you are ready to build the MEDMEset object (has for human, mmu > for mouse): > Mset = new("MEDMEset", pos=pos, chr=chr, logR = logRmat, organism="hsa") > > > > I am sorry but I did not understand the second point, as "weighting of > MeDIP enrichment" and determination of probe position are not related. I > hope I have clarified this above. > > > Regarding the dataset to be used for fitting the model, there are two > possibilities. On one hand you could generate artificially fully > methylated DNA, as described in the paper. On the other hand, you could > try to use directly the MeDIP data that you already have. The latter is > expected to work for samples where high level of methylation is > expected. This is for example true for human or mouse genomic DNA > hybridized on chromosome-wide tiling arrays. Rather, if you array > represent mostly promoter regions this is not a good idea as these > regions are usually hypo-methylated. > > I hope this helps, > > mattia > > Prashantha Hebbar Kiradi [MU-MLSC] wrote: >> Dear Mattia, >> >> I am little confused. Though read.table is working on .txt, I have now >> converted normalized .txt output to .gff output using script. We get >> chromosomal assignment and probe position information in raw as well as >> normalized data. We get position information as range parameter (Ex: >> strat - end) but what I see in the example below that you have provided >> single parameter for "pos" slot. Now I have again gone through the paper >> and understood that you have done some more work (as explained in >> "Weighting of MeDIP enrichment" section) to get positon in single >> parameter. Can you please explian me little more about "Weighting of >> MeDIP enrichment" and the care needs to be taken in this step? Is there >> any command to this in R or we have to script for it? To implement MEDME >> sucessfully in our lab as a third party, do we need to have refrence >> genome where all the "C" are methylated (As explianed in the "Derivation >> of fully methylated DNA" section of paper)? >> >> As you see in attachment files logRatio is not in matrix form. we get >> only logFC value. So, Can you please tell me how should I proceed further? >> >> Thanking you and Dr. Molinaro in anticipation for the support. >> >> Regards, >> >> Prashantha >> >> >> >> >> >> -----Original Message----- >> From: Mattia Pelizzola [mailto:mattia.pelizzola at yale.edu] >> Sent: Wed 11/26/2008 8:25 PM >> To: Prashantha Hebbar Kiradi [MU-MLSC] >> Cc: Annette Molinaro >> Subject: Re: FW: Request for the assistance to use MEDME >> >> Hi Prashantha, >> >> thanks for using MEDME. Unfortunately, the Agilent file format that you >> are using is not supported and MEDME.readFiles can't be used. In this >> case you have to use a little bit basic R functions to "manually" load >> the data into R and create a MEDMEset object. >> >> As you can find in the documentation of the MEDMEset class (type >> "class?MEDMEset" in R) there are several data slots (chr, pos, logR, >> etc..). Few of these are actually mandatory to create a minimal MEDMEset >> (chr, pos, logR and organims). You can find details on all of these in >> the documentation page above, but I'll provide here some examples: >> >> > chr = c("chr1","chr1","chr5","chr6") >> > pos = c(1000, 3000, 100, 5000) >> > logR = cbind(c(2.2, 4.1, -0.5, 0.1),c(3, 0, 0.2, -1)) # a matrix >> with N columns for N samples >> > rownames(logR) = letters[1:4] # probes are named a,b,c,d here >> > organism = "hsa" # for human or "mmu" for mouse >> >> finally you use these to initialize a new MEDMEset: >> Mset = new('MEDMEset', chr=chr, pos=pos, logR=logR, organism=organism) >> >> Now, in you case you have to load all these data from your files. I >> could not find chromosomal assignment and probe position in the header >> of your file (e.g. "test_cis_reg.gff") so I guess these are available >> (or can be exported from the Agilent software) in a separate file. Be >> careful of matching chr and pos with the data in "test_cis_reg.gff" >> respecting the probe order. >> >> You can load the data from datafile.txt with the function read.table: >> >> > data = read.table(file = "test_cis_reg.gff", sep="\t", header=TRUE, >> row.names = NULL, stringsAsFactors = FALSE) >> now you can extract the MedIP data with: >> > MeDIPdata = data[,"logFC"] # assuming that this is the column >> containing the MeDIP logRatio .. >> and the probe names with: >> > probeNames = data[,"ProbeName"] >> >> >> In case you have many samples you have to repeat that many times (you >> can use a "for" cycle to iterate on file names in case ..) and you can >> put them together with the "cbind" function that I used in the example >> above (be careful about the order of probeNames being consistent through >> files!). Finally you assign probe names and you are set. >> >> You have to do something similar with another file to get probe chr and >> pos. Then you can initialize the MEDMEset object. >> >> Let me know if you have any problem, >> >> mattia >> >> >> >> Annette Molinaro wrote: >> > Hi Mattia - >> > Can you follow up on this email? >> > >> > Many thanks, >> > >> > Annette >> > >> > >> > >> > >> > >> > *From:* Prashantha Hebbar Kiradi [MU-MLSC] >> > [mailto:prashantha.hebbar at manipal.edu] >> > *Sent:* Wednesday, November 26, 2008 1:41 AM >> > *To:* annette.molinaro at yale.edu >> > *Subject:* Request for the assistance to use MEDME >> > >> > >> > >> > >> > >> > Dear Dr. Molinaro, >> > >> > I am Prashantha from Manipal Life Sciences Center, Manipal University, >> > India. Recently, I have gone through MEDME (Pelizzola et.al, Genome > Res. >> > 2008) and tried to implement the algorithm on our data. I used limma > for >> > the normalization. But now I am stucked in MEDME. So, I would like to >> > discuss the issues with you. Incase you do not find the time, Please > put >> > me in touch with any of your lab mates. >> > >> > Following are my doubts: >> > >> > 1. The normalized data is in .tsv format and has 'Row', 'Col', >> > 'ProbeUID', 'ControlType', 'ProbeName', 'GeneName', 'SystematicName', >> > 'Description', 'logFC', 'AveExpr', 't', 'P.Value', 'adj.P.Val', 'B' as >> > columns. As you know the GFF format has 9 fields (seqname, source, >> > feature, start, end, score, strand, frame, group). So, for the "score" >> > field in GFF format which field you have chosen from the fileds in >> > normalized data. We use here Agilents image extraction software for >> > feature extraction and I think, there is no way to get the data in GFF. >> > >> > 2. In order to test my data now I have created a dummy GFF by taking >> > logFc as "score". When I use following command >> >> MEDME.readFiles(path = getwd(), files ="test_cis_reg.gff", >> > format="gff", organism="hsp") >> > >> > I will get following error message, >> > >> > -------------------------------------------------------------------- -------- > ----------- >> > Error in initialize(value, ...) : >> > logR has to be a matrix with probeIds as rownames .. >> > In addition: Warning message: >> > In MEDME.readFiles(path = getwd(), files = "test_cis_reg.gff", format = >> > "gff", : >> > no unique probe names provided on column 3; the resulting dataset >> > lacks rownames .. >> > >> > -------------------------------------------------------------------- -------- > ------------- >> > >> > I request you to kindly help me to come out of these problems. >> > >> > Thank you in anticipation for the support. >> > >> > Sincerely, >> > >> > Mr. Prashantha Hebbar >> > Bioinformatician >> > Manipal Life Sciences Center, >> > Manipal University, >> > Manipal, India >> > PIN:576104 >> > Ph:+91-9886359007 >> > >> > > -------------------------------------------------------------------- ---- >> > >> > This e-mail is privileged and confidential. If you are not the >> > intended recipient please delete the message and notify the sender. >> > Any views or opinions presented are solely those of the author. >> > >> > > -------------------------------------------------------------------- ---- >> >> ------------------------------------------------------------------- ----- >> This e-mail is privileged and confidential. If you are not the >> intended recipient please delete the message and notify the sender. >> Any views or opinions presented are solely those of the author. >> ------------------------------------------------------------------- ----- >> > > > ###################################################################### > Attention: > This e-mail message is privileged and confidential. If you are not the > intended recipient please delete the message and notify the sender. > Any views or opinions presented are solely those of the author. > > ###################################################################### >