Thank you all for your quick responses.
So just to clarify and check if I am doing this right. I can normalize my data (using edgeR in my case) and then provide normalized cpm data to svaseq, right?
I am including my codes below, not sure if I am doing this right. If someone could give me some insight that would be nice. My data include 24 samples (12 of each genotype - QQ and qq - and each sex - F and M, having 6 biological replicates for each group - QQ.F, QQ.M, qq.F an qq.M). Not sure if I included the estimated surrogate variables in the model properly.
Also, I tried using the normal sva function instead of svaseq and it gave me very similar results. I heard however that the svaseq function does not really account for TMM normalization factors, which would be important for my analysis. Could someone clarify if this is true? Thank you!
My codes:
## QC, FILTERING AND NORMALIZATION USING EDGER ##
> d <- readDGE(targets, labels = targets$Label, comment.char = "#", header = TRUE,
columns = c(1,7))
> d <- calcNormFactors(d)
> cpm.values <- cpm(d$counts)
> above1cpm <- rowSums(cpm.values >= 1)
> d.filt <- d[above1cpm >= 6, , keep.lib.sizes = FALSE]
> dim(d.filt)
# [1] 11679 24
> d.filt <- calcNormFactors(d.filt)
> cpm.values.filt < cpm(d.filt$counts)
> head(cpm.values.filt)
(...)
Samples
Tags QQ.M.3 QQ.M.4 QQ.M.5 QQ.M.6
ENSBTAG00000020035 99.33248 182.23360 168.80417 94.63499
ENSBTAG00000011528 48.37818 55.00343 53.30658 62.32472
ENSBTAG00000012594 17.57829 32.66077 20.34709 18.87298
ENSBTAG00000018278 242.30761 244.77718 230.29837 241.66867
ENSBTAG00000021997 13.03218 25.75558 13.58795 19.61574
ENSBTAG00000008490 17.31310 22.38235 19.68511 23.73471
> logCPM <- cpm(d.filt, log = T)
> head(logCPM)
# (...)
# Samples
# Tags QQ.M.5 QQ.M.6
# ENSBTAG00000020035 7.526067 6.557956
# ENSBTAG00000011528 5.863254 5.955458
# ENSBTAG00000012594 4.474122 4.232455
# ENSBTAG00000018278 7.974202 7.910458
# ENSBTAG00000021997 3.891917 4.288119
# ENSBTAG00000008490 4.426424 4.562991
## SVA ##
> Group <- relevel(d.filt$samples$group, ref = "qq.M")
> mod <- model.matrix(~Group)
> colnames(mod)[-1] <- paste0(levels(Group)[-1], "vs", levels(Group)[1])
> head(mod)
# (Intercept) qq.Fvsqq.M QQ.Fvsqq.M QQ.Mvsqq.M
# 1 1 1 0 0
# 2 1 1 0 0
# 3 1 1 0 0
# 4 1 1 0 0
# 5 1 1 0 0
# 6 1 1 0 0
> cont.mod <- makeContrasts(QQ.FvsQQ.M = QQ.Fvsqq.M - QQ.Mvsqq.M,
qq.Fvsqq.M = qq.Fvsqq.M,
QQ.Fvsqq.F = QQ.Fvsqq.M - qq.Fvsqq.M,
QQ.Mvsqq.M = QQ.Mvsqq.M,
Interact = (QQ.Fvsqq.M - QQ.Mvsqq.M) - (qq.Fvsqq.M),
levels = mod)
> cont.mod
# Contrasts
# Levels QQ.FvsQQ.M qq.Fvsqq.M QQ.Fvsqq.F QQ.Mvsqq.M Interact
# Intercept 0 0 0 0 0
# qq.Fvsqq.M 0 1 -1 0 -1
# QQ.Fvsqq.M 1 0 1 0 1
# QQ.Mvsqq.M -1 0 0 1 -1
> svseq = svaseq(cpm.values.filt, mod, mod[,1])
# Number of significant surrogate variables is: 5
# Iteration (out of 5 ):1 2 3 4 5
> head(svseq$sv)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.05741827 -0.06602954 0.20009240 -0.02684960 0.23766642
[2,] 0.26314274 -0.66218564 0.19546081 0.38882480 -0.21056924
[3,] -0.06735629 0.01038846 0.08790005 0.02410775 0.34376335
[4,] 0.10120300 0.10301322 0.24170332 0.15549638 0.06207581
[5,] -0.17679364 -0.09238517 -0.03882105 0.15047943 0.23952769
[6,] 0.07944103 -0.01603642 -0.26944731 -0.16374470 -0.23344724
> mod.sv <- cbind(mod, svseq$sv)
> colnamesmod.sv)[5:9] <- paste0("sv", 1:5)
# (Intercept) qq.Fvsqq. M QQ.Fvsqq.M QQ.Mvsqq.M sv1 sv2 sv3 sv4 sv5
# 1 1 1 0 0 0.05741827 -0.06602954 0.20009240 -0.02684960 0.23766642
# 2 1 1 0 0 0.26314274 -0.66218564 0.19546081 0.38882480 -0.21056924
# 3 1 1 0 0 -0.06735629 0.01038846 0.08790005 0.02410775 0.34376335
# 4 1 1 0 0 0.10120300 0.10301322 0.24170332 0.15549638 0.06207581
# 5 1 1 0 0 -0.17679364 -0.09238517 -0.03882105 0.15047943 0.23952769
# 6 1 1 0 0 0.07944103 -0.01603642 -0.26944731 -0.16374470 -0.23344724
> cont.mod.sv <- makeContrasts(QQ.FvsQQ.M = QQ.Fvsqq.M - QQ.Mvsqq.M,
qq.Fvsqq.M = qq.Fvsqq.M,
QQ.Fvsqq.F = QQ.Fvsqq.M - qq.Fvsqq.M,
QQ.Mvsqq.M = QQ.Mvsqq.M,
Interact = (QQ.Fvsqq.M - QQ.Mvsqq.M) - (qq.Fvsqq.M),
levels = mod.sv)
> cont.mod.sv
# Contrasts
# Levels QQ.FvsQQ.M qq.Fvsqq.M QQ.Fvsqq.F QQ.Mvsqq.M Interact
# Intercept 0 0 0 0 0
# qq.Fvsqq.M 0 1 -1 0 -1
# QQ.Fvsqq.M 1 0 1 0 1
# QQ.Mvsqq.M -1 0 0 1 -1
# sv1 0 0 0 0 0
# sv2 0 0 0 0 0
# sv3 0 0 0 0 0
# sv4 0 0 0 0 0
# sv5 0 0 0 0 0
> d.filt.sv <- estimateGLMCommonDisp(d.filt, mod.sv, verbose = TRUE)
# Disp = 0.0268 , BCV = 0.1637 ###not sure if the BCV value is good. I am working on bovine. edgeR recommends
0.1 for identical strains and 0.4 for humans. Not sure if 0.1637 is okay for bovine.
> d.filt.sv <- estimateGLMTrendedDisp(d.filt, mod.sv)
> d.filt.sv <- estimateGLMTagwiseDisp(d.filt, mod.sv)
> plotBCVd.filt.sv)
> fit.edgeR.sv <- glmFitd.filt.sv, mod.sv)
> eR.QQ.FvsQQ.M.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 1])
> eR.qq.Fvsqq.M.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 2])
> eR.QQ.Fvsqq.F.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 3])
> eR.QQ.Mvsqq.M.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 4])
> eR.Interact.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 5])
# I am not sure on how to check on sv significance from here or if what I did above is right. Should I have the svs in the model another way? Any insight are welcome! Thank you again
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] sva_3.14.0 genefilter_1.50.0 mgcv_1.8-10 nlme_3.1-122 rtracklayer_1.28.10
[6] GenomicRanges_1.20.8 GenomeInfoDb_1.4.3 IRanges_2.2.9 S4Vectors_0.6.6 rgl_0.95.1429
[11] edgeR_3.10.5 limma_3.24.15 affycoretools_1.40.5 affy_1.46.1 Biobase_2.28.0
[16] BiocGenerics_0.14.0 WGCNA_1.48 RSQLite_1.0.0 DBI_0.3.1 fastcluster_1.1.16
[21] dynamicTreeCut_1.62
loaded via a namespace (and not attached):
[1] Category_2.34.2 bitops_1.0-6 matrixStats_0.50.1 doParallel_1.0.10
[5] RColorBrewer_1.1-2 gcrma_2.40.0 tools_3.2.2 affyio_1.36.0
[9] KernSmooth_2.23-15 rpart_4.1-10 Hmisc_3.17-1 colorspace_1.2-6
[13] nnet_7.3-11 gridExtra_2.0.0 GGally_0.5.0 DESeq2_1.8.2
[17] bit_1.1-12 preprocessCore_1.30.0 graph_1.46.0 ggbio_1.16.1
[21] caTools_1.17.1 scales_0.3.0 RBGL_1.44.0 stringr_1.0.0
[25] Rsamtools_1.20.5 foreign_0.8-66 R.utils_2.2.0 AnnotationForge_1.10.1
[29] XVector_0.8.0 dichromat_2.0-0 BSgenome_1.36.3 PFAM.db_3.1.2
[33] impute_1.42.0 BiocInstaller_1.18.5 GOstats_2.34.0 hwriter_1.3.2
[37] gtools_3.5.0 BiocParallel_1.2.22 acepack_1.3-3.3 R.oo_1.19.0
[41] VariantAnnotation_1.14.13 RCurl_1.95-4.7 magrittr_1.5 GO.db_3.1.2
[45] Formula_1.2-1 oligoClasses_1.30.0 futile.logger_1.4.1 Matrix_1.2-3
[49] Rcpp_0.12.2 munsell_0.4.2 R.methodsS3_1.7.0 stringi_1.0-1
[53] zlibbioc_1.14.0 gplots_2.17.0 plyr_1.8.3 grid_3.2.2
[57] gdata_2.17.0 ReportingTools_2.8.0 lattice_0.20-33 Biostrings_2.36.4
[61] splines_3.2.2 GenomicFeatures_1.20.6 annotate_1.46.1 locfit_1.5-9.1
[65] knitr_1.11 geneplotter_1.46.0 reshape2_1.4.1 codetools_0.2-14
[69] biomaRt_2.24.1 futile.options_1.0.0 XML_3.98-1.3 RcppArmadillo_0.6.400.2.2
[73] biovizBase_1.16.0 latticeExtra_0.6-26 lambda.r_1.1.7 foreach_1.4.3
[77] gtable_0.1.2 reshape_0.8.5 ggplot2_2.0.0 xtable_1.8-0
[81] ff_2.2-13 survival_2.38-3 OrganismDbi_1.10.0 iterators_1.0.8
[85] GenomicAlignments_1.4.2 AnnotationDbi_1.30.1 cluster_2.0.3 GSEABase_1.30.2
Would it make more sense to apply the svaseq command to normalized cpm data rather than the raw counts?
In our workflow, we suggest running svaseq on normalized counts, as we estimate size factors on raw counts. If raw counts are provided to svaseq, I think something correlated to sequencing depth should be among the first surrogate variables (I think Jeff confirmed this sometime).
Yes this is what I've observed previously, for example in these analyses:
https://github.com/jtleek/svaseq
That if you don't normalize first the first sv is usually representing library size. This is a potential way to normalize but currently unexplored, so I'd stick with normalize, then svaseq like Mike suggests
Thank you all for your quick responses.
So just to clarify and check if I am doing this right. I can normalize my data (using edgeR in my case) and then provide normalized cpm data to svaseq, right?
I am including my codes below, not sure if I am doing this right. If someone could give me some insight that would be nice. My data include 24 samples (12 of each genotype - QQ and qq - and each sex - F and M, having 6 biological replicates for each group - QQ.F, QQ.M, qq.F an qq.M). Not sure if I included the estimated surrogate variables in the model properly.
Also, I tried using the normal sva function instead of svaseq and it gave me very similar results. I heard however that the svaseq function does not really account for TMM normalization factors, which would be important for my analysis. Could someone clarify if this is true? Thank you!
My codes:
## QC, FILTERING AND NORMALIZATION USING EDGER ##
> d <- readDGE(targets, labels = targets$Label, comment.char = "#", header = TRUE,
columns = c(1,7))
> d <- calcNormFactors(d)
> cpm.values <- cpm(d$counts)
> above1cpm <- rowSums(cpm.values >= 1)
> d.filt <- d[above1cpm >= 6, , keep.lib.sizes = FALSE]
> dim(d.filt)
# [1] 11679 24
> d.filt <- calcNormFactors(d.filt)
> cpm.values.filt < cpm(d.filt$counts)
> head(cpm.values.filt)
(...)
Samples
Tags QQ.M.3 QQ.M.4 QQ.M.5 QQ.M.6
ENSBTAG00000020035 99.33248 182.23360 168.80417 94.63499
ENSBTAG00000011528 48.37818 55.00343 53.30658 62.32472
ENSBTAG00000012594 17.57829 32.66077 20.34709 18.87298
ENSBTAG00000018278 242.30761 244.77718 230.29837 241.66867
ENSBTAG00000021997 13.03218 25.75558 13.58795 19.61574
ENSBTAG00000008490 17.31310 22.38235 19.68511 23.73471
> logCPM <- cpm(d.filt, log = T)
> head(logCPM)
# (...)
# Samples
# Tags QQ.M.5 QQ.M.6
# ENSBTAG00000020035 7.526067 6.557956
# ENSBTAG00000011528 5.863254 5.955458
# ENSBTAG00000012594 4.474122 4.232455
# ENSBTAG00000018278 7.974202 7.910458
# ENSBTAG00000021997 3.891917 4.288119
# ENSBTAG00000008490 4.426424 4.562991
## SVA ##
> Group <- relevel(d.filt$samples$group, ref = "qq.M")
> mod <- model.matrix(~Group)
> colnames(mod)[-1] <- paste0(levels(Group)[-1], "vs", levels(Group)[1])
> head(mod)
# (Intercept) qq.Fvsqq.M QQ.Fvsqq.M QQ.Mvsqq.M
# 1 1 1 0 0
# 2 1 1 0 0
# 3 1 1 0 0
# 4 1 1 0 0
# 5 1 1 0 0
# 6 1 1 0 0
> cont.mod <- makeContrasts(QQ.FvsQQ.M = QQ.Fvsqq.M - QQ.Mvsqq.M,
qq.Fvsqq.M = qq.Fvsqq.M,
QQ.Fvsqq.F = QQ.Fvsqq.M - qq.Fvsqq.M,
QQ.Mvsqq.M = QQ.Mvsqq.M,
Interact = (QQ.Fvsqq.M - QQ.Mvsqq.M) - (qq.Fvsqq.M),
levels = mod)
> cont.mod
# Contrasts
# Levels QQ.FvsQQ.M qq.Fvsqq.M QQ.Fvsqq.F QQ.Mvsqq.M Interact
# Intercept 0 0 0 0 0
# qq.Fvsqq.M 0 1 -1 0 -1
# QQ.Fvsqq.M 1 0 1 0 1
# QQ.Mvsqq.M -1 0 0 1 -1
> svseq = svaseq(cpm.values.filt, mod, mod[,1])
# Number of significant surrogate variables is: 5
# Iteration (out of 5 ):1 2 3 4 5
> head(svseq$sv)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.05741827 -0.06602954 0.20009240 -0.02684960 0.23766642
[2,] 0.26314274 -0.66218564 0.19546081 0.38882480 -0.21056924
[3,] -0.06735629 0.01038846 0.08790005 0.02410775 0.34376335
[4,] 0.10120300 0.10301322 0.24170332 0.15549638 0.06207581
[5,] -0.17679364 -0.09238517 -0.03882105 0.15047943 0.23952769
[6,] 0.07944103 -0.01603642 -0.26944731 -0.16374470 -0.23344724
> mod.sv <- cbind(mod, svseq$sv)
> colnamesmod.sv)[5:9] <- paste0("sv", 1:5)
# (Intercept) qq.Fvsqq. M QQ.Fvsqq.M QQ.Mvsqq.M sv1 sv2 sv3 sv4 sv5
# 1 1 1 0 0 0.05741827 -0.06602954 0.20009240 -0.02684960 0.23766642
# 2 1 1 0 0 0.26314274 -0.66218564 0.19546081 0.38882480 -0.21056924
# 3 1 1 0 0 -0.06735629 0.01038846 0.08790005 0.02410775 0.34376335
# 4 1 1 0 0 0.10120300 0.10301322 0.24170332 0.15549638 0.06207581
# 5 1 1 0 0 -0.17679364 -0.09238517 -0.03882105 0.15047943 0.23952769
# 6 1 1 0 0 0.07944103 -0.01603642 -0.26944731 -0.16374470 -0.23344724
> cont.mod.sv <- makeContrasts(QQ.FvsQQ.M = QQ.Fvsqq.M - QQ.Mvsqq.M,
qq.Fvsqq.M = qq.Fvsqq.M,
QQ.Fvsqq.F = QQ.Fvsqq.M - qq.Fvsqq.M,
QQ.Mvsqq.M = QQ.Mvsqq.M,
Interact = (QQ.Fvsqq.M - QQ.Mvsqq.M) - (qq.Fvsqq.M),
levels = mod.sv)
> cont.mod.sv
# Contrasts
# Levels QQ.FvsQQ.M qq.Fvsqq.M QQ.Fvsqq.F QQ.Mvsqq.M Interact
# Intercept 0 0 0 0 0
# qq.Fvsqq.M 0 1 -1 0 -1
# QQ.Fvsqq.M 1 0 1 0 1
# QQ.Mvsqq.M -1 0 0 1 -1
# sv1 0 0 0 0 0
# sv2 0 0 0 0 0
# sv3 0 0 0 0 0
# sv4 0 0 0 0 0
# sv5 0 0 0 0 0
> d.filt.sv <- estimateGLMCommonDisp(d.filt, mod.sv, verbose = TRUE)
# Disp = 0.0268 , BCV = 0.1637 ###not sure if the BCV value is good. I am working on bovine. edgeR recommends
0.1 for identical strains and 0.4 for humans. Not sure if 0.1637 is okay for bovine.
> d.filt.sv <- estimateGLMTrendedDisp(d.filt, mod.sv)
> d.filt.sv <- estimateGLMTagwiseDisp(d.filt, mod.sv)
> plotBCVd.filt.sv)
> fit.edgeR.sv <- glmFitd.filt.sv, mod.sv)
> eR.QQ.FvsQQ.M.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 1])
> eR.qq.Fvsqq.M.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 2])
> eR.QQ.Fvsqq.F.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 3])
> eR.QQ.Mvsqq.M.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 4])
> eR.Interact.sv <- glmLRTfit.edgeR.sv, contrast = cont.mod.sv[ , 5])
# I am not sure on how to check on sv significance from here or if what I did above is right. Should I have the svs in the model another way? Any insight are welcome! Thank you again
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] sva_3.14.0 genefilter_1.50.0 mgcv_1.8-10 nlme_3.1-122 rtracklayer_1.28.10
[6] GenomicRanges_1.20.8 GenomeInfoDb_1.4.3 IRanges_2.2.9 S4Vectors_0.6.6 rgl_0.95.1429
[11] edgeR_3.10.5 limma_3.24.15 affycoretools_1.40.5 affy_1.46.1 Biobase_2.28.0
[16] BiocGenerics_0.14.0 WGCNA_1.48 RSQLite_1.0.0 DBI_0.3.1 fastcluster_1.1.16
[21] dynamicTreeCut_1.62
loaded via a namespace (and not attached):
[1] Category_2.34.2 bitops_1.0-6 matrixStats_0.50.1 doParallel_1.0.10
[5] RColorBrewer_1.1-2 gcrma_2.40.0 tools_3.2.2 affyio_1.36.0
[9] KernSmooth_2.23-15 rpart_4.1-10 Hmisc_3.17-1 colorspace_1.2-6
[13] nnet_7.3-11 gridExtra_2.0.0 GGally_0.5.0 DESeq2_1.8.2
[17] bit_1.1-12 preprocessCore_1.30.0 graph_1.46.0 ggbio_1.16.1
[21] caTools_1.17.1 scales_0.3.0 RBGL_1.44.0 stringr_1.0.0
[25] Rsamtools_1.20.5 foreign_0.8-66 R.utils_2.2.0 AnnotationForge_1.10.1
[29] XVector_0.8.0 dichromat_2.0-0 BSgenome_1.36.3 PFAM.db_3.1.2
[33] impute_1.42.0 BiocInstaller_1.18.5 GOstats_2.34.0 hwriter_1.3.2
[37] gtools_3.5.0 BiocParallel_1.2.22 acepack_1.3-3.3 R.oo_1.19.0
[41] VariantAnnotation_1.14.13 RCurl_1.95-4.7 magrittr_1.5 GO.db_3.1.2
[45] Formula_1.2-1 oligoClasses_1.30.0 futile.logger_1.4.1 Matrix_1.2-3
[49] Rcpp_0.12.2 munsell_0.4.2 R.methodsS3_1.7.0 stringi_1.0-1
[53] zlibbioc_1.14.0 gplots_2.17.0 plyr_1.8.3 grid_3.2.2
[57] gdata_2.17.0 ReportingTools_2.8.0 lattice_0.20-33 Biostrings_2.36.4
[61] splines_3.2.2 GenomicFeatures_1.20.6 annotate_1.46.1 locfit_1.5-9.1
[65] knitr_1.11 geneplotter_1.46.0 reshape2_1.4.1 codetools_0.2-14
[69] biomaRt_2.24.1 futile.options_1.0.0 XML_3.98-1.3 RcppArmadillo_0.6.400.2.2
[73] biovizBase_1.16.0 latticeExtra_0.6-26 lambda.r_1.1.7 foreach_1.4.3
[77] gtable_0.1.2 reshape_0.8.5 ggplot2_2.0.0 xtable_1.8-0
[81] ff_2.2-13 survival_2.38-3 OrganismDbi_1.10.0 iterators_1.0.8
[85] GenomicAlignments_1.4.2 AnnotationDbi_1.30.1 cluster_2.0.3 GSEABase_1.30.2
Point of clarification: when you say use normalized values in svaseq, are you meaning DESeq2's normalized counts but not edgeR's cpm() function? DESeq2's normalized counts are not integers, but they are unlogged and on the same scale as the original counts, where as normalization via cpm results in counts per million values, which could be logged or unlogged. We use edgeR in our pipelines, so I'm trying to figure out how to integrate svaseq as opposed to regular sva on voom or logCPM values. Would it be best to get unlogged CPM values, then transform to normalized counts using the median library size? Or would using unlogged CPM values be fine?
(I'm helping frodrgs2 here locally, but I've also been using this on other data)
Thanks,
Jenny