Question

cellHTS2 importa and analysing data from chemical compound screening

0

Entering edit mode

Andreia Fonseca ▴ 810

@andreia-fonseca-3796

Last seen 8.2 years ago

Dear Forum,

I am analysing data from a chemical screen (~3000 compounds spread over plates of 96 wells) which was performed in different cell lines, and within a cell line I am studying 3 conditions, WT, Mut1, Mut2. But no replicates are available for most of the plates. Each time a plate is performed is done across all cell lines and conditions.

So I came across cell2HTS. But I am encountering some problems.

1- It starts with uploading the data, my Platelist file is

Filename Plate Replicate

SC1_D1_WT.txt 1 1

SC1_D1_1.txt 1 1

SC1_D1_7.txt 1 1

SC2_D1_WT.txt 2 1

SC2_D1_1.txt 2 1

SC2_D1_7.txt 2 1

But I am getting an error, like it it doesn’t seem to read the first column.

x <- readPlateList("Platelist.txt",name=experimentName, path="/home/andreia/daIL7: found data in 8 x 12 (96 well) format.

Error in readPlateList("Platelist.txt", name = experimentName, path = "/home/andreia/data_HTS") :

The following rows are duplicated in the plateList table:

Plate Replicate Channel

2 1 1

3 1 1

5 2 1

6 2 1

so from this message it seems that the structure of the platelist is different than the explained in the manual?

2- Regarding normalization, as we have positive and negative controls located in the edges of the plate and in the middle of the plate, I was thinking on using NPI method. And then scoring, using score Replicates. However, my estimates using excel are different then the ones using cellHTS2, tried using the mean and stdev of the normalised plate(sample only) or I even tried to use the non-normalised and results are different. I estimated the score, considering the mean and the stdev including the well classified as other , and still the values are very different, can someone explain this. How are the mean and standard deviation estimated in the z score in scoreReplicates?

3- Figure 5 of the paper, how do I access the data of controls so that I can plot the distribution of the data?

Thanks in advance for your attention and your help.

session info()

R version 3.3.1 (2016-06-21)

Platform: x86_64-redhat-linux-gnu (64-bit)

Running under: CentOS release 6.8 (Final)

locale:

[1] LC_CTYPE=C LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8

[7] LC_PAPER=en_US.UTF-8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:

[1] grid parallel stats graphics grDevices utils datasets

[8] methods base

other attached packages:

[1] cellHTS2_2.38.0 locfit_1.5-9.1 hwriter_1.3.2

[4] vsn_3.42.3 splots_1.40.0 genefilter_1.56.0

[7] Biobase_2.34.0 BiocGenerics_0.20.0 RColorBrewer_1.1-2

loaded via a namespace (and not attached):

[1] pcaPP_1.9-61 Rcpp_0.12.7 prada_1.50.0

[4] DEoptimR_1.0-6 BiocInstaller_1.24.0 plyr_1.8.4

[7] bitops_1.0-6 tools_3.3.1 zlibbioc_1.20.0

[10] annotate_1.52.0 RSQLite_1.0.0 tibble_1.2

[13] preprocessCore_1.36.0 gtable_0.2.0 lattice_0.20-34

[16] graph_1.52.0 Matrix_1.2-7.1 Category_2.40.0

[19] DBI_0.5-1 mvtnorm_1.0-5 cluster_2.0.5

[22] S4Vectors_0.12.0 IRanges_2.8.1 stats4_3.3.1

[25] GSEABase_1.36.0 robustbase_0.92-6 rrcov_1.4-3

[28] AnnotationDbi_1.36.0 RBGL_1.50.0 XML_3.98-1.5

[31] survival_2.40-1 limma_3.30.3 ggplot2_2.2.0

[34] MASS_7.3-45 scales_0.4.1 splines_3.3.1

[37] assertthat_0.1 xtable_1.8-2 colorspace_1.3-0

[40] affy_1.52.0 RCurl_1.95-4.8 lazyeval_0.2.0

[43] munsell_0.4.3 affyio_1.44.0

With kind regards,

Andreia

cellhts2 cheminformatics • 1.4k views

ADD COMMENT • link updated 8.4 years ago by Joseph Barry ▴ 160 • written 8.4 years ago by Andreia Fonseca ▴ 810

score 1 · Answer 1 · 2016-11-28

Dear Andreia,

Thanks for your questions.

1) First please check that the platelist file is a tab delimited file, with \t separating fields rather than a space. This is detailed in the help to the readPlateList function. This might explain your read-in issue.

Second, it's unclear if you specify a "Condition" column in your platelist file. If you don't then cellHTS2 will likely consider the plates corresponding to extra conditions as duplicates (since they have the same plate number across conditions in your below example). Alternatively you could treat each condition as a separate screen and have 3 separate cellHTS2 objects. It's up to you how you'd like to structure the data.

2) Your calculations for NPI may be affected by issues in 1). I'm not sure which screening data you eventually managed to read in for this. Another thing which may affect your excel-based comparison is that cellHTS2 will remove empty or NA wells before calculating the mean of positive and negative controls (as it should). You can look in more detail at the cellHTS2 code for this using cellHTS2:::NPI to make sure you are calculating the scores in the same way.

For an explanation of how the z-score works in scoreReplicates, see the 'Details' section of the help ?scoreReplicates. In short, the robust z-score is used (median and MAD rather than mean and sd).

3) Control status is stored in fData(x) where x is the cellHTS2 object. To perform your own data crunching and plotting it might be helpful to convert the data to a table and use external tools such as ggplot2 and dplyr. One way to coerce the screen information to a table is to do a cbind of the feature data and assay data e.g.

y <- cbind(fData(x), value=Data(x)[, 1, 1])

Note here that the assay data is stored in a 3d array of dimensions Features (wells) x Samples (replicates) x Channels, so the above code pulls out replicate 1 and channel 1.

Alternatively you can stick to the plots produced by writeReport, which will show the distributions of values of your controls.

score 1 · Answer 2 · 2016-11-30

1

Entering edit mode

Joseph Barry ▴ 160

@joseph-barry-5000

Last seen 8.4 years ago

Dana-Farber Cancer Institute, Boston, U…

As long as the columns 'Filename', 'Plate', and 'Replicate' are there, I don't think it matters. I'd probably choose to make 'Condition' the fourth column. See the Details section of ?readPlateList for more information. 'Condition' is really an optional column, which is why it's not included in the provided example.

ADD COMMENT • link 8.4 years ago Joseph Barry ▴ 160

score 0 · Answer 3 · 2016-11-30

0

Entering edit mode

Andreia Fonseca ▴ 810

@andreia-fonseca-3796

Last seen 8.2 years ago

Dear Joseph,

thanks so much for your reply. The platelist file is tab delimited but I will check if there is ay bug regarding that. The condition column should be the first? It is not in the example provided.

With kind regards,

Andreia

ADD COMMENT • link 8.4 years ago Andreia Fonseca ▴ 810