Sex estimation problems in [estimateCellCounts]: methylation arrays
1
0
Entering edit mode
Grace • 0
@609f1b8d
Last seen 19 months ago
United Kingdom

Hello, I was wondering if you could provide any suggestions to approach this warning message (the last few sentences of the console)

I would massively appreciate any help, thank you!


# counts <- estimateCellCounts(rgSet, compositeCellType = "Blood", processMethod = "auto", probeSelect = "auto", cellTypes = c("CD8T","CD4T", "NK","Bcell","Mono","Gran"), referencePlatform = c("IlluminaHumanMethylation450k"), returnAll = FALSE, meanPlot = FALSE, verbose = TRUE)
# [estimateCellCounts] Combining user data with reference (flow sorted) data.

[estimateCellCounts] Processing user and reference data together.

[preprocessQuantile] Mapping to genome.
[preprocessQuantile] Fixing outliers.
[preprocessQuantile] Quantile normalizing.
[estimateCellCounts] Picking probes for composition estimation.

[estimateCellCounts] Estimating composition.

Warning messages:
1: In DataFrame(sampleNames = c(colnames(rgSet), colnames(referenceRGset)),  :
  'stringsAsFactors' is ignored
2: In .getSex(CN = CN, xIndex = xIndex, yIndex = yIndex, cutoff = cutoff) :
  An inconsistency was encountered while determining sex. One possibility is that only one sex is present. We recommend further checks, for example with the plotSex function.
methylationArrayAnalysis minfiData minfi MethylationArrayData MethylationArray • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 43 minutes ago
United States

The very last line of the warning says

An inconsistency was encountered while determining sex. One possibility is that only one sex is present. We recommend further checks, for example with the plotSex function.

Which is the suggestion for how you might approach the warning. What happens if you follow the recommendation? Do you only have one sex?

ADD COMMENT
0
Entering edit mode

Sorry, I'm a student so some of these questions may seem trivial.

I am putting into the console the following code:

plotSex(rgSet, id = NULL)

and of course, it is returning this error message (I understand I have input the wrong code, but I don't understand what to input for id as my column in my sample sheet is titled 'gender' with each patient marked as 'male' or 'female'.

Error in plotSex(rgSet, id = NULL) : all(c("predictedSex", "xMed", "yMed") %in% colnames(colData(object))) is not TRUE

if you had any guidance I would be very thankful, my apolgoies.

ADD REPLY
0
Entering edit mode

I understand that the id is text used as plotting symbols for y/xMed

ADD REPLY
0
Entering edit mode

There are two issues here. First, you get a warning about the sex check that is part of preprocessQuantile, and second, when you try to plot the predicted sex data, you get an error because you are missing one or more of the required items in your colData object.

The first question to ask yourself is do you really care about that? I mean, maybe you have all female subjects, or maybe there is a mixture, but in the end that might not be something you need to know or care about. If you do care, then you might want to move forward with figuring out what the problem is.

If you do care, do note that estimating the sex is a very simple process. Here is the code

> getSex
function (object = NULL, cutoff = -2) 
{
    .isGenomicOrStop(object)
    if (is(object, "GenomicMethylSet")) 
        CN <- getCN(object)
    if (is(object, "GenomicRatioSet")) 
        CN <- getCN(object)
    xIndex <- which(seqnames(object) == "chrX")
    yIndex <- which(seqnames(object) == "chrY")
    .getSex(CN = CN, xIndex = xIndex, yIndex = yIndex, cutoff = cutoff)
}
<bytecode: 0x000001b852c01b60>
<environment: namespace:minfi>

## the .getSex function is a hidden function that  you can access using the getAnywhere function

> getAnywhere(.getSex)
A single object matching '.getSex' was found
It was found in the following places
  namespace:minfi
with value

function (CN = NULL, xIndex = NULL, yIndex = NULL, cutoff = -2) 
{
    if (is.null(CN) || is.null(xIndex) || is.null(yIndex)) {
        stop("must provide CN, xIndex, and yIndex")
    }
    xMed <- colMedians(CN, rows = xIndex, na.rm = TRUE)
    yMed <- colMedians(CN, rows = yIndex, na.rm = TRUE)
    dd <- yMed - xMed
    k <- kmeans(dd, centers = c(min(dd), max(dd)))
    sex0 <- ifelse(dd < cutoff, "F", "M")
    sex0 <- .checkSex(sex0)
    sex1 <- ifelse(k$cluster == which.min(k$centers), "F", "M")
    sex1 <- .checkSex(sex1)
    if (!identical(sex0, sex1)) {
        warning("An inconsistency was encountered while determining sex. One ", 
            "possibility is that only one sex is present. We recommend ", 
            "further checks, for example with the plotSex function.")
    }
    df <- DataFrame(xMed = xMed, yMed = yMed, predictedSex = sex0)
    rownames(df) <- colnames(CN)
    df
}
<bytecode: 0x000001b852c2e540>
<environment: namespace:minfi>

The basic idea is to get the copy number data, subset that to the X and Y chromosomes, and then use kmeans to cluster into groups. You could do

debug(minfi:::.getSex) 
getSex(rgSet)

## notice that you have to specify the fully qualified function name using the 
## package name followed by three colon characters plus the pre-pended dot
## to debug a function that is hidden in the namespace

And step through the .getSex function and inspect the kmeans cluster values, as well as sex0 and sex1. But again, is this critical for your analysis, and do you have only one sex?

ADD REPLY

Login before adding your answer.

Traffic: 797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6