Question

Minfi error in reading Basename column

0

Entering edit mode

parap • 0

@parap-8717

Last seen 9.4 years ago

United States

Hello all,

I am new to Methylation data analysis and having problem in importing data in minfi package-

After targets <-read.450k.sheet(baseDir, pattern = "csv$"), it reads the data sample sheet which has basic Sample_name, Slide etc, I get error in Basename column name ( which is not there in my sample sheet).

The sample sheets of my other datasets did not have any specific "Basename" column and run fine, but while importing this data it reads Basename column with missing path -

Basename Array Slide
1 E:/10003885002/10003885002_R01C01 R01C01 10003885002
2 character(0) R02C01 10003885002

Due to these character(0) the RgSet also gives error :

The following specified files do not exist:character(0)_Grn.idat

Can anyone please tell why I am getting character(0) under Basename?

can I manually add a Basename column with all paths in my sample sheet?

Do I need to add a Basename column in sample sheet, with path information for each sample? I am not able to find any information anywhere.

please help!

methylation minfi • 10k views

ADD COMMENT • link updated 8.5 years ago by SplittingInfinity ▴ 60 • written 9.7 years ago by parap • 0

0

Entering edit mode

ankita.chatterjee88 • 0

@ankitachatterjee88-8592

Last seen 9.4 years ago

United States

Thanks James for your reply...I am also stuck at this point. I ran 4 chips in 2sets...when I am trying to read the chip data for the first set...everything is going fine...but when I tried to read data from all the four chips...the basename is showing "Character(0)".

As you mentioned I individually checked for all the idat files and not a single one was missing. At this point please help me how should I proceed? Shall I prepare a target file in .txt format and read it in R?

ADD COMMENT • link 9.4 years ago ankita.chatterjee88 • 0

0

Entering edit mode

Hi ankita - Running into the same error, and pretty sure I have all IDAT files. Where you able to solve this by bypassing excel?

thanks!

ADD REPLY • link 9.4 years ago cristinalanata • 0

0

Entering edit mode

I resolved this issue by rechecking that the idat files match with the sample sheet as the basename column is populated automatically based on the idat files and sample sheet. I would suggest re-creating another sample sheet and test this by reading in smaller sample set. That worked for me eventually.

ADD REPLY • link 9.4 years ago parap • 0

0

Entering edit mode

Shicheng Guo • 0

@shicheng-guo-7973

Last seen 3.7 years ago

United States

I know the reason eventually, please see the following script which was used by ChAMP or some other package to read and creat SampleSheet.csv.

read.metharray.sheet()

function (base, pattern = "csv$", ignore.case = TRUE, recursive = TRUE,
    verbose = TRUE)
{
    readSheet <- function(file) {
        dataheader <- grep("^\\[DATA\\]", readLines(file), ignore.case = TRUE)
        if (length(dataheader) == 0)
            dataheader <- 0
        df <- read.csv(file, stringsAsFactor = FALSE, skip = dataheader)
        if (length(nam <- grep("Sentrix_Position", names(df),
            ignore.case = TRUE, value = TRUE)) == 1) {
            df$Array <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        if (length(nam <- grep("Array[\\._]ID", names(df), ignore.case = TRUE,
            value = TRUE)) == 1) {
            df$Array <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        if (!"Array" %in% names(df))
            warning(sprintf("Could not infer array name for file: %s",
                file))
        if (length(nam <- grep("Sentrix_ID", names(df), ignore.case = TRUE,
            value = TRUE)) == 1) {
            df$Slide <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        if (length(nam <- grep("Slide[\\._]ID", names(df), ignore.case = TRUE,
            value = TRUE)) == 1) {
            df$Slide <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        if (!"Slide" %in% names(df))
            warning(sprintf("Could not infer slide name for file: %s",
                file))
        else df[, "Slide"] <- as.character(df[, "Slide"])
        if (length(nam <- grep("Plate[\\._]ID", names(df), ignore.case = TRUE,
            value = TRUE)) == 1) {
            df$Plate <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        for (nam in c("Pool_ID", "Sample_Plate", "Sample_Well")) {
            if (nam %in% names(df)) {
                df[[nam]] <- as.character(df[[nam]])
            }
        }
        if (!is.null(df$Array)) {
            patterns <- sprintf("%s_%s_Grn.idat", df$Slide, df$Array)
            allfiles <- list.files(dirname(file), recursive = recursive,
                full.names = TRUE)
            basenames <- sapply(patterns, function(xx) grep(xx,
                allfiles, value = TRUE))
            names(basenames) <- NULL
            basenames <- sub("_Grn\\.idat", "", basenames, ignore.case = TRUE)
            df$Basename <- basenames
        }
        df
    }
    if (!all(file.exists(base)))
        stop("'base' does not exists")
    info <- file.info(base)
    if (!all(info$isdir) && !all(!info$isdir))
        stop("'base needs to be either directories or files")
    if (all(info$isdir)) {
        csvfiles <- list.files(base, recursive = recursive, pattern = pattern,
            ignore.case = ignore.case, full.names = TRUE)
        if (verbose) {
            message("[read.metharray.sheet] Found the following CSV files:\n")
            print(csvfiles)
        }
    }
    else csvfiles <- list.files(base, full.names = TRUE)
    dfs <- lapply(csvfiles, readSheet)
    namesUnion <- Reduce(union, lapply(dfs, names))
    df <- do.call(rbind, lapply(dfs, function(df) {
        newnames <- setdiff(namesUnion, names(df))
        newdf <- matrix(NA, ncol = length(newnames), nrow = nrow(df),
            dimnames = list(NULL, newnames))
        cbind(df, as.data.frame(newdf))
    }))
    df
}

ADD COMMENT • link 8.6 years ago Shicheng Guo • 0

0

Entering edit mode

SplittingInfinity ▴ 60

@splittinginfinity-11669

Last seen 3.9 years ago

Canada

read.450k() throws the character(0)_Grn.idat error because it couldn't find the file specified in the spreadsheet.

One of the most common reason is due to sample sheet format. Look for trailing space or illegal characters in your csv file.

ADD COMMENT • link 8.5 years ago SplittingInfinity ▴ 60

0

Entering edit mode

I am not sure how people solved the character(0) problem for the basename, but I am stuck on it for sometime now. I checked both the csv file and saw if all the IDAT's were present, I think there is no problem with these two. Could you all please guide me to fix this? Snippet of my code:

library(minfi)

baseDir <-"/home/idats"

targets=read.metharray.sheet(baseDir)

print(targets)

Output with last few columns:

sex status Array Slide Basename

1 M Normal 7420085 R06C02 character(0)

2 M Cancer 7420085 R06C02 character(0)

3 M Normal 7420117 R06C02 character(0)

4 M Cancer 7420117 R02C02 character(0)

ADD REPLY • link 8.4 years ago Bioinformatician_R ▴ 20

0

Entering edit mode

I moved the CSV file into the same folder as the IDAT files and made that my working directory and badabing! It worked. So just move the CSV file.

Also, I made my CSV file in excel and saved as a CSV file. If you open in a text editor you can hit return after the last item on the last row in the last column to add a carriage return and fix the "end of line" error.

ADD REPLY • link 6.3 years ago michelle.wedemeyer • 0

score 1 · Accepted Answer · 2015-08-28

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 4 hours ago

United States

When you post a question, please use the 'Question' type, rather than 'Tutorial'. As the name might suggest, a Tutorial post is intended to provide a tutorial, rather than ask a question.

The Basename column is generated programmatically, by looking at information in your SampleSheet.csv and then inferring the file name for the corresponding Grn.idat file. In your case, the expectation is that there will be a file

E:/10003885002/10003885002_R02C01_Grn.idat

and when it isn't found, you get a character(0) returned. So you are missing at least one idat file, so you need to figure out why you are missing raw data files.

ADD COMMENT • link 9.7 years ago James W. MacDonald 68k

1

Entering edit mode

Just to add my 2-bit of info in case someone come across this error. I kept getting similar error and when I looked I realized that the filenames were incorrectly rendered because originally I used excel to make my sample sheet. Excel treats the barcode like numbers and thus automatically sets it to scientific however this is a barcode and not number, so make make sure to change it to number with no decimal! works perfectly now after I saved it to csv.

ADD REPLY • link 8.1 years ago Ahdee ▴ 60

0

Entering edit mode

Thanks for the prompt response James!

Sure, I will chose right category while posting next time, thanks for correcting.

Thanks for the pointing the error, yes I checked the data and seems I received incomplete dataset, so wrong Basename column was generated. It removed few samples and it runs fine now.

ADD REPLY • link 9.7 years ago parap • 0

0

Entering edit mode

Hi James,

I am also encountering the same problem but all the IDAT files are available and for some reason it is not recognizing the pair IDAT file. Any thoughts?

thanks!

Cristina

ADD REPLY • link 9.4 years ago cristinalanata • 0