Trying to generate a pd file for ChAMP from idat files with no array or slide information in file names
1
0
Entering edit mode
minardsmitha ▴ 10
@minardsmitha-24162
Last seen 3 months ago
United States

I am trying to analyze data from GEO148000, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE148000 The IDAT file names from this project are like GSM4452048_Asthma_9_Grn.idat, GSM4452048_Asthma_9_Red.idat, GSM4452049_COPD_1_Grn.idat, GSM4452049_COPD_1_Red.idat,

I used illuminaio to get what I thought were the Array and Slide values for each file I am interested in then ran champ.load(arraytype="450K") However, it produced this error:

Find CSV Success
  Reading CSV File
  Your pd file contains NO Array(Sentrix_Position) information.
  Your pd file contains NO Slide(Sentrix_ID) information.
Error in champ.import(directory, arraytype = arraytype) : 
    Error Match between pd file and Green Channel IDAT file.

My Sample_Sheet.csv file is

Sample_Name,Sample_Plate,Sample_Group,Pool_ID,Project,Sample_Well,Array,Slide,Basename,filenames
COPD_1,NA,COPD_Former_Smoker,NA,Sputum,,R04C02,200770460001,GSM4452049_COPD_1,GSM4452049_COPD_1
COPD_2,NA,COPD_Former_Smoker,NA,Sputum,,R05C02,200770460001,GSM4452050_COPD_2,GSM4452050_COPD_2
COPD_3,NA,COPD_Former_Smoker,NA,Sputum,,R06C02,200770460001,GSM4452051_COPD_3,GSM4452051_COPD_3
COPD_4,NA,COPD_Former_Smoker,NA,Sputum,,R01C01,200770460005,GSM4452052_COPD_4,GSM4452052_COPD_4
COPD_5,NA,COPD_Former_Smoker,NA,Sputum,,R02C01,200770460005,GSM4452053_COPD_5,GSM4452053_COPD_5
COPD_6,NA,COPD_Former_Smoker,NA,Sputum,,R06C01,200770460005,GSM4452054_COPD_6,GSM4452054_COPD_6
COPD_7,NA,COPD_Former_Smoker,NA,Sputum,,R04C01,200770460005,GSM4452055_COPD_7,GSM4452055_COPD_7
COPD_8,NA,COPD_Former_Smoker,NA,Sputum,,R05C01,200770460005,GSM4452056_COPD_8,GSM4452056_COPD_8
COPD_9,NA,COPD_Former_Smoker,NA,Sputum,,R06C01,200770460005,GSM4452057_COPD_9,GSM4452057_COPD_9
COPD_10,NA,COPD_Former_Smoker,NA,Sputum,,R01C02,200770460005,GSM4452058_COPD_10,GSM4452058_COPD_10
Healthy_1,NA,Non-Smoker,NA,Sputum,,R02C02,200770460005,GSM4452059_Healthy_1,GSM4452059_Healthy_1
Healthy_2,NA,Non-Smoker,NA,Sputum,,R03C02,200770460005,GSM4452060_Healthy_2,GSM4452060_Healthy_2
Healthy_3,NA,Non-Smoker,NA,Sputum,,R04C02,200770460005,GSM4452061_Healthy_3,GSM4452061_Healthy_3
Healthy_4,NA,Non-Smoker,NA,Sputum,,R05C02,200770460005,GSM4452062_Healthy_4,GSM4452062_Healthy_4
Healthy_5,NA,Non-Smoker,NA,Sputum,,R06C02,200770460005,GSM4452063_Healthy_5,GSM4452063_Healthy_5
Healthy_9,NA,Non-Smoker,NA,Sputum,,R05C01,200770460006,GSM4452064_Healthy_9,GSM4452064_Healthy_9
Healthy_10,NA,Non-Smoker,NA,Sputum,,R06C01,200770460006,GSM4452065_Healthy_10,GSM4452065_Healthy_10

What is wrong with my file? Do the red and green idat files have to have the Array and slide values in the file names?

ChAMP • 1.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 21 minutes ago
United States

Here's what is in the example pd file that comes with ChAMP

> read.csv(paste0(system.file("extdata",package="ChAMPdata"), "/lung_test_set.csv"), skip = 7, header = TRUE)
  Sample_Name Sample_Plate Sample_Group Pool_ID Project Sample_Well Sentrix_ID
1          C1           NA            C      NA      NA         E09 7990895118
2          C2           NA            C      NA      NA         G09 7990895118
3          C3           NA            C      NA      NA         E02 9247377086
4          C4           NA            C      NA      NA         F02 9247377086
5          T1           NA            T      NA      NA         B09 7766130112
6          T2           NA            T      NA      NA         C09 7766130112
7          T3           NA            T      NA      NA         E08 7990895118
8          T4           NA            T      NA      NA         C09 7990895118
  Sentrix_Position
1           R03C02
2           R05C02
3           R01C01
4           R02C01
5           R06C01
6           R01C02
7           R01C01

And you can see that A), your error says you don't have either Sentrix_ID nor Sentrix_Position in your pd file and B), the example file does. You do have those data in columns 6 and 7, so if you simply do

names(pd)[6:7] <- c("Sentrix_Position","Sentrix_ID")

It should work.

ADD COMMENT

Login before adding your answer.

Traffic: 959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6