Problem in subtracting ChIP Seq experiment with the input using ChIPSeqSpike
0
0
Entering edit mode
Ainul • 0
@Ainul-24569
Last seen 3.8 years ago

Hello everyone,

So I have ChIP seq data regarding to the two key TFs in mESC namely Oct4 and Sox2. For quantitative purposes, exogenous ChIP seq data (Dm3) is also added and I want to carry out the normalization using ChIPSeqSpike package. However, subtracting the experiment and the input data using inputSubtraction function returns to error. It consistently said that the chromosomes differ between my experiment and input data. The code and error are as follow:

cs <- inputSubtraction(cs) 
Reading the input file
Subtracting input to experiment
     Processing Oct4_KO
         Reading bigWig file
Error in FUN(X[[i]], ...) : 
  Chromosomes differ between input and experiment

In addition, in the traceback it is stated that the error might occur due to difference procedure in generating the bigwig files. Thing is, all the bigwig files are generated using the exactly same way from bedgraph. the example codes are as follow.

nice -n 5 genomeCoverageBed -bg -ibam ~/15912_mm9.q1.rmM.rmdup.s.bam -g ~/mm9.sizes > ~/mm9_KO3_bg_Oct4.bedgraph
nice -n 5 genomeCoverageBed -bg -ibam ~/15914_mm9.q1.rmM.rmdup.s.bam -g ~/mm9.sizes > ~/mm9_KO3_INPUT.bedgraph

for i in "mm9_KO3_bg_Oct4" "mm9_KO3_INPUT" ; do
echo ${i};
nice -n 5 bedGraphToBigWig ~/${i}.bedgraph ~/mm9.noHeader.sizes ~/${i}.bw;
done;

So can anyone please pinpoint where did I do wrong or how do I correct this error. all input is appreciated. thank you

chipseq ChIPSeqSpike • 2.3k views
ADD COMMENT
0
Entering edit mode

Are your files in fixed steps?

ADD REPLY
0
Entering edit mode

Hello Nicolas,

I am quite new with this, could you please elaborate what do you mean by fixed steps?

ADD REPLY
0
Entering edit mode

In order to perform the input subtraction you need that your wig files are in fixed steps format, check this page: https://genome.ucsc.edu/goldenPath/help/wiggle.html

ADD REPLY
0
Entering edit mode

no I don't think so. okay I will try to fix the data first

ADD REPLY
0
Entering edit mode

so I tried to generate the fixedstep bigwig file using

samtools mpileup -BQ0 ~/${i}_mm9.q1.rmM.rmdup.s.bam | perl -pe '($c, $start, undef, $depth) = split;if ($c ne $lastC || $start != $lastStart+1) {print "fixedStep chrom=$c start=$start step=1 span=1\n";}$_ = $depth."\n";($lastC, $lastStart) = ($c, $start);' | gzip -c > ~/${i}.wig.gz;

then

wigToBigWig ~/${i}.wig.gz /scratch3/yaqin/Zfp64_GFP_Chip/Marker/mm9.noHeader.sizes ~/${i}.bw;

but it still reproduce the error. May I know if the code is wrong or how can I properly generate fixedstep bw files?

ADD REPLY
0
Entering edit mode

The way I do it is to use bigWigToWig from UCSC and then you can try this R script:


wigFolder <- "" ## Folder containing the wig file to convert
wigFile <- "" ## Wig file to convert
nameWig <- ""  ##Name to write in the future converted wig file
binning <- ##Size of bins of the future converted wig file"
refile <- ##File path containg genome information (nb of chrom, names of chrom, length of chrom)



################
# FUNCTION
################


extract_start_end_vector <- function(last_position_value, length_current_chrom, position_vec, binding_values_vec)
{
    if(last_position_value <= length_current_chrom)
    {
        if(last_position_value == length_current_chrom)
        {
            position_star_vec <- position_vec[-length(position_vec)];
            position_end_vec <- position_vec[-1];
            corrected_bindingValues <- binding_values_vec[-length(position_vec)];
        }
        else
        {
            position_star_vec <- position_vec;
            position_end_vec <- c(position_vec[-1], length_current_chrom);
            corrected_bindingValues <- binding_values_vec;
        }

    }
    else
    {
        stop("Some values are referenced after the end of the chromosome, problem in the construction of your wig file\n\n");
    }

    return(list(position_star_vec, position_end_vec, corrected_bindingValues));
}


################



##########
# MAIN
##########

#Reading the wig file
cat("Reading wig... \n")
wigFile <- readLines(paste(wigFolder, wigFile, sep=""));
count <- 1;
fixedList <- list();


#Reading the reference file

ref_file <- readLines(refile);
chrom_names <- unlist(strsplit(ref_file[3], " "));
chrom_length <- as.numeric(unlist(strsplit(ref_file[2], " ")));

cat("Retrieving the track line and variableStep lines...\n");

indexTrack <- grep("track", wigFile);
indexVariableStepLines <-  grep("variableStep", wigFile);

insert.at <- function(a, pos, values){
    dots <- values
    stopifnot(length(dots)==length(pos))
    result <- vector("list",2*length(pos))
    result[c(TRUE,FALSE)] <- split(a, cumsum(seq_along(a) %in% pos))
    result[c(FALSE,TRUE)] <- dots
    unlist(result)
}



if(length(indexTrack) == 0) #If the line "track" is absent before the line "VariableStep", it is inserted. Both indexes are computed again after this operation 
{
    cat("Adding a track line\n");
    pos <- indexVariableStepLines;
    values <- rep("track type=wiggle_0", length(pos));
    wigFile <- insert.at(wigFile, pos,values);
    wigFile <- c("track type=wiggle_0", wigFile);
    wigFile <- wigFile[-length(wigFile)];

    indexTrack <- grep("track", wigFile);
    indexVariableStepLines <-  grep("variableStep", wigFile);
}

#changing the track character to adapt it to fixed format
trackLine <- paste("track type=wiggle_0 name=\"",  nameWig, "\"", sep="");

#changing the variableStep line to fit the fixedStep format
fixedStepLines <- sub("variableStep", "fixedStep", wigFile[indexVariableStepLines]);
fixedStepLines <- paste(fixedStepLines, " start=1 step=", binning, sep="");

#Retrieving the names of chrom in wig file
chrom_names_wig <- unlist(lapply(strsplit(unlist(lapply(strsplit(fixedStepLines,"chrom="),"[[",2))," start"),"[[",1));

cat("Retrieving the values and positions and transforming into fixed step...\n");

##Create interval of values
indexStartInterval <- indexVariableStepLines[1:(length(indexVariableStepLines)-1)] +1;
indexEndInterval <- indexTrack[2:length(indexTrack)] - 1;

# Adding the last interval
indexStartInterval <- c(indexStartInterval, indexVariableStepLines[length(indexVariableStepLines)]+1);
indexEndInterval <- c(indexEndInterval, length(wigFile));

matrixInterval <- cbind(indexStartInterval, indexEndInterval);

##Retrieve values in these intervals ie the values between the variableStep lines

#valuesList <- list()
valuesList <- apply(matrixInterval, MARGIN=1, function(x){ return(wigFile[x[1]:x[2]]);});

values_binned <- list();

##changing the values in variable steps into fixed steps

for(i in 1:length(valuesList)) 
{
    #split each element of x into a list. each element of the new list contain an index and a value
    current_chrom_name <- chrom_names_wig[i];
    cat("Treating chromosome: ", current_chrom_name, "\n");
    splittedIndexAndValue <- strsplit(valuesList[[i]], "\t");
    position_vec <- as.numeric(unlist(lapply(splittedIndexAndValue, "[[", 1)));
    binding_values_vec <- as.numeric(unlist(lapply(splittedIndexAndValue, "[[", 2)));
    index_ref_chrom_name <- which(chrom_names == current_chrom_name);
    length_current_chrom <- chrom_length[index_ref_chrom_name];
    last_position_value <- position_vec[length(position_vec)];

    if(position_vec[1] > 1)
    {
        position_vec <- c(1, position_vec);
        binding_values_vec <- c(0, binding_values_vec);
    }
    else if(position_vec[1] == 0)
    {
        stop("\n The first position is equal to zero, pb in the wig file\n");   
    }


    result <- extract_start_end_vector(last_position_value, length_current_chrom, position_vec, binding_values_vec);
    position_star_vec <- result[[1]];
    position_end_vec <- result[[2]];
    binding_values_vec <- result[[3]];

    length_vec <- position_end_vec - position_star_vec;

    if(length(length_vec) != length(binding_values_vec))
    {
        stop("\n The number of intervals should be equal to the number of values, pb in script, contact the developper\n");
    }

    valuesRepeated_vector <- unlist(mapply(function(x,y){return(rep(x,y));}, binding_values_vec, length_vec));

    #performing the bining of the values
    steps <- seq(1,length(valuesRepeated_vector),binning);

    values_binned[[i]] <- sapply(steps, function(step,orig)
            {
                nextStep <- step+binning-1;
                if(nextStep > length(orig))
                {
                    nextStep <- length(orig)
                }

                return(mean(orig[step:nextStep]))
            },valuesRepeated_vector)

}


## Computing the list to write in the output file
fixedVector <- vector();

if(length(values_binned) != length(fixedStepLines))
{
    stop("\n The number of blocks of values should be equal to the number of lines containing the key word 'variableStep', problem in your wig file\n");
}

cat("Organizing all information before writting the file...\n");

for(i in 1:length(values_binned))
{
    fixedVector <- append(fixedVector, trackLine);
    fixedVector <- append(fixedVector, fixedStepLines[i]);
    fixedVector <- append(fixedVector, unlist(values_binned[[i]]));
}


outputFile <- paste(wigFolder, "WIGfs_", nameWig, "_bin", binning, ".wig", sep="")

cat("Writting wig... \n");
write(fixedVector, file= outputFile, ncolumns=1);
ADD REPLY
0
Entering edit mode

Thanks Nicolas, I will try it first

ADD REPLY
0
Entering edit mode

Hi Nicolas, anyway for

refile <- ##File path containg genome information (nb of chrom, names of chrom, length of chrom)

can I use the chrome sizes file for that? because when I run it with my chrom sizes file as input, the chrom_length return to NA

ADD REPLY
0
Entering edit mode

what organism are you working with?

ADD REPLY
0
Entering edit mode

Hello, sorry for the late reply since I also working on another datasets

I am working with mouse genome and use mm9 to align my sequences data. so should it be the chromosome length of the mm9?

ADD REPLY
0
Entering edit mode

Keep the two first columns of the chromInfo table from ucsc: https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=1020948187_TTkypQ5F8q84nAyf551tPR3wiXs9

ADD REPLY

Login before adding your answer.

Traffic: 751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6