Entering edit mode
Vinicius Henrique da Silva
▴
40
@vinicius-henrique-da-silva-6713
Last seen 23 months ago
Brazil
I would like to concatenate individual genomic intervals into common regions.
My input:
dfin <- "chr start end sample type 1 10 20 NE1 loss 1 5 15 NE2 gain 1 25 30 NE1 gain 2 40 50 NE1 loss 2 40 60 NE2 loss 3 20 30 NE1 gain" dfin <- read.table(text=dfin, header=T)
My expected output:
dfout <- "chr start end samples type 1 5 20 NE1-NE2 both 1 25 30 NE1 gain 2 40 60 NE1-NE2 loss 3 20 30 NE1 gain" dfout <- read.table(text=dfout, header=T)
The intervals in dfin will never overlap in the same animal, just between animals (columns sample and samples, respectively). The column type have two factors (loss and gain) in dfin and is expected to have three factors in dfout (loss, gain and both, which occur when the concatenated region in dfout was based on both loss and gain).
Any idea to deal with that?