Hi all,
I think I may have an impossible question here, but if by any chance any of you has used the EMBL-EBI tool 'Garfield' (https://www.ebi.ac.uk/birney-srv/GARFIELD/), could you please advise? Garfield is available as a bunch of shell scripts or as an R package, but I am having an issue specifically with the shell script to create a GARFIELD annotation. Thus, I am aware maybe this is not the best forum to ask for help, but since the authors don't really provide a support channel for this pretty amazing tool, I thought maybe some of you may have come across this issue?
To reiterate, I want to create a functional category for GARFIELD based on a list of SNPs I am interested. The pipeline comes with a script 'garfield_annotate_uk10k.sh' which should do the job, but this is not working. This script is a series of shell commands (and use of a binary file for which code is not available unfortunately). This script will convert an INPUT bed annotation file that informs chr and SNP coordinates, like:
head bed
chr start end
1 833927 833927
1 1344686 1344686
1 17006932 17006932
1 20498952 20498952
1 21775805 21775805
... to OUTPUT files per chromosome, which look like:
head chr1
833528 0
833927 1
1344686 0
1344892 0
17006932 1
20498952 0
This output files contains a basal list of SNPs (based on SNP data that comes with the pipeline, on garfield-data.tar.gz), with the second column containing 1/0 to inform if the SNP is present in my input/annotation/bed file, according to the genomic coordinates for that SNP (chr + position).
There are a few errors in the published script (e.g. it's missing "mkdir -p $OUTPUT_DIR" at the beginning; there's a lost "fi" at the end of the script), so the original script runs when I correct these. But it's still producing the wrong output: it's essentially just outputting the SNPs from the "basal list of SNPs" from garfield-data, and not attributing "1"s to the SNPs that are present in my input bed file. Current output looks like this:
head chr1
1 28590 ./garfield-data/tmp.chr1.bed
1 31719 0
1 52152 0
1 52167 0
1 52185 0
1 54392 0
1 54421 0
It's possible my input bed file is not in the right format (unfortunately the authors just say that they require a bed file, but don't say how it should be formatted, so I am using the typical bed file structure (chr start end). It's also possible that there's something wrong with the binary file that comes with the script "garfield_annotate_uk10k_helper". But given the presence of typos in the script, it's possible that there's a mistake in some shell commands which are causing the wrong output to be produced?
I would really appreciate if anyone could have a look at this!
Download files:
garfield-v2.tar.gz
garfield-data.tar.gz (83Gb when uncompressed, sorry!)
Example bed file (not sure this is exactly the format the software takes as input, because authors did not specify in the manual)
Thanks a lot!
Thanks so much. I am running into similar problems where all SNPs in my annotation have 0. I think the script does contain many bugs including one with regards to an awk command on line 50. It is really confusing to me. I guess I will just try your script and it might be easier.