Hi,
I am entirely new to bioinformatics, therefore I sincerely apologize for asking a very naive question. I am interested in counting the number of genes per chromosome in the human genome. For this purpose I have downloaded the latest release of the 'gff' file from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/latest_assembly_versions/GCF_000001405.34_GRCh38.p8/ and have also sorted it using IGVtools from the Integrative Genomics Viewer. IGV also allows for exporting the features as 'bed' file, although there are a single base differences between the start positions of a gene in the 'gff' files and the 'bed' files generated in this way. While scanning through the 'bed' file I observed a lot of repetitions and overlaps. Now my questions are:
1) Is there any available tool for extracting non-overlapping genes from the bed file?
2) Is there any way to automate the selection of only one of the several overlapping genes?
3) Is the 'bed' file converted using the 'Export Features' tool in IGV reliable enough for further processing? What are the preferred alternatives?
I am sure such a trivial topic has already been discussed scores of times in your forum. I would appreciate if you could direct me to some such discussions. I sincerely thank you and apologize once again.
Your questions are not very interpretable.
1) Is there any available tool for extracting non-overlapping genes from the bed file?
What do you mean by 'non-overlapping genes'? There are any number of genes that overlap; they may be on different strands, or even on the same strand. Do you really want to remove genes for this arbitrary reason? Or are you confusing transcripts and genes? Do you instead want a single gene that represents all possible transcripts?
2) Is there any way to automate the selection of only one of the several overlapping genes?
Probably, but it depends on what you are after.
3) Is the 'bed' file converted using the 'Export Features' tool in IGV reliable enough for further processing? What are the preferred alternatives?
This is a Bioconductor support site, so questions about IGV aren't really on topic. But it's not really likely that you have to actually download a GFF file from NCBI to do what you want, as there are lots of resources in Bioconductor for genetic locations of human genes.