The following is a large text file I write few lines of it, I need to extract some genes from here according to my input please write a small script. I specified Input-output at the bottom of the post. This problem could seem very tough but in brief, I just want to recognize the associate gene id of one version from another, because online sites do not have the repository of this species.
SDRB02000004.1 Genbank gene 6018 10396 . + . gene_id "TEA_012962"; transcript_id ""; gbkey "Gene"; gene_biotype "protein_coding"; locus_tag "TEA_012962";
SDRB02000004.1 Genbank transcript 6018 10396 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; gbkey "mRNA"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA";
SDRB02000004.1 Genbank exon 6018 6864 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "1";
SDRB02000232.1 Genbank stop_codon 994202 994204 . + 0 gene_id "TEA_014895"; transcript_id "gnl|WGS:SDRB|TEA016705.1"; gbkey "CDS"; locus_tag "TEA_014895"; orig_transcript_id "gnl|WGS:SDRB|TEA016705.1"; product "hypothetical protein"; protein_id "THG23623.1"; exon_number "19";
My input and desire output like the following -
Input (common gene-name) Output (special gene name)
TEA_012962 TEA014503.1
TEA_014895 TEA016705.1
You don't need these lines:
Instead you can import directly. Seems to me that didn't used to work? But whatever. This does work
Thanks for helping me out but I am sorry because at that time I was in a hurry so I visited several platforms and finally I did it manually with excel. I am a beginner and hope you will understand my concern.