How to automate a process on the command line for 1500 genes?
3
0
Entering edit mode
ChIP-Tease • 0
@chip-tease-8339
Last seen 8.0 years ago
Germany

Hello everybody,

 

I have a problem i need some help for.

When I run a program on the command line for a single gene, it works fine:

program.sh genename_1 ../../../unchanged_file.txt ../../aa_bb_cc_dd_genename_1.gff genename_1.bed

 

But I need to run it about 1500 times on the command line and I don't know how to automate it.

 

I have a folder. Within this folder, there are different files with different endings. I only want to analyze the files which end with .gff.

All the .gff files are the same at the beginning aa_bb_cc_dd_ then the genename comes and finally an underscore and a number like here:

aa_bb_cc_dd_Genename_1.gff

aa_bb_cc_dd_Genename_2.gff

aa_bb_cc_dd_Genename_3.gff

aa_bb_cc_dd_otherGenename_1.gff

aa_bb_cc_dd_otherGenename_2.gff

 

There are more than 1500 combinations.

The code which does the job for one file looks like this:

 

program.sh genename_1 ../../../unchanged_file.txt ../../aa_bb_cc_dd_genename_1.gff genename_1.bed

 

Is there any way to do this for all 1500 .gff files in a few steps. I'm very sorry i cannot suggest anything, but i don't have too much experience with the command line. I could do something like this in R, but this doesn't help here a lot.

 

Thanks a lot, Alex

command line automation loop • 1.9k views
ADD COMMENT
1
Entering edit mode
Jim Hester ▴ 40
@jim-hester-7319
Last seen 4.5 years ago
United States

Note you can do this using R as well, the system() function can call any 'command line' program.  Remove the echo from the examples to actually call program.sh

gffs <- list.files(pattern="gff$", full.names = TRUE)

lapply(gffs, function(file) {

  gene <- gsub(".*aa_bb_cc_dd_(.*).gff$", "\\1", file)

  system(sprintf("echo program.sh %s ../../../unchanged_file.txt %s %s.bed", gene, file, gene))

})

But you can of course do a similar thing with bash

for file in *gff;do

  temp=${file##aa_bb_cc_dd_}

  gene=${temp%.gff}

  echo program.sh $file ../../../unchanged_file.txt $gene $file $gene.bed

done
ADD COMMENT
0
Entering edit mode

Thanks a lot, i didn't know that R can call command line programs. This will be very usefull for me.

I guess i will try both ways.

Thanks a lot again!

ADD REPLY
0
Entering edit mode
tangming2005 ▴ 200
@tangming2005-6754
Last seen 8 weeks ago
United States

something like this:

for file in *gtf

do

command $file

done
ADD COMMENT
0
Entering edit mode

 Thank you!

ADD REPLY
0
Entering edit mode
ChIP-Tease • 0
@chip-tease-8339
Last seen 8.0 years ago
Germany

Hello everybody,

i cannot really make this suggestion work.

for file in *gff;do

  gene=${${file##aa_bb_cc_dd_}%.gff}

  echo program.sh $file ../../../unchanged_file.txt $gene $file $gene.bed

done 

The problem seems to be this part:

gene=${${file##aa_bb_cc_dd_}%.gff}

I understand that the $ sign excludes what is written in the brackets from the output.

Meaning

${file##aa_bb_cc_dd_} on aa_bb_cc_dd_example_gene.gff will give me example_gene.gff

and

gene=${${file##aa_bb_cc_dd_}%.gff} on aa_bb_cc_dd_example_gene.gff should give me example_gene

But it tells me "bad substituation"

This probably means that some sign is wrong, but i cannot figure out what is wrong and i don't really know what to google for to find the rules to variable definition. Maybe someone has a link or knows what is wrong.

Thanks a lot, Alex

ADD COMMENT
0
Entering edit mode

I forgot bash doesn't support nested substitutions like zsh. In bash you have to do it in two steps,

for file in *gff;do

  temp=${file##aa_bb_cc_dd_}

  gene=${temp%.gff}

  echo program.sh $file ../../../unchanged_file.txt $gene $file $gene.bed

done

I have updated my answer appropriately, if it answers your question please mark it as accepted.

 

ADD REPLY
0
Entering edit mode

Hello Jim,

Thanks a lot!

I accepted it. I didn't know so far that i can accept answers. Also thanks for that hint

ADD REPLY

Login before adding your answer.

Traffic: 1005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6