Jump to content

Need Help With A Shell Script


Darren Kitchen

Recommended Posts

I'm constantly processing files. I'll run a tool on one file, wait until it finishes, and run another tool on the same file. I can't start the second task until the fist ends and the only way to know it has finished is when the file size stops increasing. This is what I've got that checks to see that the date modified hasn't changed in the last 5 minutes.

white [ $[ $(stat -f "%m" "/tmp/$FILE") + 300 ] -gt $(date +%s) ]; do sleep 2; done; /tool2.sh /tmp/$FILE

This works great on a single file. Now the problem is I have a batch of files all being processed by the first tool. Each has a unique variable appended to the string. For example:

file0001--a

file0001--b

file0001--c

I don't want to start tool2.sh /tmp/file0001--* until the last file in the batch has finished processing. Again the only way to know is to check the date modified. Unfortunately they don't complete sequentially. Sometimes file0001--a will finish last. Sometimes file0001--c will finish last.

So my question is, how would I go about adapting the code above to check that all files in this series have completed processing and haven't been touched for the last 5 minutes?

I have tried wildcards with the stat command and it doesn't seem to work.

Also, note that the stat command above uses "%m" which is minutes on the version of stat that I have. It's a BSD box.

Thanks. I appreciate any help you can offer.

Link to comment
Share on other sites

Did you write these tools, or are these things you installed? Shouldn't the job be able to give you a return code when done, so if say is complete return code 0, if failed, return code X where X equates to some predefined error codes. Most systems have this sort of thing and might be possible to mod the code to tell you when they are done, then chekc the value, if is done ok, do step b, else alert error code.

edit: i might not be explainign myself very well, but something similar to this:

http://stackoverflow.com/questions/393845/...r-code-strategy

I know when we had work on the mainframe at my last job, that we checked each job for the return code, and anything over a 04 meant there was an abend or problem. How that works in linux, I'm not sure though.

edit: Found this

To check the exit status in a script, you may use the following pattern:

somecommand argument1 argument2

RETVAL=$?

[ $RETVAL -eq 0 ] && echo Success

[ $RETVAL -ne 0 ] && echo Failure

http://linuxcommando.blogspot.com/2008/03/...tatus-code.html

Edited by digip
Link to comment
Share on other sites

If you set a number input for how many files you will be producing on the script, you can use that as a "complete" when the counter gets that high.

So make the script take a decimal into a variable, So for example 5 if you have 5 files.

Do your timestamp check and if the timestamp hasn't changed in 5 minutes, set a "file[x]" complete. So you will have an array file[x].

When all file[1] - file[5] have a 1 as the data (meaning 1 = complete) then continue to do your other script.

So to recap because I suck at explaining. Have a variable you set to make an array of that size, (be careful because arrays will be 0 based so you will need to take the input number -1) check the files and if complete set the var of the array to 1. Make a function to check if all data in your array is a 1, then run the second script.

Edited by Mr-Protocol
Link to comment
Share on other sites

Use your timestamp checking code to make it change the variables in the array.

I'm not exactally sure what your whole process is. If you want to discuss in a more live environment we can skype, irc, or any other means of communication to discuss possible solutions? I can better explain verbally than in text.

Edit: Just found a command you might be able to use instead of timestamp.

lsof

combine with grep to determine by the grep results if the files are open and inuse...

http://www.netadmintools.com/html/lsof.man.html

Edited by Mr-Protocol
Link to comment
Share on other sites

The job does not return any codes sadly.

Maybe not by default, but have you tried modifying or creating a shell script to do the steps and adding check at the end of the shell script for the return codes or are you just typing big long strings of commands out at the terminal? I would say try using a script to automate the process and put checks in for each section or against each file. When all files are done and if all files return code = 0, then move to next step, else alert job borked and show error level.

If you could show us your process/code or what you are doing, we might be able to figure out a way to fix the problem without having to use timers to check for file activity.

Edited by digip
Link to comment
Share on other sites

Would this work?

start.sh:

for file in `ls /tmp/file00?--?`; do ./wait.sh $file &; done

wait.sh: (almost exactly what you had above)

white [ $[ $(stat -f "%m" "/tmp/$1") + 300 ] -gt $(date +%s) ]; do sleep 2; done; /tool2.sh /tmp/$1

Link to comment
Share on other sites

I find shell scripts arcane and difficult to understand, especially the one liners! I would use a delightful python script, for example:

import os,sys,glob,time

if len(sys.argv) != 3:
    print 'Usage %s: wildcard time_to_wait'
    raise SystemExit

stub = sys.argv[1]
wait = sys.argv[2]

try:
    wait = int(wait)
except ValueError:
    print 'Invalid wait time, use an integer number of seconds'
    raise SystemExit

done = False
while not done:
    now = time.time()
    modify_times = [(now-os.stat(filename).st_mtime) for filename in glob.glob(stub)]
    modify_times.sort() #ascending sort, so the first one is the most recent

    if modify_times[0] > wait:
        done = True
    else:
        #Don't kill the cpu by sitting in this loop forever, the absolute quickest that we could
        #exit this loop is wait-modify_times[0], so sleep for that long
        time.sleep(wait-modify_times[0])

It waits until all of the files matched by a wildcard (e.g file*) are older than a given number of seconds. You need to escape the wildcard so it doesn't get expanded by the os though, ./wait.py "file*" 300 for example

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...