title | layout |
---|---|
Shell programming |
default |
Shell scripts are files containing shell commands (commonly with the
extension .sh
) To run a shell script called file.sh
, you would type
:
$ source ./file.sh
or :
$ . ./file.sh
Note that if you just typed file.sh
, the operating system will
generally have trouble finding the script (if file.sh
is not in a
directory included in the PATH
environment variable) and recognizing that it is
executable (if the -x
flag is not set for file.sh
).
To be sure that the operating system knows what shell to use
to interpret the script, the first line of the script should be
#!/bin/bash
(in the case that you're using the bash shell).
The best thing to do is to set file.sh
to be executable (i.e., to have the 'x' flag set), and you can can execute it simply with:
$ ./file.sh
You can define your own utilities by creating a shell function. This
allows you to automate things that are more complicated than you can do
with an
alias.
One nice thing about shell functions is that the shell
automatically takes care of function arguments for you. It places the
arguments given by the user into local variables in the function called
(in order): $1 $2 $3
etc. It also fills $#
with the number of
arguments given by the user. Here's an example of using arguments in a
function that saves me some typing when I want to copy a file to the SCF
filesystem:
function putscf() {
scp $1 [email protected]:$2
}
To use this function, I just do the following to copy unit1.pdf
from
the current directory on whatever non-SCF machine I'm on to the
directory ~/teaching/243
on SCF:
$ putscf unit1.pdf teaching/243/.
Often you'd want to put such functions in your .bashrc
file.
We can use if-then-else type syntax to control the flow of a shell
script. For an example, here is a shell function niceR()
that can be
used for nicing R jobs:
# niceR shortcut for nicing R jobs
# usage: niceR inputRfile outputRfile
# Author: Brian Caffo
# Date: 10/01/03
function niceR(){
# submits nice'd R jobs
if [ $# != "2" ]; then
echo "usage: niceR inputRfile outputfile"
elif [ -e "$2" ]; then
echo "$2 exists, I won't overwrite"
elif [ ! -e "$1" ]; then
echo "inputRfile $1 does not exist"
else
echo "running R on $1"
nice -n 19 R --no-save < $1 &> $2
fi
}
If the then
is on a separate line from the if
, you won't need the semicolon.
for loops in shell scripting are primarily designed for iterating through a set of files or directories. Here's an example:
$ for FILE in $(ls *.txt); do
> mv $file ${FILE/.txt/.R}
> # this syntax replaces .txt with .R in $FILE``
> done
Note that the >
prompt above occurs when the shell is expecting
further input.
Another use of for loops is automating file downloads:
# example of bash for loop and wget for downloading a collection of files on the web
# usage: ./forloopDownload.sh
# Author: Chris Paciorek
# Date: July 28, 2011
url='ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/grid/years'
types="tmin tmax"
for ((yr=1950; yr<=2017; yr++))
do
for type in ${types}
do
wget ${url}/${yr}.${type}
done
done
If the do
is on a separate line from the for
, you don't need the
semicolon seen in the previous example.
for loops are very useful for starting a series of jobs:
# example of bash for loop for starting jobs
# usage: ./forloopJobs.sh
# Author: Chris Paciorek
# Date: July 28, 2011
n=100
for(( it=1; it<=100; it++ ))
do
echo "n=$n; it=$it; source('base.R')" > tmp-$n-$it.R # create customized R file
R CMD BATCH --no-save tmp-$n-$it.R sim-n$n-it$it.Rout
done
# note that base.R should NOT set either 'n' or 'it'
That's just an illustration. In reality, in the case above you'd be better off passing arguments into the R code using commandArgs
or by setting environment variables that are read in the R code.
Note by default the separator when you're looping through elements of a variable will be a space (as above), but you can set it differently, for example:
$ IFS=:
$ types=tmin:tmax:pmin:pmax
$ for type in $types
> do
> echo $type
> done
tmin
tmax
pmin
pmax
We've covered most of what you are likely to need to know about the
shell. I tend to only use bash scripts for simple tasks that require
only a few lines of bash commands and limited control flow (i.e.,
conditional statements, loops). For more complicated OS tasks, it is
often preferable to use Python. (You can also do a fair amount of what
you need from within R using the system()
function.) This will enable
you to avoid dealing with a lot of shell programming syntax. But you'll
still need to know how to use standard UNIX commands/utilities, wildcards, and pipes to be
effective.