+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to Bash scripting

How to automate your computing (and life) with all the commands you already know.

Marek Šuppa
Ondrej Jariabka
Adrián Matejov

1 / 47

Why UNIX (scripting) for Data Science?

  • Much of the UNIX philosophy we've seen so far was focused on composability of small programs doing one thing, and doing it well
2 / 47

Why UNIX (scripting) for Data Science?

  • Much of the UNIX philosophy we've seen so far was focused on composability of small programs doing one thing, and doing it well

  • Creating larger compositions out of these "small primitives" is the obvious next step

3 / 47

Bash scripting

The easiest way of starting with automation

4 / 47

Bash scripts

  • Apart from being a terminal shell, shell/Bash is also a full-fledged programming language
5 / 47

Bash scripts

  • Apart from being a terminal shell, shell/Bash is also a full-fledged programming language

  • Bash scripts often start as oneliners in the shell and get moved to a specific "script"

6 / 47

Bash scripts

  • Apart from being a terminal shell, shell/Bash is also a full-fledged programming language

  • Bash scripts often start as oneliners in the shell and get moved to a specific "script"

  • Scripts can become programs like any other we've met so far

7 / 47

Bash scripts

Usually text files with the .sh (or .bash) file extension

8 / 47

Bash scripts

Usually text files with the .sh (or .bash) file extension

Let's start with a simple one, called grouplist.sh:

#!/bin/bash
cat /etc/group | cut -d: -f1 | sort
9 / 47

Bash scripts

Usually text files with the .sh (or .bash) file extension

Let's start with a simple one, called grouplist.sh:

#!/bin/bash
cat /etc/group | cut -d: -f1 | sort

The script can be executed by passing it to bash:

$ bash grouplist.sh
adm
adman
audio
backup
bifadm
bin
cdrom
crontab
daemon
... [ 55 lines omitted ] ...
10 / 47

Bash scripts: comments

Anything after the hash mark (#) is considered a comment:

$ echo "Hi there!" # This text will be ignored
Hi there!
11 / 47

Bash scripts: comments

Anything after the hash mark (#) is considered a comment:

$ echo "Hi there!" # This text will be ignored
Hi there!

The first line of grouplist.sh is actually a special kind of comment called "shebang".

12 / 47

Bash scripts: comments

Anything after the hash mark (#) is considered a comment:

$ echo "Hi there!" # This text will be ignored
Hi there!

The first line of grouplist.sh is actually a special kind of comment called "shebang".

Shebangs start with #!, followed by the absolute path to the file's interpreter.

#!/bin/bash
cat /etc/group | cut -d: -f1 | sort

If a file contains a shebang and has the executable (x) permission set, the shell will use it to run it.

$ chmod +x ./grouplist.sh
$ ./grouplist.sh | head -n 3
adm
adman
audio
13 / 47

Bash scripting: variables

In Bash (scripts or just shell), variables are defined using =

$ VAR=something
14 / 47

Bash scripting: variables

In Bash (scripts or just shell), variables are defined using =

$ VAR=something

And interpolated (expanded to the value they hold) by adding the $ prefix to their names:

$ VAR=something
$ echo $VAR
something
15 / 47

Bash scripting: variables

In Bash (scripts or just shell), variables are defined using =

$ VAR=something

And interpolated (expanded to the value they hold) by adding the $ prefix to their names:

$ VAR=something
$ echo $VAR
something

Many useful variables are pre-set out of the box:

# The name of the current user
$ echo $USER
mrshu
# The home (~) directory of the
# current user
$ echo $HOME
/home/mrshu
# The current working directory
$ echo $PWD
/tmp

You can see all of them by running env or printenv.

16 / 47

Bash scripting: variables II

By default, the variables are local to the process they are defined in.

$ cat varprinter.sh
#!/bin/bash
LOCALVAR=localized
echo LOCALVAR=$LOCALVAR
echo VAR=$VAR
$ chmod +x varprinter.sh
$ VAR=something
$ ./varprinter.sh
LOCALVAR=localized
VAR=

(Undefined variables default to empty strings.)

17 / 47

Bash scripting: variables II

By default, the variables are local to the process they are defined in.

$ cat varprinter.sh
#!/bin/bash
LOCALVAR=localized
echo LOCALVAR=$LOCALVAR
echo VAR=$VAR
$ chmod +x varprinter.sh
$ VAR=something
$ ./varprinter.sh
LOCALVAR=localized
VAR=

(Undefined variables default to empty strings.)

Using export, they can be exported to (or "inherited" by) child processes:

$ export VAR=exported
$ ./varprinter.sh
LOCALVAR=localized
VAR=exported
18 / 47

Bash scripting: variables III

Just as other programs, Bash scripts can receive arguments.

  • $#

    • the number of arguments
  • $0

    • the script that is being executed
  • $1, $2, ..., $9

    • first, second, up to ninth argument
$ cat argvprinter.sh
#!/bin/bash
echo "Running cmd: $0"
echo "Number of arguments: $#"
echo "First argument: $1"
$ ./argvprinter.sh
Running cmd: ./argvprinter.sh
Number of arguments: 0
First argument:
$ ./argvprinter.sh show
Running cmd: ./argvprinter.sh
Number of arguments: 1
First argument: show
$ bash argvprinter.sh show
Running cmd: argvprinter.sh
Number of arguments: 1
First argument: show
19 / 47

Bash: variable interpolation and quotes

Strings in apostrophes or single-quotes (such as 'some string') are printed verbatim.

$ echo 'I am logged in as $USER'
I am logged in as $USER
20 / 47

Bash: variable interpolation and quotes

Strings in apostrophes or single-quotes (such as 'some string') are printed verbatim.

$ echo 'I am logged in as $USER'
I am logged in as $USER

In double-quote strings (like "some string"), the variables are first interpolated.

$ echo "I am logged in as $USER"
I am logged in as mrshu
21 / 47

Bash: variable interpolation and quotes

Strings in apostrophes or single-quotes (such as 'some string') are printed verbatim.

$ echo 'I am logged in as $USER'
I am logged in as $USER

In double-quote strings (like "some string"), the variables are first interpolated.

$ echo "I am logged in as $USER"
I am logged in as mrshu

This can be used very nicely for string concatenation:

$ echo "I am $USER@$HOSTNAME"
I am mrshu@davos
22 / 47

Bash: quotes and parameters

When executing a shell command, the parsing process goes as follows:

  1. Shell operators (>, <, |, ...)
  2. Wildcards (*, ?, ...)
  3. Variables (like $VAR)
$ echo I like *
I like argvprinter.sh grouplist.sh
$ echo '*'
I like *
$ echo Nice $VAR emoji! >*
-bash: *: ambiguous redirect
$ echo 'Nice $VAR emoji! >*'
Nice $VAR emoji! >*
$ VAR=smiling
$ echo "Nice $VAR emoji! >*"
Nice smiling emoji! >*
23 / 47

Bash scripting: command expansion

Output (written to stdout or stderr) of any command can be saved to a variable.

This concept is called "command expansion" and can be done either via $() or backticks

$ a=$(echo 'hello' | tr '[:lower:]' '[:upper:]')
$ b=$(echo 'WORLD' | tr '[:upper:]' '[:lower:]')
$ echo "$a, $b"
HELLO, world

Example from http://www.compciv.org/topics/bash/variables-and-substitution/

24 / 47

Bash scripting: exist code

On finish, each program returns a so called "exit code".

In Bash, it is stored in the $? variable.

$ grep cmd argvprinter.sh
echo "Running cmd: $0"
$ echo $?
0

Exit code 0 generally denotes EXIT_SUCCESS: the program finished successfully.

25 / 47

Bash scripting: exist code

On finish, each program returns a so called "exit code".

In Bash, it is stored in the $? variable.

$ grep cmd argvprinter.sh
echo "Running cmd: $0"
$ echo $?
0

Exit code 0 generally denotes EXIT_SUCCESS: the program finished successfully.


A non-zero exit code generally means that some error happened.

The most general one is EXIT_FAILURE, which is set to 1 on Unix systems.

$ grep non-existent-word argvprinter.sh
$ echo $?
1
26 / 47

Bash scripting: exist code II

In Bash scripts, the exit code can be set in two ways:

  1. Implicitly, as the exit code of the last executed command

  2. Explicitly via the exit command (which also stops its execution)

$ cat greeter.sh
#!/bin/bash
echo "Hello $1!"
$ bash greeter.sh there
Hello there!
$ echo $?
0
$ cat exiter.sh
#!/bin/bash
exit 47
echo "Hello $1!"
$ bash exiter.sh there
$ echo $?
47
27 / 47

Bash scripting: exist code II

In Bash scripts, the exit code can be set in two ways:

  1. Implicitly, as the exit code of the last executed command

  2. Explicitly via the exit command (which also stops its execution)

$ cat greeter.sh
#!/bin/bash
echo "Hello $1!"
$ bash greeter.sh there
Hello there!
$ echo $?
0
$ cat exiter.sh
#!/bin/bash
exit 47
echo "Hello $1!"
$ bash exiter.sh there
$ echo $?
47

The concept of exit codes is also very useful when evaluating conditions.

28 / 47

Bash scripting: if conditions

The basic syntax is as follows:

if cmd ; then
another_command;
further_command;
fi

If cmd finishes with exit code 0, the commands following the then clause are executed.

29 / 47

Bash scripting: if conditions

The basic syntax is as follows:

if cmd ; then
another_command;
further_command;
fi

If cmd finishes with exit code 0, the commands following the then clause are executed.

A simple example to demonstrate it practically:

#!/bin/bash
if grep -q "data" /etc/group; then
echo 'Group data seems to exist.'
fi

The execution procedure goes as follows:

  1. grep -q "data" /etc/group is executed
  2. If data can be found in /etc/group, the status code will be set to 0 and to 1 otherwise.
  3. If the exit code is 0, the echo part will be executed.
30 / 47

Bash scripting: if with test

In principle, any command can be used with if.

In practice, the test command is used quite often.

if test -d /tmp; then
echo "The /tmp directory exists"
fi
31 / 47

Bash scripting: if with test

In principle, any command can be used with if.

In practice, the test command is used quite often.

if test -d /tmp; then
echo "The /tmp directory exists"
fi
  • STRING1 == STRING2
    • true (0) if the strings STRING1 and STRING2 are identical
  • STRING1 != STRING2
    • true (0) if the strings STRING1 and STRING2 are not identical
  • -e FILE
    • true (0) if FILE exists
  • -f FILE
    • true (0) if FILE is a regular file
  • -d FILE
    • true (0) if FILE is a directory
  • NUM1 -eq NUM1
    • true (0) if NUM1 and NUM2 are numerically equal
  • NUM1 -ne NUM1
    • true (0) if NUM1 and NUM2 are not numerically equal
  • NUM1 -gt NUM1
    • true (0) if NUM1 is greater than NUM2
  • NUM1 -lt NUM1
    • true (0) if NUM1 is less than NUM2

For much more, see man 1 test.

32 / 47

Bash scripting: if with test II

#!/bin/bash
if test -d .tmp; then
echo "The directory .tmp exists; proceeding."
fi
if test -f .config; then
echo "Config file .config exists; proceeding."
fi
cp -R .tmp .config /backup
if test $? -eq 0; then
echo "Directory .tmp and file .config copied successfully."
fi
33 / 47

Bash scripting: if with test II

#!/bin/bash
if test -d .tmp; then
echo "The directory .tmp exists; proceeding."
fi
if test -f .config; then
echo "Config file .config exists; proceeding."
fi
cp -R .tmp .config /backup
if test $? -eq 0; then
echo "Directory .tmp and file .config copied successfully."
fi

Writing test so often is a bit obnoxious, so POSIX also has a shortcut: [ and ]:

#!/bin/bash
if [ -d .tmp ]; then
echo "The directory .tmp exists; proceeding."
fi

Note the spaces. They are mandatory -- [ is actually a command (normally stored in /usr/bin/[)

34 / 47

Bash scripting: if, elif, else

Bash also supports the standard if/elif/else conditions.

if cmd1 ; then
other_if_command;
elif cmd2 ; then
other_elif_command;
else
else_command;
fi
35 / 47

Bash scripting: if, elif, else

Bash also supports the standard if/elif/else conditions.

if cmd1 ; then
other_if_command;
elif cmd2 ; then
other_elif_command;
else
else_command;
fi

A quick example:

#!/bin/bash
if [ "$USER" == "root" ]; then
echo "You may proceed";
elif groups | grep -q sudo; then
echo "Please become root to run this"
else
echo "Sorry, only root is allowed to run this";
fi
36 / 47

Bash scripting: multiple commands

  • cmd1 | cmd2
    • the exit code is set to that of the last command
  • cmd1 && cmd2
    • cmd2 will be executed only if cmd1 returns 0
  • cmd1 || cmd2
    • cmd2 will be executed only if cmd1 does not return 0

There are also true/false constants:

true

  • "do nothing, successfully" (exit code 0)

false

  • "do nothing, unsuccessfully" (exit code 1)
$ true && echo "We will see this"
We will see this
$ false && echo "We will not see this"
$ false || echo "We will see this"
We will see this
$ grep -q aZn31A /etc/passwd | true
$ echo $?
0

If we do not care about the exit code but would like to put multiple commands on the same line, we can separate them with ;.

$ echo "First"; echo "Second"
First
Second
37 / 47

Bash scripting: multiple commands II

A quick example

#!/bin/bash
if [ "$USER" == "root" ] || [ "$USER" == "mrshu" ]; then
echo "You may proceed";
elif groups | grep -q sudo; then
echo "Please become root to run this"
else
echo "Sorry, only root or mrshu are allowed to run this";
fi
38 / 47

Bash scripting: while loop

Check the exit code of cmd1. If zero, execute cmd2.

while cmd1; do
cmd2;
done
39 / 47

Bash scripting: while loop

Check the exit code of cmd1. If zero, execute cmd2.

while cmd1; do
cmd2;
done

The script below waits until firefox is running.

#!/bin/bash
while ps -ef | grep -v grep | grep firefox; do
echo "Firefox not running, will check in 10 seconds"
sleep 10
done
40 / 47

Bash scripting: for loop

Iterates over the list of values such that i is set to val1, val2 and val3.

for i in val1 val2 val3; do
echo $i
done

If we want to quickly generate a sequence of numbers, the seq command can come handy:

$ seq 1 5
1
2
3
4
5
$ cat iterator.sh
#!/bin/bash
for i in $(seq 1 5); do
echo "Checking number $i"
done
$ bash iterator.sh
Checking number 1
Checking number 2
Checking number 3
Checking number 4
Checking number 5
41 / 47

Bash scripting: case

A shortcut, so that one does not have to write out so many ifs.

Note the double semicolons (;;) -- they are required in this case.

case string in
str1) cmd1;;
str2) cmd2;;
*) catchall-cmd;;
esac
$ cat login.sh
#!/bin/bash
case "$1" in
root) echo "Welcome, you can come in" ;;
mrshu) echo "Please provide password" ;;
*) echo "Name not recognized";;
esac
$ bash login.sh mrshu
Please provide password
$ bash login.sh root
Welcome, you can come in
$ bash login.sh vidriduch
Name not recognized
42 / 47

Useful commands

43 / 47

tar

  • creates a "package" from a filesystem path

  • concatenates files and directories without compression

$ tar -cf backup.tar /home/mrshu
  • extract the tar package to the current directory
$ tar -xf backup.tar
  • -O sends output to stdout
44 / 47

gzip compression

  • gzip file

    • creates compressed file file.gz out of file
  • gunzip file.gz

    • uncompresses file.gz and creates file
    • -c sends the output to stdout

gzip is just one of the available compression methods, there is also bzip2 and xz

45 / 47

tar + compression

  • gzip
$ tar -czf package.tar.gz path
$ tar -xzf package.tar.gz
  • bzip2
$ tar -cjf package.tar.bz2 path
$ tar -xjf package.tar.bz2
  • xz
$ tar -cJf package.tar.xz path
$ tar -xJf package.tar.xz
46 / 47

tar + compression through pipe

With gzip:

$ cat file | gzip > tmp.gz
$ cat tmp.gz | gunzip > unzipped_file
$ gunzip -c tmp.gz | head

With tar

$ cat package.tgz | gunzip | tar -x -O | less
$ cat package.tar.gz | tar -x -z -O | less
$ cat package.tar.bz2 | bunzip2 | tar -xO | tail
47 / 47

Why UNIX (scripting) for Data Science?

  • Much of the UNIX philosophy we've seen so far was focused on composability of small programs doing one thing, and doing it well
2 / 47
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow