name: inverse layout: true class: center, middle, inverse --- # Introduction to Bash scripting How to automate your computing (and life) with all the commands you already know. .footnote[Marek Šuppa
Ondrej Jariabka
Adrián Matejov] --- layout: false # Why UNIX (scripting) for Data Science? - Much of the UNIX philosophy we've seen so far was focused on composability of small programs doing one thing, and doing it well -- - Creating larger compositions out of these "small primitives" is the obvious next step --- class: middle, center, inverse # Bash scripting The easiest way of starting with automation --- # Bash scripts - Apart from being a terminal shell, shell/Bash is also a full-fledged programming language -- - Bash scripts often start as oneliners in the shell and get moved to a specific "script" -- - Scripts can become programs like any other we've met so far --- # Bash scripts Usually text files with the `.sh` (or `.bash`) file extension -- Let's start with a simple one, called `grouplist.sh`: ```bash #!/bin/bash cat /etc/group | cut -d: -f1 | sort ``` -- The script can be executed by passing it to `bash`: ```bash $ bash grouplist.sh adm adman audio backup bifadm bin cdrom crontab daemon ... [ 55 lines omitted ] ... ``` --- # Bash scripts: comments Anything after the hash mark (`#`) is considered a comment: ```bash $ echo "Hi there!" # This text will be ignored Hi there! ``` -- The first line of `grouplist.sh` is actually a special kind of comment called "[shebang]( https://en.wikipedia.org/wiki/Shebang_%28Unix%29)". -- Shebangs start with `#!`, followed by the absolute path to the file's interpreter. ```bash #!/bin/bash cat /etc/group | cut -d: -f1 | sort ``` If a file contains a shebang and has the executable (`x`) permission set, the shell will use it to run it. ```bash $ chmod +x ./grouplist.sh $ ./grouplist.sh | head -n 3 adm adman audio ``` --- # Bash scripting: variables In Bash (scripts or just shell), variables are defined using `=` ```bash $ VAR=something ``` -- And interpolated (expanded to the value they hold) by adding the `$` prefix to their names: ```bash $ VAR=something $ echo $VAR something ``` -- Many useful variables are pre-set out of the box: .left-eq-column[ ```bash # The name of the current user $ echo $USER mrshu # The home (~) directory of the # current user $ echo $HOME /home/mrshu ``` ] .right-eq-column[ ```bash # The current working directory $ echo $PWD /tmp ``` You can see all of them by running `env` or `printenv`. ] --- # Bash scripting: variables II By default, the variables are local to the process they are defined in. ```bash $ cat varprinter.sh #!/bin/bash LOCALVAR=localized echo LOCALVAR=$LOCALVAR echo VAR=$VAR $ chmod +x varprinter.sh $ VAR=something $ ./varprinter.sh LOCALVAR=localized VAR= ``` .center[ .font-small[ (Undefined variables default to empty strings.) ] ] -- Using `export`, they can be exported to (or "inherited" by) child processes: ```bash $ export VAR=`exported` $ ./varprinter.sh LOCALVAR=localized VAR=`exported` ``` --- # Bash scripting: variables III Just as other programs, Bash scripts can receive arguments. .left-eq-column[ - `$#` - the number of arguments - `$0` - the script that is being executed - `$1`, `$2`, ..., `$9` - first, second, up to ninth argument ```bash $ cat argvprinter.sh #!/bin/bash echo "Running cmd: `$0`" echo "Number of arguments: `$#`" echo "First argument: `$1`" ``` ] .right-eq-column[ ```bash $ ./argvprinter.sh Running cmd: ./argvprinter.sh Number of arguments: 0 First argument: ``` ```bash $ ./argvprinter.sh show Running cmd: ./argvprinter.sh Number of arguments: 1 First argument: show ``` ```bash $ bash argvprinter.sh show Running cmd: argvprinter.sh Number of arguments: 1 First argument: show ``` ] --- # Bash: variable interpolation and quotes Strings in apostrophes or single-quotes (such as `'some string'`) are printed verbatim. ``` $ echo 'I am logged in as $USER' I am logged in as `$USER` ``` -- In double-quote strings (like `"some string"`), the variables are first interpolated. ``` $ echo "I am logged in as $USER" I am logged in as `mrshu` ``` -- This can be used very nicely for string concatenation: ``` $ echo "I am `$USER`@`$HOSTNAME`" I am `mrshu`@`davos` ``` --- # Bash: quotes and parameters When executing a shell command, the parsing process goes as follows: 1. Shell operators (`>`, `<`, `|`, ...) 2. Wildcards (`*`, `?`, ...) 3. Variables (like `$VAR`) ```bash $ echo I like * I like argvprinter.sh grouplist.sh $ echo '*' I like * $ echo Nice $VAR emoji! >* -bash: *: ambiguous redirect $ echo 'Nice $VAR emoji! >*' Nice $VAR emoji! >* $ VAR=smiling $ echo "Nice $VAR emoji! >*" Nice smiling emoji! >* ``` --- # Bash scripting: command expansion Output (written to stdout or stderr) of any command can be saved to a variable. This concept is called "command expansion" and can be done either via `$()` or backticks ``` $ a=$(echo 'hello' | tr '[:lower:]' '[:upper:]') $ b=$(echo 'WORLD' | tr '[:upper:]' '[:lower:]') $ echo "$a, $b" HELLO, world ``` .font-small[Example from http://www.compciv.org/topics/bash/variables-and-substitution/] --- # Bash scripting: exist code On finish, each program returns a so called "exit code". In Bash, it is stored in the `$?` variable. ``` $ grep `cmd` argvprinter.sh echo "Running `cmd`: $0" $ echo $? 0 ``` Exit code `0` generally denotes `EXIT_SUCCESS`: the program finished successfully. -- --------------- A non-zero exit code generally means that some error happened. The most general one is `EXIT_FAILURE`, which is set to `1` on Unix systems. ``` $ grep `non-existent-word` argvprinter.sh $ echo $? 1 ``` --- # Bash scripting: exist code II In Bash scripts, the exit code can be set in two ways: 1. Implicitly, as the exit code of the last executed command 2. Explicitly via the `exit` command (which also stops its execution) .left-eq-column[ ```bash $ cat greeter.sh #!/bin/bash echo "Hello $1!" ``` ```bash $ bash greeter.sh there Hello there! $ echo $? 0 ``` ] .right-eq-column[ ```bash $ cat exiter.sh #!/bin/bash exit 47 echo "Hello $1!" ``` ```bash $ bash exiter.sh there $ echo $? 47 ``` ] -- The concept of exit codes is also very useful when evaluating conditions. --- # Bash scripting: `if` conditions The basic syntax is as follows: .left-eq-column[ ```bash if cmd ; then another_command; further_command; fi ``` ] .right-eq-column[ If `cmd` finishes with exit code `0`, the commands following the `then` clause are executed. ] -- .clear-both[ A simple example to demonstrate it practically: .left-eq-column[ ```bash #!/bin/bash if grep -q "data" /etc/group; then echo 'Group data seems to exist.' fi ``` ] .right-eq-column[ The execution procedure goes as follows: 1. `grep -q "data" /etc/group` is executed 2. If `data` can be found in `/etc/group`, the status code will be set to `0` and to `1` otherwise. 3. If the exit code is `0`, the `echo` part will be executed. ] ] --- # Bash scripting: `if` with `test` .left-eq-column[ In principle, any command can be used with `if`. In practice, the `test` command is used quite often. ] .right-eq-column[ ```bash if test -d /tmp; then echo "The /tmp directory exists" fi ``` ] -- .clear-both[ .left-eq-column[ - `STRING1 == STRING2` - true (0) if the strings STRING1 and STRING2 are identical - `STRING1 != STRING2` - true (0) if the strings STRING1 and STRING2 are not identical - `-e FILE` - true (0) if `FILE` exists - `-f FILE` - true (0) if `FILE` is a regular file - `-d FILE` - true (0) if `FILE` is a directory ] .right-eq-column[ - `NUM1 -eq NUM1` - true (0) if `NUM1` and `NUM2` are numerically equal - `NUM1 -ne NUM1` - true (0) if `NUM1` and `NUM2` are not numerically equal - `NUM1 -gt NUM1` - true (0) if `NUM1` is greater than `NUM2` - `NUM1 -lt NUM1` - true (0) if `NUM1` is less than `NUM2` For much more, see `man 1 test`. ] ] --- # Bash scripting: `if` with `test` II ```bash #!/bin/bash if test -d .tmp; then echo "The directory .tmp exists; proceeding." fi if test -f .config; then echo "Config file .config exists; proceeding." fi cp -R .tmp .config /backup if test $? -eq 0; then echo "Directory .tmp and file .config copied successfully." fi ``` -- Writing `test` so often is a bit obnoxious, so POSIX also has a shortcut: `[` and `]`: ```bash #!/bin/bash if [ -d .tmp ]; then echo "The directory .tmp exists; proceeding." fi ``` Note the spaces. They are mandatory -- `[` is actually a command (normally stored in `/usr/bin/[`) --- # Bash scripting: `if`, `elif`, `else` Bash also supports the standard `if`/`elif`/`else` conditions. ```bash if cmd1 ; then other_if_command; elif cmd2 ; then other_elif_command; else else_command; fi ``` -- A quick example: ```bash #!/bin/bash if [ "$USER" == "root" ]; then echo "You may proceed"; elif groups | grep -q sudo; then echo "Please become root to run this" else echo "Sorry, only root is allowed to run this"; fi ``` --- # Bash scripting: multiple commands .left-eq-column[ - `cmd1 | cmd2` - the exit code is set to that of the last command - `cmd1 && cmd2` - `cmd2` will be executed only if `cmd1` returns 0 - `cmd1 || cmd2` - `cmd2` will be executed only if `cmd1` does not return 0 There are also `true`/`false` constants: `true` - _"do nothing, successfully"_ (exit code 0) `false` - _"do nothing, unsuccessfully"_ (exit code 1) ] .right-eq-column[ ```bash $ true && echo "We will see this" We will see this ``` ```bash $ false && echo "We will not see this" ``` ```bash $ false || echo "We will see this" We will see this ``` ```bash $ grep -q aZn31A /etc/passwd | true $ echo $? 0 ``` If we do not care about the exit code but would like to put multiple commands on the same line, we can separate them with `;`. ```bash $ echo "First"; echo "Second" First Second ``` ] --- # Bash scripting: multiple commands II A quick example ```bash #!/bin/bash if [ "$USER" == "root" ] || [ "$USER" == "mrshu" ]; then echo "You may proceed"; elif groups | grep -q sudo; then echo "Please become root to run this" else echo "Sorry, only root or mrshu are allowed to run this"; fi ``` --- # Bash scripting: `while` loop .left-eq-column[ Check the exit code of `cmd1`. If zero, execute `cmd2`. ] .right-eq-column[ ```bash while cmd1; do cmd2; done ``` ] -- .clear-both[ The script below waits until `firefox` is running. ```bash #!/bin/bash while ps -ef | grep -v grep | grep firefox; do echo "Firefox not running, will check in 10 seconds" sleep 10 done ``` ] --- # Bash scripting: `for` loop .left-eq-column[ Iterates over the list of values such that `i` is set to `val1`, `val2` and `val3`. ] .right-eq-column[ ```bash for i in val1 val2 val3; do echo $i done ``` ] .clear-both[ .left-eq-column[ If we want to quickly generate a sequence of numbers, the `seq` command can come handy: ```bash $ seq 1 5 1 2 3 4 5 ``` ] .right-eq-column[ ```bash $ cat iterator.sh #!/bin/bash for i in $(seq 1 5); do echo "Checking number $i" done ``` ``` $ bash iterator.sh Checking number 1 Checking number 2 Checking number 3 Checking number 4 Checking number 5 ``` ] ] --- # Bash scripting: `case` .left-eq-column[ A shortcut, so that one does not have to write out so many `if`s. Note the double semicolons (`;;`) -- they are required in this case. ] .right-eq-column[ ```bash case string in str1) cmd1;; str2) cmd2;; *) catchall-cmd;; esac ``` ] .clear-both[ ```bash $ cat login.sh #!/bin/bash case "$1" in root) echo "Welcome, you can come in" ;; mrshu) echo "Please provide password" ;; *) echo "Name not recognized";; esac ``` ``` $ bash login.sh mrshu Please provide password $ bash login.sh root Welcome, you can come in $ bash login.sh vidriduch Name not recognized ``` ] --- class: middle, center, inverse # Useful commands --- # `tar` - creates a "package" from a filesystem path - concatenates files and directories without compression ```bash $ tar -cf `backup.tar` /home/mrshu ``` - extract the `tar` package to the current directory ```bash $ tar -xf `backup.tar` ``` - `-O` sends output to stdout --- # `gzip` compression - `gzip file` - creates compressed file `file.gz` out of `file` - `gunzip file.gz` - uncompresses `file.gz` and creates `file` - `-c` sends the output to stdout `gzip` is just one of the available compression methods, there is also `bzip2` and `xz` --- # `tar` + compression - gzip ```bash $ tar -c`z`f package.tar.gz path $ tar -x`z`f package.tar.gz ``` - bzip2 ```bash $ tar -c`j`f package.tar.bz2 path $ tar -x`j`f package.tar.bz2 ``` - xz ```bash $ tar -c`J`f package.tar.xz path $ tar -x`J`f package.tar.xz ``` --- # `tar` + compression through pipe With `gzip`: ```bash $ cat file | gzip > tmp.gz $ cat tmp.gz | gunzip > unzipped_file $ gunzip -c tmp.gz | head ``` With `tar` ```bash $ cat package.tgz | gunzip | tar -x -O | less $ cat package.tar.gz | tar -x -z -O | less $ cat package.tar.bz2 | bunzip2 | tar -xO | tail ```