name: inverse layout: true class: center, middle, inverse --- # Processes and Signals How work gets organized, directed and communicated about .footnote[Marek Šuppa
Ondrej Jariabka
Adrián Matejov] ??? https://cs.nyu.edu/~gottlieb/courses/2000s/2000-01-spring/os/chapters/chapter-2.html https://unix.bpowers.net/ https://browsix.org/ https://win95.ajf.me/ https://jvns.ca/blog/2016/06/13/should-you-be-scared-of-signals/ https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/ https://chrispenner.ca/posts/wc https://kukuruku.co/post/processes-paralleling-to-speed-up-computing-and-tasks-execution-in-linux/ https://idea.popcount.org/2012-12-11-linux-process-states/ https://homepages.uc.edu/~thomam/Intro_Unix_Text/Process2.html https://tldp.org/LDP/intro-linux/html/sect_04_01.html#sect_04_01_02 --- layout: false # Why UNIX-like for Data Science? - There is a very good chance you'll work with significant data loads -- - Often times larger than the RAM available on your boxes -- - Being able to understand, control and especially stop _what is going_ (i.e. the process) on will be critical -- --- class: middle, inverse # Processes --- # History of Processes in Operating Systems - batch operating systems -- - single-process operating systems -- - multi-process/time-sharing operating systems - This is where the modern OSs fit in --- # Processes in Linux - identified by a unique **P**rocess **ID** (**PID**) - and a ton of attributes: - the user the process belongs to (the user who runs it) - `PPID` - the **PID** of its parent process - start time (`STIME`) - which terminal is it associated with (`TTY`) - the amount of "CPU time" it consumed (`TIME`) - the actual shell command that started it (`CMD`) - state --- # Process states - `R`: running or runnable (on run queue) - `D`: uninterruptible sleep - `S`: interruptible sleep (waiting for an event to complete) - `T`: stopped by job control signal - `Z`: defunct ("zombie") process, terminated but not reaped by its parent --- # Process state diagram .center[![:scale 80%](images/process-state-diagram.png)] .font-small[Image from https://cloudchef.medium.com/linux-process-states-and-signals-a967d18fab64] --- # Detailed process information The best source is probably `/proc/
/status` ``` $ cat /proc/1382/status | head Name: ssh Umask: 0022 State: S (sleeping) Tgid: 540038 Ngid: 0 Pid: 540038 PPid: 37564 TracerPid: 0 Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 ``` --- # Listing processes - `ps` - by default lists the processes the user is running in the current terminal ``` $ ps PID TTY TIME CMD 3583901 pts/39 00:00:00 bash 3583948 pts/39 00:00:00 ps ``` --- # Listing all processes - `-e` (or `-A`) lists all processes - `-f` does full-format listing ``` $ ps -e -f UID PID PPID C STIME TTY TIME CMD root 1 0 0 Oct12 ? 00:00:05 /usr/lib/systemd/systemd --switched-root --system --deserialize 34 root 2 0 0 Oct12 ? 00:00:00 [kthreadd] root 3 2 0 Oct12 ? 00:00:00 [rcu_gp] root 4 2 0 Oct12 ? 00:00:00 [rcu_par_gp] root 6 2 0 Oct12 ? 00:00:00 [kworker/0:0H-kblockd] root 9 2 0 Oct12 ? 00:00:00 [mm_percpu_wq] root 10 2 0 Oct12 ? 00:00:10 [ksoftirqd/0] root 11 2 0 Oct12 ? 00:02:10 [rcu_sched] root 12 2 0 Oct12 ? 00:00:00 [migration/0] [ ... output omitted ... ] mrshu 550723 550241 0 13:32 pts/9 00:00:00 ps -e -f ``` --- # Listing specific information - `-o` lists specific fields, such as - `cmd` - `pid` - `ppid` - `state` - `user` - ... and many more you can find in the `man` page ``` $ ps -o pid,state,cmd PID S CMD 550241 S -fish 551637 S bash 551709 R ps -o pid,state,cmd ``` --- # Listing the process hierarchy - `-H` makes `ps` show the processes in a "tree" view. ``` $ ps -H 47754 pts/13 00:00:00 zsh 47827 pts/13 00:00:00 bash 553122 pts/13 00:00:00 bash 553182 pts/13 00:00:00 ps ``` -- In the listing above we have `zsh` which runs `bash`, which runs `bash` which runs `ps`. This can also be visualized using the `pstree` command. --- # Listing processes by `PID`s - `-p pidlist` - lists processes whose `PID`s are in the comma-separated `pidlist` ``` $ ps -p 552921,549013,547031 PID TTY TIME CMD 547031 ? 00:00:05 python3 549013 ? 00:00:08 firefox 552921 ? 00:00:00 gnome-calendar ``` --- # Listing processes of specific users - `-u userlist` - lists processes whose users are in the comma-separated `userlist` ``` $ ps -u joe123 PID TTY TIME CMD 138565 pts/4 00:04:23 vimx 138580 ? 00:00:51 python3 138594 ? 00:00:38 python3 138595 ? 00:00:16 python3 138596 ? 00:00:25 python3 138597 ? 00:00:18 python3 ``` --- class: middle, inverse # Signals --- # Signals as a concept - A way for processes to communicate - Takes place on the kernel level (i.e. it's very fast) - The bandwidth is limited though (you don't send a video this way) --- # Signals Overview - `SIGSTOP` (19) - suspend the process until it receives `SIGCONT` (18) - `SIGHUP` (1) -- "**sig**nal **h**ang **up**" - in the past it signaled literal hang up of the terminal modem's phone - often used for re-initialization of a long-running process (Apache Server) - in modern usage it means "the controling terminal has closed" - `SIGTERM` (15) - terminate a process gracefully - the process gets a chance to clean up before it terminates - `SIGKILL` (9) - terminate a process - this signal cannot be caught -- the process just dies right away -- Except for `SIGSTOP` and `SIGKILL` programs can handle these signals in their own way. --- # Sending Signals On most Linux distributions, this is done via the `kill` command. - `kill -s signal pid` - `signal` is the name of the signal (like `SIGKILL`) - `pid` is the PID of the process to send the signal to ``` $ kill -s SIGKILL 3215 ``` There is also a shorter version: ``` $ kill -SIGKILL 3215 $ kill -KILL 3215 ``` -- - Each signal is defined by its own ID (in parents on the previous slide). - These can be listed via `kill -L` ``` $ kill -L 1 HUP 2 INT 3 QUIT 4 ILL 5 TRAP 6 ABRT 6 IOT 7 BUS 8 FPE 9 KILL 10 USR1 11 SEGV 12 USR2 13 PIPE 14 ALRM 15 TERM 16 STKFLT 17 CHLD 17 CLD 18 CONT 19 STOP 20 TSTP 21 TTIN 22 TTOU 23 URG 24 XCPU 25 XFSZ 26 VTALRM 27 PROF 28 WINCH 29 IO 29 POLL 30 PWR 31 SYS 34 RTMIN 64 RTMAX ``` ??? Once again, this is well within the UNIX/Posix philosophy. Shorter yet expressive is better than verbose and redundant, mostly because typing used to be rather expensive. --- # Process Termination with Signals - The standard approach is to first send `SIGTERM` (15) to a process we want to terminate - This is done so that the process can finish up cleanly ``` $ kill -15 3215 ``` -- - And if that does not happen, bring the bing guns by sending `SIGKILL` (9) ``` $ kill -9 3215 ``` -- - `killall process` - kill processes by name (`process`) - sends `SIGTERM` by default - specific signal can be specified like in case of `kill` ``` $ killall firefox ``` -- And if that does not help... ``` $ killall -9 firefox ``` --- class: middle, inverse # Job control in Bash In other words, how to use signals to control processes form Bash --- # Stop and suspend a running process - A "normal" process in `bash` is said to be started in the foreground - that is, it outputs and reads to/from the terminal - `Ctrl+C` - sends the `SIGINT` (2) signal (similar to `SIGTERM`) - interrupts and generally makes the running process stop - `Ctrl+Z` - sends the `SIGTSTP` (20) signal (similar to `SIGSTOP`) - suspends the program and returns back to the shell --- # Forground and background processes Let's consider a long-running command like `cp movie.mp4 ~/Movies` - `cp movie.mp4 ~/Movies` - runs the command on the _foreground_ - until it finishes, it is not possible to run (or even type) any further command - `Ctrl+C` will terminate it - `Ctrl+Z` will "suspend" the process (it will be stopped) - once stopped, `bg` will resume its execution in the background - conversely, `fg` will resume its execution in the foreground -- - `cp movie.mp4 ~/Movies &` - runs the command in the background by default - the shell is available straight away - `Ctrl+C` won't work on it (it is not in the foreground) - `fg` will bring it to the foreground --- # Job control with `jobs` - `jobs` - Bash internal command (not part of the operating system) - lists all processes executed from the terminal - each job has its ID (in brackets) - these can be used to reference it in `fg`, `bg` or `kill`, e.g. `fg %1` - by default `fg` and `bg` take the first job from the table -- ``` $ man ps [1]+ Stopped man ps $ eog & [2] 32165 $ jobs [1]+ Stopped man ps [2]- Running eog & $ kill -15 %2 [2]- Terminated eog $ fg man ps ``` --- class: middle, inverse # Useful commands --- # `wc` - Stands for "word count" (despite what the name may suggest...). - Shows the number of lines, words and characters in a file ``` $ wc /etc/passwd 54 134 3062 /etc/passwd $ wc -l /etc/passwd 54 /etc/passwd $ wc -w /etc/passwd 134 /etc/passwd $ wc -m /etc/passwd 3062 /etc/passwd ``` -- Works with data piped in from other commands as well: ``` $ cat /etc/passwd | wc -m 3062 ```