name: inverse layout: true class: center, middle, inverse --- # Users, Groups and Regular Expressions User management and the most useful tool UNIX can give you .footnote[Marek Šuppa
Ondrej Jariabka
Adrián Matejov] --- layout: false # Why UNIX-like for Data Science? If for nothing else, it's worth it for **regular expressions**. > Knowing [regular expressions] can mean the difference between solving a problem in 3 steps and solving it in 3,000 steps. When you’re a nerd, you forget that the problems you solve with a couple keystrokes can take other people days of tedious, error-prone work to slog through. -- Cory Doctorow https://www.theguardian.com/technology/2012/dec/04/ict-teach-kids-regular-expressions --- class: middle, inverse # Users and Groups --- # Users - UNIX was devised with "collaboration in mind" - The concept of users plays a central role -- - Same thing with Linux: it is a multi-user OS - Each user is identified with a `UID` - Their actions (i.e. started processes or created files) are associated with this `UID` ??? - You know, sharing is caring and all that. - In principle, UNIX has been built so that people could collaborate on documents, something basically unheard of in 1970s --- # How do I become a user? Via logging in. Two things need to happen: 1. Identification - By passing in the username 2. Authorization - By providing a password - Or other methods like SSH/HW crypto keys --- # Where is info about users stored? In general, two files: - `/etc/passwd` - Can be read by everyone - `/etc/shadow` - Can only be read by root (or "special users") - Actually contains the hashed passwords ??? The concept of shadowing came from the need to make the password hashes a bit more secure -- so that they could not be bruteforced by a random user capable of logging in. Linux was kind of lucky: shadowing was ported there very early and basically just stayed in up until now. --- # `/etc/passwd` A file full of colon (`:`) delimited fields like ```bash jsmith`:`x`:`1001`:`1000`:`Joe Smith,Room 7,(234)555-8910,j@smi.th`:`/home/jsmith`:`/bin/sh ``` -- Each field has a specific meaning: 1. `jsmith`: the username (generally lowercase) -- 2. `x`: password (the `x` here means the password is in `/etc/shadow`) -- 3. `1001`: the user's `UID` -- 4. `1000`: the user's primary `GID` (Group ID) -- 5. `Joe Smith,Room 7,(234)555-8910,j@smi.th`: some further (contact) details about the user -- 6. `/home/jsmith`: home directory path -- 7. `/bin/sh`: user's default shell ??? The 5th row is actually https://en.wikipedia.org/wiki/Gecos_field -- a historical curiosity --- # `/etc/shadow` Similar to `/etc/passwd` in format, for example ```bash jsmith:`$6$rTDC8QprwvDu`.:15377:0:99999:7::: daemon:`*`:17206:0:99999:7::: ``` -- Once again, each field has a specific meaning: 1. `jsmith`: the username 2. the hashed password - empty: empty password - `!` or `*`: account is password locked, login only possible via other means (SSH) - `!!`: password not set yet 3. `15377`: day of last password change 4. `0`: days until change allowed 5. `99999`: days until change required 6. `7`: days warning for expiration -- All the numbers of days are counted from the "beginning of the UNIX epoch": **1 January 1970**. --- # Groups - A useful concept for allowing groups of users to access a set of resources - Could be files, special devices (printers, GPUs ...) or programs -- - Uniquely identified by a `GID` - Can have an access password (quite uncommon these days) - From its point of view there are - **users**: those that are associated with / part of it - **others**: everyone else - Information about them is stored in `/etc/group` and `/etc/gshadow` --- # `/etc/group` and `/etc/gshadow` - `/etc/group` ``` sudo:x:3:mrshu,vidriduch,adman lp:x:7:daemon,lp,mrshu ``` - name - password (or `x`, in which case it is shadowed) - comma separated list of usernames -- - `/etc/gshadow` ``` sudo:!:: lp:!!:: ``` - name - password (or `!`, `!!`, `*`) - list of administrators - list of users --- # User groups - Each user can be in multiple groups - Just one of them is primary (its `GID` is right after `UID` in `/etc/passwd`) -- - We can get the list of groups we are in by running the `groups` command: ``` $ groups mrshu sudo lp ``` - To get the groups of other users, pass their username as a parameter ``` $ groups adman adman : adman sudo ``` --- # `root` user - an account for system administrator - in the UNIX security model, the `root` user is considered "all-powerful" - this user traditionally has `UID` 0 and home directory `/root` - it is also associated with a specific `root` group (`GID` is also 0) -- ## `sudo` - stands for "superuser do" or "substitute user do" - allows "normal" users to run commands as `root` - only for users specified in its configuration (`/etc/sudoers`) - sometimes it is enough to be part of a special group (like `sudo`) --- # Useful commands - `id` - find out what your current identity is (along with `UID` and `GID`s) ``` $ id uid=1001(mrshu) gid=1001(mrshu) groups=1001(mrshu),27(sudo) ``` -- - `su USER` - change to some other `USER` (abbreviation of "set user") - if called without arguments, assumes that `USER` is `root` - if you know the `root`'s password, this is how you can get `root` privileges - `su -` is effectively the same thing as logging in as a different user -- - `passwd` - change your UNIX password - `root` can also use it to change passwords of other users (`passwd USER`) --- class: middle, inverse # Regular Expressions --- # Regular Expressions - aka "regex" or "regexp" - a quick way of describing a particular pattern of characters in text - allows for extremely effective search and replace -- - can be found everywhere on *NIX systems, but the especially in text editors - comes from the `ed` editor but you'll mostly encounter the `grep` program -- - in general `grep` outputs lines which match a given regex pattern ??? The name grep itself comes from the `ed` command: > “One afternoon I asked Ken Thompson if he could lift the regular expression recognizer out of the editor and make a one-pass program to do it. He said yes. The next morning I found a note in my mail announcing a program named grep. It worked like a charm. When asked what that funny name meant, Ken said it was obvious. It stood for the editor command that it simulated, g/re/p (global regular expression print).” -- [Chapter 9, On the Early History and Impact of Unix Tools to Build the Tools for a New Millenium](http://www.columbia.edu/~rh120/ch001j.c11) https://medium.com/@rualthanzauva/grep-was-a-private-command-of-mine-for-quite-a-while-before-i-made-it-public-ken-thompson-a40e24a5ef48 --- # Using the `grep` command **Task**: show lines in `file.txt` that match the regular expression `regexp`. -- There are various ways of doing it: - file as an argument - `grep "regexp" file.txt` - input forwarded via standard I/O forwarding - `grep "regexp" < file.txt` - data passed from pipe - `cat file.txt | grep "regexp"` --- # RegExp Patterns ```bash $ cat file.txt 1 a.smith1 2 joe2 3 molly13 4 nemo7 5 rob5 6 roy8 ``` - character(s) ```bash $ cat file.txt | grep o 2 j`o`e2 3 m`o`lly13 4 nem`o`7 5 r`o`b5 6 r`o`y8 ``` - strings of characters ```bash $ cat file.txt | grep mo 3 `mo`lly13 4 ne`mo`7 ``` --- # RegExp Patterns: Dot ```bash $ cat file.txt a.smith1 joe2 molly13 nemo7 rob5 roy8 ``` - any character (denoted by a dot `.`) ```bash $ cat file.txt | grep "o.." j`oe2` m`oll`y13 r`ob5` r`oy8` ``` - an explicit dot can be expressed as `\.` ```bash $ cat file.txt | grep "\." a`.`smith1 ``` --- # RegExp Patterns: Character Classes ```bash $ cat file.txt a.smith1 joe2 molly13 nemo7 rob5 roy8 ``` - a class of characters (denoted `[]`) - "find all lines which contain `2`, `3` or `5`" ```bash $ cat file.txt | grep [235] joe`2` molly1`3` rob`5` ``` -- - "find all lines where `o` is followed by either `e` or `y`" ```bash cat file.txt | grep "o[ey]" j`oe`2 r`oy`8 ``` --- # RegExp Patterns: Ranges I ```bash $ cat file.txt 1 a.smith1 2 joe2 3 molly13 4 nemo7 5 rob5 6 roy8 ``` - character classes can also be specified as ranges (i.e. `[a-z]` or `[0-9]`) - "find all lines with three characters (`[a-z]`) followed by a number from `4` to `9`" ```bash $ cat file.txt | grep [a-z][a-z][a-z][4-9] 4 n`emo7` 5 `rob5` 6 `roy8` ``` - the repetition can be easily denoted with a number in curly braces `{}` ```bash $ cat file.txt | grep [a-z]{3}[4-9] 4 n`emo7` 5 `rob5` 6 `roy8` ``` --- # RegExp Patterns: Ranges II ```bash $ cat file.txt 1 a.smith1 2 joe2 3 molly13 4 nemo7 5 rob5 6 roy8 ``` - invert the class by putting `^` at the beginning of the definition (`[^ ]`) - "find all lines with three characters (`[a-z]`) **not** followed by a number from `4` to `9`" ```bash $ cat file.txt | grep [a-z][a-z][a-z][^4-9] 1 a.`smit`h1 2 `joe2` 3 `moll`y13 4 `nemo`7 ``` --- # RegExp Patterns: Repetitions .left-eq-column[ ```bash $ cat text.txt So, looking at the lock or the silk? ``` Repetitions can be applied on any character or character class. ] .right-eq-column[ Three basic repetition operators: - `\?`: match once or not at all - `\+`: match **one and more** times - `*`: match **zero and more** times ] -- .clear-both[ --------- Match all `l`s followed by zero or one `o`: ```bash $ cat text.txt | grep "lo\?" So, `lo`oking at the `lo`ck or the si`l`k? ``` Match all `l`s followed by at least one or more `o`s: ```bash $ cat text.txt | grep "lo\+" So, `loo`king at the `lo`ck or the silk? ``` Match all `l`s followed by zero or more `o`s: ```bash $ cat text.txt | grep "lo*" So, `loo`king at the `lo`ck or the si`l`k? ``` ] --- # RegExp Patterns: Anchors .left-eq-column[ ```bash $ cat file.txt 1 a.smith1 2 joe2 3 molly13 4 nemo7 5 rob5 6 roy8 ``` ] .right-eq-column[ Anchors are two very important "special characters": - `^`: match the beginning of the line - `$`: match the end of the line ] -- .clear-both[ .left-eq-column[ Find numbers at the beginning: ```bash $ cat file.txt | grep "^[0-9]\+" 1 a.smith1 2 joe2 3 molly13 4 nemo7 5 rob5 6 roy8 ``` ] .right-eq-column[ Find numbers at the end: ```bash $ cat file.txt | grep "[0-9]\+$" 1 a.smith`1` 2 joe`2` 3 molly`13` 4 nemo`7` 5 rob`5` 6 roy`8` ``` ] ] --- # Using the `grep` command II .left-eq-column[ - `grep PATTERNS FILE` - prints lines that match patterns - `-i`: make the search case-insensitive (**i**gnore-case) - `-v`: print lines that do not match the pattern (in**v**ert) - `-o`: output only the matched part of the line (**o**nly) - `-n`: include the line number in the output (**n**umber) ] .right-eq-column[ ```bash $ cat file.txt a.smith1 joe2 molly13 nemo7 rob5 roy8 $ cat file.txt | grep "[0-5]\$" -n 1:a.smith`1` 2:joe`2` 3:molly1`3` 5:rob`5` $ cat file.txt | grep "[0-5]\$" -n -v 4:nemo7 6:roy8 echo "Hello World!" | grep -i world Hello `World`! echo "Hello World!" | grep -i world -o World ``` ] --- class: middle, inverse # Useful Commands `cut` and `paste` --- # `cut` - cut out a field from a text file, based on some separator - `-d DELIM` set a specific delimiter (TAB by default) - `-f FIELDS` - specify fields (starting from 1) to cut out - can be a number (like `-f 2`) or a list (like `-f 2,5`) - or a `
-
` format (like `-f 2-4`) ```bash $ cut /etc/group -f 3 -d: | tail -n 5 972 84 971 970 969 $ cut /etc/group -f 1,3 -d: | tail -n 5 flatpak:972 screen:84 firebird:971 nm-fortisslvpn:970 docker:969 ``` --- # `paste` .left-eq-column[ - join files horizontally (like horizontal `cat`) - `-d` sets the delimiter (TAB by default) - `-s` appends data in **s**erial rather than in paralel ] .right-eq-column[ ```bash $ cat names.txt Mark Smith Bobby Brown Sue Miller Jenny Igotit $ cat numbers.txt 555-1234 555-9876 555-6743 867-5309 ``` ] .clear-both[ .left-eq-column[ ```bash $ paste names.txt numbers.txt Mark Smith 555-1234 Bobby Brown 555-9876 Sue Miller 555-6743 Jenny Igotit 867-5309 ``` ] .right-eq-column[ ```bash $ paste -d, names.txt numbers.txt Mark Smith,555-1234 Bobby Brown,555-9876 Sue Miller,555-6743 Jenny Igotit,867-5309 ``` ] ] .clear-both[ ```bash $ paste -s names.txt numbers.txt Mark Smith Bobby Brown Sue Miller Jenny Igotit 555-1234 555-9876 555-6743 867-5309 ``` ] ??? Example taken straight from the great Wikipedia: https://en.wikipedia.org/wiki/Paste_(Unix)