+ - 0:00:00
Notes for current slide
Notes for next slide

Users, Groups and Regular Expressions

User management and the most useful tool UNIX can give you

Marek Šuppa
Ondrej Jariabka
Adrián Matejov

1 / 49

Why UNIX-like for Data Science?

If for nothing else, it's worth it for regular expressions.

Knowing [regular expressions] can mean the difference between solving a problem in 3 steps and solving it in 3,000 steps. When you’re a nerd, you forget that the problems you solve with a couple keystrokes can take other people days of tedious, error-prone work to slog through.

-- Cory Doctorow

https://www.theguardian.com/technology/2012/dec/04/ict-teach-kids-regular-expressions

2 / 49

Users and Groups

3 / 49

Users

  • UNIX was devised with "collaboration in mind"

  • The concept of users plays a central role

4 / 49

Users

  • UNIX was devised with "collaboration in mind"

  • The concept of users plays a central role

  • Same thing with Linux: it is a multi-user OS

  • Each user is identified with a UID

    • Their actions (i.e. started processes or created files) are associated with this UID
5 / 49
  • You know, sharing is caring and all that.

  • In principle, UNIX has been built so that people could collaborate on documents, something basically unheard of in 1970s

How do I become a user?

Via logging in. Two things need to happen:

  1. Identification

    • By passing in the username
  2. Authorization

    • By providing a password
    • Or other methods like SSH/HW crypto keys
6 / 49

Where is info about users stored?

In general, two files:

  • /etc/passwd

    • Can be read by everyone
  • /etc/shadow

    • Can only be read by root (or "special users")
    • Actually contains the hashed passwords
7 / 49

The concept of shadowing came from the need to make the password hashes a bit more secure -- so that they could not be bruteforced by a random user capable of logging in.

Linux was kind of lucky: shadowing was ported there very early and basically just stayed in up until now.

/etc/passwd

A file full of colon (:) delimited fields like

jsmith:x:1001:1000:Joe Smith,Room 7,(234)555-8910,j@smi.th:/home/jsmith:/bin/sh
8 / 49

/etc/passwd

A file full of colon (:) delimited fields like

jsmith:x:1001:1000:Joe Smith,Room 7,(234)555-8910,j@smi.th:/home/jsmith:/bin/sh

Each field has a specific meaning:

  1. jsmith: the username (generally lowercase)
9 / 49

/etc/passwd

A file full of colon (:) delimited fields like

jsmith:x:1001:1000:Joe Smith,Room 7,(234)555-8910,j@smi.th:/home/jsmith:/bin/sh

Each field has a specific meaning:

  1. jsmith: the username (generally lowercase)

  2. x: password (the x here means the password is in /etc/shadow)

10 / 49

/etc/passwd

A file full of colon (:) delimited fields like

jsmith:x:1001:1000:Joe Smith,Room 7,(234)555-8910,j@smi.th:/home/jsmith:/bin/sh

Each field has a specific meaning:

  1. jsmith: the username (generally lowercase)

  2. x: password (the x here means the password is in /etc/shadow)

  3. 1001: the user's UID

11 / 49

/etc/passwd

A file full of colon (:) delimited fields like

jsmith:x:1001:1000:Joe Smith,Room 7,(234)555-8910,j@smi.th:/home/jsmith:/bin/sh

Each field has a specific meaning:

  1. jsmith: the username (generally lowercase)

  2. x: password (the x here means the password is in /etc/shadow)

  3. 1001: the user's UID

  4. 1000: the user's primary GID (Group ID)

12 / 49

/etc/passwd

A file full of colon (:) delimited fields like

jsmith:x:1001:1000:Joe Smith,Room 7,(234)555-8910,j@smi.th:/home/jsmith:/bin/sh

Each field has a specific meaning:

  1. jsmith: the username (generally lowercase)

  2. x: password (the x here means the password is in /etc/shadow)

  3. 1001: the user's UID

  4. 1000: the user's primary GID (Group ID)

  5. Joe Smith,Room 7,(234)555-8910,j@smi.th: some further (contact) details about the user

13 / 49

/etc/passwd

A file full of colon (:) delimited fields like

jsmith:x:1001:1000:Joe Smith,Room 7,(234)555-8910,j@smi.th:/home/jsmith:/bin/sh

Each field has a specific meaning:

  1. jsmith: the username (generally lowercase)

  2. x: password (the x here means the password is in /etc/shadow)

  3. 1001: the user's UID

  4. 1000: the user's primary GID (Group ID)

  5. Joe Smith,Room 7,(234)555-8910,j@smi.th: some further (contact) details about the user

  6. /home/jsmith: home directory path

14 / 49

/etc/passwd

A file full of colon (:) delimited fields like

jsmith:x:1001:1000:Joe Smith,Room 7,(234)555-8910,j@smi.th:/home/jsmith:/bin/sh

Each field has a specific meaning:

  1. jsmith: the username (generally lowercase)

  2. x: password (the x here means the password is in /etc/shadow)

  3. 1001: the user's UID

  4. 1000: the user's primary GID (Group ID)

  5. Joe Smith,Room 7,(234)555-8910,j@smi.th: some further (contact) details about the user

  6. /home/jsmith: home directory path

  7. /bin/sh: user's default shell

15 / 49

The 5th row is actually https://en.wikipedia.org/wiki/Gecos_field -- a historical curiosity

/etc/shadow

Similar to /etc/passwd in format, for example

jsmith:$6$rTDC8QprwvDu.:15377:0:99999:7:::
daemon:*:17206:0:99999:7:::
16 / 49

/etc/shadow

Similar to /etc/passwd in format, for example

jsmith:$6$rTDC8QprwvDu.:15377:0:99999:7:::
daemon:*:17206:0:99999:7:::

Once again, each field has a specific meaning:

  1. jsmith: the username
  2. the hashed password
    • empty: empty password
    • ! or *: account is password locked, login only possible via other means (SSH)
    • !!: password not set yet
  3. 15377: day of last password change
  4. 0: days until change allowed
  5. 99999: days until change required
  6. 7: days warning for expiration
17 / 49

/etc/shadow

Similar to /etc/passwd in format, for example

jsmith:$6$rTDC8QprwvDu.:15377:0:99999:7:::
daemon:*:17206:0:99999:7:::

Once again, each field has a specific meaning:

  1. jsmith: the username
  2. the hashed password
    • empty: empty password
    • ! or *: account is password locked, login only possible via other means (SSH)
    • !!: password not set yet
  3. 15377: day of last password change
  4. 0: days until change allowed
  5. 99999: days until change required
  6. 7: days warning for expiration

All the numbers of days are counted from the "beginning of the UNIX epoch": 1 January 1970.

18 / 49

Groups

  • A useful concept for allowing groups of users to access a set of resources

    • Could be files, special devices (printers, GPUs ...) or programs
19 / 49

Groups

  • A useful concept for allowing groups of users to access a set of resources

    • Could be files, special devices (printers, GPUs ...) or programs
  • Uniquely identified by a GID

  • Can have an access password (quite uncommon these days)

  • From its point of view there are

    • users: those that are associated with / part of it
    • others: everyone else
  • Information about them is stored in /etc/group and /etc/gshadow

20 / 49

/etc/group and /etc/gshadow

  • /etc/group
    sudo:x:3:mrshu,vidriduch,adman
    lp:x:7:daemon,lp,mrshu
    • name
    • password (or x, in which case it is shadowed)
    • comma separated list of usernames
21 / 49

/etc/group and /etc/gshadow

  • /etc/group

    sudo:x:3:mrshu,vidriduch,adman
    lp:x:7:daemon,lp,mrshu
    • name
    • password (or x, in which case it is shadowed)
    • comma separated list of usernames
  • /etc/gshadow

    sudo:!::
    lp:!!::
    • name
    • password (or !, !!, *)
    • list of administrators
    • list of users
22 / 49

User groups

  • Each user can be in multiple groups

  • Just one of them is primary (its GID is right after UID in /etc/passwd)

23 / 49

User groups

  • Each user can be in multiple groups

  • Just one of them is primary (its GID is right after UID in /etc/passwd)

  • We can get the list of groups we are in by running the groups command:

$ groups
mrshu sudo lp
  • To get the groups of other users, pass their username as a parameter
$ groups adman
adman : adman sudo
24 / 49

root user

  • an account for system administrator

  • in the UNIX security model, the root user is considered "all-powerful"

  • this user traditionally has UID 0 and home directory /root

  • it is also associated with a specific root group (GID is also 0)

25 / 49

root user

  • an account for system administrator

  • in the UNIX security model, the root user is considered "all-powerful"

  • this user traditionally has UID 0 and home directory /root

  • it is also associated with a specific root group (GID is also 0)

sudo

  • stands for "superuser do" or "substitute user do"

  • allows "normal" users to run commands as root

  • only for users specified in its configuration (/etc/sudoers)

    • sometimes it is enough to be part of a special group (like sudo)
26 / 49

Useful commands

  • id

    • find out what your current identity is (along with UID and GIDs)
$ id
uid=1001(mrshu) gid=1001(mrshu) groups=1001(mrshu),27(sudo)
27 / 49

Useful commands

  • id

    • find out what your current identity is (along with UID and GIDs)
$ id
uid=1001(mrshu) gid=1001(mrshu) groups=1001(mrshu),27(sudo)
  • su USER

    • change to some other USER (abbreviation of "set user")
    • if called without arguments, assumes that USER is root
    • if you know the root's password, this is how you can get root privileges
    • su - is effectively the same thing as logging in as a different user
28 / 49

Useful commands

  • id

    • find out what your current identity is (along with UID and GIDs)
$ id
uid=1001(mrshu) gid=1001(mrshu) groups=1001(mrshu),27(sudo)
  • su USER

    • change to some other USER (abbreviation of "set user")
    • if called without arguments, assumes that USER is root
    • if you know the root's password, this is how you can get root privileges
    • su - is effectively the same thing as logging in as a different user
  • passwd

    • change your UNIX password
    • root can also use it to change passwords of other users (passwd USER)
29 / 49

Regular Expressions

30 / 49

Regular Expressions

  • aka "regex" or "regexp"

  • a quick way of describing a particular pattern of characters in text

  • allows for extremely effective search and replace

31 / 49

Regular Expressions

  • aka "regex" or "regexp"

  • a quick way of describing a particular pattern of characters in text

  • allows for extremely effective search and replace

  • can be found everywhere on *NIX systems, but the especially in text editors

  • comes from the ed editor but you'll mostly encounter the grep program

32 / 49

Regular Expressions

  • aka "regex" or "regexp"

  • a quick way of describing a particular pattern of characters in text

  • allows for extremely effective search and replace

  • can be found everywhere on *NIX systems, but the especially in text editors

  • comes from the ed editor but you'll mostly encounter the grep program

  • in general grep outputs lines which match a given regex pattern

33 / 49

The name grep itself comes from the ed command:

“One afternoon I asked Ken Thompson if he could lift the regular expression recognizer out of the editor and make a one-pass program to do it. He said yes. The next morning I found a note in my mail announcing a program named grep. It worked like a charm. When asked what that funny name meant, Ken said it was obvious. It stood for the editor command that it simulated, g/re/p (global regular expression print).”

-- Chapter 9, On the Early History and Impact of Unix Tools to Build the Tools for a New Millenium

https://medium.com/@rualthanzauva/grep-was-a-private-command-of-mine-for-quite-a-while-before-i-made-it-public-ken-thompson-a40e24a5ef48

Using the grep command

Task: show lines in file.txt that match the regular expression regexp.

34 / 49

Using the grep command

Task: show lines in file.txt that match the regular expression regexp.

There are various ways of doing it:

  • file as an argument

    • grep "regexp" file.txt
  • input forwarded via standard I/O forwarding

    • grep "regexp" < file.txt
  • data passed from pipe

    • cat file.txt | grep "regexp"
35 / 49

RegExp Patterns

$ cat file.txt
1 a.smith1
2 joe2
3 molly13
4 nemo7
5 rob5
6 roy8
  • character(s)

    $ cat file.txt | grep o
    2 joe2
    3 molly13
    4 nemo7
    5 rob5
    6 roy8
  • strings of characters

    $ cat file.txt | grep mo
    3 molly13
    4 nemo7
36 / 49

RegExp Patterns: Dot

$ cat file.txt
a.smith1
joe2
molly13
nemo7
rob5
roy8
  • any character (denoted by a dot .)
    $ cat file.txt | grep "o.."
    joe2
    molly13
    rob5
    roy8
  • an explicit dot can be expressed as \.
    $ cat file.txt | grep "\."
    a.smith1
37 / 49

RegExp Patterns: Character Classes

$ cat file.txt
a.smith1
joe2
molly13
nemo7
rob5
roy8
  • a class of characters (denoted [])

    • "find all lines which contain 2, 3 or 5"
      $ cat file.txt | grep [235]
      joe2
      molly13
      rob5
38 / 49

RegExp Patterns: Character Classes

$ cat file.txt
a.smith1
joe2
molly13
nemo7
rob5
roy8
  • a class of characters (denoted [])

    • "find all lines which contain 2, 3 or 5"

      $ cat file.txt | grep [235]
      joe2
      molly13
      rob5
    • "find all lines where o is followed by either e or y"

      cat file.txt | grep "o[ey]"
      joe2
      roy8
39 / 49

RegExp Patterns: Ranges I

$ cat file.txt
1 a.smith1
2 joe2
3 molly13
4 nemo7
5 rob5
6 roy8
  • character classes can also be specified as ranges (i.e. [a-z] or [0-9])

    • "find all lines with three characters ([a-z]) followed by a number from 4 to 9"
      $ cat file.txt | grep [a-z][a-z][a-z][4-9]
      4 nemo7
      5 rob5
      6 roy8
    • the repetition can be easily denoted with a number in curly braces {}
      $ cat file.txt | grep [a-z]{3}[4-9]
      4 nemo7
      5 rob5
      6 roy8
40 / 49

RegExp Patterns: Ranges II

$ cat file.txt
1 a.smith1
2 joe2
3 molly13
4 nemo7
5 rob5
6 roy8
  • invert the class by putting ^ at the beginning of the definition ([^ ])

    • "find all lines with three characters ([a-z]) not followed by a number from 4 to 9"
      $ cat file.txt | grep [a-z][a-z][a-z][^4-9]
      1 a.smith1
      2 joe2
      3 molly13
      4 nemo7
41 / 49

RegExp Patterns: Repetitions

$ cat text.txt
So, looking at the lock or the silk?

Repetitions can be applied on any character or character class.

Three basic repetition operators:

  • \?: match once or not at all
  • \+: match one and more times
  • *: match zero and more times
42 / 49

RegExp Patterns: Repetitions

$ cat text.txt
So, looking at the lock or the silk?

Repetitions can be applied on any character or character class.

Three basic repetition operators:

  • \?: match once or not at all
  • \+: match one and more times
  • *: match zero and more times

Match all ls followed by zero or one o:

$ cat text.txt | grep "lo\?"
So, looking at the lock or the silk?

Match all ls followed by at least one or more os:

$ cat text.txt | grep "lo\+"
So, looking at the lock or the silk?

Match all ls followed by zero or more os:

$ cat text.txt | grep "lo*"
So, looking at the lock or the silk?
43 / 49

RegExp Patterns: Anchors

$ cat file.txt
1 a.smith1
2 joe2
3 molly13
4 nemo7
5 rob5
6 roy8

Anchors are two very important "special characters":

  • ^: match the beginning of the line
  • $: match the end of the line
44 / 49

RegExp Patterns: Anchors

$ cat file.txt
1 a.smith1
2 joe2
3 molly13
4 nemo7
5 rob5
6 roy8

Anchors are two very important "special characters":

  • ^: match the beginning of the line
  • $: match the end of the line

Find numbers at the beginning:

$ cat file.txt | grep "^[0-9]\+"
1 a.smith1
2 joe2
3 molly13
4 nemo7
5 rob5
6 roy8

Find numbers at the end:

$ cat file.txt | grep "[0-9]\+$"
1 a.smith1
2 joe2
3 molly13
4 nemo7
5 rob5
6 roy8
45 / 49

Using the grep command II

  • grep PATTERNS FILE

    • prints lines that match patterns

    • -i: make the search case-insensitive (ignore-case)

    • -v: print lines that do not match the pattern (invert)

    • -o: output only the matched part of the line (only)

    • -n: include the line number in the output (number)

$ cat file.txt
a.smith1
joe2
molly13
nemo7
rob5
roy8
$ cat file.txt | grep "[0-5]\$" -n
1:a.smith1
2:joe2
3:molly13
5:rob5
$ cat file.txt | grep "[0-5]\$" -n -v
4:nemo7
6:roy8
echo "Hello World!" | grep -i world
Hello World!
echo "Hello World!" | grep -i world -o
World
46 / 49

Useful Commands

cut and paste

47 / 49

cut

  • cut out a field from a text file, based on some separator

  • -d DELIM set a specific delimiter (TAB by default)

  • -f FIELDS

    • specify fields (starting from 1) to cut out
    • can be a number (like -f 2) or a list (like -f 2,5)
    • or a <from>-<to> format (like -f 2-4)
$ cut /etc/group -f 3 -d: | tail -n 5
972
84
971
970
969
$ cut /etc/group -f 1,3 -d: | tail -n 5
flatpak:972
screen:84
firebird:971
nm-fortisslvpn:970
docker:969
48 / 49

paste

  • join files horizontally (like horizontal cat)

  • -d sets the delimiter (TAB by default)

  • -s appends data in serial rather than in paralel

$ cat names.txt
Mark Smith
Bobby Brown
Sue Miller
Jenny Igotit
$ cat numbers.txt
555-1234
555-9876
555-6743
867-5309
$ paste names.txt numbers.txt
Mark Smith 555-1234
Bobby Brown 555-9876
Sue Miller 555-6743
Jenny Igotit 867-5309
$ paste -d, names.txt numbers.txt
Mark Smith,555-1234
Bobby Brown,555-9876
Sue Miller,555-6743
Jenny Igotit,867-5309
$ paste -s names.txt numbers.txt
Mark Smith Bobby Brown Sue Miller Jenny Igotit
555-1234 555-9876 555-6743 867-5309
49 / 49

Example taken straight from the great Wikipedia:

https://en.wikipedia.org/wiki/Paste_(Unix)

Why UNIX-like for Data Science?

If for nothing else, it's worth it for regular expressions.

Knowing [regular expressions] can mean the difference between solving a problem in 3 steps and solving it in 3,000 steps. When you’re a nerd, you forget that the problems you solve with a couple keystrokes can take other people days of tedious, error-prone work to slog through.

-- Cory Doctorow

https://www.theguardian.com/technology/2012/dec/04/ict-teach-kids-regular-expressions

2 / 49
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow