+ - 0:00:00
Notes for current slide
Notes for next slide

File and directory attributes

On who has access to what and a nice hack for dealing with duplicate content

Marek Šuppa
Ondrej Jariabka
Adrián Matejov

1 / 28

Why UNIX for Data Science?

  • You will almost inevitably need to set up your own server environment

  • Understanding how permissions work will be crucial to you being able to effectively use it

  • Setting up links here and there will give you a bit of a superpower unheard of on other systems

2 / 28

UNIX directory structure

  • On the system level, the directory structure is implemented as a tree

  • There is one more level of structure underneath.
3 / 28

inodes

  • The basic data structure is called an inode, uniquely identified by an ID
4 / 28

inodes

  • The basic data structure is called an inode, uniquely identified by an ID

  • Each inode contains information about a specific file

5 / 28

inodes

  • The basic data structure is called an inode, uniquely identified by an ID

  • Each inode contains information about a specific file

  • A directory is actually a special file: it contains a list of filenames and pointers to thier inodes

Image from http://faculty.salina.k-state.edu/tim/unix_sg/advanced/links.html

6 / 28

File in UNIX systems

  • name (and an optional extension, e.g. .txt)
  • attributes (metadata)
    • owner (UID) and group (GID)
    • access rights for owner, group and others
    • time of last attribute change (ctime), modification (mtime) and access to content (atime)
  • content
7 / 28

File in UNIX systems

  • name (and an optional extension, e.g. .txt)
  • attributes (metadata)
    • owner (UID) and group (GID)
    • access rights for owner, group and others
    • time of last attribute change (ctime), modification (mtime) and access to content (atime)
  • content
8 / 28

File metadata

  • Not shown by default in ls listing

  • Part of the "long listing format" (i.e. via ls -l)

$ ls -l /labs/lab06_dist.sh
-rwxr-xr-x 1 mrshu mrshu 3402021 Oct 26 15:36 lab06_dist.sh
9 / 28

File metadata

  • Not shown by default in ls listing

  • Part of the "long listing format" (i.e. via ls -l)

$ ls -l /labs/lab06_dist.sh
-rwxr-xr-x 1 mrshu mrshu 3402021 Oct 26 15:36 lab06_dist.sh
  • To do the same on a directory, add -d flag (ls lists directory contents by default)
$ ls -l /labs
total 20264
-rwxr-xr-x 1 root root 3389123 Sep 21 13:21 lab01_dist.sh
-rwxr-xr-x 1 root root 3428779 Sep 28 14:09 lab02_dist.sh
-rwxr-xr-x 1 root root 3644753 Oct 5 14:23 lab03_dist.sh
$ ls -l -d /labs
drwxr-xr-x 2 root root 4096 Oct 26 17:01 /labs/
10 / 28

File metadata: File Types

File types:

  • -
    • "normal" file
  • d
    • directory
  • c / b
    • character / block device
  • p
    • named pipe
  • l
    • symbolic link
  • s
    • socket

Image from https://test-www.ics.uci.edu/computing/linux/file-security.php

11 / 28

File metadata: Permissions

  • Permissions are set and evaluated on three levels:
    1. owner
    2. group
    3. others
  • Each level can have permission to
    • read (r)
    • write (w)
    • execute (x)
  • Can also be represented as octal (base 8) numbers

Conversion from rwxr-xr-- to octal:

Image from https://danielmiessler.com/study/unixlinux_permissions/

12 / 28
  • Permissions are evaluated in the same order as they are represented in the listing
    1. Am I the owner?
    2. If not, am I part of the group?
    3. If not, then I am part of others.

File metadata: Permissions II

Permissions can be changed with the chmod command:

chmod {u,g,o,a}{+,-,=}{r,w,x} file

  • user (owner), group, other and all
  • + adds, - removes and = sets
  • read, write and execute permissions
$ chmod g+rx file.txt
$ chmod a-r file.txt
$ chmod o=rw file.txt

It also works with numerical (octal) representation:

  • chmod 750 file
    • set the file permissions of file to rwxr-x---
# (111 111 111) in binary
$ chmod 777 file.txt
# (111 100 101) in binary
$ chmod 745 file.txt
13 / 28

Maybe also setUID, setGID and umask (in the future)?

File metadata: Directory Permissions

Permissions work a bit differently on directories:

  • read (r)

    • allow listing of contents
  • write (w)

    • create, modify and remove files in the directory
  • execute (x)

    • ability to go through the directory
    • in other words, to include it in the path
14 / 28

File metadata: Owner and Group

  • New files/folders are created with the owner's UID and their active/primary group's GID

  • Only root can change the owner (via the chown command)

# Change the owner of file.txt to jane
$ chown jane file.txt
# Works on directories as well
$ chown jane ~/Downloads
  • Group can be changed by the current group's user's

  • The user needs to be part of the new group

# Change the group of file.txt to root
$ chgrp root file.txt
  • If we want to change both the owner as well as the group, we can use chmod
$ chown jane:users /home/jane
15 / 28

File metadata: Timestamps

  • ls -l by default shows the mtime -- last modified time

  • The other timestamps can be viewed by the stat utility

$ stat /tmp
File: /tmp
Size: 3800 Blocks: 0 IO Block: 4096 directory
Device: 23h/35d Inode: 21596 Links: 129
Access: (1777/drwxrwxrwt) Uid: ( 0/ root) Gid: ( 0/ root)
Context: system_u:object_r:tmp_t:s0
Access: 2020-10-30 13:48:18.249475358 +0100
Modify: 2020-11-02 11:05:48.417238277 +0100
Change: 2020-11-02 11:05:48.417238277 +0100
Birth: -
16 / 28

File metadata: Timestamps II

The timestamps can be changed with the touch utility

  • touch
    • sets atime and mtime to the current time (by default)
    • creates an empty file if it does not exist
    • -a: set atime only
    • -m: set mtime only
    • -t STAMP: use STAMP instead of current date

STAMP is formatted as [[CC]YY]MMDDhhmm[.ss]

$ touch newfile.txt
$ touch -a newfile.txt
# STAMP format: [[CC]YY]MMDDhhmm[.ss]
# Everything in [] is optional
#
# MM == 11
# DD == 02
# hh == 10
# mm == 10
$ touch -m -t 11021010 newfile.txt
17 / 28

Hardlinks and Symlinks

18 / 28

Links as a concept

  • Often times we'd like to have a file/directory with the same content available on various parts of the filesystem.

19 / 28

Links as a concept

  • Often times we'd like to have a file/directory with the same content available on various parts of the filesystem.

  • It makes little sense to copy the same contents multiple times over (what if it changes?)
20 / 28

Links as a concept

  • Often times we'd like to have a file/directory with the same content available on various parts of the filesystem.

  • It makes little sense to copy the same contents multiple times over (what if it changes?)

  • The solution is called links

Image from https://www.computerhope.com/unix/uln.htm

21 / 28

Directory as a special file

  • Notice how the "directory contents" file has entries for both itself (.) and its parent directory (..)

  • Since it can contain arbitrary filenames which point to arbitrary inode IDs, this can be used for simple implementation of links

22 / 28

Hard link

  • Different filenames (or even paths) link to the same inode ID

  • The attributes as well as contents stay exactly the same

  • Only possible within a single data device (disk)

  • inode ID of a file can be viewed with ls -i

  • inode counts the number of links that point to it

  • When nothing does, the inode gets "physically" removed as well

$ echo "This is a file." > file1.txt
$ ls -l -i file1.txt
136158 -rw-rw-r-- 1 mrshu mrshu 16 Nov 2 11:20 file1.txt
# Create a hardlink called file2.txt from file1.txt
$ ln file1.txt file2.txt
# The hardlink count has increased
$ ls -l -i file2.txt
136158 -rw-rw-r-- 2 mrshu mrshu 16 Nov 2 11:20 file2.txt
$ cat file1.txt
This is a file.
$ cat file2.txt
This is a file.
23 / 28

Symbolic link (symlink)

  • A "standard shortcut"

    • A new "text" file gets created; its contents point to the path where the source file is located
    • The path is evaluated from the position of this new file in the filesystem
  • When the source file gets removed, the symbolic link continues to live

  • It is not counted among the hardlinks (e.g. in ls -l output)

  • Can be created across the whole filesystem (not just one data device)

Hardlink

Symlink

Images from https://www.computerhope.com/unix/uln.htm

24 / 28

Symbolic link (symlink)

$ echo "This is a file." > file1.txt
$ ls -l file1.txt
136158 -rw-rw-r-- 1 mrshu mrshu 16 Nov 2 11:44 file1.txt
# Create a symlink called file2.txt pointing to file1.txt
$ ln -s file1.txt file2.txt
# The hardlink count has not increased
$ ls -l -i file2.txt
136159 lrwxrwxrwx 1 mrshu mrshu 9 Nov 2 11:45 file2.txt -> file1.txt
$ ls -l -i file1.txt
136158 -rw-rw-r-- 1 mrshu mrshu 16 Nov 2 11:44 file1.txt
$ cat file1.txt
This is a file.
$ cat file2.txt
This is a file.
# Link to the parent directory
$ mkdir dir
$ cd dir
$ ln -s ../file1.txt file3.txt
$ ls -l -i file3.txt
393489 lrwxrwxrwx 1 mrshu mrshu 12 Nov 2 11:48 file3.txt -> ../file1.txt
25 / 28

Useful commands

For checking and changing file encoding

26 / 28

file

A useful command for checking the file type (e.g. is it an image, video or a text file).

  • file filename
    • -b simplified output
    • -i output as a mime type
$ file image.jpg
image.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72,
segment length 16, progressive, precision 8, 317x191, components 3
$ file -bi image.jpg
image/jpeg; charset=binary
$ file index.html
index.html: HTML document, UTF-8 Unicode text, with very long lines
$ file -i index.html
index.html: text/html; charset=utf-8
27 / 28

iconv

Convert text form one character encoding to another

  • iconv -f [encoding] -t [encoding] -o [outputfile] [inputfile]
    • -f [encoding]: convert encoding from this charset
    • -t [encoding]: convert encoding to this charset
    • -o [outputfile]: save output to this file (stdout by default)
$ file -bi file_iso.txt
text/plain; charset=iso-8859-1
$ iconv -f iso-8859-1 -t utf-8 -o file_utf8.txt file_iso.txt
$ file -bi file_utf8.txt
text/plain; charset=utf-8
28 / 28

Why UNIX for Data Science?

  • You will almost inevitably need to set up your own server environment

  • Understanding how permissions work will be crucial to you being able to effectively use it

  • Setting up links here and there will give you a bit of a superpower unheard of on other systems

2 / 28
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow