Advanced text editing

With sed and awk

Marek Šuppa
Ondrej Jariabka
Adrián Matejov

1 / 55

Why UNIX for Data Science?The tools we'll learn about today will sound strange and obsolete (they syntax almost certainly will)
2 / 55

Why UNIX for Data Science?

The tools we'll learn about today will sound strange and obsolete (they syntax almost certainly will)
But the reason why we learn about them is simple: they are present virtually everywhere

3 / 55

Why UNIX for Data Science?

The tools we'll learn about today will sound strange and obsolete (they syntax almost certainly will)
But the reason why we learn about them is simple: they are present virtually everywhere

A language that doesn't affect the way you think about programming, is not worth knowing.

-- Alan Perils, Epigrams on programming

4 / 55

That's because they are required for POSIX compliance: https://pubs.opengroup.org/onlinepubs/9699919799/

https://en.wikiquote.org/wiki/Alan_Perlis#Epigrams_on_Programming,_1982

`sed`

Aka "stream editor"

5 / 55

`sed`

Takes in a stream of text line by line and transforms it in one go.

6 / 55

`sed`

Takes in a stream of text line by line and transforms it in one go.

The syntax of sed commands is

[addr]X[options]

where X is a single-letter sed command (s in the example above).

7 / 55

`sed`

Takes in a stream of text line by line and transforms it in one go.

The syntax of sed commands is

[addr]X[options]

where X is a single-letter sed command (s in the example above).

sed [cmd] [filename] or cat [filename] | sed [cmd]

$ cat text.txt 
sed is a Unix utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on the scripting features of the interactive editor ed.
$ cat text.txt | sed 's/Unix/UNIX/'
sed is a UNIX utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on the scripting features of the interactive editor ed.
$ sed 's/Unix/UNIX/' text.txt
sed is a UNIX utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on the scripting features of the interactive editor ed.

8 / 55

`sed`: substitution

The most common usecase of sed, denoted by s

The syntaxt of the s command is s/[regex]/[replacement]/[flags]

9 / 55

`sed`: substitution

The most common usecase of sed, denoted by s

The syntaxt of the s command is s/[regex]/[replacement]/[flags]

sed 's/[regex]/[replacement]'

replace text matched by [regex] with [replacement]

$ cat text.txt 
sed is a Unix utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on the scripting features of the interactive editor ed.
$ cat text.txt | sed 's/the/THE/'
sed is a UNIX utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on THE scripting features of the interactive editor ed.

by default, only the first match on the line gets replaced
this can be changed with the g flag

10 / 55

`sed`: (global) substitution

sed 's/[regex]/[replacement]/g'

replace text matched by [regex] with [replacement] globally (every occurrence on the line)

$ cat text.txt 
sed is a Unix utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on the scripting features of the interactive editor ed.
$ cat text.txt | sed 's/the/THE/g'
sed is a UNIX utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on THE scripting features of THE interactive editor ed.

11 / 55

`sed`: (extended) regular expressions

expr	description
`.`	any character
`[ ]`	character class (or `[^ ]`)
`^`	beginning of the line
`$`	end of the line
`?`	match once or not at all
`+`	match 1+ times
`*`	match 0+ times
`{2,7}`	two to seven matches
`[r]∣[e]`	match regex `[r]` or `[e]`
`([r])`	reference for regex `[r]`

Extended regular expressions can be turned on with -E.

$ echo hello | sed -E 's/[a-m]+/XXXX/'
XXXXo
$ echo hello | sed -E 's/[lia]{2}/ZZ/'
heZZo

12 / 55

`sed`: regex references and alternatives

Once part of a regex gets enclosed in parenthesis (), it can be referenced further.

13 / 55

`sed`: regex references and alternatives

Once part of a regex gets enclosed in parenthesis (), it can be referenced further.

The m-th enclosed regex can be referenced via \m

$ cat tenses.txt 
I was there.
He will be here.
It is everywhere.
$ cat tenses.txt  | sed -E 's/([her]+)/[\1]/'
I was t[here].
H[e] will be here.
It is [e]verywhere.

14 / 55

`sed`: regex references and alternatives

Once part of a regex gets enclosed in parenthesis (), it can be referenced further.

The m-th enclosed regex can be referenced via \m

$ cat tenses.txt 
I was there.
He will be here.
It is everywhere.
$ cat tenses.txt  | sed -E 's/([her]+)/[\1]/'
I was t[here].
H[e] will be here.
It is [e]verywhere.

Using |, alternatives can be provided in the parenthesis (()).

$ cat tenses.txt  | sed -E 's/.*(is|was).*/# Found \1 on this line/'
# Found was on this line
He will be here.
# Found is on this line

15 / 55

`sed`: regex references and alternatives

$ cat repetition.txt 
abcabc
djaejk
asdhrj
bbccdd
xxsxxs

The references can also be used directly in the regular expression:

$ cat repetition.txt  | sed -E 's/^(.*)\1$/\1/'
abc
djaejk
asdhrj
bbccdd
xxs

16 / 55

`sed`: referencing the whole match

If we want to reference the whole match, we can use &.

17 / 55

`sed`: referencing the whole match

If we want to reference the whole match, we can use &.

Suppose we have following text

$ cat text.txt 
sed is a Unix utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on the scripting features of the interactive editor ed.

The sed command below will put all numbers into square brackets:

$ cat text.txt  | sed -E 's/[0-9]+/[&]/g'
sed is a Unix utility that transforms text.
sed was developed from [1973] to [1974] by Lee E. McMahon of Bell Labs.
sed was based on the scripting features of the interactive editor ed.

18 / 55

`sed`: `[addr]`

Recall that sed commands have the following structure: [addr]X[options]

19 / 55

`sed`: `[addr]`

Recall that sed commands have the following structure: [addr]X[options]

Let's discuss [addr] a bit.

20 / 55

`sed`: `[addr]`

Recall that sed commands have the following structure: [addr]X[options]

Let's discuss [addr] a bit.

sed "[cmd]"

apply [cmd] on all lines

sed "5 [cmd]"

apply [cmd] on line 5

sed "$ [cmd]"

apply [cmd] on the last line

$ cat tenses.txt | grep here
I was there.
He will be here.
It is everywhere.
$ cat tenses.txt | sed "2 s/here/home/"
I was there.
He will be home.
It is everywhere.
$ cat tenses.txt | sed "$ s/where/one/"
I was there.
He will be home.
It is everyone.

21 / 55

`sed`: `[addr]` via regex

Regular expressions can also be used as an address.

sed "/was/ s/here/orn/"

on line which matches was, replace here with orn

22 / 55

`sed`: `[addr]` via regex

Regular expressions can also be used as an address.

sed "/was/ s/here/orn/"

on line which matches was, replace here with orn

$ cat tenses.txt
I was there.
He will be here.
It is everywhere.
$ cat tenses.txt | sed "/was/ s/here/orn/"
I was torn.
He will be here.
It is everywhere.

23 / 55

`sed`: other commands

sed "[addr] d"

delete lines described by [addr]

sed "[addr] p"

print lines described by [addr]

Note that the space between [addr] and the command is optional.

24 / 55

`sed`: other commands

sed "[addr] d"

delete lines described by [addr]

sed "[addr] p"

print lines described by [addr]

Note that the space between [addr] and the command is optional.

$ cat text.txt 
sed is a Unix utility that transforms text.
sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs.
sed was based on the scripting features of the interactive editor ed.

The following deletes the second line:

$ cat text.txt | sed 2d
sed is a Unix utility that transforms text.
sed was based on the scripting features of the interactive editor ed.

25 / 55

`sed`: useful options

-i

edit file "in place"

$ cat tenses.txt
I was there.
He will be here.
It is everywhere.
$ sed -i "/was/ s/here/orn/" tenses.txt
$ cat tenses.txt
I was torn.
He will be here.
It is everywhere.

26 / 55

`sed`: useful options

-i

edit file "in place"

$ cat tenses.txt
I was there.
He will be here.
It is everywhere.
$ sed -i "/was/ s/here/orn/" tenses.txt
$ cat tenses.txt
I was torn.
He will be here.
It is everywhere.

-n

do not automatically print all matched lines
works nicely in combination with the p command

# Print specific (third) line of a file
$ sed -n 3p tenses.txt
It is everywhere.

27 / 55

`sed`: custom separator

sed is well known for its / separator (s/foo/bar/ has become somewhat commonplace).

28 / 55

`sed`: custom separator

sed is well known for its / separator (s/foo/bar/ has become somewhat commonplace).

But suppose we want to get rid of http:// in http://data.science.com.

29 / 55

`sed`: custom separator

sed is well known for its / separator (s/foo/bar/ has become somewhat commonplace).

But suppose we want to get rid of http:// in http://data.science.com.

Thankfully, basically any other character can be used as a separator, most commonly #:

$ echo "http://data.science.com" | sed 's#http://##'
data.science.com

30 / 55

`awk`

The simplest and most effective programming language you'll learn in 20 minutes

31 / 55

https://en.wikiquote.org/wiki/Alan_Perlis#Epigrams_on_Programming,_1982

`awk`

A language that doesn't affect the way you think about programming, is not worth knowing.

-- Alan Perils, Epigrams on programming

The name is the abbreviation of its authors: Aho, Weinberger and Kernighan.

32 / 55

`awk`

A language that doesn't affect the way you think about programming, is not worth knowing.

-- Alan Perils, Epigrams on programming

The name is the abbreviation of its authors: Aho, Weinberger and Kernighan.

It follows the pattern-action paradigm.

pattern1 { action1 }
pattern2 { action2; action3 }
...

pattern:

regular expression, numerical expression, string expression or a combination of these
by default each line matches

action:

executable code (the default action is to print the line)

33 / 55

`awk`: quick example

$ cat people.txt
Amelia       555-5553     amelia.zodiacusque@gmail.com    F
Anthony      555-3412     anthony.asserturo@hotmail.com   A
Becky        555-7685     becky.algebrarum@gmail.com      A
Bill         555-1675     bill.drowning@hotmail.com       A
Broderick    555-0542     broderick.aliquotiens@yahoo.com R
Camilla      555-2912     camilla.infusarum@skynet.be     R
Fabius       555-1234     fabius.undevicesimus@ucb.edu    F

34 / 55

`awk`: quick example

$ cat people.txt
Amelia       555-5553     amelia.zodiacusque@gmail.com    F
Anthony      555-3412     anthony.asserturo@hotmail.com   A
Becky        555-7685     becky.algebrarum@gmail.com      A
Bill         555-1675     bill.drowning@hotmail.com       A
Broderick    555-0542     broderick.aliquotiens@yahoo.com R
Camilla      555-2912     camilla.infusarum@skynet.be     R
Fabius       555-1234     fabius.undevicesimus@ucb.edu    F

Show phone numbers only:

$ cat people.txt | awk '{ print $2 }'
555-5553
555-3412
555-7685
555-1675
555-0542
555-2912
555-1234

Show emails only:

$ cat people.txt | awk '{ print $3 }'
amelia.zodiacusque@gmail.com
anthony.asserturo@hotmail.com
becky.algebrarum@gmail.com
bill.drowning@hotmail.com
broderick.aliquotiens@yahoo.com
camilla.infusarum@skynet.be
fabius.undevicesimus@ucb.edu

35 / 55

`awk`: patterns

empty
- action(s) executed for each input line (default if pattern not specified)
/[regex]/
- action(s) will be executed if the regular expression matches the line
BEGIN
- action(s) executed before the input gets processed
END
- action(s) executed after the input gets processed

36 / 55

`awk`: patterns

empty
- action(s) executed for each input line (default if pattern not specified)
/[regex]/
- action(s) will be executed if the regular expression matches the line
BEGIN
- action(s) executed before the input gets processed
END
- action(s) executed after the input gets processed

$ cat awktext.txt 
AWK was created at Bell Labs in the 1970s.
Its name is derived from the surnames of its authors.
The acronym is pronounced the same as the bird auk.
$ cat awktext.txt | awk '/is/'
Its name is derived from the surnames of its authors.
The acronym is pronounced the same as the bird auk.

37 / 55

Regexes allow us to use awk much like grep.

`awk`: patterns & pre-filled variables

Internally, awk works along two dimensions: lines (called rows) and "columns" (called fields)

RS
- internal variable that contains the "row separator"
- newline (\n) by default
FS
- internal variable that contains the "field separator"
- space (' ') by default
- can be set via the -F flag (e.g. awk -F:)

38 / 55

`awk`: patterns & pre-filled variables

Internally, awk works along two dimensions: lines (called rows) and "columns" (called fields)

RS
- internal variable that contains the "row separator"
- newline (\n) by default
FS
- internal variable that contains the "field separator"
- space (' ') by default
- can be set via the -F flag (e.g. awk -F:)

awk pre-fills quite a few other variables:

NR
- number of records (rows or lines) awk already processed
NF
- number of fields (rows) in the current record (line)

Each field (column) has its own "special" variable:

$1: the first field
$N: the N-th field
$0: the whole field (row or line)

$ echo 'foo:123:bar:789' | awk -F: '{ print $3, $2, $0 }'
bar 123 foo:123:bar:789

39 / 55

`awk`: patterns & pre-filled variables II

Print everything from the third line onwards

$ cat people.txt | awk 'NR>2'
Becky        555-7685     becky.algebrarum@gmail.com      A
Bill         555-1675     bill.drowning@hotmail.com       A
Broderick    555-0542     broderick.aliquotiens@yahoo.com R
Camilla      555-2912     camilla.infusarum@skynet.be     R
Fabius       555-1234     fabius.undevicesimus@ucb.edu    F

Print all names off friends (F in the last column)

$ cat people.txt | awk '$4 == "F" {print $1}'
Amelia
Fabius

Print all phone numbers of relatives (R in the last column)

$ cat people.txt | awk '$4 == "R" {print $2}'
555-0542
555-2912

40 / 55

`awk`: operators and variables

All standard operators work out of the box
- That is, >, <, >=, <=, == and != work as you'd expect them to
Custom variables are zero (empty string or empty array) initialized.

$ ls *.txt -l
-rw-rw-r--. 1 mrshu mrshu 149 Nov 14 23:18 awktext.txt
-rw-r--r--. 1 mrshu mrshu   0 Nov  2 11:29 newfile.txt
-rw-rw-r--. 1 mrshu mrshu 420 Nov 18 21:18 people.txt
-rw-rw-r--. 1 mrshu mrshu  35 Nov 14 13:56 repetition.txt
-rw-rw-r--. 1 mrshu mrshu  59 Nov 14 16:52 tenses_new.txt
-rw-rw-r--. 1 mrshu mrshu  48 Nov 14 15:57 tenses.txt
-rw-rw-r--. 1 mrshu mrshu 182 Nov 14 12:18 text.txt

41 / 55

`awk`: operators and variables

All standard operators work out of the box
- That is, >, <, >=, <=, == and != work as you'd expect them to
Custom variables are zero (empty string or empty array) initialized.

$ ls *.txt -l
-rw-rw-r--. 1 mrshu mrshu 149 Nov 14 23:18 awktext.txt
-rw-r--r--. 1 mrshu mrshu   0 Nov  2 11:29 newfile.txt
-rw-rw-r--. 1 mrshu mrshu 420 Nov 18 21:18 people.txt
-rw-rw-r--. 1 mrshu mrshu  35 Nov 14 13:56 repetition.txt
-rw-rw-r--. 1 mrshu mrshu  59 Nov 14 16:52 tenses_new.txt
-rw-rw-r--. 1 mrshu mrshu  48 Nov 14 15:57 tenses.txt
-rw-rw-r--. 1 mrshu mrshu 182 Nov 14 12:18 text.txt

Sum the size of all files over 100 bytes:

$ ls *.txt -l | awk '$5 >= 100 {sum += $5} END { print sum  }' 
751

42 / 55

`awk`: operators and variables

All standard operators work out of the box
- That is, >, <, >=, <=, == and != work as you'd expect them to
Custom variables are zero (empty string or empty array) initialized.

$ ls *.txt -l
-rw-rw-r--. 1 mrshu mrshu 149 Nov 14 23:18 awktext.txt
-rw-r--r--. 1 mrshu mrshu   0 Nov  2 11:29 newfile.txt
-rw-rw-r--. 1 mrshu mrshu 420 Nov 18 21:18 people.txt
-rw-rw-r--. 1 mrshu mrshu  35 Nov 14 13:56 repetition.txt
-rw-rw-r--. 1 mrshu mrshu  59 Nov 14 16:52 tenses_new.txt
-rw-rw-r--. 1 mrshu mrshu  48 Nov 14 15:57 tenses.txt
-rw-rw-r--. 1 mrshu mrshu 182 Nov 14 12:18 text.txt

Sum the size of all files over 100 bytes:

$ ls *.txt -l | awk '$5 >= 100 {sum += $5} END { print sum  }' 
751

What's the average file size (rounded to two decimal points)?

$ ls *.txt -l | awk '{sum += $5} END { printf "avg=%.2f\n", sum/NR  }' 
avg=127.57

43 / 55

`awk`: operators and variables II

Increment (++, +=) and decrement (--, -=) operators work out of the box
Associative arrays are automatically initialized

$ cat people.txt 
Amelia       555-5553     amelia.zodiacusque@gmail.com    F
Anthony      555-3412     anthony.asserturo@hotmail.com   A
Becky        555-7685     becky.algebrarum@gmail.com      A
Bill         555-1675     bill.drowning@hotmail.com       A
Broderick    555-0542     broderick.aliquotiens@yahoo.com R
Camilla      555-2912     camilla.infusarum@skynet.be     R
Fabius       555-1234     fabius.undevicesimus@ucb.edu    F

How many acquaintances (A), relatives (R) do we have in our dataset?

$ cat people.txt | awk '{ p[$4]++ } END { print "A:", p["A"], "| R:", p["R"] }'
A: 3 | R: 2

44 / 55

`awk`: control statements

All the standard control statements (if/else, while, for, break, continue) work as you would expect them to, with C/Python-like syntax

$ cat people.txt 
Amelia       555-5553     amelia.zodiacusque@gmail.com    F
Anthony      555-3412     anthony.asserturo@hotmail.com   A
Becky        555-7685     becky.algebrarum@gmail.com      A
Bill         555-1675     bill.drowning@hotmail.com       A
Broderick    555-0542     broderick.aliquotiens@yahoo.com R
Camilla      555-2912     camilla.infusarum@skynet.be     R
Fabius       555-1234     fabius.undevicesimus@ucb.edu    F

How many acquaintances (A), friends (F) and relatives (R) do we have in our dataset?

$ cat people.txt | awk '{ p[$4]++ } END { for(i in p) print i, ":", p[i] }' 
A : 3
R : 2
F : 2

45 / 55

`awk`: actions (built-in functions)

print
- the default action if not specified
- prints the string out to the standard output

# awk concatenates strings automatically
# this basically generates a CSV
$ cat people.txt | awk '{ print $1 "," $2 "," $4 }'
Amelia,555-5553,F
Anthony,555-3412,A
Becky,555-7685,A
Bill,555-1675,A
Broderick,555-0542,R
Camilla,555-2912,R
Fabius,555-1234,F

printf "[formatstr]", variable
- prints out the variable according to [formatstr]
- [formatstr] can contain
  - %s: string
  - %d: integer
  - %f: float

46 / 55

awk: actions (built-in functions) IIlength(s)return the length of string s

tolower(s)lowercase the string s

toupper(s)uppercase the string s

gsub(r, s, t)replace the regular expression r with the substitution s in the t string ($0 if not provided)

system(c)run the command c

47 / 55

`awk`: sample implementation

AWK's secret weapon is the pattern-action paradigm:

pattern1 { action1 }
pattern2 { action2; action3 }
...

48 / 55

`awk`: sample implementation

AWK's secret weapon is the pattern-action paradigm:

pattern1 { action1 }
pattern2 { action2; action3 }
...

It allows not just for short (at most 2 lines) and simple-yet-powerful programs but also for simple implementation.

49 / 55

`awk`: sample implementation

AWK's secret weapon is the pattern-action paradigm:

pattern1 { action1 }
pattern2 { action2; action3 }
...

It allows not just for short (at most 2 lines) and simple-yet-powerful programs but also for simple implementation.

for line in file.readlines():
    for pattern, actions in patterns_actions:
        if pattern.match(line):
            eval(actions)

50 / 55

Useful commands51 / 55

`wget`

"web get" -- a tool for downloading files from the internet
supports HTTP, HTTPS and FTP protocols
wget [URL] -O [filename]
- saves [URL] to [filename]
- setting filename to - makes the output go to standard output

$ wget uniba.sk
--2020-11-18 22:56:41--  http://uniba.sk/
Resolving uniba.sk (uniba.sk)... 158.195.6.138
Connecting to uniba.sk (uniba.sk)|158.195.6.138|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://uniba.sk/ [following]
--2020-11-18 22:56:41--  https://uniba.sk/
Connecting to uniba.sk (uniba.sk)|158.195.6.138|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’
index.html                             [ <=>  ]  37.07K  --.-KB/s    in 0.04s   
2020-11-18 22:56:41 (869 KB/s) - ‘index.html’ saved [37964]

-q makes the output "quiet" (doesn't print extended info)

52 / 55

`curl`

the name stands for "Client URL" but "cat URL" is a great mnemonic
curl outputs the file it reads from the network to stdout by default

$ curl uniba.sk
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://uniba.sk/">here</a>.</p>
<hr>
<address>Apache/2.2.22 (Debian) Server at uniba.sk Port 80</address>
</body></html>

curl -o [filename] [url] saves [url] to [filename] (also works with)
forwarding stdout to a file does the same thing

$ curl uniba.sk > index.html
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   299  100   299    0     0   4462      0 --:--:-- --:--:-- --:--:--  4462

-s makes the output "silent" (doesn't print extended info)

53 / 55

https://ec.haxx.se/curl/curl-name

`wget` vs `curl`

Much of their functionality is the same. There are a few important differences though

`wget`:

is a bit older and available on more devices (due to being part of GNU)
capable of doing recursive downloads (as in "save all you find on this URL to disk")
can be found on busybox (albeit as a stripped-down clone)
can be typed in using only the left hand on a qwerty keyboard!

`curl`:

works much better with pipes and Unix scripts in general
has upload capabilities
supports more protocols (even ones like TELNET, IMAP or SMTP)
comes pre-installed on macOS and Windows 10 (!)

54 / 55

https://daniel.haxx.se/docs/curl-vs-wget.html

`diff`

Show differences between two files, line by line.

$ cat tenses.txt 
I was there.
He will be here.
It is everywhere.

$ cat tenses_new.txt 
I was where you were not.
He will be here.
It is in there.

$ diff tenses.txt tenses_new.txt 
1c1
< I was there.
---
> I was where you were not.
3c3
< It is everywhere.
---
> It is in there.

If you'd like to see the diff side-by-side, you can use diff -y or (even) vimdiff.

55 / 55

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Advanced text editing

Why UNIX for Data Science?

Why UNIX for Data Science?

Why UNIX for Data Science?

sed

sed

sed

sed

sed: substitution

sed: substitution

sed: (global) substitution

sed: (extended) regular expressions

sed: regex references and alternatives

sed: regex references and alternatives

sed: regex references and alternatives

sed: regex references and alternatives

sed: referencing the whole match

sed: referencing the whole match

sed: [addr]

sed: [addr]

sed: [addr]

sed: [addr] via regex

sed: [addr] via regex

sed: other commands

sed: other commands

sed: useful options

sed: useful options

sed: custom separator

sed: custom separator

sed: custom separator

awk

awk

awk

awk: quick example

awk: quick example

awk: patterns

awk: patterns

awk: patterns & pre-filled variables

awk: patterns & pre-filled variables

awk: patterns & pre-filled variables II

awk: operators and variables

awk: operators and variables

awk: operators and variables

awk: operators and variables II

awk: control statements

awk: actions (built-in functions)

awk: actions (built-in functions) II

awk: sample implementation

awk: sample implementation

awk: sample implementation

Useful commands

wget

curl

wget vs curl

wget:

curl:

diff

Why UNIX for Data Science?

Help

`sed`

`sed`

`sed`

`sed`

`sed`: substitution

`sed`: substitution

`sed`: (global) substitution

`sed`: (extended) regular expressions

`sed`: regex references and alternatives

`sed`: regex references and alternatives

`sed`: regex references and alternatives

`sed`: regex references and alternatives

`sed`: referencing the whole match

`sed`: referencing the whole match

`sed`: `[addr]`

`sed`: `[addr]`

`sed`: `[addr]`

`sed`: `[addr]` via regex

`sed`: `[addr]` via regex

`sed`: other commands

`sed`: other commands

`sed`: useful options

`sed`: useful options

`sed`: custom separator

`sed`: custom separator

`sed`: custom separator

`awk`

`awk`

`awk`

`awk`: quick example

`awk`: quick example

`awk`: patterns

`awk`: patterns

`awk`: patterns & pre-filled variables

`awk`: patterns & pre-filled variables

`awk`: patterns & pre-filled variables II

`awk`: operators and variables

`awk`: operators and variables

`awk`: operators and variables

`awk`: operators and variables II

`awk`: control statements

`awk`: actions (built-in functions)

`awk`: actions (built-in functions) II

`awk`: sample implementation

`awk`: sample implementation

`awk`: sample implementation

`wget`

`curl`

`wget` vs `curl`

`wget`:

`curl`:

`diff`