+ - 0:00:00
Notes for current slide
Notes for next slide

Sequence Processing

Recurrent Neural Networks and more

Marek Šuppa
ESS 2020, Bratislava

1 / 66

Types of Neural Networks

Image from https://karpathy.github.io/2015/05/21/rnn-effectiveness/

2 / 66

Unfolded RNN

3 / 66

Unfolded RNN

4 / 66

Training unfolded RNN

This concept is called Back Propagation Through Time (BPTT)

5 / 66

Training unfolded RNN

This concept is called Back Propagation Through Time (BPTT)

6 / 66

Training unfolded RNN

Note how various parts of the unfolded RNN impact h2

7 / 66

Problems with long-term dependencies

8 / 66

LSTM: what to forget and what to remember

9 / 66

LSTM: Conveyor belt

10 / 66

LSTM: Conveyor belt

  • The cell state is sort of a "conveyor belt"
11 / 66

LSTM: Conveyor belt

  • The cell state is sort of a "conveyor belt"

  • Allows information to stay unchanged or get slightly updated

All the following nice images are from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ which I highly recommend

12 / 66

LSTM: Conveyor belt

Step 1: Decide what to forget

13 / 66

LSTM: Conveyor belt

Step 2: Decide

  • which values to update (it)
  • what should the new values be (ˆCt)
14 / 66

LSTM: Conveyor belt

Step 2.5: perform forgetting and update

15 / 66

LSTM: Conveyor belt

Step 3: produce output (ht)

16 / 66

LSTM: Conveyor belt

A coveyor belt that can pick

  • what to remember
  • what to forget
  • what to output
17 / 66

GRU: Simplified conveyor belt

18 / 66

GRU: Simplified conveyor belt

  • Forget and input combined into a single "update gate" (zt)
19 / 66

GRU: Simplified conveyor belt

  • Forget and input combined into a single "update gate" (zt)

  • Cell state (Ct in LSTM) merged with the hidden state (ht)

20 / 66

GRU vs LSTM

  • GRU is smaller and hence requires less compute
  • But it turns out it cannot count (especially longer sequences)
21 / 66

GRU vs LSTM

  • GRU is smaller and hence requires less compute
  • But it turns out it cannot count (especially longer sequences)

On the Practical Computational Power of Finite Precision RNNs for Language Recognition (2018)

22 / 66

Application: Machine Translation

23 / 66

Application: Handwriting from Text

Input: "He dismissed the idea"

24 / 66

Application: Handwriting from Text

Input: "He dismissed the idea"

Output:

25 / 66

Application: Handwriting from Text

Input: "He dismissed the idea"

Output:

Generating Sequences With Recurrent Neural Networks, Alex Graves, 2013

Demo at https://www.cs.toronto.edu/~graves/handwriting.html

26 / 66

Application: Character-Level Text Generation

"The Unreasonable Effectiveness of Recurrent Neural Networks", Andrej Karpathy, 2015

27 / 66

Application: Image Question Answering

28 / 66

Application: Image Question Answering

Exploring Models and Data for Image Question Answering, 2015

29 / 66

Application: Image Caption Generation

30 / 66

Application: Video Caption Generation

31 / 66

Application: Video Caption Generation

Sequence to Sequence - Video to Text, Venugopalan et al., 2015

32 / 66

Application: Adding Audion to Silent Film

33 / 66

Application: Adding Audion to Silent Film

Visually Indicated Sounds, Owens et al., 2015

34 / 66

Application: Medical Diagnosis

35 / 66

Application: End-to-End Driving *



36 / 66

Application: End-to-End Driving *



Input: features extracted from CNN Output: predicted steering angle

* On relatively straight roads

37 / 66

Application: Stock Market Prediction

38 / 66

Application: Sentiment Analysis

Try it yourself at https://demo.allennlp.org/sentiment-analysis/

39 / 66

Application: Named Entity Recognition (NER)

Try it yourself at https://demo.allennlp.org/named-entity-recognition/

40 / 66

Application: Trump2Cash

41 / 66

Application: Trump2Cash

  • A combination of Sentiment Analysis and Named Entity Recognition
42 / 66

Application: Trump2Cash

  • A combination of Sentiment Analysis and Named Entity Recognition

How it works:

  1. Monitor tweets of Donald Trump
  2. Use NER to see if some of them mention a publicly traded company
  3. Apply sentiment analysis
  4. Profit?
43 / 66

Application: Trump2Cash

See predictions live at https://twitter.com/Trump2Cash

44 / 66

Attention and Transformers

45 / 66

History of Deep Learning Milestones

From Deep Learning State of the Art (2020) by Lex Fridman at MIT

46 / 66

The perils of seq2seq modeling

47 / 66

The perils of seq2seq modeling

Aren't we throwing out a bit too much?

Videos from https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

48 / 66

The fix

Let's use the full encoder output!

49 / 66

The fix

Let's use the full encoder output!

But how do we combine all the hidden states together?

50 / 66

The mechanics of Attention

51 / 66

Getting alignment with attention

52 / 66

Attention visualized

See nice demo at https://distill.pub/2016/augmented-rnns/

53 / 66

Attention also helps explainability of stock prediction

54 / 66

What if we only used attention?

55 / 66

The Transformer architecture

Images from https://jalammar.github.io/illustrated-transformer/

56 / 66

The Transformer's Encoder

57 / 66

What's Self Attention?

The animal didn't cross the street because it was too tired.

What does "it" refer to?

58 / 66

What's Self Attention?

The animal didn't cross the street because it was too tired.

What does "it" refer to?

59 / 66

Self Attention mechanics

60 / 66

The full Transformer seq2seq process

61 / 66

Transformer recap

  • Encoder-decoder architecture
  • No time-depencency due to self-attention
  • Easy to paralelize
  • Very helpful in many tasks
62 / 66

Big Transformers Wins: GPT-2

Try it yourself at https://transformer.huggingface.co/doc/gpt2-large

63 / 66

Big Transformers Wins: BERT

64 / 66

BERT: for Forex Movement Prediction

65 / 66

Types of Neural Networks

Image from https://karpathy.github.io/2015/05/21/rnn-effectiveness/

2 / 66
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow