Sequence Processing

Recurrent Neural Networks and more

Marek Šuppa
ESS 2020, Bratislava

1 / 66

Types of Neural Networks

Image from https://karpathy.github.io/2015/05/21/rnn-effectiveness/

2 / 66

Unfolded RNN

3 / 66

Unfolded RNN

4 / 66

Training unfolded RNN

This concept is called Back Propagation Through Time (BPTT)

5 / 66

Training unfolded RNN

This concept is called Back Propagation Through Time (BPTT)

6 / 66

Training unfolded RNN

Note how various parts of the unfolded RNN impact

7 / 66

Problems with long-term dependencies

8 / 66

LSTM: what to forget and what to remember

9 / 66

LSTM: Conveyor belt

10 / 66

LSTM: Conveyor belt

The cell state is sort of a "conveyor belt"

11 / 66

LSTM: Conveyor belt

The cell state is sort of a "conveyor belt"
Allows information to stay unchanged or get slightly updated

All the following nice images are from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ which I highly recommend

12 / 66

LSTM: Conveyor belt

Step 1: Decide what to forget

13 / 66

LSTM: Conveyor belt

Step 2: Decide

which values to update ()
what should the new values be ()

14 / 66

LSTM: Conveyor belt

Step 2.5: perform forgetting and update

15 / 66

LSTM: Conveyor belt

Step 3: produce output ()

16 / 66

LSTM: Conveyor belt

A coveyor belt that can pick

what to remember
what to forget
what to output

17 / 66

GRU: Simplified conveyor belt

18 / 66

GRU: Simplified conveyor belt

Forget and input combined into a single "update gate" ()

19 / 66

GRU: Simplified conveyor belt

Forget and input combined into a single "update gate" ()
Cell state ( in LSTM) merged with the hidden state ()

20 / 66

GRU vs LSTMGRU is smaller and hence requires less compute
But it turns out it cannot count (especially longer sequences)
21 / 66

GRU vs LSTM

GRU is smaller and hence requires less compute
But it turns out it cannot count (especially longer sequences)

On the Practical Computational Power of Finite Precision RNNs for Language Recognition (2018)

22 / 66

Application: Machine Translation

23 / 66

Application: Handwriting from Text

Input: "He dismissed the idea"

24 / 66

Application: Handwriting from Text

Input: "He dismissed the idea"

Output:

25 / 66

Application: Handwriting from Text

Input: "He dismissed the idea"

Output:

Generating Sequences With Recurrent Neural Networks, Alex Graves, 2013

Demo at https://www.cs.toronto.edu/~graves/handwriting.html

26 / 66

Application: Character-Level Text Generation

"The Unreasonable Effectiveness of Recurrent Neural Networks", Andrej Karpathy, 2015

27 / 66

Application: Image Question Answering

28 / 66

Application: Image Question Answering

Exploring Models and Data for Image Question Answering, 2015

Live Demo at https://vqa.cloudcv.org/

29 / 66

Application: Image Caption Generation

30 / 66

Application: Video Caption Generation

31 / 66

Application: Video Caption Generation

Sequence to Sequence - Video to Text, Venugopalan et al., 2015

More at https://vsubhashini.github.io/s2vt.html

32 / 66

Application: Adding Audion to Silent Film

33 / 66

Application: Adding Audion to Silent Film

Visually Indicated Sounds, Owens et al., 2015

More at http://andrewowens.com/vis/

34 / 66

Application: Medical Diagnosis

35 / 66

Application: End-to-End Driving *

36 / 66

Application: End-to-End Driving *

Input: features extracted from CNN Output: predicted steering angle

* On relatively straight roads

37 / 66

Application: Stock Market Prediction

38 / 66

Application: Sentiment Analysis

Try it yourself at https://demo.allennlp.org/sentiment-analysis/

39 / 66

Application: Named Entity Recognition (NER)

Try it yourself at https://demo.allennlp.org/named-entity-recognition/

40 / 66

Application: Trump2Cash

41 / 66

Application: Trump2CashA combination of Sentiment Analysis and Named Entity Recognition
42 / 66

Application: Trump2Cash

A combination of Sentiment Analysis and Named Entity Recognition

How it works:

Monitor tweets of Donald Trump
Use NER to see if some of them mention a publicly traded company
Apply sentiment analysis
Profit?

43 / 66

Application: Trump2Cash

See predictions live at https://twitter.com/Trump2Cash

44 / 66

Attention and Transformers45 / 66

History of Deep Learning Milestones

From Deep Learning State of the Art (2020) by Lex Fridman at MIT

46 / 66

The perils of seq2seq modeling
  
  Your browser does not support the video tag.
47 / 66

The perils of seq2seq modeling

Aren't we throwing out a bit too much?

Videos from https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

48 / 66

The fix

Let's use the full encoder output!

49 / 66

The fix

Let's use the full encoder output!

But how do we combine all the hidden states together?

50 / 66

The mechanics of Attention
  
  Your browser does not support the video tag.
51 / 66

Getting alignment with attention
  
  Your browser does not support the video tag.
52 / 66

Attention visualized

See nice demo at https://distill.pub/2016/augmented-rnns/

53 / 66

Attention also helps explainability of stock prediction

News-Driven Stock Prediction With Attention-Based Noisy Recurrent State Transition, 2020

54 / 66

What if we only used attention?55 / 66

The Transformer architecture

Images from https://jalammar.github.io/illustrated-transformer/

56 / 66

The Transformer's Encoder

57 / 66

What's Self Attention?

The animal didn't cross the street because it was too tired.

What does "it" refer to?

58 / 66

What's Self Attention?

The animal didn't cross the street because it was too tired.

What does "it" refer to?

59 / 66

Self Attention mechanics

60 / 66

The full Transformer seq2seq process

61 / 66

Transformer recapEncoder-decoder architecture
No time-depencency due to self-attention
Easy to paralelize
Very helpful in many tasks
62 / 66

Big Transformers Wins: GPT-2

Try it yourself at https://transformer.huggingface.co/doc/gpt2-large

63 / 66

Big Transformers Wins: BERT

64 / 66

BERT: for Forex Movement Prediction

65 / 66

BERT: for Forex Movement Prediction

Group, Extract and Aggregate: Summarizing a Large Amount of Finance News for Forex Movement Prediction

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Sequence Processing

Types of Neural Networks

Unfolded RNN

Unfolded RNN

Training unfolded RNN

Training unfolded RNN

Training unfolded RNN

Problems with long-term dependencies

LSTM: what to forget and what to remember

LSTM: Conveyor belt

LSTM: Conveyor belt

LSTM: Conveyor belt

LSTM: Conveyor belt

LSTM: Conveyor belt

LSTM: Conveyor belt

LSTM: Conveyor belt

LSTM: Conveyor belt

GRU: Simplified conveyor belt

GRU: Simplified conveyor belt

GRU: Simplified conveyor belt

GRU vs LSTM

GRU vs LSTM

Application: Machine Translation

Application: Handwriting from Text

Application: Handwriting from Text

Application: Handwriting from Text

Application: Character-Level Text Generation

Application: Image Question Answering

Application: Image Question Answering

Application: Image Caption Generation

Application: Video Caption Generation

Application: Video Caption Generation

Application: Adding Audion to Silent Film

Application: Adding Audion to Silent Film

Application: Medical Diagnosis

Application: End-to-End Driving *

Application: End-to-End Driving *

Application: Stock Market Prediction

Application: Sentiment Analysis

Application: Named Entity Recognition (NER)

Application: Trump2Cash

Application: Trump2Cash

Application: Trump2Cash

Application: Trump2Cash

Attention and Transformers

History of Deep Learning Milestones

The perils of seq2seq modeling

The perils of seq2seq modeling

The fix

The fix

The mechanics of Attention

Getting alignment with attention

Attention visualized

Attention also helps explainability of stock prediction

What if we only used attention?

The Transformer architecture

The Transformer's Encoder

What's Self Attention?

What's Self Attention?

Self Attention mechanics

The full Transformer seq2seq process

Transformer recap

Big Transformers Wins: GPT-2

Big Transformers Wins: BERT

BERT: for Forex Movement Prediction

BERT: for Forex Movement Prediction

Types of Neural Networks

Help