name: inverse layout: true class: center, middle, inverse --- # Sequence Processing Recurrent Neural Networks and more .footnote[Marek Ć uppa
ESS 2020, Bratislava] --- layout: false class: center # Types of Neural Networks ![:scale 100%](images/rnns.png) .footnote[.font-small[Image from https://karpathy.github.io/2015/05/21/rnn-effectiveness/]] --- layout: false class: center # Unfolded RNN ![:scale 100%](images/unfolded-rnn.png) --- layout: false class: center # Unfolded RNN ![:scale 100%](images/unfolded-rnn-2.png) --- layout: false class: center # Training unfolded RNN ![:scale 90%](images/rnn-bptt.png) This concept is called Back Propagation Through Time (**BPTT**) --- layout: false class: center # Training unfolded RNN ![:scale 90%](images/rnn-bptt.png) This concept is called Back Propagation Through Time (**BPTT**) --- layout: false class: center # Training unfolded RNN ![:scale 95%](images/rnn-bptt-2.png) Note how various parts of the unfolded RNN impact $h_2$ --- layout: false class: center # Problems with long-term dependencies ![:scale 95%](images/long-term-dep.png) --- layout: false class: center ## LSTM: what to forget and what to remember ![:scale 100%](images/lstm-intro.png) --- layout: false ## LSTM: Conveyor belt ![:scale 100%](images/LSTM3-C-line.png) -- - The cell state is sort of a "conveyor belt" -- - Allows information to stay unchanged or get slightly updated .footnote[.font-small[All the following nice images are from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ which I highly recommend]] --- layout: false calls: center, middle ## LSTM: Conveyor belt ![:scale 100%](images/LSTM3-focus-f.png) **Step 1**: Decide what to forget --- layout: false calls: center, middle ## LSTM: Conveyor belt ![:scale 100%](images/LSTM3-focus-i.png) **Step 2**: Decide - which values to update ($i_t$) - what should the new values be ($\hat{C}_t$) --- layout: false calls: center, middle ## LSTM: Conveyor belt ![:scale 100%](images/LSTM3-focus-C.png) **Step 2.5**: perform forgetting and update --- layout: false calls: center, middle ## LSTM: Conveyor belt ![:scale 100%](images/LSTM3-focus-o.png) **Step 3**: produce output ($h_t$) --- layout: false calls: center, middle ## LSTM: Conveyor belt ![:scale 100%](images/LSTM3-chain.png) A coveyor belt that can pick - what to remember - what to forget - what to output --- layout: false calls: center, middle ## GRU: Simplified conveyor belt ![:scale 100%](images/LSTM3-var-GRU.png) -- - Forget and input combined into a single "update gate" ($z_t$) -- - Cell state ($C_t$ in LSTM) merged with the hidden state ($h_t$) --- layout: false ## GRU vs LSTM - GRU is smaller and hence requires less compute - But it turns out it cannot count (especially longer sequences) -- .center[![:scale 80%](images/lstm-gru-counting.png)] [On the Practical Computational Power of Finite Precision RNNs for Language Recognition (2018)](https://arxiv.org/abs/1805.04908) --- class: middle ## Application: Machine Translation .center[![:scale 80%](images/machine-translation.png)] --- ## Application: Handwriting from Text **Input:** "He dismissed the idea" -- **Output:** .center[![:scale 50%](images/handwriting.png)] -- [Generating Sequences With Recurrent Neural Networks, Alex Graves, 2013](https://arxiv.org/abs/1308.0850) Demo at https://www.cs.toronto.edu/~graves/handwriting.html --- ## Application: Character-Level Text Generation .center[![:scale 100%](images/char-rnn.png)] .footnote[.font-small[["The Unreasonable Effectiveness of Recurrent Neural Networks"](https://karpathy.github.io/2015/05/21/rnn-effectiveness/), Andrej Karpathy, 2015]] --- ## Application: Image Question Answering .center[![:scale 100%](images/vqa.png)] -- .left-eq-column[ .center[![:scale 100%](images/vqa-arch.png)] .font-small[Exploring Models and Data for Image Question Answering, 2015] ] .right-eq-column[ Live Demo at https://vqa.cloudcv.org/ ] --- ## Application: Image Caption Generation .center[![:scale 100%](images/captioning.png)] --- ## Application: Video Caption Generation .center[![:scale 100%](images/video-caption-generation.png)] -- .left-eq-column[ .center[![:scale 100%](images/S2VTarchitecture.png)] .font-small[Sequence to Sequence - Video to Text, Venugopalan et al., 2015] ] .right-eq-column[ More at https://vsubhashini.github.io/s2vt.html ] --- ## Application: Adding Audion to Silent Film .center[![:scale 100%](images/silent-audio.png)] -- .left-eq-column[ .center[![:scale 60%](images/pipeline.jpg)] .font-small[Visually Indicated Sounds, Owens et al., 2015] ] .right-eq-column[ More at http://andrewowens.com/vis/ ] --- ## Application: Medical Diagnosis .center[![:scale 100%](images/medical-diagnosis.png)] --- ## Application: End-to-End Driving .red[*]
.left-eq-column[![:scale 100%](images/rnn-steering.gif)] .right-eq-column[![:scale 100%](images/LSTM3-chain.png)] --
**Input**: features extracted from CNN **Output**: predicted steering angle .footnoe[.red[*] On relatively straight roads] --- ## Application: Stock Market Prediction .center[![:scale 100%](images/stock-market.png)] --- ## Application: Sentiment Analysis .center[![:scale 80%](images/sentiment.analysis.png)] Try it yourself at https://demo.allennlp.org/sentiment-analysis/ --- ## Application: Named Entity Recognition (NER) .center[![:scale 100%](images/NER.png)] Try it yourself at https://demo.allennlp.org/named-entity-recognition/ --- ## Application: Trump2Cash .center[![:scale 60%](images/trump2cash.png)] --- ## Application: Trump2Cash - A combination of Sentiment Analysis and Named Entity Recognition -- How it works: 1. Monitor tweets of Donald Trump 2. Use NER to see if some of them mention a publicly traded company 3. Apply sentiment analysis 4. Profit? --- ## Application: Trump2Cash .center[![:scale 50%](images/simulated-twitter-fund.png)] See predictions live at https://twitter.com/Trump2Cash --- class: middle, inverse # Attention and Transformers --- ## History of Deep Learning Milestones ![:scale 70%](images/timeline.png) .footnote[ From [Deep Learning State of the Art (2020)](https://www.youtube.com/watch?v=0VH1Lim8gL8) by Lex Fridman at MIT] --- class: middle ## The perils of seq2seq modeling
Your browser does not support the video tag.
-- Aren't we throwing out a bit too much? .footnote[.font-small[Videos from https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/]] --- class: middle ## The fix Let's use the full encoder output!
Your browser does not support the video tag.
-- But how do we combine all the hidden states together? --- class: middle ## The mechanics of Attention
Your browser does not support the video tag.
--- class: middle ## Getting alignment with attention
Your browser does not support the video tag.
--- ## Attention visualized .center[![:scale 60%](images/attention_sentence.png)] See nice demo at https://distill.pub/2016/augmented-rnns/ --- ## Attention also helps explainability of stock prediction .center[![:scale 95%](images/attention-stocs-prediction.png)] .footnote[.font-small[ [News-Driven Stock Prediction With Attention-Based Noisy Recurrent State Transition, 2020](https://arxiv.org/abs/2004.01878) ]] --- class: middle # What if we only used attention? --- class: middle ## The Transformer architecture .center[![:scale 90%](images/The_transformer_encoder_decoder_stack.png)] .footnote[.font-small[Images from https://jalammar.github.io/illustrated-transformer/]] --- class: middle ## The Transformer's Encoder .center[![:scale 100%](images/encoder_with_tensors_2.png)] --- ## What's Self Attention? .center[ *The animal didn't cross the street because it was too tired.* ] What does "it" refer to? -- .center[![:scale 50%](images/transformer_self-attention_visualization.png)] --- ## Self Attention mechanics .center[![:scale 70%](images/self-attention-output.png)] --- ## The full Transformer seq2seq process .center[![:scale 100%](images/transformer_decoding_2.gif)] --- ## Transformer recap - Encoder-decoder architecture - No time-depencency due to self-attention - Easy to paralelize - Very helpful in many tasks --- ## Big Transformers Wins: GPT-2 .center[![:scale 100%](images/gpt2-sizes.png)] Try it yourself at https://transformer.huggingface.co/doc/gpt2-large --- ## Big Transformers Wins: BERT .center[![:scale 100%](images/bert.png)] --- ## BERT: for Forex Movement Prediction .center[![:scale 100%](images/bert-forex.png)] --- ## BERT: for Forex Movement Prediction .center[![:scale 100%](images/bert-forex-results.png)] [Group, Extract and Aggregate: Summarizing a Large Amount of Finance News for Forex Movement Prediction](https://www.aclweb.org/anthology/D19-5106.pdf)