And what does it mean for robotics/robotists?
Marek Šuppa
Fablab 2023
$ whomai
$ whomai
"Principal Data Scientist/Engineer" at Slido (now part of Cisco)
Lecturer at Matfyz (ML, NLP)
$ whomai
"Principal Data Scientist/Engineer" at Slido (now part of Cisco)
Lecturer at Matfyz (ML, NLP)
RoboCupJunior Exec
* Our today's (loose) agenda
From Deep Learning State of the Art (2020) by Lex Fridman at MIT
Aren't we throwing out a bit too much?
Videos from https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
Let's use the full encoder output!
Let's use the full encoder output!
But how do we combine all the hidden states together?
The animal didn't cross the street because it was too tired.
What does "it" refer to?
The animal didn't cross the street because it was too tired.
What does "it" refer to?
To actually understand what's going on, there is no better approach.
To actually understand what's going on, there is no better approach.
To actually understand what's going on, there is no better approach.
To actually understand what's going on, there is no better approach.
Try it yourself at https://transformer.huggingface.co/doc/gpt2-large
transformers
transformers
IIfrom transformers import pipelineen_sk_translator = pipeline("translation_en_to_sk")print(en_sk_translator("When will this presentation end ?"))
Works with many other languages as well -- the full list is here
Attention was a fix for sequence models that did not really work too wel
It turned out it was all that was needed for (bounded) sequence processing
Attention was a fix for sequence models that did not really work too wel
It turned out it was all that was needed for (bounded) sequence processing
Transformer is an encoder-decoder architecture that is "all the rage" now
Attention was a fix for sequence models that did not really work too wel
It turned out it was all that was needed for (bounded) sequence processing
Transformer is an encoder-decoder architecture that is "all the rage" now
It has no time-depencency due to self-attention and is therefore easy to paralelize
Attention was a fix for sequence models that did not really work too wel
It turned out it was all that was needed for (bounded) sequence processing
Transformer is an encoder-decoder architecture that is "all the rage" now
It has no time-depencency due to self-attention and is therefore easy to paralelize
Well known models like BERT and GPT-* took the world of NLP by storm
Attention was a fix for sequence models that did not really work too wel
It turned out it was all that was needed for (bounded) sequence processing
Transformer is an encoder-decoder architecture that is "all the rage" now
It has no time-depencency due to self-attention and is therefore easy to paralelize
Well known models like BERT and GPT-* took the world of NLP by storm
transformers
libraryBasically the same architecture as GPT2
The sheer size is astounding (power-law of model/dataset/computation size)
Basically the same architecture as GPT2
The sheer size is astounding (power-law of model/dataset/computation size)
It would take 355 years of Tesla V100 GPU time to train
The training would cost about $4.6M at retail prices to train
Basically the same architecture as GPT2
The sheer size is astounding (power-law of model/dataset/computation size)
It would take 355 years of Tesla V100 GPU time to train
The training would cost about $4.6M at retail prices to train
It was so expensive to train they didn't even fix the bugs they themselves found:
Basically the same architecture as GPT2
The sheer size is astounding (power-law of model/dataset/computation size)
It would take 355 years of Tesla V100 GPU time to train
The training would cost about $4.6M at retail prices to train
It was so expensive to train they didn't even fix the bugs they themselves found:
Very short A: It depends
Short A: It depends on who you ask
Very short A: It depends
Short A: It depends on who you ask
A: It depends on who you ask. OpenAI's Docs probably wouldn't agree.
Very short A: It depends
Short A: It depends on who you ask
A: It depends on who you ask. OpenAI's Docs probably wouldn't agree.
Actual A: We don't really know. It's behind an API, we don't really have ways of proving this one way or the other.
InstructGPT: Training language models to follow instructions with human feedback (2022)
InstructGPT: Training language models to follow instructions with human feedback (2022)
The outputs generated by a small (1.3B) InstructGPT model were prefered to those of GPT3
The rewards model was also "rather small" (6B)
The outputs generated by a small (1.3B) InstructGPT model were prefered to those of GPT3
The rewards model was also "rather small" (6B)
We don't know how large the model behind ChatGPT is, but chances are it's this "small"
Let it do things you will do a manual check on anyway
Have it draft things you'll rewrite anyway
Assume the first response will be far from final
Inspired by https://vickiboykis.com/2023/02/26/what-should-you-use-chatgpt-for/
$ whomai
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |