Week 2/21/2022 – 2/27/2022 – transformer, BERT, attention
References
- BERT
- Text Extraction with BERT in Keras – It uses transformer from HuggingFace
- Text Classification with Transformer in Keras – it implements a transformer block as a Keras layer and then uses it for text classification
- Time Series classification with a Transformer model in Keras –
Week 2/7 – 2/13/2022 – adding attention
- Tips and tricks for training
- Adding attention to models
References
- Tips for training ML models
- Adding an attention layer
- How to do attention over an LSTM sequences with masking?
- https://www.youtube.com/watch?v=oaV_Fv5DwUM
- How to add Attention on top of a Recurrent Layer (Text Classification) #4962
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention – paper, not yet read
- Keras attention mechanism
- A Comprehensive Guide to Attention Mechanism in Deep Learning for Everyone – a really good read, also includes coding examples to define an attention layer in Keras
- A Beginner’s Guide to Using Attention Layer in Neural Networks – another article that shows how to use Keras attention layer
- Keras attention layer – Dot-product attention layer, a.k.a. Luong-style attention
- What is the difference between Luong attention and Bahdanau attention? – stack overflow
- The Luong Attention Mechanism
- Effective Approaches to Attention-based Neural Machine Translation – Luong attention original paper
- Craft your own Attention layer in 6 lines — Story of how the code evolved – not yet read, a detailed article though
- Practical PyTorch: Translation with a Sequence to Sequence Network and Attention
- Getting started with Attention for Classification
- Keras Self Attention
- Source code for attention layers implemented in Keras
- Transformers
- Attention Is All You Need – introduces the transformer architecture – original paper
- Time Series Classification with a Transformer Model in Keras – full example that uses a transformer
- The Transformer neural network architecture EXPLAINED. “Attention is all you need” (NLP) – youtube video
- Multi Head Attention later – Keras
- F-Net
- Text Generation using FNet – Keras complete example
- F-Net: mixing Tokens with Fourier Transform – original paper
- F-Net explained – Youtube video
- F-Net code on Google in Tensorflow
Week 1/31 – 2/6/2022
- Masking in Keras LSTM: There are three ways for masking in Keras. See snippnet below.
References
- LSTM
- Keras lstm with masking layer for variable-length inputs — How to use masking with LSTM – StackOverflow
- Masking and Padding with Keras – Tensorflow documentation
- Masking layer in Keras – Keras documentation
- How does masking work in RNN?
- Lambda layer in Keras
- What is masking in RNN – Quora