Attention and Transformers

Test your understanding

Back to Course

Attention and Transformers for self-testing

Question 1 of 14
Score: 0/0
1 What is the primary limitation of vanilla RNNs in tasks like machine translation?
  • They cannot process input sequences from left to right
  • They require input and output sequences to be of different lengths
  • They cannot look at the entire input sequence to make predictions
  • They use a complex activation function that slows down training
Explanation: The primary limitation of vanilla RNNs in tasks like machine translation is that they cannot look at the entire input sequence to make predictions. In traditional RNN architectures, the model processes sequences sequentially and only has access to information from previous time steps when making predictions at the current step. This creates a bottleneck because: (1) **Sequential processing**: RNNs must process tokens one by one in order, without the ability to "look ahead" to future tokens in the sequence, (2) **Information bottleneck**: All information from the input sequence must be compressed into a fixed-size hidden state, which becomes problematic for long sequences, (3) **Context limitation**: When translating, understanding the full context of a sentence is crucial, but vanilla RNNs can only access partial context at each step, and (4) **Long-term dependencies**: Important information from early in the sequence may be forgotten by the time it's needed for translation decisions. This limitation led to the development of attention mechanisms and eventually Transformer architectures, which allow models to access and attend to any part of the input sequence when making predictions. The other options are incorrect: RNNs do process sequences from left to right (that's actually how they work), they don't inherently require different input/output lengths, and their activation functions are typically simple (like tanh or ReLU).