Question

After processing text with an LSTM, what common layer is used to squish all the information from the sequence into a single, fixed-size set of numbers before the final output layer?

Accepted Answer

After processing text with an LSTM, the common layer used to squish all the information from the sequence into a single, fixed-size set of numbers before the final output layer is a global pooling layer, most typically Global Average Pooling or Global Max Pooling. An LSTM, which stands for Long Short-Term Memory, is a type of recurrent neural network designed to process sequences of data. When an LSTM processes an input sequence, such as a sentence or a document, it produces a sequence of hidden state vectors. Each hidden state vector contains information about the input seen up to that point in the sequence. For tasks that require a single output for the entire input sequence, like classifying the sentiment of a review or categorizing a document, this sequence of hidden states needs to be condensed into a single, fixed-size representation. A global pooling layer achieves this by aggregating the information across the entire time dimension (the sequence length). Global Average Pooling computes the average of all the hidden state vectors produced by the LSTM across all time steps, resulting in a single vector whose size is equal to the hidden state dimension of the LSTM. This single vector represents an average summary of the entire input sequence. Global Max Pooling, alternatively, takes the maximum value for each feature dimension across all the hidden state vectors in the sequence, producing a single vector that highlights the most prominent features found anywhere in the sequence. Both methods effectively transform the variable-length sequence output of the LSTM into a single, fixed-size vector, regardless of the original input sequence length, making it suitable input for a subsequent standard feedforward layer (often called a Dense or Fully Connected layer) that leads to the final output layer.

Home → All Courses → Engineering and Technology Courses → DeepLearning.AI TensorFlow Developer Certificate → Flashcard

After processing text with an LSTM, what common layer is used to squish all the information from the sequence into a single, fixed-size set of numbers before the final output layer?