Weight tying is a technique used in neural networks to reduce the number of parameters by forcing certain weights to be shared across different parts of the model. In the context of Transformers, a common form of weight tying involves sharing the input word embedding matrix with the output softmax layer's weight matrix. This means that the same matrix is used to map words to vectors in the input and to map vectors back to words in the output. This weight sharing has a significant impact on model size and performance. By tying the input and output embeddings, the number of parameters in the model is reduced....
Log in to view the answer