How does the Skip-gram model in Word2Vec differ fundamentally from the CBOW model?
The Skip-gram and CBOW (Continuous Bag of Words) models are both word embedding techniques in Word2Vec, but they differ in their approach to predicting words. The CBOW model predicts a target word based on the surrounding context words. It takes multiple context words as input and tries to predict the single target word. In contrast, the Skip-gram model does the opposite: it predicts the surrounding context words based on a single target word. It takes a target word as input and tries to predict the surrounding context words. For example, given the sentence 'The cat sat on the mat', the CBOW model might try to predict 'sat' given the context words 'The', 'cat', 'on', and 'the'. The Skip-gram model, on the other hand, might try to predict 'The', 'cat', 'on', and 'the' given the target word 'sat'. Because Skip-gram is predicting multiple context words from a single target word, it tends to perform better than CBOW, especially with small datasets, and is better at capturing the semantics of less frequent words. CBOW, because it averages the embeddings of multiple context words, is faster to train and performs better with frequent words. In summary, CBOW predicts the target word given the context, while Skip-gram predicts the context given the target.