Govur University Logo
--> --> --> -->
...

When training a Skip-gram model with negative sampling, what is the mathematical purpose of the noise distribution used to select non-target words?



In a Skip-gram model, the goal is to maximize the probability of predicting context words given a target word. Calculating this probability across the entire vocabulary is computationally expensive because it requires summing the scores of every word in the dictionary for every training step. Negative sampling simplifies this by converting the problem into a binary classification task where the model learns to distinguish between....

Log in to view the answer



Redundant Elements