A model gets 99% accuracy on its training data but only 60% accuracy on new, unseen data. What common problem is this model showing?
The model is showing the common problem of overfitting. Overfitting occurs when a machine learning model learns the training data too well, including its noise and specific patterns, to the extent that it fails to perform accurately on new, unseen data. Training data is the dataset used to teach the model how to make predictions. The 99% accuracy on training data means the model has learned the patterns in its teaching material almost perfectly. New, unseen data refers to a separate dataset that the model has never encountered during its training phase; it is used to evaluate the model's ability to generalize. Generalization is the model's capacity to apply its learned knowledge to make accurate predictions on examples it hasn't seen before. The 60% accuracy on new, unseen data indicates a significant drop in performance, meaning the model has essentially memorized the specific examples from the training set rather than learning robust, underlying rules that apply broadly. Accuracy is a performance metric representing the percentage of correct predictions made by the model. The large discrepancy between the high training accuracy and low unseen data accuracy is a direct indicator that the model has overfit.