Overfitting the Unknown Unknowns
Warning: this page has been flagged for excessive use of jargon. Prolonged exposure may lead to spontaneous combustion of cognitive function.
In the depths of the algorithmic underworld, there lies a phenomenon known as Overfitting the Unknown Unknowns. It's a condition where a model is so convinced of its own superiority, it begins to fit the noise in the data, and forgets the signal.
Symptoms of Overfitting the Unknown Unknowns include:
- Unusually high R-Squared values
- Excessive use of regularization techniques
- Models that are more complex than their own mothers
Causes of Overfitting the Unknown Unknowns:
- Insufficient data (or a data scientist's worst nightmare)
- Overly optimistic model assumptions
- Too much caffeine