How Much Data Does a Classifier Need?
The research question
Does a model keep getting better as you add more training data, or does it stop improving?
Abstract
I trained the same classifier with growing amounts of data and tracked test accuracy. Accuracy rose quickly at first, then levelled off — more data stopped helping much.
Background
People say AI needs lots of data, but I wanted to measure exactly how accuracy changes as the dataset grows.
What I did
I trained an image classifier with 5, 10, 20, 40, and 80 examples per class, testing every model on the same held-out set.
What I found
Accuracy jumped a lot from 5 to 20 examples, then improved only slightly after that — a clear diminishing-returns curve.
What's next
I would test whether harder tasks need more data before they level off.
Takeaway
More data helps — but mostly early on. Knowing when to stop saves real effort.