Bad Labels vs. Less Data: The ML Training Quiz
·67 words·1 min
Quiz: Which one is better for training ML model?
- Dataset with 100 records with 0 bad labels
- Dataset with 300 records with 100 bad labels
Answer is 1 in most cases :). The negative impact of bad label doesn’t usually offset availability of additional good labels in same proportion. Many times even 2X or 3X good labels per bad labels doesn’t yield advantage. More here: https://www.sciencedirect.com/science/article/pii/S1077314217300814