This article needs additional citations for verification. (May 2017) (Learn how and when to remove this template message)
Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of that unlabeled data with meaningful tags that are informative. For example, labels might be indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, whether the dot in an x-ray is a tumor, etc.
Labels can be obtained by asking humans to make judgments about a given piece of unlabeled data (e.g., "Does this photo contain a horse or a cow?"), and are significantly more expensive to obtain than the raw unlabeled data.
After obtaining a labeled dataset, machine learning models can be applied to the data so that new unlabeled data can be presented to the model and a likely label can be guessed or predicted for that piece of unlabeled data.
Dataset could be of different types depending on use case, look at some open human labeled datasets :
- Johnson, Leif. "What is the difference between labeled and unlabeled data?", Stack Overflow, 4 October 2013. Retrieved on 13 May 2017. This article incorporates text by lmjohns3 available under the CC BY-SA 3.0 license.
- "https://dataturks.com/projects/trending". dataturks.com. Retrieved 2018-05-22. External link in