AI Glossary/Active Learning
AI Fundamentals

Active Learning

Active learning is a machine learning approach where the algorithm selectively queries a human expert to label new data points with the goal of improving the model's performance with minimal labeled data.

In-depth explanation

Active learning is a specialized form of machine learning that addresses the challenge of obtaining labeled data, which is often expensive and time-consuming. The core idea is to allow the learning algorithm to choose the data it learns from. By strategically selecting the most informative data points, active learning can improve model performance more efficiently than traditional passive learning approaches, where data is randomly labeled and fed into the model. Historically, active learning emerged from the field of statistics and has been developed extensively over the past few decades to address practical challenges in machine learning. There are three primary scenarios for active learning: pool-based sampling, stream-based selective sampling, and membership query synthesis. In pool-based sampling, the model selects the most informative examples from a large pool of unlabeled data. Stream-based selective sampling involves the model deciding whether to label an incoming stream of data or not. Membership query synthesis allows the model to generate new instances to query for labeling. The technical foundation of active learning lies in its ability to use uncertainty sampling, query-by-committee, and expected model change to evaluate which data points will most effectively improve the model. Uncertainty sampling involves selecting the data points for which the model is least certain in its predictions. Query-by-committee maintains a set of models (committee) and selects data points where there is maximum disagreement among the models. Expected model change estimates the potential impact of labeling a data point on the model's parameters. Active learning is critically important in domains where labeled data is scarce but unlabeled data is abundant. In fields like medical diagnosis, where obtaining labels requires expertise and is costly, active learning enables more efficient use of limited resources. It is also valuable in natural language processing and computer vision for tasks such as object recognition or sentiment analysis, where labeled datasets can be vast and complex. A common misconception is that active learning always yields better models, regardless of the data. However, the effectiveness of active learning depends heavily on the domain and the data distribution. In some cases, the cost of obtaining labels may outweigh the benefits of selective sampling.

Examples

In medical imaging, active learning is used to identify the most ambiguous images of tumors that radiologists need to label, minimizing the number of images required for model training.
A text classification task in NLP uses active learning to select sentences with uncertain sentiment predictions for manual labeling, improving the sentiment analysis model with fewer labeled examples.
In autonomous driving, active learning helps select the most challenging driving scenarios from video data for annotation, enhancing the performance of object detection algorithms.

Master Active Learning.

Learn how to apply this concept with hands-on projects in our comprehensive AI programs.