1. Insights
  2. AI Data
  3. Article
  • Share on X
  • Share on Facebook
  • Share via email

Automation: The antidote to overcoming data labeling inefficiencies

Posted March 4, 2022
Hundreds of brightly colored lines on a black background. On the left of the image the colored lines are mixed together. In the middle of the image you can see the lines twisting and beginning to sort themselves into specific colors. On the right of the image, all lines are grouped together by color.

High-quality labeled data is the centerpiece of functional artificial intelligence (AI). As more and more technologies turn towards machine learning (ML) and deep learning to improve AI systems, the global datasphere continues to grow exponentially. In fact, Statista estimates that between 2018 and 2025 the size of real-time data will grow from five zettabytes to 51 zettabytes.

Although this increased data accessibility is an advantage for larger companies developing cutting-edge AI systems, they may still encounter problems like data imbalance or lack of diversity. And on the other end of the spectrum, many small-scale players and innovators experience a scarcity of accurate, labeled data. Additionally, research indicates that models perform better when continuously ingested with more examples that contain new information via multiple iterations daily. Therefore, it is safe to contend that along with volume, quality and diversity of data, the continuity of the data supply chain also affects the substantial growth of AI.

Faster labeling and iterations to training datasets are critical for developing highly performant models. However, labeling large volumes of data quickly, efficiently and accurately is a massive challenge for ML teams. Almost 80% of the time spent on ML initiatives accounts for data preparation activities such as data identification/collection, data cleaning, data aggregation, data augmentation, and especially, data labeling. Along with annotation infrastructure, the data labeling process often requires enormous human efforts and expertise. And so, data labeling can also be time-consuming and expensive. High reliance on human intelligence to execute tedious and complex annotation tasks often disrupts the data supply chain and model iteration processes that could potentially slow down AI advancement.

The critical role of automation

Automation is the answer to by-passing the inefficiencies involved in fully-manual operations. There are a number of advantages over manual labeling techniques, including:

  1. Speed: ML-assisted annotation tools expedite data labeling by 40 - 60%, vastly reducing the project timelines for companies building complex ML models. With each iteration, the human work required for labeling decreases even as training data requirements increase.
  2. Cost-effectiveness: With pre-trained models, annotators spend far less time executing annotations. The tool eliminates unnecessary labeling tasks, shifting focus on low-confidence labels, saving time and annotator costs involved in data labeling. In use cases that require specialized experts to label data, teams can cut costs by 50 - 70%.
  3. Accuracy: The time saved by automation helps annotators fix errors in labeling and improve label accuracy. Human annotators further enhance models by correcting annotations and retraining models for better performance.
  4. AI performance: ML-assisted tools speed up the labeling process over time and improve accuracies, enabling faster iterations and leading to more accurate and performant AI models.

With improved time, cost, quality, and frequency of the data supply chain, businesses can scale their AI projects from tens to thousands to millions of data points without any added hassles

The essential guide to AI training data

Discover best practices for the sourcing, labeling and analyzing of training data from TELUS International, a leading provider of AI data solutions.

Download the guide

Manual annotation vs. automated annotation

TELUS International offers a variety of labeling automation features via its proprietary AI training data platform. Let’s compare and contrast the effectiveness of a few automation features.

Manual video annotation vs. interpolation for video object detection and tracking

The automated interpolation feature can reduce repetitive labeling when objects of interest span across a sequence of frames in a video. The tool requires the annotator to create annotations in one frame, and the annotations automatically duplicate in the corresponding image frames. The annotator adjusts the position in every fifth frame of the sequence. This feature drastically speeds up labeling for large video/sensor fusion sequences.

In the following video, the annotator manually draws cuboids in each frame, annotating a total of three cars.

However, using interpolation, the tool automatically detects objects in multiple frames, increasing the number of annotated cars from three to six.

Manual segmentation vs. interactive instance segmentation

With automation, annotators can execute semantic segmentation in just a few clicks. By marking the extreme points of an object, the tool automatically generates a semantic mask, drastically speeding up the segmentation of objects.

In manual segmentation, annotators plot numerous points to create a semantic mask.

In contrast, the automation feature requires the annotator to plot only the four extreme points to complete the semantic mask.

Manual cuboid annotation vs. one-click cuboids for object detection

As the name suggests, one-click cuboids help annotators execute cuboid annotations in 3D point clouds with just a single click. The pre-trained model automatically creates an accurate cuboid amid a cluster of points forming an object, reducing the execution time by 25%.

When executed manually, the annotators draw and drag boxes to create cuboids, annotating three cars in the given frame.

Using automation, the annotators execute six annotations in the same time frame, increasing the total of annotated cars to six in the given scenario.

Pre-labeling via ML proposals

Pre-labeling algorithms enable faster and more accurate labeling with fewer clicks. The algorithms, trained on a wide range of datasets, can successfully identify up to 80 general classes for 2D object detection and tracking. Confidence scores allow annotators to review annotations at different thresholds to select the accurate proposals and reject or edit the remaining annotations. This automation feature has improved labeling speeds by 5X for many in-house projects.

Reach out to our experts to learn how we can help you create high-quality labeled data at faster speed and scale.

Check out our solutions

Test and improve your machine learning models via our global AI Community of 1 million+ annotators and linguists.

Learn more