Computer Vision Annotation: Tools, Types, and Resources
Behind the AI systems that give sight to machines, you’ll find a computer vision annotation tool. These tools are the key to taking raw image data and turning it into training data for machine learning models. Annotation tools help autonomous vehicles to recognize traffic conditions, warehouse robots to differentiate stock and delivery drones to navigate to addresses.
Within computer vision, annotation tools are used for a variety of different applications. Although facial recognition, object detection and medical imaging all fit under the umbrella of computer vision, each requires a different kind of annotation to achieve its goals. Knowing the type of annotation for the job is key to picking the right tool.
In this article, we’ll look at the common types of image annotation for computer vision AI, along with tools and resources for starting your own projects.
2D Bounding Boxes
Bounding boxes are drawn over an image, shape or text to define its X and Y coordinates. This is the start of training a machine to recognize distinct types of objects. For example, bounding boxes can help autonomous vehicles differentiate pedestrians from vehicles. They are also essential for tasks like object identification and collision detection.
When training for computer vision, annotation tools allow human annotators to move, transform, rotate and scale bounding boxes. They also allow for category classification. High quality annotation tools should be simple to use with a high degree of flexibility. A good annotation tool will include functions like zooming into images and crosshairs for defining box position. These quality of life details allow annotators to work more quickly without sacrificing accuracy.
As mentioned above, bounding boxes are common for autonomous vehicles. They also help drones locate landmarks and help industrial warehouse robotics recognize a variety of different objects.
3D Bounding Boxes/Cuboids
3D bounding boxes, also known as cuboids, add the extra dimension of depth to traditional bounding boxes. Creating a 3D representation of an object for computer vision means giving machines the ability to distinguish an object’s position in 3D space, as well as its volume.
Bounding boxes usually start with anchor points, placed at the edges of an object. By filling the space between these anchor points with lines, you create a 3D box, or cuboid, around an object. The resulting 3D representation then shows depth along with location.
3D bounding boxes are common in locomotive robotics and autonomous vehicles, where it is not enough to simply know that an object exists. When a machine needs to be able to understand the location and size of an object, 3D bounding boxes offer higher levels of accuracy than traditional 2D bounding boxes.
Landmark annotation works by placing points across an image to label objects within that image. This kind of labeling ranges from single points to annotate small objects, and also multiple points to outline particular details. Images for landmark annotation can include maps, faces, bodies, and objects.
In computer vision projects, landmark annotation is most common for accurate facial recognition. By allowing for multiple points to differentiate the shape and details of unique faces, machines can learn to more accurately differentiate one face from another. This can be for unlocking cellphones, identifying faces in social media apps, and more.
Outside of facial recognition, landmark annotation can also help with video analysis. For example, TELUS International worked with a client on tracking the movements of certain body parts across multiple frames of video. In this project, it was important that the tool allow for multi-tier classification, such as “elbow - left” and “ankle - right.” This flexibility allowed for higher-quality analysis.
Though bounding boxes are fine for many computer vision AI tasks, they sometimes lack the accuracy necessary for objects with irregular shapes. Think street signs or building shapes, for example. In these cases, polygon annotation is a more accurate solution. Unlike bounding boxes, which have a set rectangular shape, polygon annotation allows for multiple angles and lines. This means that instead of drawing a box over a building, annotators can click at certain points and change direction to best adhere to the shape of the object.
Polygon annotation is helpful for aerial imaging, where it is often important for drones or satellites to locate particular objects from up high. In terms of autonomous vehicles, polygon annotation helps when you require high levels of detail. An example of this is differentiating a variety of objects among heavy traffic.
When working on polygon annotation for computer vision, a good annotation tool will offer ways to make work easier. Look for features such as zooming and panning controls to support annotator accuracy, or multi-pass options for inter-annotator agreement to ensure quality. If you need to record text within an annotation, such as street signs or advertising signboards, look for the ability to set optional or mandatory comments per annotation.
For machine learning models, the accuracy of your computer vision comes down to the accuracy of your dataset. That means making sure you have a trustworthy annotation tool, and a reliable group of annotators to work with it.
The TELUS International annotation platform is designed to address this challenge. It’s a flexible toolkit that offers users the freedom and control to ensure their project needs are met. We’ve designed the platform for ease of use. Projects are simple to create and customize, and can be iterated as they evolve. It’s also carefully set up so project managers can oversee a team of annotators across a range of project types on a single platform.
We offer full project management support and quality assurance for each of your individual projects. We can also recommend annotators for specific jobs if you require extra help. If you’re looking for image annotation solutions for machine learning or just want to get a better understanding of the field, get in touch.