What is computer vision?
Humans and animals use their eyes to see the world around them; computer vision is the science that aims to give the similar skill to machines. The goal in computer vision is to automate tasks that the human visual system can do, such as image acquisition and image analysis.
For example, in computer science, colors are represented by a HEX numerical value. This is how machines are programmed to understand which pixel combinations correspond to which colors. On the other hand, humans have an inherent mutual understanding of how to distinguish between different shades of colors.
The image data for computer vision can come in different forms, such as video sequences, views from multiple cameras at different angles or multidimensional data from a medical scanner.
AI systems that process visual information rely on computer vision. Let’s break down the complex process of how data scientists teach a computer to “see.”
In computer vision, the most common way to locate an object within an image is to use bounding boxes. These are imaginary boxes drawn on an image, shape, or text, defined by their x and y coordinates. Human annotators label the contents of bounding boxes to help a model recognize it as a distinct type of object. The annotators can work with bounding boxes by moving, transforming, rotating and scaling them. This way, the annotators can make sure that each image has a precise bounding box around it.
Neural networks, also called neural nets, create a process that tries to simulate the logical reasoning of human brains by creating algorithms that are contingent on the outcomes of other surrounding algorithms.
Convolutional neural networks (CNN) are a type of neural network that is used for computer vision. Computers use CNN to break images down into numbers and represent them mathematically. The neural network uses convolution, which is a combination of two functions that produce a third function, to merge multiple sets of information about an image. The computer pools all of that information together to create an accurate representation of an image. After pooling, the computer describes the image in numerical turns so that a neural network can make a prediction about the content of that image. This is how autonomous vehicles will be able to tell apart pedestrians, traffic lights and other cars on the road.
Over time, the neural network will become trained about the accuracy of its predictions. Computers don’t start off knowing how to classify objects, and they require a lot of training data until their predictions are accurate.
Once you’ve trained your model, it can then apply the prediction to end use cases like face recognition to unlock your smartphone, or suggesting a friend to tag on Facebook.
Recent progress in computer vision allows the healthcare industry to make extensive use of medical imaging data to provide better diagnosis, treatment and prediction of diseases. For example, Medivis built the SurgicalAR platform, a visualization tool that guides surgical navigation, and can decrease complications and improve patient outcomes, while lowering surgical costs.
Computer vision is the technology behind imaging for autonomous vehicles. In fact, the automobile industry often refers to computer vision as “perception.” This is because cameras are the primary tools that the vehicles use to perceive their environment and surrounding objects.
Many smartphones have introduced a face recognition feature as an alternative to a password or using your thumbprint. The feature uses computer vision and machine learning to adapt to changes in your expression and appearance. It can still recognize you if you gain weight, get a haircut or put on fancy accessories. Even if you wear a scarf or grow a beard, your smartphone should still be able to recognize you.
In order to help your next computer vision project approach 20/20 vision, you’re going to need a team of annotators and a lot of training data. We can help. Reach out to our AI Data Solutions team to get started.