1. Insights
  2. AI Data
  3. Article
  • Share on X
  • Share on Facebook
  • Share via email

30 intermediate AI terms you should know

Posted January 1, 2021
Illustration of person reading with a book that prominently displays "A-Z" in the background

Have you already mastered our list of 50 beginner AI terms you should know? If so, it’s time to progress through this glossary of 30 intermediate AI and machine learning terms.

Anomaly detection

Anomaly detection is the task of identifying suspicious elements within a given stream of data, based on how those elements differ from the rest of the dataset in relevant criteria.

Binary (Bimodal) classification

Binary classification is the task of classifying elements into two groups, based on a classification rule (defined below).

Classification rule (Classifier)

Given a population where elements belong to different categories, a classification rule is a procedure to predict which elements belong to which categories.

Complex system

A complex system is an algorithm to solve a problem containing many entities linked together in a complex way.

Computational intelligence

Computational intelligence is the ability of a computer to learn a specific task from training data or experimental observation.

Computer vision

Computer vision enables machines to understand the content of images and videos. The goal of computer vision is to automate tasks that the human visual system can do.

Data cleansing

The process of improving data quality, which usually involves removing or correcting false data values. Data cleansing is an important step to do before beginning a machine learning project.

Edge AI computing

In edge AI computing, algorithms process information locally on hardware devices, and do not require a connection.

Game theory

Game theory is the study of mathematical models of strategic interaction between rational decision makers. In simple terms, it is the study of how and why people make decisions. Game theory helps us understand parts of science, politics and more.

Grid search

The process of performing hyperparameter tuning in order to determine the optimal values for a given model. This is significant, as the performance of the entire model is based on the specified hyperparameter values.

Ground truth

In machine learning, ground truth refers to the accuracy of the training dataset’s classification for supervised learning techniques. The ground truth is used in statistical models to prove or disprove research hypotheses.

Heuristic search techniques

Heuristic search techniques are support functions that narrow down the search for optimal solutions for a problem by eliminating incorrect options.

Logarithmic loss

Logarithmic loss is a function that measures the performance of a classification model where the prediction input is a probability value between zero and one. The goal of machine learning models is to minimize this value.

Logic programming

Logic programming is a type of programming paradigm in which computation is carried out based on the knowledge repository of facts and rules. Two programming languages used for machine learning are LISP and Prolog.

Long short-term memory

Long short-term memory (LSTM) is an artificial recurrent neural network architecture used in deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections that make it act as a general purpose computer that can process not only single data points, but also entire data sequences.

Naive bayes

Naive bayes is a probabilistic machine learning classifier that makes classifications using the Maximum A Posteriori decision rule in a Bayesian setting. Naive bayes classifiers are commonly used for text classification and are a traditional solution for spam detection.

Named entity recognition

Named entity recognition (NER) is the classification of named entities present in a body or text. The entities are labeled based on predefined categories such as person, organization or place.

Natural intelligence

Natural intelligence refers to how humans and animals think, as opposed to artificial intelligence.

Optical character recognition

Optical character recognition (OCR) technology enables computers to extract text data from images. Once a document (typed, handwritten or printed) undergoes OCR processing, the text data can easily be edited, searched, indexed and retrieved.

Optimization problem

In mathematics and computer science, an optimization problem is the task of finding the most effective and efficient solution to a problem, instead of finding any possible solution that works.

Phrase chunking

Phrase chunking is the process of tagging parts of speech with their linguistic or grammatical meaning.

Search relevance

Search relevance refers to search engine performance and the relevance of its fetched results. It is the user’s ability to search for information quickly and easily.

Soft computing

Soft computing, sometimes referred to as computational intelligence, refers to the use of inexact but usable solutions to solve complex computational problems.


Stemming is the process of reducing words to their root form. For example, the word robotics would be reduced to the stem robot. The stem is usually a written word, but does not need to be. The Porter stemmer, a widely used algorithm for removing common suffixes from English words, reduces the words universal, university and universe to the stem univers.

Support vector machines

Support vector machines (SVM) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.

Swarm behavior

From the perspective of the mathematical modeler, swarm behavior is an emergent behavior arising from simple rules that are followed by individuals and does not involve any central coordination.

Systems engineering

Systems engineering is a sub-field of engineering that focuses on how to design and manage complex systems throughout their life cycles.

Term frequency

Term frequency, used in text mining, natural language processing and information retrieval, tells you how frequently a term (word or phrase) occurs in a document. Since documents differ in length, it’s possible that a term would appear more times in longer documents than in shorter ones. Thus, term frequency is calculated by dividing the total number of terms in the document, as a way of normalization.

Term Frequency = [Number of times the term appears in the document] / [Total number of terms in the document].


Tf-idf (term frequency-inverse document frequency) is a numerical statistic that is used to show how important a word is to a document in a corpus. The method is to count how often the word occurs in the document, then normalize it against how often that word appears in other documents.

Unstructured data

Unstructured data is data that does not have easily searchable patterns, for example, audio, video and social media content.

Word vectors

This is the concept of transforming a word into a vector and giving it a position in multi-dimensional space. By representing words as vectors, you can use them in mathematical operations. You can calculate the distance between words to represent mathematically which words are related.

Looking to apply what you’ve learned in an AI or machine learning project? Our AI Data Solutions team has the expertise and ability to apply these intermediate concepts and more. Reach out today.

Check out our solutions

Test and improve your machine learning models via our global AI Community of 1 million+ annotators and linguists.

Learn more