Natural language processing: The power behind today's large language models
From the ability to recognize speech to generating text, images, computer code and more, today's large language models (LLMs) are nothing short of amazing.
In terms of building intelligent machines, LLMs signify a giant technological leap forward and serve as the power behind generative AI (GenAI) technology like ChatGPT, Google Bard and DALL-E. GenAI-enabled tools are becoming more and more pervasive in society every day. In fact, a recent TELUS International survey of 1,000 Americans familiar with GenAI found the majority of the individuals (86%) had started to utilize the technology to help with various aspects of their daily lives, including writing personal documents (50%), automating personal or work tasks (47%), generating social media content (43%) and recommending personal health initiatives (38.5%).
Natural language processing (NLP) serves as the foundation for large language models, which refers to a machine's ability to process natural language (any language used by humans) and respond in a way that will support the task the user is trying to accomplish.
Considering large language models' surprising capabilities and enormous potential, it's interesting to see how the relatively simple (by today's standards) NLP tasks carried out beginning in the 1950s evolved to today's tasks.
Evolution of natural language processing
One of the first NLP research endeavors, the Georgetown-IBM experiment, conducted in 1954, used machines to successfully translate 60 Russian sentences into English. While the task could be considered relatively simple by today's standards, this experiment, and others like it, showed the incredible potential of natural language processing as a field.
Through the years, three main approaches to NLP developed, culminating in the methods used to train today's large language models.
From the 1950s to the early 1990s, symbolic approaches were used. These provided computers with collections of handcrafted rules, enabling them to carry out NLP tasks by applying those rules to any data they came across.
Symbolic approaches trained AI systems based on complex sets of principles that included language concepts and the relationships between these concepts. The AI used these rules to understand the meaning of words following the conditional logic, "If A, then B." Essentially, when an "if" linguistic condition was met, a particular "then" output was generated.
The advantage of this approach is that it allows for transparency into how models work because the developers write the rules to process language. Also, any issues that occur can be easily identified and the rules revised for outputs that are more aligned with what is desired. A downside is that handcrafting, organizing and managing a vast and complex set of rules can be difficult and time-consuming.
With the rapid development of the internet in the 1990s, large amounts of data became available for more common languages, fueling development of machine learning (ML) methods for NLP tasks.
Machine learning approaches
The first ML approach, called statistical NLP, uses mathematical methods to model the probability of the distribution of sequences of words in natural language. Trained on large datasets, these models learn the statistical patterns that are characteristic of the language to account for the relationships between words and predict the probability of certain word sequences. Ultimately, they base the probability of a word appearing next in a sentence based on the words that came before it.
The second group of ML approaches involves deep-learning and is referred to as neural network NLP. This framework paved the way for significant enhancements in natural language processing tasks and has become the predominant method used today.
Similar to the statistical approach, neural network NLP models the probability of each word in a sentence given the prior words seen in input data. However, it uses word embeddings (representations of words, typically in the form of real-valued vectors) to capture the semantic properties of words. Encoded words that are closer in the vector space are expected to be similar in meaning.
One of the most commonly used neural network-based language models is a recurrent neural network (RNN). An RNN produces predictive results in sequential data. Because of its internal memory, it's able to remember the salient features of the input it receives, which allows it to be extremely precise in predicting what should come next.
A transformer is a newer type of neural network-based language model that can process input sequences in parallel. Familiar models that use this architecture include OpenAI's ChatGPT and Google's BERT. In addition to parallelism (the ability to process several inputs simultaneously), these models use a mechanism referred to as "attention" that allows them to learn the inputs that are more salient than others in a particular context.
Machine learning approaches to natural language processing have revolutionized AI. These methods have enabled the creation of significantly larger models, paving the way for generative AI technology. Their internal memory and ability to analyze how words are grammatically related, makes these models more robust and more accurate. They are also better able to deal with unfamiliar input like words or structures they encounter for the first time as well as erroneous input like misspelled words or word omissions.
There are, however, some disadvantages. One is model drift, which happens when the data a machine learning model was trained on becomes outdated or no longer represents current real-world conditions. Another is the black box problem, in which it becomes difficult to understand how the machine makes its decisions due to the intricate nature of the algorithms. There is also the challenge of sourcing adequate resources. A typical large language model has at least one billion parameters and can demand hundreds (if not thousands) of gigabytes of graphics processing units, or GPUs, memory to handle the massive datasets it learns from. Not surprisingly, these models are incredibly expensive to both train and run.
Future of large language models
It's exciting to think about the incredible potential of LLMs. With further technological advancements and ever-evolving approaches to natural language processing, there's no doubt these machines will become more efficient and accurate. Meanwhile, concerns around generative AI such as misinformation, model transparency and misuse are warranted. One thing's for certain, keeping humans in the loop for both the training and testing stages of building these models, is essential.
TELUS International provides services to implementers of generative AI technologies, with extensive capabilities for application development through the consultancy, design, build, deployment and maintenance phases. We have nearly two decades of AI experience in natural language processing, and our global community can review, translate, annotate and curate data in 500+ languages and dialects. To learn more, reach out today.