How to select a large language model
Businesses are eager to leverage generative AI (GenAI) and they wanted it implemented yesterday.
Who can blame them? GenAI gives enterprises a significant edge in today's competitive market by advancing automation and personalization, streamlining the marketing and sales journeys, increasing customer retention, improving worker efficiency and more.
Few companies have the resources to build their own large language model (LLM), so they will create apps based on an existing foundational model such as OpenAI's GPT series or Google's PaLM technology. For those tasked with the complex objective of building applications that leverage these foundational LLMs, deciding which one to use can be tricky to say the least. With so many diverse LLMs to choose from, you need a defined framework to help you select what will work best for your business needs.
Read on to get a sense of what's out there to choose from, as well as a list of key considerations when deciding on an LLM.
Available large language models
The first step in your generative AI journey is familiarizing yourself with available foundational models.
There are two types of LLMs: proprietary and open source. Proprietary LLMs require a license that may restrict how the LLM can be used, while open-source models can be used and modified by anyone, for any purpose.
With a seemingly endless list of models to choose from, the following provides just a taste of what's available.
- Llama 2: The next generation of Meta's open source LLM, Llama 2 has double the context length of Llama 1. Available in three size versions: 7 billion, 13 billion and 70 billion parameters. Open source.
- Falcon: Developed by the Technology Innovation Institute, Falcon is available in 1.3 billion, 7.5 billion, 40 billion and 180 billion parameter models. These models are multilingual and serve as bases that can be fine-tuned for specific requirements. Open source.
- PaLM 2: Trained on more than 100 languages, Google's LLM is available in four sizes (parameters not specified) that can be fine-tuned to support specific use cases. This next-generation model was developed using a larger range of datasets compared to PaLM and also features model architecture improvements. A technique called compute-optimal scaling, in which model size and the training dataset size are scaled proportionately, resulted in this model being smaller than PaLM, yet more efficient and boasting better overall performance. Open source.
- GPT models: OpenAI's range of models includes GPT-4, GPT-4 Turbo, GPT-3.5, GPT-3 (legacy model) and more, each with different capabilities and price points. Models can be customized with fine-tuning for your specific use case. GPT-3 has more than 175 billion parameters. The number of parameters in GPT-4 is not confirmed, but is estimated to be 1.76 trillion. Proprietary.
- Claude: To reduce the potential of an LLM to be harmful rather than helpful, the developers at Anthropic used a process they call "constitutional training" to guide this LLM to adhere to a "constitution" of desired behavior. This family of models includes Claude, Claude Instant and Claude 2. They were mostly trained in English but are also said to work well in other common languages. Proprietary.
A popular and up-to-the-minute resource is Hugging Face, which has an open-source LLM leaderboard where models are ranked based on general knowledge tests, multitasking capabilities, propensity to hallucinate, ability to make common sense inferences and more.
Key considerations when selecting an LLM
The number of models to choose from is immense, so it's important to find the one that most effectively meets your specific business needs at the right price point. Here are some key factors to consider when making your decision.
- Use case: One of the most important considerations is what exactly you want to achieve by investing in an LLM and implementing generative AI processes into your business. Whether you want to improve automation, create content, summarize, analyze or translate text, implement a GenAI chatbot or generate code, different LLMs excel at different tasks. For example, Llama 2 Chat is optimized for assistant-like chat uses, while Claude 2 excels at generating code in popular programming languages.
- Budget: As previously mentioned, some models are freely available, while others require licenses or have associated usage costs. Be sure to fully understand the cost structure of the model you are selecting so you aren't surprised by "hidden" fees after the fact.
- Size and capabilities: Depending on the LLM, there can be a tradeoff between model size and capabilities. When investigating large language models, you'll often read about hyperparameters, which are settings that control how the LLM generates output. These are used to tune how the next token is selected (a basic unit of meaning such as a word) in a sequence and are responsible for defining the model's behavior. Generally speaking, increasing the parameters creates a model that can better detect details and nuances in the data, which leads to better performance in tasks like natural language understanding, text generation and question answering. While bigger might be better, the size of the model you choose will be dictated by your specific objectives and, of course, budget. Furthermore, larger models carry more risks; if your application doesn't need the extra power, it's safer to pick a smaller model.
- Fine-tuning: When you fine-tune a pre-trained large language model, you use new data to retrain it, thereby adjusting the parameters of the LLM to a specific task or tasks. To take a general-purpose model and create a custom model for your specific domain or tasks, you'll need to ensure the LLM you are selecting supports this, as not all do.
- Latency: This refers to the time it takes for a model to process a request and return a response, which can vary from milliseconds to more than 15 seconds in some cases. If your use case necessitates real-time responses (for example, a GenAI chatbot assisting a human agent in replying to a customer query), you'll require a low-latency model. Smaller models with fewer parameters tend to be faster than large models, but may not be equal in terms of performance. Be cognizant of the fact that there are techniques that help improve model response time, including prompt engineering and fine-tuning.
- Performance: Evaluating an LLM's performance before making your selection is paramount. Factors that should be considered are the model's accuracy (the rate at which the model's output reflects the correct or expected results), fluency (ability to generate natural-sounding responses), relevancy (the amount of alignment between a user query and the model's response), context awareness (ability to maintain contextual understanding during a conversation) and specificity (ability to generate specific rather than generic responses).
- Risk assessment: Large language models' occasional propensity to hallucinate (output inaccurate or nonsensical responses) or provide biased output can potentially destroy customer trust and loyalty. When assessing models, be sure to investigate their explainability. Where did the training data come from? Was high-quality data used? Did the training include transparent processes and incorporate human feedback? Also, keep in mind that some LLMs have had much more testing and fine-tuning done to them and, as a result, are better at risk mitigation.
Implementing GenAI is a journey made easier with the right partner
With so many LLMs to choose from, the road to generative AI implementation can be a bumpy one. Working with a trusted partner can help smooth the path. TELUS International offers comprehensive solutions to advance your GenAI initiatives.
Starting with our consultation services, we'll help you articulate an AI strategy by exploring market opportunities, identifying generative AI use cases for immediate impact and establishing responsible governance and risk mitigation strategies. While we know that speed is crucial, we advocate for an intentional, strategic approach to GenAI implementation.
When it comes to fine-tuning your LLM, data readiness is crucial. Our knowledgeable team of data sourcing specialists, computational linguists and global community members (capable of collecting and annotating data in hundreds of languages and locales) accurately source, create and select complex datasets across content types and domains. Through our data engineering services, we help ensure your proprietary datasets are complete, accurate and ready for retraining your custom LLM. Our capabilities ensure the data you need is available and high quality.
An LLM that outputs inaccurate, biased or toxic responses can seriously harm your brand equity. As a result, ensuring that the performance of your model is optimized is paramount. We can help do so via model testing, prompt generation and enhancement, and model output evaluation. Through reinforcement learning from human feedback processes, we help improve model output in terms of accuracy and bias reduction.
With nearly two decades of AI experience, our team of experts are here to help solve "last-mile" generative AI challenges. Our multidisciplinary teams seamlessly blend data science, engineering, strategy and experience to accelerate AI roadmaps responsibly and securely.
Meet your business needs faster, and more strategically, than ever before with our complete set of integrated GenAI services. Reach out today to speak with an expert.