Tech box illustration

10 (more) AI terms everyone should know

by Susanna Ray

Since generative AI catapulted into the mainstream at the end of 2022, most of us have gained a basic understanding of the technology and how it uses natural language to help us interact more easily with computers. Some of us can even throw around buzzwords like “prompts” and “machine learning” over coffee with friends. (If you’re not there yet, you can start with this introductory piece: 10 AI terms everyone should know.) But as AI continues to evolve, so does its lexicon. Do you know the difference between large and small language models? Or what the “GPT” stands for in ChatGPT? Or what a RAG has to do with cleaning up fabrications? We’re here to help with a next-level breakdown of AI terms to get you up to speed.

Tech maze illustration


Computers using AI now can solve problems and accomplish tasks by employing patterns they’ve learned from historical data to make sense of information, something akin to reasoning. The most advanced systems are showing the ability to go a step further, tackling increasingly complex problems by creating plans, devising a sequence of actions to reach an objective. Imagine asking an AI program for help organizing a trip to a theme park. The system can take that goal — a visit where you go on six different rides, including making sure the water adventure is during the hottest part of the day — and can break it up into steps for a schedule while using reasoning to make sure you’re not doubling back anywhere and that you’ll be on the splash coaster between noon and 3 p.m.
Opaque weight illustration


To create and use an AI system, there are two steps: training and inference. Training is sort of like an AI system’s education, when it is fed a dataset and learns to perform tasks or make predictions based on that data. For example, it might be given a list of prices for homes recently sold in a neighborhood, along with the number of bedrooms and bathrooms in each and a multitude of other variables. During training, the system adjusts its internal parameters, which are values that determine how much weight to give each of those factors in influencing pricing. Inference is when it uses those learned patterns and parameters to come up with a price prediction for a new home about to go on the market.
Tech box illustration

SLM/small language model

Small language models, or SLMs, are pocket-sized versions of large language models, or LLMs. They both use machine learning techniques to help them recognize patterns and relationships so they can produce realistic, natural language responses. But while LLMs are enormous and need a hefty dose of computational power and memory, SLMs such as Phi-3 are trained on smaller, curated datasets and have fewer parameters, so are more compact and can even be used offline, without an internet connection. That makes them great for apps on devices like a laptop or phone, where you might want to ask basic questions about pet care but don’t need to dive into the detailed, multi-step reasoning of how to train seeing-eye dogs.
Tech box illustration


Generative AI systems can compose stories, poems and jokes, as well as answer research questions. But sometimes they face challenges separating fact from fiction, or their training data is outdated, and then they can give inaccurate responses referred to as hallucinations. Developers work to help AI interact with the real world accurately through the process of grounding, which is when they connect and anchor their model with data and tangible examples to improve accuracy and produce more contextually relevant and personalized outputs.
Tech blocks illustration

Retrieval Augmented Generation (RAG)

When developers give an AI system access to a grounding source to help it be more accurate and current, they use a method called Retrieval Augmented Generation, or RAG. The RAG pattern saves time and resources by adding extra knowledge without having to retrain the AI program. It’s as if you’re Sherlock Holmes and you’ve read every book in the library but haven’t cracked the case yet, so you go up to the attic, unroll some ancient scrolls, and voilà — you find the missing piece of the puzzle. Similarly, if you’ve got a clothing company and want to create a chatbot that can answer questions specific to your merchandise, you can use the RAG pattern over your product catalog to help customers find the perfect green sweater from your store.
Illustration showing various shapes


AI programs have a lot on their plate as they process people’s requests. The orchestration layer is what steers them through all their tasks in the right order to get to the best response. If you ask Microsoft Copilot who Ada Lovelace is, for example, and then ask it when she was born, the AI’s orchestrator is what stores the chat history to see that the “she” in your follow-up query refers to Lovelace. The orchestration layer can also follow a RAG pattern by searching the internet for fresh information to add into the context and help the model come up with a better answer. It’s like a maestro cueing the violins and then the flutes and oboes as they all follow the sheet music to produce the sound the composer had in mind.
Illustration showing a ball


Today’s AI models don’t technically have memory. But AI programs can have orchestrated instructions that help them “remember” information by following specific steps with every single transaction — such as temporarily storing previous questions and answers in a chat and then including that context in the current request of the model, or using grounding data from the RAG pattern to make sure the response has the most current information. Developers are experimenting with the orchestration layer to help AI systems know if they need to temporarily remember a breakdown of steps, for example — short-term memory, like jotting a reminder on a sticky note — or if it would be useful to remember something for a longer period of time by storing it in a more permanent location.
Illustration showing various shapes

Transformer models and diffusion models

People have been teaching AI systems to understand and generate language for decades, but one of the breakthroughs that accelerated recent progress was the transformer model. Among generative AI models, transformers are the ones that understand context and nuance the best and fastest. They’re eloquent storytellers, paying attention to patterns in data and weighing the importance of different inputs to help them quickly predict what comes next, which enables them to generate text. A transformer’s claim to fame is that it’s the T in ChatGPT — Generative Pre-trained Transformer. Diffusion models, generally used for image creation, add a twist by taking a more gradual and methodical journey, diffusing pixels from random positions until they’re distributed in a way that forms a picture asked for in a prompt. Diffusion models keep making small changes until they create something that works.
Illustration showing blocks and circles

Frontier models

Frontier models are large-scale systems that push the boundaries of AI and can perform a wide variety of tasks with new, broader capabilities. They can be so advanced that they sometimes surprise us with what they’re able to accomplish. Tech companies including Microsoft formed a Frontier Model Forum to share knowledge, set safety standards and help everyone understand these powerful AI programs to ensure safe and responsible development.
Rectangular block illustration


A GPU, which stands for Graphics Processing Unit, is basically a turbocharged calculator. GPUs were originally designed to smooth out fancy graphics in video games, and now they’re the muscle cars of computing. The chips have lots of tiny cores, or networks of circuits and transistors, that tackle math problems together, called parallel processing. Since that’s basically what AI is — solving tons of calculations at massive scale to be able to communicate in human language and recognize images or sounds — GPUs are indispensable for AI tools for both training and inference. In fact, today’s most advanced models are trained using enormous clusters of interconnected GPUs — sometimes numbering tens of thousands spread across giant data centers — like those Microsoft has in Azure, which are among the most powerful computers ever built.
Colorful boxes illustration

Images for this story were created by Makeshift Studios. Story published on May 13, 2024.