AI Glossary

Tom van Wees

·

5 min read

A practical guide to AI terminology, covering key concepts from basic machine learning to advanced language models. Written for product teams and developers, this glossary explains 70+ essential AI terms and concepts, from agents and benchmarking to vector databases and zero-shot learning.

Table of contents
Loading contents...

Terms to understand when it comes to AI

Agent: An AI model designed to autonomously interact with its environment to perform tasks, often adapting to new information.

Agentic Workflow: A method of task automation where agents work in a structured sequence to complete complex tasks independently.

AGI (Artificial General Intelligence): An advanced form of AI that can understand, learn, and apply knowledge across a wide range of tasks like a human.

AI Copilot: An AI assistant designed to collaborate with humans, often in real-time, to aid in tasks or decision-making.

Alignment: The process of ensuring an AI system's goals and actions align with human values and intentions.

ASI (Artificial Superintelligence): A hypothetical AI that surpasses human intelligence across all fields, including creativity, problem-solving, and emotional intelligence.

Benchmarking: The process of measuring an AI model's performance against set standards or other models.

Bias: Systematic errors in AI that can lead to unfair or inaccurate outcomes, often rooted in biased data.

Chain of Thought: A reasoning technique where AI models break down complex problems into intermediate steps for improved answers.

Chatbot: An AI-powered conversational agent that can communicate with users in text or voice formats to answer questions or provide assistance.

ChatGPT: A conversational AI model developed by OpenAI, based on the GPT architecture, for natural language interactions.

Classification: The process of categorizing data points into predefined classes, such as spam vs. non-spam emails.

Claude: An advanced AI chatbot created by Anthropic with an emphasis on ethical and safe interactions.

Completions: Responses generated by AI models based on the input prompt, typically used in text-based interactions.

Compute: The computational resources (e.g., processors, GPUs) required to train and run AI models.

Content Enrichment or Enrichment: Improving raw data by adding additional context, such as tags, metadata, or categorizations, to enhance usability.

Conversational AI: AI designed specifically for understanding and generating human language in a conversational context.

Data Augmentation: The process of artificially creating new training data from existing data to enhance model performance.

Data Extraction: The process of pulling specific data or insights from unstructured sources, like text or images.

Data Ingestion: The initial step in the data pipeline where data is collected from various sources and processed for use.

Data Sets: Collections of data used to train, validate, or test AI models.

Deep Learning: A subset of machine learning using neural networks with multiple layers to learn complex patterns in data.

Determinism: When an AI model produces the same output each time it receives the same input.

Diffusion: A process used in generative models to create or modify data, often seen in image generation techniques.

Embedding: A representation of data, often words or sentences, in a continuous vector space to capture its meaning or relationships.

Evaluations: Tests or assessments to measure the effectiveness or accuracy of AI models.

Explainable AI (XAI): AI systems designed with transparency to allow humans to understand how they reach their conclusions.

Few-shot Learning: A technique where AI models learn tasks with minimal training examples.

Fine-tuning: The process of adapting a pre-trained model to a specific task with additional data.

Foundation Model: A large-scale AI model pre-trained on vast data that can be adapted to various downstream tasks.

Generative AI: AI that can produce new content, such as text, images, or music, rather than simply analyzing existing data.

Gemini: A family of AI models by Google focused on both conversational and multimodal tasks.

GPT (Generative Pretrained Transformer): A transformer-based model that generates text by predicting the next word in a sequence.

GPU (Graphics Processing Unit): Hardware optimized for parallel processing, commonly used to accelerate AI computations.

Hallucination: When an AI model generates information that is not based on real data or facts.

Human-in-the-loop: A setup where human input guides or corrects AI decisions to improve performance or accuracy.

Inference: The process of making predictions or generating responses based on a trained AI model.

Knowledge Graph: A structured representation of interconnected facts that helps AI understand relationships between entities.

Large Language Model (LLM): A powerful type of AI trained on massive text data to understand and generate human language.

Latency: The time delay between a user's input and the AI's response.

Llama: Meta's open-source large language model designed for various text generation and understanding tasks.

Machine Learning: A field of AI where algorithms learn from data to make predictions or decisions without explicit programming.

Metadata: Data that provides information about other data, often used to organize and retrieve data efficiently.

Mistral: An open-source AI model focused on efficient, smaller-scale performance for various NLP tasks.

Model Configs: The settings and hyperparameters that define an AI model's structure and behavior.

Multimodal: AI models that can process and combine multiple types of input, such as text, images, and audio.

Multitask Prompt Tuning (MPT): A technique where prompts are adjusted to allow a model to perform multiple tasks.

Natural Language Processing (NLP): The field of AI focused on enabling computers to understand and process human language.

Neural Network: A series of interconnected nodes that mimic the human brain, used to detect patterns and make decisions in AI.

Node: Building blocks within Workflows in Lleverage.

Parameters: The values in a model that are adjusted during training to fit the data, such as weights in a neural network.

Parsing: The process of analyzing text to extract structured information, like document parsing (CV).

Pre-training: The initial phase of training a model on large datasets to develop foundational knowledge before fine-tuning.

Prompt: The input given to an AI model to generate a response, often structured to guide the model's output.

Prompt Chaining: The practice of linking multiple prompts to guide the AI through a sequence of responses.

Prompt Engineering: Crafting and optimizing prompts to achieve the best responses from AI models.

Prompt IDE: An interface to design, test, and refine prompts for better model interactions.

Prompt Massaging: Adjusting prompts to refine or correct model responses without major modifications.

RAG (Retrieval Augmented Generation): A model technique that retrieves data from external sources to improve response accuracy.

Reinforcement Learning: A type of machine learning where models learn by receiving rewards or penalties for their actions.

RLHF (Reinforcement Learning from Human Feedback): Training models by optimizing based on human feedback on responses.

Semantic Search: A search that uses the meaning of words rather than exact matches to retrieve relevant information.

Sentiment Analysis: The process of identifying the emotional tone in text, often used in social media monitoring.

Similarity Search: Finding data points similar to a query by comparing their vector embeddings.

Singularity: A theoretical point where AI surpasses human intelligence, leading to rapid and possibly unpredictable advances.

Structured Data: Data that is organized in a clear, defined format, such as tables or databases.

Structured Output: AI-generated data presented in an organized format like lists, tables, or fields.

Temperature: A parameter controlling the randomness of a model's output, where higher values lead to more varied responses.

TensorFlow: An open-source framework by Google for building and deploying machine learning models.

Token: A unit of text, such as a word or character, that a model processes to generate responses.

Token Limit: The maximum number of tokens a model can handle in a single input or output sequence.

Top-P (Nucleus Sampling): A decoding method where only the top cumulative probability tokens are considered in response generation.

Training Data: Data used to train an AI model, helping it learn patterns and make predictions.

Transformer: A type of model architecture that excels in handling sequential data, particularly for NLP tasks.

Unstructured Data: Data not organized in a pre-defined way, like raw text, audio, or images.

Variable: A storage element in programming or machine learning that can hold data values for processing.

Vector Database: A specialized database optimized for storing and retrieving vector embeddings (e.g. Weaviate, Pinecone)

Vectorizing: The process of converting text or other data into numerical vectors to enable similarity comparisons.

Zero-shot Learning: When a model performs a task it wasn't explicitly trained for by leveraging general knowledge.

Turn your manual decisions into intelligent operations

See how we capture your decision intelligence and put it to work inside the systems you already have. Start with one workflow. See results in days.