Artificial Intelligence Glossary

Comprehensive AI Glossary

A complete reference guide covering artificial intelligence terminology from foundational concepts to cutting-edge developments.

A

Activation Function: A mathematical function applied to the output of a neuron that introduces non-linearity, enabling neural networks to learn complex patterns. Common examples include ReLU, sigmoid, and tanh.

Agent: A software program that can autonomously perform tasks on behalf of a user, often capable of using tools, accessing external data, and making sequential decisions to achieve a defined goal.

Agentic AI: Refers to AI systems that can autonomously pursue complex goals by planning, using tools, and making decisions across multiple steps without constant human guidance.

AGI (Artificial General Intelligence): A hypothetical AI capable of performing any intellectual task a human can. It remains a future goal of AI research.

AI (Artificial Intelligence): The field of computer science focused on creating machines that simulate human intelligence, such as learning, decision-making, and language understanding.

AI Alignment: The process of ensuring AI systems behave in accordance with human intentions and values.

AI Ethics: The study of ethical issues surrounding AI, including bias, transparency, and fairness.

AI Safety: The field concerned with ensuring AI systems do not pose risks to humans.

AI Watermarking / Content Provenance: Techniques for embedding invisible or visible markers in AI-generated content to identify its origin. Examples include Google’s SynthID and the C2PA content provenance standard.

Algorithm: A step-by-step process or set of rules used for problem-solving and decision-making in AI and machine learning.

ANI (Artificial Narrow Intelligence): Also known as weak AI, it is designed for specific tasks, such as image recognition or language translation.

Annotation: The process of labeling data (text, images, audio) with metadata to create training datasets for supervised learning.

Anthropomorphism: The attribution of human traits, emotions, or intentions to AI systems.

ASI (Artificial Superintelligence): A hypothetical level of AI that surpasses human intelligence across all domains, including creativity, problem-solving, and social understanding. Considered a theoretical concept beyond AGI.

Attention Mechanism: A technique that allows models to focus on relevant parts of input data when making predictions, weighing the importance of different elements dynamically.

Autoencoder: A neural network that learns to compress data into a compact representation and then reconstruct it, useful for dimensionality reduction and anomaly detection.

Autoregressive Model: A model that generates output sequentially, with each new element conditioned on previously generated elements. GPT models work this way.

Autonomous: Describes AI systems or robots that operate without human intervention, such as self-driving cars.

B

Backpropagation: The fundamental algorithm for training neural networks, calculating gradients of the loss function with respect to weights by propagating errors backward through the network.

Backward Chaining: A reasoning method that starts with a goal and works backward to determine necessary conditions.

Batch Size: The number of training examples processed together before updating model weights. Larger batches provide more stable gradients but require more memory.

Benchmark: A standardized test or dataset used to evaluate and compare the performance of different AI models on specific tasks.

BERT (Bidirectional Encoder Representations from Transformers): A transformer-based language model that reads text in both directions simultaneously, excelling at understanding context for tasks like question answering and sentiment analysis.

Bias: Unintended preferences or prejudices in AI systems, often due to imbalanced training data.

Big Data: Large and complex datasets that require AI and advanced analytics to process and extract insights.

Black Box (AI): A term for AI models, especially deep learning networks, whose internal decision-making process is difficult to interpret.

Bounding Box: A rectangular box used in computer vision to define the location of an object in an image.

C

Catastrophic Forgetting: A phenomenon where a neural network loses previously learned knowledge when trained on new data, particularly during fine-tuning. Techniques like LoRA and elastic weight consolidation help mitigate this problem.

Chain-of-Thought (CoT) Prompting: A prompting technique that encourages AI models to break down complex reasoning into intermediate steps, improving accuracy on logical and mathematical tasks.

Chatbot: An AI system designed to simulate conversation with users through text or voice.

Classification: A machine learning task that assigns data to predefined categories, such as spam detection in emails.

Clustering: An unsupervised learning technique that groups similar data points together.

Cognitive Computing: AI systems designed to mimic human thinking processes.

Computer Use / Browser Use: The ability of AI systems to interact with desktop applications and web browsers by interpreting screenshots, clicking buttons, typing text, and navigating interfaces, effectively using a computer the way a human would.

Computer Vision: A field of AI focused on enabling computers to interpret visual data, such as images and videos.

Constitutional AI: An approach to AI safety where models are trained to follow a set of principles or “constitution” that guides their behavior toward helpfulness and harmlessness.

Context Window: The maximum amount of text (measured in tokens) that a language model can process at once, including both input and output. Context windows have grown dramatically, with some models now supporting over one million tokens, enabling analysis of entire codebases or book-length documents in a single session.

Convolutional Neural Networks (CNN): A specialized deep learning model, inspired by the visual cortex, excellent at recognizing patterns in grid-like data like images, audio, and video, by using layers of filters (kernels) to detect features (edges, shapes, textures) and progressively build complex representations, reducing data size with pooling, and then classifying through fully connected layers.

Corpus: A large collection of text used for training AI language models.

Cross-Validation: A technique for evaluating model performance by splitting data into multiple subsets, training on some and testing on others, then averaging results.

D

Data Augmentation: Techniques for artificially expanding training datasets by creating modified versions of existing data, such as rotating images or adding noise to audio.

Data Mining: The process of discovering patterns and insights in large datasets using AI and statistical techniques.

Data Poisoning: The intentional corruption of training data to manipulate a model’s behavior, causing it to learn incorrect patterns or introduce vulnerabilities. A growing concern in AI safety and security.

Data Science: A field that combines AI, statistics, and domain expertise to analyze and interpret data.

Dataset: A structured collection of data used to train AI models.

Decision Tree: A model that splits data into branches based on decision rules, used for classification and regression tasks.

Deep Learning: A subset of machine learning that uses multi-layered neural networks to learn from large amounts of data.

Deep Research: An emerging AI capability where a model autonomously performs extended, multi-step research by searching, reading, and synthesizing information from multiple sources over minutes or hours to produce comprehensive reports.

Deepfake: AI-generated synthetic media, including video, audio, and images, designed to realistically depict events that never occurred or impersonate real people.

Diffusion Model: A generative AI architecture that creates images by learning to gradually remove noise from random static, used in systems like Stable Diffusion and DALL-E 3.

Dimensionality Reduction: Techniques for reducing the number of features in data while preserving important information, making computation more efficient.

Discriminator: In a GAN, the neural network that tries to distinguish between real data and fake data generated by the generator.

DPO (Direct Preference Optimization): A training method that simplifies AI alignment by directly optimizing a model on pairs of preferred and non-preferred outputs, without requiring a separate reward model. An increasingly popular alternative to RLHF.

Dropout: A regularization technique that randomly deactivates neurons during training to prevent overfitting and improve generalization.

E

Edge AI: AI processing performed on local devices rather than cloud servers to improve efficiency and privacy.

Embedding: A numerical representation of data (words, images, or other content) as dense vectors in a continuous space where similar items are positioned closer together.

Emergent Behavior: Unexpected or unintended capabilities that arise as AI systems scale in complexity.

Encoder-Decoder: An architecture where an encoder compresses input into a representation and a decoder generates output from it, commonly used in translation and summarization.

Ensemble Learning: Combining multiple models to make predictions, often achieving better accuracy than any single model alone. Random forests are a common example.

Epoch: One complete pass through the entire training dataset during model training.

Evaluation Metrics: Quantitative measures used to assess model performance, including accuracy, precision, recall, F1 score, and AUC-ROC.

Expert System: An AI that applies predefined rules to make decisions in a specific domain.

Explainable AI (XAI): AI models designed to provide clear, human-understandable explanations for their decisions.

F

Feature: An individual attribute or measurable property used as input in machine learning models.

Feature Engineering: The process of transforming raw data into useful features for machine learning.

Federated Learning: A decentralized learning technique where models are trained across multiple devices without sharing raw data.

Few-Shot Learning: The ability of AI models to learn new tasks from only a small number of examples, often by leveraging prior knowledge.

Fine-Tuning: Adapting a pre-trained model to a specific task or domain by training it further on a smaller, specialized dataset.

Forward Chaining: A reasoning approach that starts with known data and applies rules to derive conclusions.

Foundation Model: A large AI model trained on broad data that can be adapted to many different downstream tasks. Examples include models from OpenAI (GPT series), Anthropic (Claude), Google (Gemini), and Meta (Llama).

Frontier Model: The most capable and advanced AI models available at any given time, typically developed by leading research labs. The term is commonly used in policy discussions and safety research to distinguish cutting-edge systems from older or smaller models.

Function Calling / Tool Use: The ability of a large language model to invoke external tools, APIs, or functions during a conversation, allowing it to perform actions like searching the web, running code, querying databases, or interacting with third-party services.

G

GAN (Generative Adversarial Network): A type of AI model that generates realistic data by pitting two neural networks against each other.

Generative AI: AI systems that create new content, such as text, images, code, music, and video.

Generator: In a GAN, the neural network that creates synthetic data attempting to fool the discriminator.

GPT (Generative Pre-trained Transformer): A family of AI language models capable of generating human-like text.

Gradient Descent: An optimization algorithm used to adjust model parameters to minimize error.

Graph Neural Network (GNN): A type of neural network designed to process data structured as graphs, where information flows between connected nodes. Used in drug discovery, social network analysis, recommendation systems, and fraud detection.

Grounding: Connecting AI outputs to verifiable sources or real-world data to improve accuracy and reduce hallucinations.

Guardrails (AI Safeguards): Constraints designed to prevent AI systems from producing harmful or unintended outputs.

H

Hallucination (AI): When an AI system generates false or misleading information while appearing confident.

Heuristic: A rule-of-thumb approach used to solve problems quickly when an optimal solution is impractical.

Hidden Layer: Layers in a neural network between the input and output layers where intermediate computations occur and features are learned.

Human-in-the-Loop (HITL): A design approach where humans are integrated into AI workflows to provide oversight, corrections, or final decisions.

Hyperparameter: A setting in machine learning models that is manually tuned to improve performance.

Hyperparameter Tuning: The process of systematically searching for optimal hyperparameter values to maximize model performance.

I

Image Recognition: The ability of AI to identify and classify objects in images.

In-Context Learning: The ability of large language models to perform new tasks based solely on examples provided in the prompt, without any weight updates.

Inference: The process of using a trained model to make predictions on new, unseen data.

Inference-Time Compute (Test-Time Compute): The practice of allocating additional computational resources during inference, rather than during training, to improve a model’s reasoning quality. This is the core mechanism behind reasoning models that “think” through problems step by step before responding.

Instruction Tuning: A training process where a pre-trained model is further trained on datasets of instruction-response pairs, teaching it to follow natural language directions. A key step between pre-training and alignment techniques like RLHF.

Intent (NLP): The purpose behind a user’s input in a chatbot or voice assistant.

J

Jailbreaking: Techniques or prompts designed to bypass an AI model’s built-in safety guardrails and content restrictions, causing it to produce outputs it was trained to refuse. Distinct from red teaming, which is conducted by developers to improve safety.

K

Knowledge Distillation: A technique for training a smaller, faster model to mimic the behavior of a larger, more capable model.

Knowledge Graph: A structured representation of facts and their relationships, used to enhance AI’s ability to reason and retrieve information.

L

Label (Data Label): The correct output assigned to training examples in supervised learning.

Large Language Model (LLM): A powerful AI model trained on vast text data to generate human-like responses.

Latent Space: The compressed, abstract representation of data learned by a model, where similar concepts cluster together.

Layer (Neural Network): A collection of neurons at the same depth in a network that process information together.

Learning Rate: A hyperparameter controlling how much model weights are adjusted during each training step. Too high causes instability; too low causes slow learning.

Linguistic Annotation: The process of marking text with grammatical or semantic information for AI training.

LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that adds small, trainable layers to a frozen pre-trained model instead of updating all weights. Widely used in the open-source community to customize large models with limited hardware.

Loss Function: A mathematical function that measures how far a model’s predictions are from the correct answers, guiding the training process.

LSTM (Long Short-Term Memory): A type of recurrent neural network designed to learn long-term dependencies, addressing the vanishing gradient problem.

M

Machine Learning (ML): A subset of AI where algorithms learn patterns from data and make predictions or decisions.

Machine Translation: AI-driven translation of text or speech between languages.

Masked Language Model: A training approach where random words in text are hidden and the model learns to predict them, used by BERT and similar models.

MCP (Model Context Protocol): An open standard that enables AI models to securely connect with external data sources and tools.

Mixture of Experts (MoE): An architecture where multiple specialized sub-networks (experts) handle different aspects of a task, with a gating mechanism routing inputs to the most relevant experts.

Model: The result of training a machine learning algorithm, used to make predictions or classifications.

Model Card: A standardized documentation format that describes a model’s intended use, training data, performance benchmarks, limitations, and ethical considerations, promoting transparency and responsible deployment.

Model Collapse: A degradation in AI model quality that can occur when models are trained on AI-generated content rather than human-created data.

Multi-Agent Systems: Architectures where multiple AI agents collaborate, coordinate, or compete to complete complex tasks, with each agent potentially specializing in a different role or capability.

Multimodal: Refers to systems or models capable of processing, understanding, and generating information from multiple types of data modalities, such as text, images, audio, video, or other sensory inputs.

N

Named Entity Recognition (NER): An NLP task that identifies and classifies named entities in text, such as people, organizations, locations, and dates.

Natural Language Generation (NLG): AI technology that generates human-like text from structured data.

Natural Language Processing (NLP): The field of AI that enables machines to understand and process human language.

Neural Network: A computational model inspired by the human brain, used in deep learning.

Normalization: Techniques for scaling data to a standard range or distribution, improving training stability and convergence.

O

One-Shot Learning: The ability to learn a new concept or task from just a single example.

Open Weights vs. Open Source: An important distinction in AI model distribution. Open weights means a model’s trained parameters are publicly released for use and fine-tuning (e.g., Meta’s Llama), while true open source additionally includes the training data, code, and full methodology needed to reproduce the model from scratch.

Optimizer: The algorithm that adjusts model weights during training to minimize the loss function. Common optimizers include SGD, Adam, and AdaGrad.

Overfitting: A machine learning problem where a model learns noise in the training data instead of general patterns.

P

Parameter: A value learned by an AI model during training that affects its predictions.

Pattern Recognition: AI’s ability to detect patterns in data for classification or prediction.

Perceptron: The simplest form of neural network, consisting of a single neuron that makes binary classifications.

Perplexity: A metric measuring how well a language model predicts text, with lower values indicating better performance.

Pre-training: The initial phase of training a model on a large, general dataset before fine-tuning for specific tasks.

Predictive Analytics: Using AI to forecast future outcomes based on past data.

Prompt (AI Prompt): An input given to AI models, like ChatGPT, to generate a response.

Prompt Engineering: The practice of crafting effective prompts to elicit desired behaviors and outputs from AI models.

Pruning (Model): Reducing model size by removing less important weights or neurons while maintaining performance.

Q

Quantization: Reducing the precision of model weights (e.g., from 32-bit to 8-bit) to decrease memory usage and speed up inference.

Quantum Computing: A field of computing that uses quantum mechanics to process data in ways classical computers cannot.

R

RAG (Retrieval-Augmented Generation): A technique that enhances AI responses by first retrieving relevant information from external sources before generating output.

Reactive AI: AI that responds to inputs without memory or learning from past experiences.

Reasoning Model: An AI model designed to perform explicit, step-by-step reasoning before producing a final answer, often using additional inference-time compute to work through complex problems. Examples include OpenAI’s o-series models and extended thinking features in other platforms.

Recurrent Neural Network (RNN): A neural network architecture designed for sequential data, where outputs from previous steps feed back into the network.

Red Teaming: The practice of deliberately testing AI systems for vulnerabilities, biases, and failure modes by attempting to elicit harmful or unintended behaviors.

Regression: A type of machine learning used for predicting continuous values, such as stock prices.

Regularization: Techniques that prevent overfitting by adding constraints to the learning process, such as L1/L2 penalties or dropout.

Reinforcement Learning (RL): A machine learning approach where an agent learns by interacting with an environment and receiving rewards.

Reward Model: A model trained on human preference data that provides a score or signal indicating how well an AI output aligns with desired behavior. Used as the feedback mechanism in RLHF training.

RLHF (Reinforcement Learning from Human Feedback): A training method where models learn from human preferences and rankings rather than explicit labels, commonly used to align language models.

Robotics: The design and application of machines that can perform tasks autonomously, often using AI.

S

Sampling: In text generation, the process of selecting the next token based on probability distributions, with methods like temperature, top-k, and top-p controlling randomness.

Scaffold / Orchestration: The surrounding code, logic, and infrastructure that connects an AI model to tools, memory, data sources, and workflows, enabling it to function as part of a larger agentic system.

Scaling Laws: Empirical relationships describing how model performance improves with increases in model size, training data, and compute.

Self-Attention: A mechanism allowing each element in a sequence to attend to all other elements, enabling models to capture long-range dependencies. Core to transformer architecture.

Self-Supervised Learning: A training paradigm where models learn from unlabeled data by creating their own supervisory signals, such as predicting masked words.

Semantic Search: Search that understands the meaning and intent behind queries rather than just matching keywords.

Sentiment Analysis: AI’s ability to determine the emotional tone of text, such as positive or negative sentiment.

Small Language Model (SLM): A compact language model optimized for efficiency, speed, and on-device deployment, typically with fewer parameters than large language models. Examples include Microsoft’s Phi series and Google’s Gemma family.

Softmax: A function that converts a vector of numbers into a probability distribution, commonly used in classification output layers.

Speech Recognition: AI’s ability to convert spoken words into text.

State Space Model (SSM): An alternative to transformer architecture that processes sequences using principles from control theory, offering more efficient handling of very long sequences. Mamba is a notable example gaining traction as a potential complement or alternative to transformers.

Structured Output: When an AI model returns responses in a predefined format such as JSON, XML, or a specific schema, rather than free-form text. Increasingly important for integrating AI into software workflows and automated pipelines.

Supervised Learning: A type of machine learning where models learn from labeled training data.

Synthetic Data: Artificially generated data used to train AI models, often when real data is scarce, sensitive, or biased.

System Prompt: Instructions provided to an AI model that define its behavior, personality, or constraints for a given session.

T

Temperature: A parameter controlling the randomness of AI-generated text. Higher values produce more creative but less predictable outputs; lower values are more focused and deterministic.

Test Data: A dataset used to evaluate a trained model’s performance.

Text-to-Image: AI systems that generate images from text descriptions, such as DALL-E, Midjourney, and Stable Diffusion.

Token: A unit of text processed by AI, such as words or subwords.

Tokenization: The process of breaking text into tokens for processing by language models.

Top-k Sampling: A text generation method that limits choices to the k most probable next tokens.

Top-p (Nucleus) Sampling: A text generation method that considers the smallest set of tokens whose cumulative probability exceeds p.

Training Data: The dataset used to teach a machine learning model.

Transfer Learning: Reusing a pre-trained model on a new, related task to speed up learning.

Transformer: A deep learning architecture that revolutionized NLP, used in models like GPT. Built on self-attention mechanisms that process all tokens in parallel.

Turing Test: A test to determine if an AI can mimic human intelligence well enough to fool a human evaluator.

U

Underfitting: When a model is too simple to capture patterns in the data, resulting in poor performance on both training and test sets.

Unstructured Data: Data that lacks a predefined format, such as images, videos, or raw text.

Unsupervised Learning: A type of machine learning where AI discovers patterns in data without labels.

V

Validation Data: Data used to tune a machine learning model before final testing.

Vanishing Gradient Problem: A difficulty in training deep networks where gradients become extremely small, preventing early layers from learning effectively.

Variance (ML): A measure of how much a model’s predictions change based on different datasets.

Vector Database: A database optimized for storing and searching high-dimensional vectors (embeddings), enabling fast similarity searches for AI applications.

Vibe Coding: The practice of building software by describing what you want in natural language and letting AI generate the code, with the developer guiding direction rather than writing code line by line. Coined by Andrej Karpathy in early 2025.

Vision-Language Model (VLM): A model specifically designed to jointly process and reason over both visual (image or video) and textual inputs, enabling tasks like image captioning, visual question answering, and document analysis.

Vision Transformer (ViT): A transformer architecture adapted for image processing, treating images as sequences of patches.

W

Weak AI (Narrow AI): AI that is specialized for a specific task rather than general intelligence.

Weight: A numerical parameter in a neural network that determines the strength of connections between neurons, adjusted during training.

Word Embedding: Dense vector representations of words where semantically similar words have similar vectors. Word2Vec and GloVe are classic examples.

Word2Vec: A pioneering technique for learning word embeddings from large text corpora using shallow neural networks.

X

XAI: See Explainable AI (XAI).

Y

YOLO (You Only Look Once): A family of real-time object detection models that process an entire image in a single pass through the network, making them extremely fast. Widely used in computer vision applications like autonomous driving, surveillance, and robotics.

Z

Zero-Shot Learning: AI’s ability to make predictions for categories it has never seen before by relying on related knowledge.