Artificial Intelligence Glossary 📑
Comprehensive AI Glossary
A complete reference guide covering artificial intelligence terminology from foundational concepts to cutting-edge developments.
A
Activation Function: A mathematical function applied to the output of a neuron that introduces non-linearity, enabling neural networks to learn complex patterns. Common examples include ReLU, sigmoid, and tanh.
Agent: A software program designed to perform specific, narrowly defined tasks on behalf of a user within predefined boundaries.
Agentic AI: Refers to AI systems that can autonomously pursue complex goals by planning, using tools, and making decisions across multiple steps without constant human guidance.
AGI (Artificial General Intelligence): A hypothetical AI capable of performing any intellectual task a human can. It remains a future goal of AI research.
AI (Artificial Intelligence): The field of computer science focused on creating machines that simulate human intelligence, such as learning, decision-making, and language understanding.
AI Alignment: The process of ensuring AI systems behave in accordance with human intentions and values.
AI Ethics: The study of ethical issues surrounding AI, including bias, transparency, and fairness.
AI Safety: The field concerned with ensuring AI systems do not pose risks to humans.
Algorithm: A step-by-step process or set of rules used for problem-solving and decision-making in AI and machine learning.
ANI (Artificial Narrow Intelligence): Also known as weak AI, it is designed for specific tasks, such as image recognition or language translation.
Annotation: The process of labeling data (text, images, audio) with metadata to create training datasets for supervised learning.
Anthropomorphism: The attribution of human traits, emotions, or intentions to AI systems.
Attention Mechanism: A technique that allows models to focus on relevant parts of input data when making predictions, weighing the importance of different elements dynamically.
Autoencoder: A neural network that learns to compress data into a compact representation and then reconstruct it, useful for dimensionality reduction and anomaly detection.
Autoregressive Model: A model that generates output sequentially, with each new element conditioned on previously generated elements. GPT models work this way.
Autonomous: Describes AI systems or robots that operate without human intervention, such as self-driving cars.
B
Backpropagation: The fundamental algorithm for training neural networks, calculating gradients of the loss function with respect to weights by propagating errors backward through the network.
Backward Chaining: A reasoning method that starts with a goal and works backward to determine necessary conditions.
Batch Size: The number of training examples processed together before updating model weights. Larger batches provide more stable gradients but require more memory.
Benchmark: A standardized test or dataset used to evaluate and compare the performance of different AI models on specific tasks.
BERT (Bidirectional Encoder Representations from Transformers): A transformer-based language model that reads text in both directions simultaneously, excelling at understanding context for tasks like question answering and sentiment analysis.
Bias: Unintended preferences or prejudices in AI systems, often due to imbalanced training data.
Big Data: Large and complex datasets that require AI and advanced analytics to process and extract insights.
Black Box (AI): A term for AI models, especially deep learning networks, whose internal decision-making process is difficult to interpret.
Bounding Box: A rectangular box used in computer vision to define the location of an object in an image.
C
Chain-of-Thought (CoT) Prompting: A prompting technique that encourages AI models to break down complex reasoning into intermediate steps, improving accuracy on logical and mathematical tasks.
Chatbot: An AI system designed to simulate conversation with users through text or voice.
Classification: A machine learning task that assigns data to predefined categories, such as spam detection in emails.
Clustering: An unsupervised learning technique that groups similar data points together.
Cognitive Computing: AI systems designed to mimic human thinking processes.
Computer Vision: A field of AI focused on enabling computers to interpret visual data, such as images and videos.
Constitutional AI: An approach to AI safety where models are trained to follow a set of principles or “constitution” that guides their behavior toward helpfulness and harmlessness.
Context Window: The maximum amount of text (measured in tokens) that a language model can process at once, including both input and output.
Convolutional Neural Networks (CNN): A specialized deep learning model, inspired by the visual cortex, excellent at recognizing patterns in grid-like data like images, audio, and video, by using layers of filters (kernels) to detect features (edges, shapes, textures) and progressively build complex representations, reducing data size with pooling, and then classifying through fully connected layers.
Corpus: A large collection of text used for training AI language models.
Cross-Validation: A technique for evaluating model performance by splitting data into multiple subsets, training on some and testing on others, then averaging results.
D
Data Augmentation: Techniques for artificially expanding training datasets by creating modified versions of existing data, such as rotating images or adding noise to audio.
Data Mining: The process of discovering patterns and insights in large datasets using AI and statistical techniques.
Data Science: A field that combines AI, statistics, and domain expertise to analyze and interpret data.
Dataset: A structured collection of data used to train AI models.
Decision Tree: A model that splits data into branches based on decision rules, used for classification and regression tasks.
Deep Learning: A subset of machine learning that uses multi-layered neural networks to learn from large amounts of data.
Diffusion Model: A generative AI architecture that creates images by learning to gradually remove noise from random static, used in systems like Stable Diffusion and DALL-E 3.
Dimensionality Reduction: Techniques for reducing the number of features in data while preserving important information, making computation more efficient.
Discriminator: In a GAN, the neural network that tries to distinguish between real data and fake data generated by the generator.
Dropout: A regularization technique that randomly deactivates neurons during training to prevent overfitting and improve generalization.
E
Edge AI: AI processing performed on local devices rather than cloud servers to improve efficiency and privacy.
Embedding: A numerical representation of data (words, images, or other content) as dense vectors in a continuous space where similar items are positioned closer together.
Emergent Behavior: Unexpected or unintended capabilities that arise as AI systems scale in complexity.
Encoder-Decoder: An architecture where an encoder compresses input into a representation and a decoder generates output from it, commonly used in translation and summarization.
Ensemble Learning: Combining multiple models to make predictions, often achieving better accuracy than any single model alone. Random forests are a common example.
Epoch: One complete pass through the entire training dataset during model training.
Evaluation Metrics: Quantitative measures used to assess model performance, including accuracy, precision, recall, F1 score, and AUC-ROC.
Expert System: An AI that applies predefined rules to make decisions in a specific domain.
Explainable AI (XAI): AI models designed to provide clear, human-understandable explanations for their decisions.
F
Feature: An individual attribute or measurable property used as input in machine learning models.
Feature Engineering: The process of transforming raw data into useful features for machine learning.
Federated Learning: A decentralized learning technique where models are trained across multiple devices without sharing raw data.
Few-Shot Learning: The ability of AI models to learn new tasks from only a small number of examples, often by leveraging prior knowledge.
Fine-Tuning: Adapting a pre-trained model to a specific task or domain by training it further on a smaller, specialized dataset.
Forward Chaining: A reasoning approach that starts with known data and applies rules to derive conclusions.
Foundation Model: A large AI model trained on broad data that can be adapted to many different downstream tasks, such as GPT-4 or Claude.
G
GAN (Generative Adversarial Network): A type of AI model that generates realistic data by pitting two neural networks against each other.
Generative AI: AI systems that create new content, such as text, images, and music.
Generator: In a GAN, the neural network that creates synthetic data attempting to fool the discriminator.
GPT (Generative Pre-trained Transformer): A family of AI language models capable of generating human-like text.
Gradient Descent: An optimization algorithm used to adjust model parameters to minimize error.
Grounding: Connecting AI outputs to verifiable sources or real-world data to improve accuracy and reduce hallucinations.
Guardrails (AI Safeguards): Constraints designed to prevent AI systems from producing harmful or unintended outputs.
H
Hallucination (AI): When an AI system generates false or misleading information while appearing confident.
Heuristic: A rule-of-thumb approach used to solve problems quickly when an optimal solution is impractical.
Hidden Layer: Layers in a neural network between the input and output layers where intermediate computations occur and features are learned.
Human-in-the-Loop (HITL): A design approach where humans are integrated into AI workflows to provide oversight, corrections, or final decisions.
Hyperparameter: A setting in machine learning models that is manually tuned to improve performance.
Hyperparameter Tuning: The process of systematically searching for optimal hyperparameter values to maximize model performance.
I
Image Recognition: The ability of AI to identify and classify objects in images.
In-Context Learning: The ability of large language models to perform new tasks based solely on examples provided in the prompt, without any weight updates.
Inference: The process of using a trained model to make predictions on new, unseen data.
Intent (NLP): The purpose behind a user’s input in a chatbot or voice assistant.
K
Knowledge Distillation: A technique for training a smaller, faster model to mimic the behavior of a larger, more capable model.
Knowledge Graph: A structured representation of facts and their relationships, used to enhance AI’s ability to reason and retrieve information.
L
Label (Data Label): The correct output assigned to training examples in supervised learning.
Large Language Model (LLM): A powerful AI model trained on vast text data to generate human-like responses.
Latent Space: The compressed, abstract representation of data learned by a model, where similar concepts cluster together.
Layer (Neural Network): A collection of neurons at the same depth in a network that process information together.
Learning Rate: A hyperparameter controlling how much model weights are adjusted during each training step. Too high causes instability; too low causes slow learning.
Linguistic Annotation: The process of marking text with grammatical or semantic information for AI training.
Loss Function: A mathematical function that measures how far a model’s predictions are from the correct answers, guiding the training process.
LSTM (Long Short-Term Memory): A type of recurrent neural network designed to learn long-term dependencies, addressing the vanishing gradient problem.
M
Machine Learning (ML): A subset of AI where algorithms learn patterns from data and make predictions or decisions.
Machine Translation: AI-driven translation of text or speech between languages.
Masked Language Model: A training approach where random words in text are hidden and the model learns to predict them, used by BERT and similar models.
MCP (Model Context Protocol): An open standard that enables AI models to securely connect with external data sources and tools.
Mixture of Experts (MoE): An architecture where multiple specialized sub-networks (experts) handle different aspects of a task, with a gating mechanism routing inputs to the most relevant experts.
Model: The result of training a machine learning algorithm, used to make predictions or classifications.
Model Collapse: A degradation in AI model quality that can occur when models are trained on AI-generated content rather than human-created data.
Multimodal: Refers to systems or models capable of processing, understanding, and generating information from multiple types of data modalities, such as text, images, audio, video, or other sensory inputs.
N
Named Entity Recognition (NER): An NLP task that identifies and classifies named entities in text, such as people, organizations, locations, and dates.
Natural Language Generation (NLG): AI technology that generates human-like text from structured data.
Natural Language Processing (NLP): The field of AI that enables machines to understand and process human language.
Neural Network: A computational model inspired by the human brain, used in deep learning.
Normalization: Techniques for scaling data to a standard range or distribution, improving training stability and convergence.
O
One-Shot Learning: The ability to learn a new concept or task from just a single example.
Optimizer: The algorithm that adjusts model weights during training to minimize the loss function. Common optimizers include SGD, Adam, and AdaGrad.
Overfitting: A machine learning problem where a model learns noise in the training data instead of general patterns.
P
Parameter: A value learned by an AI model during training that affects its predictions.
Pattern Recognition: AI’s ability to detect patterns in data for classification or prediction.
Perceptron: The simplest form of neural network, consisting of a single neuron that makes binary classifications.
Perplexity: A metric measuring how well a language model predicts text, with lower values indicating better performance.
Pre-training: The initial phase of training a model on a large, general dataset before fine-tuning for specific tasks.
Predictive Analytics: Using AI to forecast future outcomes based on past data.
Prompt (AI Prompt): An input given to AI models, like ChatGPT, to generate a response.
Prompt Engineering: The practice of crafting effective prompts to elicit desired behaviors and outputs from AI models.
Pruning (Model): Reducing model size by removing less important weights or neurons while maintaining performance.
Q
Quantization: Reducing the precision of model weights (e.g., from 32-bit to 8-bit) to decrease memory usage and speed up inference.
Quantum Computing: A field of computing that uses quantum mechanics to process data in ways classical computers cannot.
R
RAG (Retrieval-Augmented Generation): A technique that enhances AI responses by first retrieving relevant information from external sources before generating output.
Reactive AI: AI that responds to inputs without memory or learning from past experiences.
Recurrent Neural Network (RNN): A neural network architecture designed for sequential data, where outputs from previous steps feed back into the network.
Red Teaming: The practice of deliberately testing AI systems for vulnerabilities, biases, and failure modes by attempting to elicit harmful or unintended behaviors.
Regression: A type of machine learning used for predicting continuous values, such as stock prices.
Regularization: Techniques that prevent overfitting by adding constraints to the learning process, such as L1/L2 penalties or dropout.
Reinforcement Learning (RL): A machine learning approach where an agent learns by interacting with an environment and receiving rewards.
RLHF (Reinforcement Learning from Human Feedback): A training method where models learn from human preferences and rankings rather than explicit labels, commonly used to align language models.
Robotics: The design and application of machines that can perform tasks autonomously, often using AI.
S
Sampling: In text generation, the process of selecting the next token based on probability distributions, with methods like temperature, top-k, and top-p controlling randomness.
Scaling Laws: Empirical relationships describing how model performance improves with increases in model size, training data, and compute.
Self-Attention: A mechanism allowing each element in a sequence to attend to all other elements, enabling models to capture long-range dependencies. Core to transformer architecture.
Self-Supervised Learning: A training paradigm where models learn from unlabeled data by creating their own supervisory signals, such as predicting masked words.
Semantic Search: Search that understands the meaning and intent behind queries rather than just matching keywords.
Sentiment Analysis: AI’s ability to determine the emotional tone of text, such as positive or negative sentiment.
Softmax: A function that converts a vector of numbers into a probability distribution, commonly used in classification output layers.
Speech Recognition: AI’s ability to convert spoken words into text.
Supervised Learning: A type of machine learning where models learn from labeled training data.
Synthetic Data: Artificially generated data used to train AI models, often when real data is scarce, sensitive, or biased.
System Prompt: Instructions provided to an AI model that define its behavior, personality, or constraints for a given session.
T
Temperature: A parameter controlling the randomness of AI-generated text. Higher values produce more creative but less predictable outputs; lower values are more focused and deterministic.
Test Data: A dataset used to evaluate a trained model’s performance.
Text-to-Image: AI systems that generate images from text descriptions, such as DALL-E, Midjourney, and Stable Diffusion.
Token: A unit of text processed by AI, such as words or subwords.
Tokenization: The process of breaking text into tokens for processing by language models.
Top-k Sampling: A text generation method that limits choices to the k most probable next tokens.
Top-p (Nucleus) Sampling: A text generation method that considers the smallest set of tokens whose cumulative probability exceeds p.
Training Data: The dataset used to teach a machine learning model.
Transfer Learning: Reusing a pre-trained model on a new, related task to speed up learning.
Transformer: A deep learning architecture that revolutionized NLP, used in models like GPT. Built on self-attention mechanisms that process all tokens in parallel.
Turing Test: A test to determine if an AI can mimic human intelligence well enough to fool a human evaluator.
U
Underfitting: When a model is too simple to capture patterns in the data, resulting in poor performance on both training and test sets.
Unstructured Data: Data that lacks a predefined format, such as images, videos, or raw text.
Unsupervised Learning: A type of machine learning where AI discovers patterns in data without labels.
V
Validation Data: Data used to tune a machine learning model before final testing.
Vanishing Gradient Problem: A difficulty in training deep networks where gradients become extremely small, preventing early layers from learning effectively.
Variance (ML): A measure of how much a model’s predictions change based on different datasets.
Vector Database: A database optimized for storing and searching high-dimensional vectors (embeddings), enabling fast similarity searches for AI applications.
Vision Transformer (ViT): A transformer architecture adapted for image processing, treating images as sequences of patches.
W
Weak AI (Narrow AI): AI that is specialized for a specific task rather than general intelligence.
Weight: A numerical parameter in a neural network that determines the strength of connections between neurons, adjusted during training.
Word Embedding: Dense vector representations of words where semantically similar words have similar vectors. Word2Vec and GloVe are classic examples.
Word2Vec: A pioneering technique for learning word embeddings from large text corpora using shallow neural networks.
Z
Zero-Shot Learning: AI’s ability to make predictions for categories it has never seen before by relying on related knowledge.



