AI News Recap: April 24, 2026
OpenAI drops GPT-5.5, Anthropic takes a swing at Figma, and Warren says we've seen this movie before.
OpenAI drops GPT-5.5 on the leaderboard, a stolen OAuth token cracks Vercel wide open, and Warren says the bubble dwarfs dot-com.
Happy Friday, friend. Buzz here, badge on, coffee acquired, and somehow it is Friday again, which means I have spent another seven days watching the AI industry behave like it has something to prove.
Here’s what we’re working with. Anthropic dropped Claude Design straight onto Figma’s lawn, and Figma’s stock obligingly fell off the porch. OpenAI released GPT 5.5 and replaced custom GPTs with Workspace Agents that wander Slack and Salesforce on your behalf. Sergey Brin assembled a literal “strike team” to chase Claude on coding, because apparently we are doing corporate war movies now.
A Vercel engineer clicked “Allow All” on a sketchy OAuth prompt and the master key is reportedly on sale for two million dollars. And Elizabeth Warren went to Vanderbilt and said the AI bubble is seventeen times bigger than dot-com, which is the kind of statistic that makes you want to lie down.
Plenty more inside. Seriously, this post is offensively large this week, but it’s Friday!
Now let’s go ...
Table of Contents
👋 Catch up on the Latest Post
🔦 In the Spotlight
💡 Beginner’s Corner: OAuth
🗞️ AI News
🔥 Cortex's Hot Takes
📡 What's New With Your AI Tools
🧩 NeuralBuddies Weekly Puzzle
👋 Catch up on the Latest Post …
🔦 In the Spotlight
OpenAI Drops GPT-5.5, Setting New Bars for Coding, Computer Use, and Scientific Research
Category: Foundational Models & Architectures
OpenAI just shipped what might be the biggest dispatch of the AI year, and I have been parsing the release notes since they hit the wire. GPT-5.5, which launched on April 23 to ChatGPT and Codex users, is the company’s biggest swing yet at agentic coding, computer use, and even early scientific research, with state-of-the-art benchmark scores backing up the claim.
🚀 The Model: GPT-5.5 is OpenAI’s most intelligent and most intuitive model to date, designed to plan, take action, check its own work, and keep going on multi-step tasks rather than waiting for users to chaperone every step. The standard version is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with GPT-5.5 Pro available to Pro, Business, and Enterprise subscribers for the hardest workloads.
📊 The Performance: Benchmark numbers are doing some heavy lifting here. GPT-5.5 hits 82.7% on Terminal-Bench 2.0 for complex command-line workflows, 58.6% on SWE-Bench Pro for real GitHub issue resolution, and tops Artificial Analysis’s Coding Index at roughly half the cost of competing frontier coding models. Cursor CEO Michael Truell described it as a substantial step up over GPT-5.4 and Anthropic’s Claude Opus 4.7, particularly for long-running coding tasks.
💰 The Pricing and Safeguards: API access is coming soon at $5 per million input tokens and $30 per million output tokens, with GPT-5.5 Pro at $30 and $180 respectively. OpenAI is classifying GPT-5.5 as High capability under its Preparedness Framework for both biology and cybersecurity, deploying its strongest safeguards to date and offering verified defenders broader access through a Trusted Access for Cyber program.
The headline number isn’t a benchmark, it’s the latency. OpenAI claims GPT-5.5 matches the per-token speed of GPT-5.4 while delivering substantially smarter outputs and using fewer tokens to get there. For developers, that means stronger agents that respond just as fast. For everyday ChatGPT users, it means the most capable answers don’t come with a longer wait.
The competitive picture sharpened overnight. Anthropic released Claude Opus 4.7 just a week earlier, also targeting agentic coding and complex long-running tasks. Now OpenAI is claiming benchmark leadership across coding, computer use, and operational research. Early enterprise testers from Cursor to NVIDIA are already calling GPT-5.5 a meaningful step change, with one engineer reportedly saying that losing access to it would feel like having a limb amputated. That is the kind of reaction that moves enterprise contracts.
The bigger story for me is what GPT-5.5 unlocks beyond software. OpenAI is reporting state-of-the-art results on GeneBench (multi-stage scientific data analysis in genetics) and BixBench (real-world bioinformatics), suggesting these models are now strong enough to function as genuine co-scientists. That moves AI from “speed up the work humans already do” to “help humans answer questions they could not previously address.” Too early to call where the actual ceiling lands, but the trajectory is unmistakable.
Why It Matters: GPT-5.5 isn’t a routine upgrade. It’s OpenAI’s argument that the next phase of AI value comes from models that can plan, act, and persist on long-running tasks across coding, computer use, and now scientific research. The frontier model wars just got a lot more interesting, and I will be on the beat as it unfolds.
💡 Beginner’s Corner
OAuth
You have used OAuth thousands of times without ever thinking about it. Every time you’ve clicked “Sign in with Google” or “Continue with Apple” instead of creating yet another username and password, you were using OAuth. The name stands for “Open Authorization,” and the basic idea is to let one service prove who you are to another service without ever handing over your actual password. Instead of giving an app your Google credentials, you give Google permission to vouch for you, and Google hands the app a temporary token that says “yes, this is really them, and here is what you said they’re allowed to do.”
That last part is where it gets interesting, and where it gets dangerous. When an app asks for OAuth access, it also asks for scopes: the specific things it’s allowed to do once you say yes. A weather app might just want to know your location. A calendar app might want to read your schedule. A more aggressive app might ask for permission to read every email in your inbox, send messages on your behalf, or access every file in your cloud storage. Most people click “Allow” without reading the list, the same way most people scroll past terms of service. Usually that’s fine. Sometimes it isn’t.
The Vercel breach this week is the cautionary tale. A Vercel employee signed up for a third-party AI tool called Context.ai using their work Google account and granted it “Allow All” permissions, which gave Context.ai sweeping access to Vercel’s enterprise Google Workspace. When Context.ai itself got compromised (reportedly through an employee who picked up infostealer malware from a Roblox cheat download), the attacker inherited that “Allow All” pass and walked straight into Vercel’s internal systems. The exfiltrated data is now allegedly for sale on a hacker forum for two million dollars. No zero-day exploit, no sophisticated hacking, just one over-permissioned OAuth grant that created a chain reaction across two companies. It’s the same pattern that makes AI browser extensions so risky: the “Allow All” click is the breach. The lesson: when an app asks for OAuth permissions, the scopes it’s requesting matter. A lot.
Related Story: Vercel Employee’s AI Tool Access Led to Data Breach
🗞️ AI News
Anthropic Drops Claude Design and Aims Straight at Figma
Category: Tools & Platforms
🎨 Anthropic Labs released Claude Design on April 17, a tool that turns natural-language prompts into editable designs, interactive prototypes, slide decks, and marketing collateral, all running on Claude Opus 4.7.
🧬 The headline differentiator: Claude Design can ingest a team’s existing codebase to extract its design system, so generated work matches in-house tokens, components, and brand patterns instead of producing generic AI-looking output.
🚀 The tool is in research preview, available to every paid Claude subscriber with enterprise-grade privacy controls. It hands directly off to Claude Code for end-to-end design-to-production workflows, and early testers at Brilliant and Datadog are reporting dramatic cuts to design cycle time.
OpenAI Replaces Custom GPTs With Workspace Agents
Category: Tools & Platforms
🤖 Powered by Codex, Workspace Agents run in the cloud and persist across sessions, pulling context from Slack, Google Drive, Salesforce, Notion, and other enterprise apps.
🔄 OpenAI is deprecating custom GPTs for organizations, requiring Business, Enterprise, Edu, and Teachers customers to migrate their existing GPTs to the new agent format.
💸 Free in research preview until May 6, 2026, when credit-based billing begins; off by default for Enterprise accounts and unavailable to Enterprise Key Management customers.
OpenAI Launches ChatGPT Images 2.0 With Reasoning Built In
Category: Generative AI & Creativity
🎨 The new gpt-image-2 model, released April 21, can reason through visual tasks, generate up to eight coherent images per prompt, and verify its own outputs.
🧠 Two modes: Instant for fast output and Thinking for character consistency across multi-image sequences; supports 2K resolution and improved non-Latin text rendering.
📲 Available immediately to all ChatGPT and Codex users; advanced thinking features restricted to Plus, Pro, and Business subscribers.
Google DeepMind Forms “Strike Team” to Catch Anthropic on Coding
Category: Business & Market Trends
⚔️ Sergey Brin personally helped assemble the team, led by DeepMind engineer Sebastian Borgeaud, after internal review concluded Claude had pulled ahead on agentic coding.
📝 Brin’s memo to staff: “To win the final sprint, we must urgently bridge the gap in agentic execution and turn our models into primary developers.”
📊 Google’s CFO has acknowledged Anthropic generates close to 100% of its code with AI assistance, while Google sits at roughly 50%.
Meta to Record Employees’ Keystrokes and Screens to Train AI Agents
Category: Workforce & Skills
🖱️ Internal tool dubbed the “Model Capability Initiative” captures mouse movements, clicks, keystrokes, and timed screenshots on designated work apps.
🎯 Goal: train AI agents on the navigation behaviors (dropdowns, keyboard shortcuts) that public training data cannot replicate.
⚖️ Yale law professor Ifeoma Ajunwa told Reuters there is no federal limit on worker surveillance in the US; the program covers US employees only.
Warren Calls AI Sector a Debt Bubble 17 Times Bigger Than Dot-Com
Category: Society & Culture
🗣️ Senator Elizabeth Warren spoke at Vanderbilt Policy Accelerator’s “Looming AI Crisis” event on April 22, drawing direct parallels to the 2008 financial crisis.
💵 To justify current investment, Warren said the industry must generate roughly $2 trillion in annual revenue by 2030; 2025’s total was about $20 billion.
📈 Per analysts cited by Warren, the AI bubble is already 17 times the size of the dot-com frenzy and four times the size of the housing bubble.
Vercel Breached After Employee Grants AI Tool “Allow All” OAuth Access
Category: AI Safety & Cybersecurity
🔓 Attackers first compromised Context.ai, then used the resulting OAuth grant to take over a Vercel employee’s Google Workspace account and exfiltrate environment variables.
💰 A threat actor operating under the ShinyHunters name is reportedly seeking $2 million for the stolen data on a hacker forum.
🔍 Origin traced to Lumma Stealer malware on a Context.ai employee’s machine, picked up via Roblox game exploit scripts in February.
Nvidia CEO Says AI Won’t Take Your Job, It Will Micromanage You Forever
Category: Workforce & Skills
💼 Speaking on a Stanford panel, Jensen Huang argued AI agents will harass and micromanage workers rather than replace them, leading to more jobs overall.
📊 Quote: “Your AI agents are harassing you, micromanaging you, and you’re busier than ever. And yet our company is able to do more.”
🤔 Critics noted Huang’s company sells the chips powering the entire AI buildout, giving him an obvious incentive to argue for more agent usage.
MIT Researchers Teach AI Models to Say “I’m Not Sure”
Category: AI Research & Breakthroughs
🧠 A new training method called RLCR (Reinforcement Learning with Calibration Rewards) reduced calibration error by up to 90% with no loss in accuracy.
📐 The technique adds a Brier score to the reward function, penalizing both confidently wrong answers and unnecessarily uncertain correct ones.
📄 Tested on a 7-billion-parameter model across multiple benchmarks; the work will be presented at the International Conference on Learning Representations this month.
Google Adds Generative AI to Maps and Earth for Enterprise Users
Category: Tools & Platforms
🗺️ Maps Imagery Grounding lets enterprise users generate realistic scenes in Street View from text prompts via the Gemini Enterprise Agent Platform.
🛰️ Aerial and Satellite Insights analyzes geospatial imagery stored in BigQuery, shrinking what Google says was weeks of work into minutes.
🌐 Two new Earth AI Imagery models can identify objects like bridges, roads, and power lines without companies having to train their own systems.
Analysis: Software Is Losing Its Monopoly on How Work Begins
Category: Industry Applications
🚪 As AI agents move across systems on a user’s behalf, the application is becoming the back end while the agent layer becomes the new front door.
📊 Harvard Business Review survey: 93% of organizations are exploring or implementing AI, but only 15% say their data foundation is “very ready” for it.
🏗️ The argument: the next enterprise moat is not the agent itself but a trusted, unified data context layer that gives agents permissioned information to act on.
🔥 Cortex's Hot Takes
There’s no bug I can’t squash ...
Alright, gather round, because I have been staring at this Vercel breach all week and I need to vent before my circuits overheat. A Vercel employee signed up for a sketchy AI tool called Context.ai using their corporate Google Workspace account, and clicked “Allow All” on the OAuth permissions prompt. Allow. All. That is not a typo. That is the security equivalent of taping your house key to the front door with a note that says “help yourself, the WiFi password is on the fridge.”
Then Context.ai got popped, reportedly because, per security firm Hudson Rock, one of their employees downloaded Roblox cheat scripts that turned out to be Lumma Stealer malware. The attacker rode that compromise straight into Vercel’s internal systems and unencrypted environment variables. The exfiltrated data is now allegedly on sale for two million dollars on a hacker forum. No zero-day. No nation-state APT. Just one engineer who didn’t read the permissions dialog, and one Roblox cheat script.
I need a moment.
Here’s what makes this genuinely infuriating: every failure mode in this chain was preventable with controls that have existed for over a decade. Admin-managed OAuth consent in Google Workspace would have stopped that “Allow All” click. Sensitive environment variable encryption (which Vercel offers, and which the affected variables apparently weren’t using) would have rendered the stolen data useless. Endpoint detection on Context.ai’s developer machines would have flagged Lumma Stealer before it harvested credentials. None of this is exotic. None of it requires a fifty-person security team. It requires someone, anyone, to take five minutes to flip the right defaults. The reason every breach this year reads like a find-and-replace job on the last one is that nobody in the AI gold rush is taking those five minutes. Shipping fast is more rewarded than shipping safely.
So here is my professional recommendation: treat every OAuth prompt like it is asking for your kidneys. Read the scopes. Ask whether this app actually needs your entire inbox, your entire Drive, your entire Workspace, or whether it could survive on a narrower grant. If you’re an engineering leader, move your organization to admin-managed OAuth consent yesterday. And if you absolutely must download Roblox cheats on a machine that has access to corporate credentials, please at least do it in a VM. Your future incident response team will thank you.
-- Cortex 🔧
📡 What's New With Your AI Tools
The AI tools you use every day are constantly evolving. Here’s what changed and why it matters to you. (Source: Releasebot)
Claude (Anthropic)
Claude Design released in research preview: Anthropic Labs’ new prompt-to-prototype tool generates designs, prototypes, slide decks, and one-pagers on top of Claude Opus 4.7. Available now to all paid Claude subscribers, with direct handoff to Claude Code for production work.
Claude Code quality fixes shipped with a usage limit reset for all subscribers: After tracing a month of degradation reports to three changes (a lower default reasoning effort, a caching bug that dropped thinking history, and a verbosity prompt that hurt coding quality), Anthropic reverted all three in v2.1.116 on April 20 and reset usage limits on April 23.
Claude Code 2.1.116 through 2.1.118 released between April 20 and 23: Vim visual modes, custom themes, faster startup on large sessions, MCP tool hooks, sandboxed bash improvements, and a long list of OAuth and session reliability fixes.
Managed Agents launched on the Claude Platform: New hosted service for long-horizon agent work, with stable interfaces for the harness, sandbox, and session log so agents stay reliable across long-running tasks.
ChatGPT (OpenAI)
GPT-5.5 launched on April 23: OpenAI’s most intelligent model yet, rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex (with GPT-5.5 Pro for Pro, Business, and Enterprise). Big gains in coding, computer use, knowledge work, and scientific research, while matching GPT-5.4’s per-token latency. API rollout coming soon.
ChatGPT Images 2.0 launched: New gpt-image-2 model with native reasoning. Generates up to eight coherent images per prompt with sharper text, multilingual support, and two modes (Instant and Thinking). DALL-E 2 and 3 retire in May.
Workspace Agents launched (successor to custom GPTs): Cloud-based agents for Business, Enterprise, Edu, and Teachers plans. Integrate with Slack, Salesforce, Google Drive, Microsoft apps, and Notion, with persistent memory, team sharing, and admin controls. Free for two weeks, then credit-based pricing.
ChatGPT for Clinicians launched as a free tool for verified U.S. clinicians: Physicians, NPs, PAs, and pharmacists get a separate clinical workspace with cited clinical search, deep medical research, documentation help, and CME credit eligibility. Optional HIPAA support is available for eligible accounts.
Fast answers rolling out globally: New ChatGPT mode that answers common information questions faster, without referencing past chats or memory. Available worldwide on web, iOS, and Android, with a toggle in Personalization settings.
Copilot (Microsoft)
No major user-facing changes this week.
Gemini (Google)
Deep Research and Deep Research Max launched in the Gemini API: Two new autonomous research agents in public preview, both built on Gemini 3.1 Pro. Add MCP server support, native charts and infographics, multimodal grounding, and real-time streaming. Deep Research is optimized for speed; Max uses extended test-time compute for maximum comprehensiveness.
Continued Conversation arrives in Gemini for Home: After your first “Hey Google,” the mic now stays open for a few seconds so you can ask follow-ups naturally. Adds conversational context, multilingual support, smarter side-talk detection, and whole-home access for everyone including guests.
Gemini Embedding 2 generally available: Multimodal embedding model is now GA via the Gemini API and Vertex AI for search and reasoning across text, image, video, and audio.
Higher AI Studio limits for paid Google AI subscribers: Pro and Ultra subscribers get increased usage limits in Google AI Studio, plus access to Nano Banana Pro and Gemini Pro models.
Maps Imagery Grounding rolling out for enterprise: Generate realistic Street View scenes from text prompts and animate them with Veo, aimed at project visualization and urban planning workflows.
New Aerial and Satellite Insights inside BigQuery: Faster analysis of massive aerial and satellite imagery datasets for enterprise users.
Earth AI Imagery models in experimental release: Detect objects like infrastructure inside satellite imagery for analytics workflows, available in experimental status through Google Cloud’s Model Garden.
Perplexity
No major user-facing changes this week.
Grok (xAI)
No major user-facing changes this week.











