Skip to content
Back to Blog
AI Models Kimi K2.6 GLM 5.1 GPT-5.5 DeepSeek V4 EU Hosting Benchmarks

New AI models in May 2026: Kimi K2.6 and GLM 5.1 now EU-hosted

Published on May 1, 2026 · 7 min read · by Lurus Team

The past two weeks have seen more movement in frontier AI coding models than the three months before. OpenAI shipped GPT-5.5, Moonshot AI released Kimi K2.6 as an open-weight model, Z.ai followed with GLM 5.1, and DeepSeek uploaded V4 Pro — 1.6 trillion parameters — to Hugging Face. All four beat Claude Opus 4.6 on at least one major coding benchmark, some by several percentage points.

Starting today, every one of these models is available in Lurus Code. The headline: Kimi K2.6 and GLM 5.1 are hosted in Europe with Zero Data Retention guarantees. For anything involving GDPR, regulated workloads, or corporate IT security, this means open-source frontier models with clear processing documentation and concrete EU hosting.

What’s new

Six new models landing today:

ModelTierHostingHighlight
GPT-5.5PowerfulOpenAI (EU region)82.7% Terminal-Bench 2.0 (SOTA)
Kimi K2.6BalancedEU80.2% SWE-Bench Verified, 256K context
GLM 5.1BalancedEU58.4% SWE-Bench Pro, 8h autonomy
DeepSeek V4 ProPowerfulGlobal (US, ZDR)1.6T parameters, 1M context
MiniMax M2.7BalancedGlobal (US, ZDR)Fast & cheap
Qwen 3.6 PlusBalancedGlobal (US, ZDR)1M context with vision

In parallel, GPT-5.4 Pro and the older Kimi, GLM and DeepSeek variants have been retired from the default selection — they were worse and more expensive than their successors on every benchmark.

GPT-5.5: state of the art in agentic coding

GPT-5.5 launched April 23 and is OpenAI’s first fully retrained base model since GPT-4.5 — not an incremental update. The numbers that matter:

  • Terminal-Bench 2.0: 82.7% — new state of the art. For comparison: Claude Opus 4.7 at 69.4%, Gemini 3.1 Pro at 68.5%.
  • SWE-Bench Pro: 58.6% — tied with Kimi K2.6, behind Claude Opus 4.7 (64.3%).
  • MRCR v2 at 512K–1M tokens: 74.0% — a 37-point jump over GPT-5.4. At 128K–256K: 87.5% vs. Claude’s 59.2%.
  • 1M context in the API (400K in Codex).

In practice: long-running terminal tasks — DevOps pipelines, full-repo debugging, multi-hour refactors — now run dramatically more reliably. If you were using GPT-5.4 Pro, GPT-5.5 gives you better quality at a sixth of the price (€5 / €30 vs. €30 / €180 per 1M input/output tokens).

In Lurus Code, GPT-5.5 replaces GPT-5.4 Pro as the default Powerful option in the OpenAI tier.

Kimi K2.6 — EU-hosted, open-weight, top-tier

Moonshot AI released Kimi K2.6 on April 20, shipping it directly to Hugging Face under an open-weight license. The benchmarks read like a frontier model:

  • SWE-Bench Verified: 80.2% — within 0.4 points of DeepSeek V4 Pro (80.6%) and Claude Opus 4.6.
  • SWE-Bench Pro: 58.6% — tied with GPT-5.5.
  • Terminal-Bench 2.0: 66.7% — just behind Claude Opus 4.7 (69.4%), but clearly ahead of Gemini 3.1 Pro.
  • DeepSearchQA F1: 92.5%.
  • 256K context, native tool calls, agent-swarm architecture supporting up to 300 parallel sub-agents.

This is the first serious Chinese frontier coding model that is simultaneously open-weight. But if you use Moonshot’s API directly, your prompts flow to servers in China — a non-starter for most European teams.

In Lurus Code, Kimi K2.6 is hosted in Europe with Zero Data Retention and no training. Your prompts and code are not retained — see our security overview for details. Pricing: €0.80 / €3.50 per 1M input/output tokens — roughly a third of Claude Opus.

GLM 5.1 — 8 hours of autonomy, EU-hosted

Z.ai (formerly Zhipu AI) shipped GLM 5.1 a few days before Kimi K2.6. Both claimed the #1 spot on SWE-Bench Pro within two weeks of each other — Z.ai at 58.4%, narrowly behind Kimi.

What makes GLM 5.1 remarkable isn’t the raw benchmark number. It’s the long-horizon autonomy:

  • Terminal-Bench 2.0: 63.5%
  • AIME 2026: 95.3%, GPQA-Diamond: 86.2%
  • Up to 8 hours of autonomous execution on a single task — planning, executing, iterating, optimizing, shipping. No other open-source model has been evaluated at that duration.
  • Trained on 100,000 Huawei Ascend 910B chips — zero NVIDIA hardware in the training stack.

From Z.ai’s Lou on launch day: “Agents could do about 20 steps by the end of last year. GLM 5.1 can do 1,700 right now.” That jump from 20 to 1,700 steps in four months is probably the most interesting curve in the industry right now.

GLM 5.1 is also hosted in Europe in Lurus Code with Zero Data Retention and no training. Pricing: €1.40 / €4.40 per 1M tokens.

DeepSeek V4 Pro, MiniMax M2.7, Qwen 3.6 Plus: the new global providers

Not everything is EU-hostable. For three new models we offer a Global Provider (US) tier — with one hard condition: Zero Data Retention and no training, contractually enforced through our gateway provider.

DeepSeek V4 Pro (released April 24): 1.6 trillion parameters, 49B active per token, 1M context at only 27% of the inference FLOPs and 10% of the KV-cache memory of V3.2. That’s the real breakthrough — 1M context becomes a production default, not a premium add-on. SWE-Bench Verified: 80.6%, within 0.2 points of Claude Opus 4.6 — at 7× lower cost (€3.48 vs. €25 per 1M output tokens).

MiniMax M2.7: fast all-rounder for Balanced-tier tasks. €0.30 / €1.20 per 1M tokens — among the cheapest models on the menu.

Qwen 3.6 Plus (Alibaba, March 2026): native 1M context, vision support, strong performance on frontend code and web generation. The most interesting option for UI-heavy workloads.

All three are hosted in the US through our gateway provider, but run under a contractually enforced ZDR policy. That isn’t the same as EU hosting — but it’s materially stronger than what you get from the default terms of most US APIs, and it’s explicitly called out in the docs.

Which model for which case?

Quick heuristic from our internal testing:

  • GPT-5.5 for long terminal sessions and long-context refactors.
  • Claude Opus 4.7 remains unbeatable for pure code quality on short, focused tasks.
  • Kimi K2.6 for agent swarms and long tool-call chains — at a fraction of Opus’s cost.
  • GLM 5.1 for multi-hour autonomous tasks (CI fixes, migrations, large refactors).
  • DeepSeek V4 Pro for 1M-context analysis of entire codebases.
  • Qwen 3.6 Plus for frontend and vision tasks.

The revamped Models page shows every model with its tier (Powerful / Balanced / Fast), provider group and exact pricing. The Global Provider section carries an explicit ZDR notice right next to the heading — so it’s immediately obvious which models qualify for regulated workloads.

What this means for you

If you’re already on Lurus Code: the new models are live in the model picker (CLI and VS Code) as of today. Existing sessions keep running on whatever model you had selected.

If you’ve been defaulting to Claude Opus because “it just works best”: try GPT-5.5 on terminal-style tasks and Kimi K2.6 on long agent chains. In both cases you get comparable quality at a third of the cost — and with Kimi you get EU-hosted, GDPR-compliant processing on top.

Find the updated model overview and all pricing on the Models page and in the pricing table in our docs.