Since my last post https://beththetester.com/2026/01/26/agent-skills-how-to-create-and-test-them/ in January, a lot has been going on in the world of agents, skills, mcp etc. etc. It’s increasingly hard to even keep up with the acronyms never mind the actual tech!
To get an understanding of the speed of change in both Playwright and Vibium I thought I’d revisit the code repo I updated for the above blog post back in Jan and see what releases had brought to the table in that time.
The repo I have used in this blog is public and can be accessed here: https://github.com/askherconsulting/AI_SDLC_MCP
I intend on repeating this experiment in the future as we know in tech nothing stands still!
Versions originally used:-
Playwright – 1.40.0
Vibium – 0.1.2 (which has since moved on to a calendar based release naming convention)
Latest Releases (as of June 2026)
Playwright – 1.61.0 (21 releases)
Vibium – v26.5.31(a whopping 33 releases)
Here’s a rundown of the key changes in that time (of course, curtesy of ai but reviewed by me):-
🧭 Playwright: The one‑sentence summary
Playwright evolved from a browser automation tool into a full-stack, agent‑friendly, highly observable test platform.
🎭 Part 1 — Playwright (v1.40.0 → v1.61.0)
Major themes across these 21 releases
Playwright’s evolution in this period can be summarised into five big arcs:
1. Massive investment in the Test Runner
This was the biggest area of change. Key shifts:
- More stable parallelisation
- Better sharding + retries
- New annotations, fixtures, and hooks
- Better reporting (HTML, trace viewer, attachments)
- More deterministic test isolation
New things you can do now:
- Build fully parallelised, hermetic test suites with fewer race conditions
- Use built‑in reporters that previously required plugins
- Run tests with more granular control over retries, timeouts, and fixtures
2. Tracing, debugging, and HAR improvements
Playwright doubled down on observability:
- Trace viewer became richer and faster
- HAR replay became more reliable
- Better snapshots, network logs, and step‑level metadata
New things you can do now:
- Reproduce failures with near‑perfect fidelity
- Use trace files as debugging artefacts in CI
- Replay network conditions without hitting real servers
3. Browser engine upgrades (Chromium, WebKit, Firefox)
Every release included engine bumps:
- Better WebKit support on Linux
- More stable Firefox automation
- Chromium upgrades enabling new APIs (WebAuthn, WebGPU, etc.)
New things you can do now:
- Test features that didn’t exist in 1.40 (e.g., newer WebAuthn flows)
- Run more stable cross‑browser tests
- Use modern web APIs without polyfills
4. CLI + tooling improvements
The CLI became more powerful:
playwright codegenimprovements- Better scaffolding
- More consistent
installbehaviour - Cleaner error messages
New things you can do now:
- Generate tests with more accurate selectors
- Bootstrap projects faster
- Debug failures with clearer diagnostics
5. Playwright MCP server + agent‑friendly APIs
This is the part that matters most for your blog.
Playwright introduced:
- The Playwright MCP server
- More stable BiDi support
- Better isolation for agent‑driven workflows
New things you can do now:
- Drive Playwright from an AI agent using MCP
- Build agentic test runners
- Use BiDi for more deterministic browser control
🎭 Part 2 — Vibium (v0.1.2 → v26.5.31)
Vibium’s evolution is even more dramatic because it went from “early prototype” to “serious browser automation engine” in a very short time.
Major themes across these 33 releases
1. Stabilisation + packaging fixes (0.1.x → early 26.x)
The early releases were mostly:
- Packaging fixes
- Dependency corrections
- Basic CLI improvements
- Early browser automation primitives
New things you can do now:
- Install Vibium reliably
- Use it across more environments
- Run basic automation without crashes
2. The big architectural shift (early 26.x)
This was the turning point:
- Move to a more stable internal architecture
- Better session management
- More deterministic browser control
- Early BiDi alignment
New things you can do now:
- Run longer sessions without memory leaks
- Use more consistent automation primitives
- Integrate Vibium into CI without flakiness
3. BiDi-first automation (26.3.x series)
This is where Vibium became interesting:
- Stronger BiDi support
- Better event handling
- More consistent navigation + DOM APIs
- Improved error messages
New things you can do now:
- Use modern browser automation without CDP
- Build more deterministic agent workflows
- Capture events with higher fidelity
4. Cross‑client stability + MCP alignment (26.3.x → 26.5.x)
This is the era where Vibium became agent‑ready:
- Better compatibility with MCP servers
- More predictable command execution
- Improved error handling
- More robust ESM resolution
New things you can do now:
- Use Vibium as a backend for agentic testing
- Run Vibium from MCP‑driven workflows
- Build multi‑agent browser automation flows
5. Quality-of-life improvements (26.5.x)
The latest releases focus on:
- Fixing edge cases
- Improving developer experience
- Reducing crashes
- Making the CLI more predictable
New things you can do now:
- Use Vibium in production‑like workflows
- Build stable test harnesses
- Integrate Vibium with Playwright‑style patterns
🧭 Vibium: The one‑sentence summary
Vibium evolved from a fragile prototype into a BiDi‑native, agent‑friendly browser automation engine designed for modern AI‑driven workflows.
🧩 Combined insight
Across both tools, the big meta‑trend is:
Browser automation is shifting from “test runner + scripts” to “agent‑driven, BiDi‑native, observable automation platforms.”
Updating my code base with a new change: Playwright’s local session storage
- Context: Playwright introduced the WebStorage API (page.localStorage / page.sessionStorage) in recent releases (post v1.57). This example shows how to set, read and inspect localStorage/sessionStorage from a Playwright test and is runnable in this repo.
- Files: Test: webstorage.spec.ts
- Navigate: opens
http://localhost:3000. - Write: stores a key/value in localStorage and sessionStorage.
- Read: reads them back to assert correctness.
- Inspect: demonstrates robust checks for the return value of items() (handles arrays, objects, or string shapes).
- Why useful: directly manipulating storage from tests is handy for seeding auth tokens, feature flags, or session state without relying on UI flows.
- Key operations used:




























