Our mobile dev process wasn’t broken, we just started automating the boring parts and couldn’t stop. What began as scripting repetitive tasks turned into a full rethink of how we build, test, and ship native mobile apps.
If you’re running a mobile team and wondering where AI actually helps (beyond autocomplete), here’s the blueprint we’ve landed on. It has four layers, each building on the last: Context, Build, Test, and Review.
Layer 1: Context - Teaching AI how your team works
The quickest way for AI to become a hindrance rather than a help is when it writes code that doesn’t look or feel like your codebase. Instead of just plugging in an LLM, we gave it strict boundaries.
We created dedicated agents.md files that explain our architecture, directory structures, network layers, and database models. We even documented why we use certain older technologies in specific areas, so the AI respects our intentions rather than “modernising” things that work fine.
These aren’t generic templates or files we downloaded from a repository. They’re hand-crafted from years of working in this codebase: accumulated decisions, trade-offs, and conventions that only our team knows. That’s what makes them effective. Giving the AI our actual coding preferences prevented the models from going off-route and ensured the generated code was easy to review.
If you want to go deeper into building reliable AI agent workflows, we wrote about practical approaches to improving AI code agent reliability.
Layer 2: Build - Why native is faster than ever
This contextual foundation led to perhaps our biggest shift, changing how we view cross-platform development entirely.
We work across multiple native apps simultaneously: two iOS apps and one android app. Historically, maintaining parity across these meant either doing the work three times or adopting cross-platform solutions like React Native.
Today, by feeding the same product requirements to multiple AI agents, we implement the same features natively on iOS and Android simultaneously. LLMs used to skew heavily toward JavaScript, but in our experience, today’s models are far more capable in Kotlin and Swift. We’re producing pure native code with significantly less effort than the traditional approach.
We’re also using this capability to tackle technical debt, refactoring outdated UIs into modern SwiftUI and Jetpack Compose with high confidence. With AI doing the heavy lifting, cross-platform solutions in existing projects are feeling much less mandatory.
This deserves its own post (it’s that big a deal for how you think about platform strategy). We’ll go deeper on the cross-platform angle soon.
Layer 3: Test - AI agents that QA your app
Moving fast natively means your test suite has to keep up. We use Maestro for UI tests, with GitHub Actions automatically running them on every PR merge. In the past, keeping these updated was a chore. Now we use AI agents to write new Maestro tests.
We feed the agent our existing test files for style reference and context, along with screenshots of the new features, and the agent writes the updated UI tests. Most of the time, we still need to adjust the generated tests because the model misses details we didn’t explicitly provide. Even so, it saves significant time compared to writing them from scratch.
But generating tests was just the start. We wanted to go further and empower our QA team directly. We’re currently developing a tool called ios-simulator-mcp. It’s a MCP server, an interface that gives AI agents direct, programmable access to the iOS Simulator. The agents can take screenshots, inspect the UI hierarchy, tap, swipe, type, and set GPS locations.
This is still early, but we believe it will change how we review our work. The goal is to help the team review multiple tickets at once by deploying agents that interact with the app in the simulator and follow QA scripts. They run through the actual flows and verify whether the requirements were met.
Layer 4: Review - AI as your first reviewer
We also redefined our review and release workflows by creating specialised sub-agents (jira-planner.md, prd-creator.md, release-notes-writer.md) for our biggest bottlenecks. The AI gathers context, drafts PRDs, and extracts the key information for release notes.
Another daily win: code review. Reviewing massive PRs used to take multiple days. We set up a dedicated Claude agent via GitHub Actions to do the first pass. It catches early mistakes and deviations in implementation from requirements. Better yet, we can ask the agent, “What parts of this PR actually need my attention?” and skip the boilerplate entirely.
If you’re curious about structuring AI into your review workflow, we wrote about giving your AI coder a product manager, a practical agile approach to managing AI-generated code changes.
What this adds up to
AI hasn’t removed our ownership over the codebase, it augmented it. By letting agents handle the boilerplate, write release notes, draft UI tests, and do first-pass PR reviews, our engineers have won back the time to do what they do best: thinking deeply about architecture and building great features.
The blueprint is straightforward: give AI your context, use it to build natively across platforms, automate your testing, and let it take the first pass on reviews. Each layer compounds on the one before it.
If you’re exploring how to embed AI into your team’s mobile workflow, or any software development workflow, we’d love to talk about what’s working for us.
Webinar: March 18
Claude for Leaders: turning AI into strategic advantage
Claude for Leaders: turning AI into strategic advantage
Free webinar - March 18