back to thoughts

    AI Coding Assistants Are Almost Right: My Workflow for Shipping Real Code with Cursor, Claude Code, and Codex

    My practical workflow for using AI coding assistants like Cursor, Claude Code, Codex, and GitHub Copilot without shipping fragile, unreviewed code.

    17 min read
    AI Coding AssistantsCursorClaude CodeCodexGitHub CopilotVibe CodingDeveloper WorkflowSoftware Engineering

    AI coding assistants are useful in a very specific way: they make the first draft of almost anything cheaper.

    They can move quickly. They can unblock you. They can do the boring parts without complaining. They can also confidently misunderstand the architecture, invent an API that does not exist, delete a guard clause that mattered, and hand you a diff that looks clean until you actually read it with your brain turned on.

    That is the part people keep skipping.

    The best way to use Cursor, Claude Code, Codex, GitHub Copilot, or any other coding agent is not to pretend the model is a senior engineer. It is to build a workflow where the assistant can be fast, but the human still owns the shape, risk, and final judgment of the code.

    I use coding assistants a lot. I use them for repo exploration, refactors, debugging, test scaffolding, boring UI states, API route cleanup, documentation, and sometimes whole features. I use them when building Next.js apps, Supabase-heavy dashboards, RAG tools, and the small product details that usually eat an afternoon.

    But I do not use them as a replacement for engineering taste.

    The real skill is not "prompt engineering," at least not in the way people usually mean it. The real skill is knowing where the assistant is allowed to be loose, where it needs constraints, and where it should not touch anything until I understand the risk myself.

    The uncomfortable part: developers use AI, but they do not trust it

    The contradiction around AI coding tools is pretty obvious now.

    Stack Overflow's 2025 Developer Survey reported that 84% of developers either use or plan to use AI tools in their development process, while 46% said they do not trust the accuracy of AI tool output. That feels exactly right to me. The tools are too useful to ignore and too unreliable to trust blindly.

    That gap is where most bad AI-assisted development happens.

    The bad workflow looks like this:

    1. Ask the agent to build a feature.
    2. Accept a giant diff.
    3. Run the app.
    4. Feed errors back into the chat.
    5. Repeat until the page stops crashing.
    6. Ship.

    It feels productive because there is constant motion. Files are changing. Errors are disappearing. The assistant sounds calm. The app eventually loads.

    But "the app loads" is not the same as "the code is correct."

    The worst AI-generated bugs are not syntax errors. Syntax errors are easy. TypeScript catches them. The terminal screams. The agent can usually fix them.

    The dangerous bugs are quieter:

    • It bypasses an authorization check because it saw a simpler path.
    • It moves logic from the server to the client.
    • It changes a shared type to satisfy one page and breaks another.
    • It duplicates a helper instead of using the existing one.
    • It catches an error and returns an empty array, which hides the real failure.
    • It adds a dependency for something already solved in the codebase.
    • It writes a test that confirms the implementation, not the behavior.

    That is the "almost right" problem. The code is not obviously wrong. It is worse than that. It is plausible enough to survive a lazy review.

    My rule: the assistant can write code, but it cannot own the decision

    When I use an AI coding assistant, I try to keep one boundary clear:

    The assistant can produce options. I own the decision.

    That sounds simple, but it changes the workflow completely. I do not start with "build this." I start with context, constraints, and verification.

    Here is the loop I use most often:

    1. Read the codebase first.
    2. Write a small spec.
    3. Ask for a plan before edits.
    4. Implement in small batches.
    5. Review the diff like a pull request.
    6. Run the real checks.
    7. Ask the assistant to critique its own work.
    8. Commit only what I understand.

    The slower-looking workflow is usually faster because I do not spend the next hour untangling a giant plausible mess.

    Step 1: use ask/read mode before edit mode

    Before I let an assistant edit a repo, I want it to understand the repo.

    In Cursor, that often means using Ask mode before Agent mode. Cursor describes Ask as a read-only mode for learning and planning, while Agent mode is for autonomous exploration and multi-file edits. That separation is useful. I want the assistant to search and explain before it starts changing files.

    In Claude Code or Codex, I do the same thing conversationally:

    Before making changes, inspect the relevant files and explain:
    - where this behavior currently lives
    - what files are likely involved
    - what existing patterns you would follow
    - what risks you see
    Do not edit anything yet.

    This is not ceremony. It catches bad assumptions early.

    If the assistant cannot explain the current shape of the code, I do not want it editing the code.

    For example, if I am adding a new analytics event to a Next.js app, I want to know:

    • Is analytics handled through a shared helper?
    • Are events sent from client components only?
    • Is there already a naming convention?
    • Are events typed?
    • Does the app use Google Analytics, Vercel Analytics, Supabase, or something custom?

    An assistant that skips this step will often create a new helper, invent an event format, and call it done. That may work locally, but now the codebase has two analytics patterns.

    That is how projects get messy. Not from one dramatic mistake. From a hundred small "this is fine" diffs.

    Step 2: write a tiny spec

    I do not mean a 12-page product requirements document. I mean a short markdown note that makes the task hard to misread.

    For a small feature, my spec might look like this:

    Goal:
    Track project card clicks from the homepage and projects page.
    
    Behavior:
    - When a user clicks a project card, send event `project_card_click`.
    - Include project slug, source page, and whether the project is featured.
    - Do not block navigation if analytics fails.
    
    Constraints:
    - Reuse the existing analytics helper.
    - Do not add a new analytics library.
    - Do not change ProjectCard visual styling.
    - Keep the component API backward-compatible.
    
    Verification:
    - TypeScript passes.
    - Existing build passes.
    - Manual test: click cards on `/` and `/projects`.

    This gives the assistant a box to work inside.

    Without the constraints, the agent may decide to "improve" the component while it is there. Maybe it adds a loading state. Maybe it changes styling. Maybe it rewires links. Maybe the new code is fine, but now the diff is bigger, the review is slower, and the original task is hiding inside extra movement.

    Small specs protect your attention.

    Step 3: ask for a plan before edits

    Once the assistant has read the code and seen the spec, I ask for a plan.

    Not a grand architecture document. Just the intended files and steps.

    Give me the implementation plan before editing.
    List the files you expect to change and why.
    Keep the change as small as possible.

    This is where you catch scope creep.

    If the plan says it wants to touch seven files for a two-line analytics change, I stop it. If it wants to add a dependency, I ask why. If it wants to create a new abstraction, I ask what duplication it removes.

    The point is not to make the model timid. The point is to make it legible.

    I want to know what kind of diff I am about to receive.

    Step 4: small batches beat giant diffs

    Coding agents are getting better at multi-file work. That does not mean every task should become a multi-file diff.

    I prefer this pattern:

    Implement only the data/model changes first. Stop after that.

    Then:

    Now wire the UI to the new data shape. Do not change styling.

    Then:

    Now add or update tests for the changed behavior.

    This keeps review possible.

    When an assistant changes 600 lines across 14 files, the review becomes vague. You skim. You trust. You tell yourself the tests passed. That is exactly when subtle issues get through.

    Small batches force the assistant to keep its mental stack clean, and they force me to keep mine clean too.

    Step 5: review the diff like a pull request

    This is the part that separates AI-assisted engineering from "please generate my app."

    I review AI diffs with a slightly suspicious checklist:

    • Did it follow existing patterns?
    • Did it change unrelated code?
    • Did it weaken auth, validation, or error handling?
    • Did it move server-only logic into client code?
    • Did it add a dependency unnecessarily?
    • Did it duplicate a helper that already exists?
    • Did it make types looser to silence errors?
    • Did it add comments that explain obvious code instead of clarifying tricky code?
    • Did it write tests that would fail if the behavior broke?

    That last one matters.

    AI assistants are very good at writing tests that pass. They are not always good at writing tests that protect behavior. A test that mirrors the implementation too closely is just a second copy of the same assumption.

    For example, if a function should reject users from another organization, I want a test that creates two organizations and proves cross-tenant access fails. I do not want a test that mocks the helper and asserts it was called.

    The more important the feature, the more adversarial the review.

    Step 6: run boring checks every time

    The assistant should not be the only thing telling you the code works.

    At minimum, I want:

    npm run build

    If the project has them:

    npm run lint
    npm test
    npx tsc --noEmit

    For database or auth work, I also want manual verification with real accounts. This is especially true in Supabase projects. RLS, auth cookies, server components, and client-side state can all look fine in code and still fail in the real app.

    For AI features, I want a few ugly inputs:

    • empty input
    • very long input
    • malicious-looking input
    • ambiguous input
    • a case where the answer should be "I do not know"

    Coding assistants tend to test the happy path unless you force them not to.

    Step 7: make the assistant critique itself

    After implementation, I often ask the assistant to review its own diff.

    Review the changes you just made as if this were a pull request.
    Focus on bugs, security issues, behavior changes, and missing tests.
    Do not summarize the code. Look for problems.

    This works surprisingly well, especially when the assistant has access to the diff and test output. It will not catch everything, but it often catches something.

    I do not treat that review as truth. I treat it as a second pass.

    The useful part is that the assistant switches modes. During implementation it is trying to complete the task. During review it is looking for ways the task might be wrong.

    Those are different mental postures, even for a model.

    Cursor, Claude Code, Codex, and Copilot are not the same tool

    I do not think there is one best coding assistant. The better question is: what kind of work are you doing?

    Cursor

    Cursor is strongest when I want IDE-native iteration. It is good for moving through files quickly, making targeted edits, asking questions about the codebase, and staying close to the actual editor.

    I like Cursor for:

    • UI changes
    • component refactors
    • local codebase questions
    • fast iteration
    • "read this file and change this part" tasks

    The mode separation matters. Ask mode is good when I want understanding. Agent mode is better when I am ready to let it act.

    Claude Code

    Claude Code started as a terminal tool and has since expanded to VS Code, JetBrains, a desktop app, and the browser. Anthropic describes it as an agentic coding tool that reads your codebase, edits files, runs commands, and integrates with your development tools.

    I like that style for deeper repo work:

    • debugging failing builds
    • understanding unfamiliar codebases
    • broad refactors
    • generating tests after reading implementation
    • command-line-heavy workflows

    It feels most useful when the task needs exploration before edits.

    Codex

    Codex is useful when I want to delegate a chunk of work and have it navigate the repo, edit files, run commands, and verify. OpenAI describes Codex as a coding agent you can use locally or delegate to in the cloud.

    The interesting part is not just code generation. It is the workflow around agents: assign a task, let it work, review the diff, redirect when needed, and keep the human in charge of acceptance.

    I like Codex for:

    • scoped implementation tasks
    • code review passes
    • test fixes
    • parallel exploration
    • "go make this boring but careful change" work

    It is strongest when the task has a clear boundary and a clear verification step.

    GitHub Copilot cloud agent

    GitHub's cloud agent (previously called the coding agent) can work from issues and open pull requests in the background. That makes it interesting for issue-to-PR workflows.

    I would use it for:

    • small bugs
    • incremental features
    • chores with clear acceptance criteria
    • repository maintenance

    I would be more cautious with vague product work. If the issue is fuzzy, the PR will be fuzzy.

    Where AI assistants are genuinely good

    I do not want this to sound like fear dressed up as prudence. These tools are useful. I would not keep using them if they were not.

    They are excellent at:

    • explaining unfamiliar code
    • finding where behavior lives
    • drafting repetitive code
    • writing first-pass tests
    • translating errors into likely causes
    • converting one pattern into another
    • generating migration checklists
    • documenting decisions
    • catching obvious mistakes in review

    They are also good at momentum. That matters more than engineers like to admit.

    Sometimes the value is not that the assistant writes perfect code. Sometimes the value is that it gets you from a blank page to something concrete enough to criticize. That is underrated.

    I trust my judgment more when I am editing than when I am staring at an empty file.

    Where I do not trust them

    I slow down around anything involving:

    • authentication
    • authorization
    • payments
    • database migrations
    • data deletion
    • multi-tenant access
    • security-sensitive API routes
    • background jobs
    • caching
    • race conditions
    • generated SQL
    • dependency upgrades

    The assistant can still help. It can explain the current auth flow. It can draft a migration. It can write a test matrix. It can find every call site.

    But I do not let it blast through those areas without review.

    If a bug can expose user data, charge someone incorrectly, delete records, or silently corrupt state, the assistant is no longer "saving time" unless the verification gets stricter too.

    A practical example: adding a feature without letting the assistant wander

    Let's say I am working on a Next.js and Supabase app and want to add saved searches to an AI document search product. This is the kind of feature that looks small until you remember auth, RLS, UI state, empty states, and deletion.

    A loose prompt would be:

    Add saved searches.

    That is how you get a huge diff.

    A better prompt is:

    We need saved searches for authenticated users.
    
    Read the codebase first. Do not edit yet.
    
    Goal:
    - Users can save a search query from the search page.
    - Users can see their saved searches on the same page.
    - Users can delete their own saved searches.
    
    Constraints:
    - Use existing Supabase client patterns.
    - Preserve RLS assumptions.
    - Do not add a new state management library.
    - Do not change unrelated search behavior.
    
    First, explain the current search flow, auth flow, and data access pattern.
    Then propose the smallest implementation plan.

    After that, I want the assistant to propose something like:

    • Add a saved_searches table.
    • Add RLS policies scoped to auth.uid().
    • Add server/client functions following existing Supabase patterns.
    • Add UI controls to save and delete.
    • Add tests or manual verification steps.

    Then I split the work:

    Create only the SQL migration and RLS policies first.
    Stop after that.

    Then I review the SQL.

    Then:

    Now add the data access functions using the existing Supabase helper.
    Stop after that.

    Then:

    Now wire the UI. Keep styling consistent and do not refactor the search component.

    This is slower than one prompt in the same way measuring twice is slower than cutting once. It is also the difference between supervising a tool and letting it redesign the app by accident.

    The checklist I use before accepting AI-generated code

    Before I commit AI-assisted work, I want to be able to answer yes to these:

    • I understand every file that changed.
    • The diff is smaller than the problem deserves, not larger.
    • Existing architecture is respected.
    • Security-sensitive logic got extra review.
    • Types were improved or preserved, not weakened.
    • Tests or manual checks cover the risky paths.
    • The assistant did not hide errors to make the build pass.
    • The final behavior matches the original spec.
    • I could explain the implementation to another developer.

    That last one is the real test.

    If I cannot explain the code, I do not own it yet.

    What this means for "vibe coding"

    I do not hate the phrase "vibe coding." It captures something real: the interface to software is changing. More work starts in language now. More implementation is delegated. More developers are becoming reviewers, editors, and system designers, not just line-by-line typists.

    But production software cannot run on vibes alone.

    The better version is structured:

    • clear intent
    • small specs
    • repo-aware agents
    • constrained edits
    • real tests
    • human review
    • measured trust

    That is less flashy than "I built an app in one prompt." It is also how you avoid waking up to a codebase you no longer understand.

    The future skill is supervision

    The developers who benefit most from AI coding assistants will not be the ones who accept the most code.

    They will be the ones who can:

    • describe the problem clearly
    • give the assistant the right context
    • constrain the implementation
    • spot plausible nonsense
    • verify behavior
    • preserve architecture
    • decide when to stop and do it manually

    That is not less engineering. It is engineering with a faster, stranger collaborator sitting very close to the steering wheel.

    AI coding assistants are almost right. That is why they are useful.

    They are almost right. That is why they are dangerous.

    The workflow is what decides which side you get.


    Sources and further reading

    👨‍💻

    Ryan Katayi

    Full-stack developer who turns coffee into code. Building things that make the web a better place, one commit at a time.

    more about me