Artificial Intelligence in software development has been framed around speed and efficiency. From automated code completion co-pilots to repositories generated via "vibe coding," AI coding assistants have altered the mechanics of writing software. As these tools transition from novelty plugins to integrated engineering tools, a stark reality is coming to light: the gains made by generating a hundred lines of code in seconds can quickly be erased if that code introduces security flaws and architectural drift.
How do the rules of engagement change when shifting from modern "greenfield" projects to legacy systems? At what point does auditing AI-generated "slop" begin to outweigh the ROI of writing it manually? To map these boundaries, we spoke with technology leaders and engineering executives to unpack the hidden liabilities, shifting responsibilities and the exact tipping points where AI coding assistants become a liability rather than an asset.
At what point does a project’s complexity or security requirements make using an AI coding assistant a liability rather than an asset?
“I don't see complexity as an issue with an AI coding assistant,” says Justin Handley, Director of Technology at Monroe Institute. “Complexity can be difficult with multi-threaded remote AI where you're trying to do lots of things at once, but the more complex tasks are better suited for what you would maybe call vibe coding, where a developer works with an AI and moves through complex issues. So I think that AI is generally helpful with deep complexity. Security requirements is a different thing. And there's two types of security requirements. One is, is your IP so valuable that you can't possibly risk leaking it? In which case, AI needs to be run locally. And using any non-local AI is a liability flat out. If your code is not your real moat and you're just generally writing normal web app kind of code and you don't care if AI knows about it, then I don't see other security requirements as an issue. I think that again with proper checking you can meet HIPAA or SOC2 or whatever requirements you need to, using an AI assistant, you just have to make sure that you're reviewing code as it comes in.”
Is there a significant difference in the reliability of AI assistants when working on modern "green field" projects versus refactoring or maintaining legacy codebases?
“This sounds counterintuitive but there's a real chance AI assistants do better on legacy codebases than on greenfield ones,” says Sylvain Kalache, Head of AI Labs at Rootly. “There are two reasons why: First, legacy code is more likely to be written in a language with a large training corpus, which generally means stronger model performance. Second, and more importantly, legacy systems tend to come with years of accumulated context: documentation, runbooks, postmortems, internal wikis describing how the code actually behaves in production, where the traffic spikes are, how the system has failed before. Greenfield projects don't have any of that yet. And context is king. The more an AI assistant can see about how and where the code will run, the more thorough job it can do.”
“Not necessarily,” Handley adds, “Unless you're talking out of the box. So out of the box, absolutely. Starting a project from scratch, AI will do better because AI has written the whole thing. AI understands why it's built to think the way that it wrote. And so it's built to extend what it wrote. It's fine. Also, if you don't guide it clearly from day one, it can get very messy. But with good guidance from day one, starting a pure AI project is definitely easier than introducing AI to complex legacy. With complex legacy, the most important thing is just deep training. You know, lots of context files, lots of descriptions, rules, to make sure that the AI fully understands the structure and standards that your code base is used to.”
Should the "rules of engagement" for AI assistants differ based on an engineer's seniority?
“Agents do what they're told. They don't push back on bad direction, and they don't catch their own omissions,” notes Kalache. “To use one well you need to know, at a high level, what good looks like, and you need enough experience to spot when the agent is drifting or skipping over something important. That's why senior and junior engineers using the same tool will produce very different outputs. AWS recently put a policy in place requiring senior engineers to review AI-generated code written by juniors.”
“Definitely,” Handley adds assuredly. “For a young developer, I mean vibe coding is a bad idea. People who don't know code and are vibe coding, you can get some things done. You can write like little scripts that do some things and it can feel neat, but for production code that is going to be used by hundreds or thousands of people in real life, vibe coding is almost never going to cut it. When I say vibe coding, I'm talking about the idea that anyone who doesn't know code can code. And really, I think the rules of engagement are a little tricky to define, but you should never have AI write code above your pay grade. When you have AI writing things that you do not personally understand, you enter very dangerous territory. You could be introducing security flaws, you could be introducing bugs and you have no idea because you don't understand what's being written. I think that a senior engineer who has the ability to comprehend anything that an AI writes should and can have much more freedom and success than an earlier stage developer. I also worry about early-stage developers, if they lean on AI too hard, they'll never actually become a senior engineer because they won't develop the skills necessary to do that. In some ways AI represents an existential threat because if all young developers lean on AI at some point we will have zero senior engineers left.”
What are the specific "red flags" a developer should look for in AI-generated code that suggest the assistant is hallucinating logic?
Kalache says, “the hardest one to catch, and increasingly common, is slopsquatting. It's a software supply chain attack in which bad actors register package names that AI assistants are known to hallucinate. The model confidently imports a package that should exist, the developer doesn't double-check, and now they've pulled malicious code into the build. It's a failure mode that didn't really exist before AI coding assistants and is one that most developers still aren't watching for.”
“The most worrying ‘red flag’ comes in the form of ‘confident incorrectness,’ where AI makes use of obsolete libraries and/or hallucinates parameters that might seem syntactically correct but fail when loaded under certain production circumstances,” adds Sriramprabhu Rajendran, Senior Manager, Software Engineering at Capital One
“There's no way to define this on a code basis,” says Handley. “The best way to know this is to write good tests for all your code, so that if the AI code doesn’t pass your test, you know it didn’t do the right thing. Short of that, human QA would uncover this, but I’m not sure there’s any way to, logic bugs can be very hard to spot. So really, testing is the way to catch them, not in the code review kind of reading process.”
Beyond the code itself, what are the often-overlooked legal or IP risks that should force a developer to keep the AI assistant turned off?
“I'm pretty liberal with this. I mean, I don't see any legal or IP risks associated with using an AI assistant,” Handley posits. “Again, the only case where that's not true is where your code is actually so unique and so valuable that the IP of the code itself is worth protecting. For example, maybe the training algorithms that train these AIs is worth so much money that you wouldn't want to expose them to AI in case somebody else could reverse engineer data out of AI. I would say that any code that you feel has real monetary value, the reality is that 90% of code and 90% of production projects don’t have any monetary value. There's nothing unique, nothing deeply unique. The idea is unique, the data structure is unique, but really the way that the code is written is just how code is written. The IP risk of leak is usually very low.”
When does the time spent auditing AI-generated code begin to outweigh the speed of writing it manually? How do you measure that ROI?
“In high-risk industries where every line has to be reviewed by a human before it ships, AI doesn't scale the way the marketing implies,” Kalache points out. “If the assistant is sending slop into a process that already requires line-by-line review, you've just moved the bottleneck. You haven't removed it. The speed-up only materializes when the review can be lightweight, and review can only be lightweight when the stakes are low.”
Rajendran adds “At the point where it takes longer to audit the code written by the AI than to write it yourself, then you are losing the ROI in terms of time. The moment AI-created code becomes involved in the ‘Critical Path’ logic, it becomes cost-prohibitive.”
“That's an interesting question. I actually wrote an entire product to address this; a CI Automator, so it actually takes AI-generated code and does an AI automated review process on it using multiple different tools and perspectives and tests to ensure that the code is reasonable,” Handley adds. “When we implemented this, we saw an almost immediate 400% increase in team productivity because that was the major bottleneck. When our team started doing tons of stuff with AI, the CI process and getting rid of all the little bugs and making sure that it met our coding standards, etc., was taking developers sometimes hours per task. Automating that part of it, so that when a developer goes to do code review, they're never having to check for any bugs that could be caught by linting or by other AIs or by automated checks.”