docs: add Borg et al. 2025 RCT on AI code maintainability (v3.27.7)
- Resource eval: arXiv:2507.00788 "Echoes of AI" (151 devs, 95% pros, 2-phase blind RCT) — 30.7% faster median, ~55.9% habitual users, no significant downstream maintainability impact - guide/learning-with-ai.md: citation + "On maintainability fear" note - guide/ultimate-guide.md: nuance blockquote in §1.7 Trust Calibration - machine-readable/reference.yaml: 4 new RCT/maintainability entries - docs/resource-evaluations/: evaluation file with technical-writer audit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
4c42151151
commit
895ace49f7
6 changed files with 197 additions and 1 deletions
|
|
@ -150,6 +150,8 @@ The pattern: **AI excels at well-defined, repeatable tasks**. It struggles with
|
|||
|
||||
The difference isn't the tool — it's the organizational discipline around it.
|
||||
|
||||
**On maintainability fear**: The concern that AI-generated code creates unmaintainable codebases is not empirically supported — downstream developers show no significant difference in evolution time or code quality (Borg et al., 2025, n=151). The real risks are skill atrophy and over-delegation, not inherent quality degradation for the next developer. ([arXiv:2507.00788](https://arxiv.org/abs/2507.00788))
|
||||
|
||||
### Implications for Learning
|
||||
|
||||
This research shapes the rest of this guide:
|
||||
|
|
@ -923,6 +925,7 @@ Sources for [§3 The Reality of AI Productivity](#the-reality-of-ai-productivity
|
|||
- **Stack Overflow 2024: AI Sentiment** — [stackoverflow.co](https://stackoverflow.co/labs/developer-sentiment-ai-ml/) — Developer attitudes toward AI tools, productivity perceptions
|
||||
- **Uplevel Engineering Intelligence (2024)** — Burnout and productivity metrics with AI coding tools
|
||||
- **METR Experienced Developer RCT (2025)** — [arXiv:2507.09089](https://arxiv.org/abs/2507.09089) — Randomized controlled trial (16 experienced devs, 246 issues, repos 1M+ lines): AI tools made developers 19% slower on familiar codebases, despite perceiving themselves 20% faster (39-point perception gap). Strongest evidence for skill atrophy risk in experienced developers.
|
||||
- **Borg et al. "Echoes of AI" RCT (2025)** — [arXiv:2507.00788](https://arxiv.org/abs/2507.00788) — 2-phase blind RCT (151 participants, 95% professional developers): AI users 30.7% faster (median), habitual users ~55.9% faster. Phase 2: downstream developers evolving AI-generated code showed no significant difference in evolution time or code quality vs. human-generated code. First RCT to explicitly target maintainability of AI-assisted code. Co-authored by Dave Farley ("Continuous Delivery"). Note: arXiv preprint (v2 Dec 2025), not yet published in peer-reviewed proceedings.
|
||||
- **DORA/Google DevOps Research (2024)** — AI tool adoption impact on team performance
|
||||
|
||||
### Practitioner Perspectives
|
||||
|
|
|
|||
|
|
@ -1091,6 +1091,8 @@ Research consistently shows AI code has higher defect rates than human-written c
|
|||
|
||||
**Key insight**: AI produces code faster but verification becomes the bottleneck. The question isn't "does it work?" but "how do I know it works?"
|
||||
|
||||
> **Nuance on downstream maintainability**: A 2-phase blind RCT (Borg et al., 2025, n=151 professional developers) found no significant difference in the time needed for downstream developers to evolve AI-generated vs. human-generated code. The defect rates above are real — but they do not systematically translate into higher maintenance burden for the next developer. The risk is more narrowly scoped than commonly assumed. ([arXiv:2507.00788](https://arxiv.org/abs/2507.00788))
|
||||
|
||||
### The Verification Spectrum
|
||||
|
||||
Not all code needs the same scrutiny. Match verification effort to risk:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue