Reference Tables
Code Quality Benchmarks 2026
Definitive threshold tables for every major code quality metric. No vague advice. Colour-coded ratings with context by project maturity, language, and team size.
Updated 16 April 2026
Master Benchmark Table
| Metric | Excellent | Acceptable | Warning | Critical |
|---|---|---|---|---|
| Technical Debt Ratio | < 5% | 5 - 10% | 10 - 20% | > 20% |
| Code Coverage | > 80% | 60 - 80% | 40 - 60% | < 40% |
| Cognitive Complexity (per function) | < 8 | 8 - 15 | 15 - 25 | > 25 |
| Cyclomatic Complexity (per function) | < 10 | 10 - 20 | 20 - 40 | > 40 |
| Code Duplication | < 3% | 3 - 5% | 5 - 10% | > 10% |
| Dependency Freshness (major versions behind) | 0 | 1 | 2 - 3 | > 3 |
| Security Vulnerabilities | 0 | Low only | Any medium | Any high/critical |
| Code Smell Density (per 1K LoC) | < 5 | 5 - 15 | 15 - 30 | > 30 |
Code Coverage Benchmarks
The 80% coverage rule is a floor, not a ceiling. What matters is what you cover, not just the percentage. Business logic should be near 100%. UI rendering can be lower. Legacy code improving from 20% to 40% is a bigger win than going from 80% to 85%.
The 80% myth
80% is not a magic number. It became the industry default because quality gates needed a threshold, and 80% was a reasonable compromise. For business logic, 80% is too low. For generated code, UI components, and boilerplate, 80% may be unnecessarily expensive. Target by module, not by project.
Complexity Benchmarks
Complexity correlates directly with bug density. Research shows that functions with cyclomatic complexity above 20 have 4-5x the bug rate of functions below 10. Cognitive complexity (used by SonarQube) is a better predictor because it penalises nesting depth.
Per-Function Thresholds
Per-File Thresholds
Duplication Benchmarks
Zero duplication is not the goal. Some duplication is preferable to premature abstraction. The cost of duplication is not the extra lines, but the bugs that appear when one copy is updated and the others are not. Target reduction of high-duplication clusters, not elimination of all repeated code.
| Codebase Age | Normal Range | Context |
|---|---|---|
| New (< 1 year) | 1 - 3% | Small codebase, fresh patterns. Above 3% suggests copy-paste development habits. |
| Growing (1 - 3 years) | 3 - 6% | Some organic duplication from rapid feature development. Normal if trending stable. |
| Mature (3 - 7 years) | 5 - 10% | Multiple teams, feature branches, legacy modules. Active cleanup keeps it under 10%. |
| Legacy (7+ years) | 8 - 15% | Accumulated from team turnover and evolving requirements. Above 15% signals a systematic problem. |
Dependency Freshness Benchmarks
The cost of deferred upgrades grows exponentially with the version gap. Upgrading one major version is typically a half-day task. Upgrading three major versions can be a multi-week project with breaking changes compounding across versions.
Current
On latest version
Effort: trivial
1 behind
Previous major
Effort: half-day
2-3 behind
Multiple majors
Effort: 1-2 weeks
4+ behind
Migration crisis
Effort: weeks to months
CVE Response Time Targets
DORA Metrics Mapped to Code Quality
DORA (DevOps Research and Assessment) metrics measure software delivery performance. Code quality directly influences all four metrics. Teams with high code quality scores consistently achieve elite DORA performance.
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | Multiple/day | Weekly | Monthly | < Monthly |
| Lead Time for Changes | < 1 hour | < 1 day | < 1 week | > 1 month |
| Change Failure Rate | < 5% | 5 - 10% | 10 - 15% | > 15% |
| Time to Restore (MTTR) | < 1 hour | < 1 day | < 1 week | > 1 week |
Deployment Frequency: High coverage and clean quality gates enable confident, frequent deploys
Lead Time for Changes: Low complexity and good test coverage reduce review and validation time
Change Failure Rate: Coverage gaps and high complexity directly increase change failure rates
Time to Restore (MTTR): Low cognitive complexity and good coverage make incident diagnosis faster
Benchmarks by Language
Different languages have different normal ranges. A cyclomatic complexity of 15 in Go is unusual; in Java enterprise code it is common. Apply language-specific context to your benchmarks.
Java / Kotlin
Mature tooling ecosystem. SonarQube strongest. Enterprise codebases tend toward higher complexity due to framework overhead.
Python
Dynamic typing means coverage matters more. Missing type hints compound maintenance cost. Lower complexity norm due to language expressiveness.
TypeScript / JavaScript
Frontend code tends to be undertested. React component complexity is often higher than it appears. Type coverage matters alongside line coverage.
Go
Language design enforces simplicity. Error handling inflates line counts. Coverage tooling is built in. Duplication tends to be higher due to explicit error handling.
Rust
Compiler catches many issues that other languages need tests for. Ownership model reduces certain bug classes. Coverage tooling is less mature.
C# / .NET
Similar profile to Java. Enterprise codebases can have high complexity from framework abstractions. Strong Roslyn analyser ecosystem.