From Cyclomatic Complexity to Code Quality: Building an Online Code Analyzer
From Cyclomatic Complexity to Code Quality: Building an Online Code Analyzer#
During code reviews, I often encounter “spaghetti code” — functions with hundreds of lines and seven or eight nested if-else blocks. To quantify code quality, I found cyclomatic complexity to be a solid metric. So I built a tool and documented the implementation approach.
What is Cyclomatic Complexity?#
Cyclomatic complexity was introduced by Thomas McCabe in 1976 to measure code complexity. Simply put: the more branches in your code, the higher the complexity.
The formula:
Cyclomatic Complexity = Number of Branch Nodes + 1
Branch nodes include: if, while, for, switch, case, catch, ? : (ternary operator), &&, ||.
Examples:
// Cyclomatic complexity = 1 (no branches)
function add(a, b) {
return a + b
}
// Cyclomatic complexity = 2 (1 if)
function abs(n) {
if (n < 0) return -n
return n
}
// Cyclomatic complexity = 4 (3 branches: if + for + if)
function sumEven(arr) {
let sum = 0
if (arr.length === 0) return 0
for (let i = 0; i < arr.length; i++) {
if (arr[i] % 2 === 0) {
sum += arr[i]
}
}
return sum
}
Why Does Cyclomatic Complexity Matter?#
Experience-based thresholds:
| Complexity | Risk Level | Recommendation |
|---|---|---|
| 1-5 | Low | Clear code, easy to maintain |
| 6-10 | Medium | Acceptable, but monitor |
| 11-20 | High | Refactor recommended, split functions |
| 21+ | Very High | Must refactor, hard to test |
High complexity code has several issues:
- High testing cost: Theoretically, you need at least N test cases for full coverage (N = cyclomatic complexity)
- Difficult maintenance: More branches mean higher cognitive load
- High bug risk: Each branch is a potential hiding spot for bugs
Building a Code Analyzer#
1. Basic Statistics#
The simplest part:
interface CodeStats {
lines: number // Total lines
characters: number // Character count
words: number // Word count
functions: number // Function count
complexity: number // Cyclomatic complexity
}
function analyzeCode(code: string): CodeStats {
const lines = code.split('\n').length
const characters = code.length
const words = code.split(/\s+/).filter(w => w.length > 0).length
// Count functions (simple regex match)
const functions = (code.match(/function\s+\w+|=>/g) || []).length
// Calculate cyclomatic complexity
const complexity = calculateComplexity(code)
return { lines, characters, words, functions, complexity }
}
2. Calculating Cyclomatic Complexity#
The core algorithm:
function calculateComplexity(code: string): number {
// Match branch nodes
const branchPatterns = [
/\bif\b/g, // if statement
/\bwhile\b/g, // while loop
/\bfor\b/g, // for loop
/\bswitch\b/g, // switch statement
/\bcase\b/g, // case branch
/\bcatch\b/g, // catch exception
/\?\s*:/g, // ternary operator
/&&/g, // logical AND
/\|\|/g, // logical OR
]
let branchCount = 0
for (const pattern of branchPatterns) {
const matches = code.match(pattern)
if (matches) {
branchCount += matches.length
}
}
// Cyclomatic complexity = branch count + 1
return branchCount + 1
}
This implementation has a problem: regex matching will misidentify keywords in strings.
For example:
const message = "if you see this, it's not a branch"
// Regex matches "if", but it's just a string
3. More Accurate Approach: AST Parsing#
Using Babel to parse the AST (Abstract Syntax Tree) allows precise identification of actual branch nodes:
import { parse } from '@babel/parser'
import traverse from '@babel/traverse'
function calculateComplexityAST(code: string): number {
const ast = parse(code, {
sourceType: 'module',
plugins: ['jsx', 'typescript']
})
let complexity = 1 // Base complexity
traverse(ast, {
// Conditional statements
IfStatement() { complexity++ },
ConditionalExpression() { complexity++ }, // Ternary operator
// Loop statements
WhileStatement() { complexity++ },
ForStatement() { complexity++ },
ForInStatement() { complexity++ },
ForOfStatement() { complexity++ },
// Switch
SwitchCase() { complexity++ },
// Exception handling
CatchClause() { complexity++ },
// Logical operators
LogicalExpression(path) {
if (path.node.operator === '&&' || path.node.operator === '||') {
complexity++
}
}
})
return complexity
}
The AST approach is more accurate but requires importing Babel in the browser (~1.5MB). For a simple tool, regex is sufficient.
Performance Optimization: Handling Large Files#
When users paste thousands of lines of code, real-time analysis can cause lag. Here are some optimizations:
1. Debounce Input#
import { useMemo } from 'react'
import { debounce } from 'lodash-es'
function CodeAnalyzer() {
const [code, setCode] = useState('')
// Debounce: analyze only after user stops typing for 300ms
const debouncedAnalyze = useMemo(
() => debounce((value: string) => {
const result = analyzeCode(value)
setAnalysis(result)
}, 300),
[]
)
const handleChange = (value: string) => {
setCode(value)
debouncedAnalyze(value)
}
return <textarea onChange={e => handleChange(e.target.value)} />
}
2. Web Worker for Async Computation#
Move analysis logic to a Web Worker to avoid UI blocking:
// analyzer.worker.ts
self.onmessage = (e) => {
const code = e.data
const result = analyzeCode(code)
self.postMessage(result)
}
// main.tsx
const worker = new Worker('analyzer.worker.ts')
function CodeAnalyzer() {
const [analysis, setAnalysis] = useState(null)
useEffect(() => {
worker.onmessage = (e) => {
setAnalysis(e.data)
}
}, [])
const handleChange = (code: string) => {
worker.postMessage(code) // Async computation
}
return <textarea onChange={e => handleChange(e.target.value)} />
}
3. Incremental Analysis#
For very large files (100k+ lines), analyze only the recently modified parts:
function incrementalAnalyze(
oldCode: string,
newCode: string,
oldAnalysis: CodeStats
): CodeStats {
// Find the diff
const diff = computeDiff(oldCode, newCode)
// Only recalculate changed parts
const changedLines = diff.changedLines
const changedComplexity = calculateComplexity(diff.changedText)
return {
...oldAnalysis,
complexity: oldAnalysis.complexity - diff.oldComplexity + changedComplexity,
lines: oldAnalysis.lines - diff.removedLines + diff.addedLines
}
}
Edge Cases I Encountered#
1. Keywords in Comments#
// This is a comment with if and while
/*
for loop example
*/
Regex matches keywords in comments. Solution: remove comments before analysis.
function removeComments(code: string): string {
// Remove single-line comments
code = code.replace(/\/\/.*$/gm, '')
// Remove multi-line comments
code = code.replace(/\/\*[\s\S]*?\*\//g, '')
return code
}
2. Keywords in Strings#
const query = "SELECT * FROM users WHERE if = true"
const message = 'Press Enter to continue'
Same issue. AST approach naturally avoids this problem.
3. Nested Functions#
function outer() {
function inner() {
if (condition) { // Which function does this if belong to?
// ...
}
}
}
If you need to calculate complexity per function, maintain a scope stack during AST traversal:
function calculateFunctionComplexities(code: string): Map<string, number> {
const ast = parse(code)
const complexities = new Map<string, number>()
const scopeStack: string[] = []
traverse(ast, {
FunctionDeclaration(path) {
const name = path.node.id?.name || 'anonymous'
scopeStack.push(name)
complexities.set(name, 1)
},
'FunctionDeclaration:exit'() {
scopeStack.pop()
},
IfStatement() {
const currentFunction = scopeStack[scopeStack.length - 1]
if (currentFunction) {
complexities.set(
currentFunction,
(complexities.get(currentFunction) || 1) + 1
)
}
}
})
return complexities
}
The Result#
Based on these ideas, I built: Code Analyzer
Features:
- Real-time code statistics (lines, characters, functions)
- Cyclomatic complexity calculation with recommendations
- Estimated code reading time
- Support for multiple programming languages
Code analysis isn’t complex, but getting the details right takes effort. Hope this helps.
Related: Code Formatter | Code Minifier