From Cyclomatic Complexity to Code Quality: Building an Online Code Analyzer#

During code reviews, I often encounter “spaghetti code” — functions with hundreds of lines and seven or eight nested if-else blocks. To quantify code quality, I found cyclomatic complexity to be a solid metric. So I built a tool and documented the implementation approach.

What is Cyclomatic Complexity?#

Cyclomatic complexity was introduced by Thomas McCabe in 1976 to measure code complexity. Simply put: the more branches in your code, the higher the complexity.

The formula:

Cyclomatic Complexity = Number of Branch Nodes + 1

Branch nodes include: if, while, for, switch, case, catch, ? : (ternary operator), &&, ||.

Examples:

// Cyclomatic complexity = 1 (no branches)
function add(a, b) {
  return a + b
}

// Cyclomatic complexity = 2 (1 if)
function abs(n) {
  if (n < 0) return -n
  return n
}

// Cyclomatic complexity = 4 (3 branches: if + for + if)
function sumEven(arr) {
  let sum = 0
  if (arr.length === 0) return 0
  for (let i = 0; i < arr.length; i++) {
    if (arr[i] % 2 === 0) {
      sum += arr[i]
    }
  }
  return sum
}

Why Does Cyclomatic Complexity Matter?#

Experience-based thresholds:

Complexity	Risk Level	Recommendation
1-5	Low	Clear code, easy to maintain
6-10	Medium	Acceptable, but monitor
11-20	High	Refactor recommended, split functions
21+	Very High	Must refactor, hard to test

High complexity code has several issues:

High testing cost: Theoretically, you need at least N test cases for full coverage (N = cyclomatic complexity)
Difficult maintenance: More branches mean higher cognitive load
High bug risk: Each branch is a potential hiding spot for bugs

Building a Code Analyzer#

1. Basic Statistics#

The simplest part:

interface CodeStats {
  lines: number          // Total lines
  characters: number     // Character count
  words: number          // Word count
  functions: number      // Function count
  complexity: number     // Cyclomatic complexity
}

function analyzeCode(code: string): CodeStats {
  const lines = code.split('\n').length
  const characters = code.length
  const words = code.split(/\s+/).filter(w => w.length > 0).length
  
  // Count functions (simple regex match)
  const functions = (code.match(/function\s+\w+|=>/g) || []).length
  
  // Calculate cyclomatic complexity
  const complexity = calculateComplexity(code)
  
  return { lines, characters, words, functions, complexity }
}

2. Calculating Cyclomatic Complexity#

The core algorithm:

function calculateComplexity(code: string): number {
  // Match branch nodes
  const branchPatterns = [
    /\bif\b/g,           // if statement
    /\bwhile\b/g,        // while loop
    /\bfor\b/g,          // for loop
    /\bswitch\b/g,       // switch statement
    /\bcase\b/g,         // case branch
    /\bcatch\b/g,        // catch exception
    /\?\s*:/g,           // ternary operator
    /&&/g,               // logical AND
    /\|\|/g,             // logical OR
  ]
  
  let branchCount = 0
  
  for (const pattern of branchPatterns) {
    const matches = code.match(pattern)
    if (matches) {
      branchCount += matches.length
    }
  }
  
  // Cyclomatic complexity = branch count + 1
  return branchCount + 1
}

This implementation has a problem: regex matching will misidentify keywords in strings.

For example:

const message = "if you see this, it's not a branch"
// Regex matches "if", but it's just a string

3. More Accurate Approach: AST Parsing#

Using Babel to parse the AST (Abstract Syntax Tree) allows precise identification of actual branch nodes:

import { parse } from '@babel/parser'
import traverse from '@babel/traverse'

function calculateComplexityAST(code: string): number {
  const ast = parse(code, {
    sourceType: 'module',
    plugins: ['jsx', 'typescript']
  })
  
  let complexity = 1  // Base complexity
  
  traverse(ast, {
    // Conditional statements
    IfStatement() { complexity++ },
    ConditionalExpression() { complexity++ },  // Ternary operator
    
    // Loop statements
    WhileStatement() { complexity++ },
    ForStatement() { complexity++ },
    ForInStatement() { complexity++ },
    ForOfStatement() { complexity++ },
    
    // Switch
    SwitchCase() { complexity++ },
    
    // Exception handling
    CatchClause() { complexity++ },
    
    // Logical operators
    LogicalExpression(path) {
      if (path.node.operator === '&&' || path.node.operator === '||') {
        complexity++
      }
    }
  })
  
  return complexity
}

The AST approach is more accurate but requires importing Babel in the browser (~1.5MB). For a simple tool, regex is sufficient.

Performance Optimization: Handling Large Files#

When users paste thousands of lines of code, real-time analysis can cause lag. Here are some optimizations:

1. Debounce Input#

import { useMemo } from 'react'
import { debounce } from 'lodash-es'

function CodeAnalyzer() {
  const [code, setCode] = useState('')
  
  // Debounce: analyze only after user stops typing for 300ms
  const debouncedAnalyze = useMemo(
    () => debounce((value: string) => {
      const result = analyzeCode(value)
      setAnalysis(result)
    }, 300),
    []
  )
  
  const handleChange = (value: string) => {
    setCode(value)
    debouncedAnalyze(value)
  }
  
  return <textarea onChange={e => handleChange(e.target.value)} />
}

2. Web Worker for Async Computation#

Move analysis logic to a Web Worker to avoid UI blocking:

// analyzer.worker.ts
self.onmessage = (e) => {
  const code = e.data
  const result = analyzeCode(code)
  self.postMessage(result)
}

// main.tsx
const worker = new Worker('analyzer.worker.ts')

function CodeAnalyzer() {
  const [analysis, setAnalysis] = useState(null)
  
  useEffect(() => {
    worker.onmessage = (e) => {
      setAnalysis(e.data)
    }
  }, [])
  
  const handleChange = (code: string) => {
    worker.postMessage(code)  // Async computation
  }
  
  return <textarea onChange={e => handleChange(e.target.value)} />
}

3. Incremental Analysis#

For very large files (100k+ lines), analyze only the recently modified parts:

function incrementalAnalyze(
  oldCode: string,
  newCode: string,
  oldAnalysis: CodeStats
): CodeStats {
  // Find the diff
  const diff = computeDiff(oldCode, newCode)
  
  // Only recalculate changed parts
  const changedLines = diff.changedLines
  const changedComplexity = calculateComplexity(diff.changedText)
  
  return {
    ...oldAnalysis,
    complexity: oldAnalysis.complexity - diff.oldComplexity + changedComplexity,
    lines: oldAnalysis.lines - diff.removedLines + diff.addedLines
  }
}

Edge Cases I Encountered#

1. Keywords in Comments#

// This is a comment with if and while
/* 
  for loop example
*/

Regex matches keywords in comments. Solution: remove comments before analysis.

function removeComments(code: string): string {
  // Remove single-line comments
  code = code.replace(/\/\/.*$/gm, '')
  // Remove multi-line comments
  code = code.replace(/\/\*[\s\S]*?\*\//g, '')
  return code
}

2. Keywords in Strings#

const query = "SELECT * FROM users WHERE if = true"
const message = 'Press Enter to continue'

Same issue. AST approach naturally avoids this problem.

3. Nested Functions#

function outer() {
  function inner() {
    if (condition) {  // Which function does this if belong to?
      // ...
    }
  }
}

If you need to calculate complexity per function, maintain a scope stack during AST traversal:

function calculateFunctionComplexities(code: string): Map<string, number> {
  const ast = parse(code)
  const complexities = new Map<string, number>()
  const scopeStack: string[] = []
  
  traverse(ast, {
    FunctionDeclaration(path) {
      const name = path.node.id?.name || 'anonymous'
      scopeStack.push(name)
      complexities.set(name, 1)
    },
    'FunctionDeclaration:exit'() {
      scopeStack.pop()
    },
    IfStatement() {
      const currentFunction = scopeStack[scopeStack.length - 1]
      if (currentFunction) {
        complexities.set(
          currentFunction,
          (complexities.get(currentFunction) || 1) + 1
        )
      }
    }
  })
  
  return complexities
}

The Result#

Based on these ideas, I built: Code Analyzer

Features:

Real-time code statistics (lines, characters, functions)
Cyclomatic complexity calculation with recommendations
Estimated code reading time
Support for multiple programming languages

Code analysis isn’t complex, but getting the details right takes effort. Hope this helps.

Related: Code Formatter | Code Minifier