From Cyclomatic Complexity to Code Quality: Building an Online Code Analyzer#

During code reviews, I often encounter “spaghetti code” — functions with hundreds of lines and seven or eight nested if-else blocks. To quantify code quality, I found cyclomatic complexity to be a solid metric. So I built a tool and documented the implementation approach.

What is Cyclomatic Complexity?#

Cyclomatic complexity was introduced by Thomas McCabe in 1976 to measure code complexity. Simply put: the more branches in your code, the higher the complexity.

The formula:

Cyclomatic Complexity = Number of Branch Nodes + 1

Branch nodes include: if, while, for, switch, case, catch, ? : (ternary operator), &&, ||.

Examples:

// Cyclomatic complexity = 1 (no branches)
function add(a, b) {
  return a + b
}

// Cyclomatic complexity = 2 (1 if)
function abs(n) {
  if (n < 0) return -n
  return n
}

// Cyclomatic complexity = 4 (3 branches: if + for + if)
function sumEven(arr) {
  let sum = 0
  if (arr.length === 0) return 0
  for (let i = 0; i < arr.length; i++) {
    if (arr[i] % 2 === 0) {
      sum += arr[i]
    }
  }
  return sum
}

Why Does Cyclomatic Complexity Matter?#

Experience-based thresholds:

Complexity Risk Level Recommendation
1-5 Low Clear code, easy to maintain
6-10 Medium Acceptable, but monitor
11-20 High Refactor recommended, split functions
21+ Very High Must refactor, hard to test

High complexity code has several issues:

  1. High testing cost: Theoretically, you need at least N test cases for full coverage (N = cyclomatic complexity)
  2. Difficult maintenance: More branches mean higher cognitive load
  3. High bug risk: Each branch is a potential hiding spot for bugs

Building a Code Analyzer#

1. Basic Statistics#

The simplest part:

interface CodeStats {
  lines: number          // Total lines
  characters: number     // Character count
  words: number          // Word count
  functions: number      // Function count
  complexity: number     // Cyclomatic complexity
}

function analyzeCode(code: string): CodeStats {
  const lines = code.split('\n').length
  const characters = code.length
  const words = code.split(/\s+/).filter(w => w.length > 0).length
  
  // Count functions (simple regex match)
  const functions = (code.match(/function\s+\w+|=>/g) || []).length
  
  // Calculate cyclomatic complexity
  const complexity = calculateComplexity(code)
  
  return { lines, characters, words, functions, complexity }
}

2. Calculating Cyclomatic Complexity#

The core algorithm:

function calculateComplexity(code: string): number {
  // Match branch nodes
  const branchPatterns = [
    /\bif\b/g,           // if statement
    /\bwhile\b/g,        // while loop
    /\bfor\b/g,          // for loop
    /\bswitch\b/g,       // switch statement
    /\bcase\b/g,         // case branch
    /\bcatch\b/g,        // catch exception
    /\?\s*:/g,           // ternary operator
    /&&/g,               // logical AND
    /\|\|/g,             // logical OR
  ]
  
  let branchCount = 0
  
  for (const pattern of branchPatterns) {
    const matches = code.match(pattern)
    if (matches) {
      branchCount += matches.length
    }
  }
  
  // Cyclomatic complexity = branch count + 1
  return branchCount + 1
}

This implementation has a problem: regex matching will misidentify keywords in strings.

For example:

const message = "if you see this, it's not a branch"
// Regex matches "if", but it's just a string

3. More Accurate Approach: AST Parsing#

Using Babel to parse the AST (Abstract Syntax Tree) allows precise identification of actual branch nodes:

import { parse } from '@babel/parser'
import traverse from '@babel/traverse'

function calculateComplexityAST(code: string): number {
  const ast = parse(code, {
    sourceType: 'module',
    plugins: ['jsx', 'typescript']
  })
  
  let complexity = 1  // Base complexity
  
  traverse(ast, {
    // Conditional statements
    IfStatement() { complexity++ },
    ConditionalExpression() { complexity++ },  // Ternary operator
    
    // Loop statements
    WhileStatement() { complexity++ },
    ForStatement() { complexity++ },
    ForInStatement() { complexity++ },
    ForOfStatement() { complexity++ },
    
    // Switch
    SwitchCase() { complexity++ },
    
    // Exception handling
    CatchClause() { complexity++ },
    
    // Logical operators
    LogicalExpression(path) {
      if (path.node.operator === '&&' || path.node.operator === '||') {
        complexity++
      }
    }
  })
  
  return complexity
}

The AST approach is more accurate but requires importing Babel in the browser (~1.5MB). For a simple tool, regex is sufficient.

Performance Optimization: Handling Large Files#

When users paste thousands of lines of code, real-time analysis can cause lag. Here are some optimizations:

1. Debounce Input#

import { useMemo } from 'react'
import { debounce } from 'lodash-es'

function CodeAnalyzer() {
  const [code, setCode] = useState('')
  
  // Debounce: analyze only after user stops typing for 300ms
  const debouncedAnalyze = useMemo(
    () => debounce((value: string) => {
      const result = analyzeCode(value)
      setAnalysis(result)
    }, 300),
    []
  )
  
  const handleChange = (value: string) => {
    setCode(value)
    debouncedAnalyze(value)
  }
  
  return <textarea onChange={e => handleChange(e.target.value)} />
}

2. Web Worker for Async Computation#

Move analysis logic to a Web Worker to avoid UI blocking:

// analyzer.worker.ts
self.onmessage = (e) => {
  const code = e.data
  const result = analyzeCode(code)
  self.postMessage(result)
}

// main.tsx
const worker = new Worker('analyzer.worker.ts')

function CodeAnalyzer() {
  const [analysis, setAnalysis] = useState(null)
  
  useEffect(() => {
    worker.onmessage = (e) => {
      setAnalysis(e.data)
    }
  }, [])
  
  const handleChange = (code: string) => {
    worker.postMessage(code)  // Async computation
  }
  
  return <textarea onChange={e => handleChange(e.target.value)} />
}

3. Incremental Analysis#

For very large files (100k+ lines), analyze only the recently modified parts:

function incrementalAnalyze(
  oldCode: string,
  newCode: string,
  oldAnalysis: CodeStats
): CodeStats {
  // Find the diff
  const diff = computeDiff(oldCode, newCode)
  
  // Only recalculate changed parts
  const changedLines = diff.changedLines
  const changedComplexity = calculateComplexity(diff.changedText)
  
  return {
    ...oldAnalysis,
    complexity: oldAnalysis.complexity - diff.oldComplexity + changedComplexity,
    lines: oldAnalysis.lines - diff.removedLines + diff.addedLines
  }
}

Edge Cases I Encountered#

1. Keywords in Comments#

// This is a comment with if and while
/* 
  for loop example
*/

Regex matches keywords in comments. Solution: remove comments before analysis.

function removeComments(code: string): string {
  // Remove single-line comments
  code = code.replace(/\/\/.*$/gm, '')
  // Remove multi-line comments
  code = code.replace(/\/\*[\s\S]*?\*\//g, '')
  return code
}

2. Keywords in Strings#

const query = "SELECT * FROM users WHERE if = true"
const message = 'Press Enter to continue'

Same issue. AST approach naturally avoids this problem.

3. Nested Functions#

function outer() {
  function inner() {
    if (condition) {  // Which function does this if belong to?
      // ...
    }
  }
}

If you need to calculate complexity per function, maintain a scope stack during AST traversal:

function calculateFunctionComplexities(code: string): Map<string, number> {
  const ast = parse(code)
  const complexities = new Map<string, number>()
  const scopeStack: string[] = []
  
  traverse(ast, {
    FunctionDeclaration(path) {
      const name = path.node.id?.name || 'anonymous'
      scopeStack.push(name)
      complexities.set(name, 1)
    },
    'FunctionDeclaration:exit'() {
      scopeStack.pop()
    },
    IfStatement() {
      const currentFunction = scopeStack[scopeStack.length - 1]
      if (currentFunction) {
        complexities.set(
          currentFunction,
          (complexities.get(currentFunction) || 1) + 1
        )
      }
    }
  })
  
  return complexities
}

The Result#

Based on these ideas, I built: Code Analyzer

Features:

  • Real-time code statistics (lines, characters, functions)
  • Cyclomatic complexity calculation with recommendations
  • Estimated code reading time
  • Support for multiple programming languages

Code analysis isn’t complex, but getting the details right takes effort. Hope this helps.


Related: Code Formatter | Code Minifier