String Escape Implementation: From Regex Replacement to Unicode Handling#

Dealing with user input strings often causes problems when special characters break JSON parsing. Newlines, quotes, backslashes - they all need proper handling. So I built a string escape tool and here’s what I learned.

The Essence of Escaping#

Escaping means converting special characters into escape sequences. For example, the newline character \n (ASCII 10) becomes two characters: \ and n.

The core is just a few regex replacements:

function escapeString(str: string): string {
  return str
    .replace(/\\/g, '\\\\')  // Backslash must be first
    .replace(/"/g, '\\"')    // Double quote
    .replace(/'/g, "\\'")    // Single quote
    .replace(/\n/g, '\\n')   // Newline
    .replace(/\r/g, '\\r')   // Carriage return
    .replace(/\t/g, '\\t')   // Tab
    .replace(/\0/g, '\\0')   // Null character
}

Why Backslash Must Be First?#

This is a classic pitfall. Consider the string \n (one newline character):

Correct order: Process \ first → \\n, then process newline → \\n (correct result)

Wrong order: Process newline first → \n, then process \\\n

But what if the original string is \n (two characters: backslash + n)?

  1. Process backslash first: \\n
  2. Process newline: \\n (no match, because \n already became \\n)

If reversed:

  1. Process newline first: \n (no match, this is backslash+n, not newline)
  2. Process backslash: \\n

Seems the same? But consider a real newline character (ASCII 10):

  • Process backslash first: unchanged (no backslash)
  • Process newline: \n

Now if the text has both, like a\nb (a + backslash + n + b), processing newline first does nothing, then backslash makes it a\\nb. The correct result should be a\\nb.

So the order must be strict: process backslash first, then other escape characters.

Unescape Order is Reversed#

Unescaping requires the opposite order: process other escape sequences first, backslash last.

function unescapeString(str: string): string {
  return str
    .replace(/\\0/g, '\0')   // Null character
    .replace(/\\t/g, '\t')   // Tab
    .replace(/\\r/g, '\r')   // Carriage return
    .replace(/\\n/g, '\n')   // Newline
    .replace(/\\'/g, "'")    // Single quote
    .replace(/\\"/g, '"')    // Double quote
    .replace(/\\\\/g, '\\')  // Backslash last
}

The reason: \\n should become \n (backslash + n), not a newline. If you process backslash first, \\n becomes \n, then gets treated as a newline.

Edge Cases#

1. Null Character \0#

Null character is ASCII 0, valid in JavaScript:

const str = 'hello\0world'
console.log(str.length)  // 11, not 10

But in C, \0 terminates strings. Be careful if your data goes to C programs.

2. Windows Newline \r\n#

Windows uses \r\n, Unix uses \n. When escaping:

// Windows text
const text = 'line1\r\nline2'
const escaped = escapeString(text)
// Result: line1\\r\\nline2

Unescaping restores it to \r\n. For Unix format, you need extra handling.

3. Unicode Escape#

Standard escaping doesn’t include Unicode. For that:

function escapeUnicode(str: string): string {
  return str.replace(/[\u0000-\u001f\u007f-\uffff]/g, (char) => {
    const code = char.charCodeAt(0)
    return `\\u${code.toString(16).padStart(4, '0')}`
  })
}

escapeUnicode('你好')  // \u4f60\u597d

All non-ASCII characters become \uXXXX.

4. HTML Entities vs String Escape#

Don’t confuse them:

  • String escape: For JSON, code strings, uses \n, \t
  • HTML entities: For HTML content, uses <, &
// JSON string
const json = '{"text": "Line1\\nLine2"}'

// HTML content
const html = '<div>Line1&lt;br&gt;Line2</div>'

Real-World Use Cases#

1. Generating JSON Strings#

Although JSON.stringify handles escaping automatically:

function toJsonString(obj: unknown): string {
  return JSON.stringify(obj)
}

2. Database Queries#

Single quotes in SQL need escaping:

function escapeSql(str: string): string {
  return str.replace(/'/g, "''")
}

// Better: use parameterized queries, don't concatenate SQL

3. Regular Expressions#

Special characters in regex need double escaping:

function escapeRegex(str: string): string {
  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
}

const pattern = new RegExp(escapeRegex('a.b'))  // Matches literal a.b

Performance Considerations#

Chained .replace() creates intermediate strings. For large texts, single-pass is better:

function escapeStringOptimized(str: string): string {
  const result: string[] = []
  for (let i = 0; i < str.length; i++) {
    const char = str[i]
    switch (char) {
      case '\\': result.push('\\\\'); break
      case '"': result.push('\\"'); break
      case "'": result.push("\\'"); break
      case '\n': result.push('\\n'); break
      case '\r': result.push('\\r'); break
      case '\t': result.push('\\t'); break
      case '\0': result.push('\\0'); break
      default: result.push(char)
    }
  }
  return result.join('')
}

In practice, correctness matters more than performance for string escaping.

The Final Tool#

Based on these ideas, I built: String Escape Tool

Features:

  • Bidirectional escape/unescape
  • Supports \n, \t, \r, \0, \\, \", \'
  • One-click swap input/output
  • Real-time preview

The core isn’t complex, but getting the replacement order and edge cases right requires careful thinking. Hope this helps.


Related: URL Encode/Decode | Base64 Encode/Decode