String Escape Implementation: From Regex Replacement to Unicode Handling
String Escape Implementation: From Regex Replacement to Unicode Handling#
Dealing with user input strings often causes problems when special characters break JSON parsing. Newlines, quotes, backslashes - they all need proper handling. So I built a string escape tool and here’s what I learned.
The Essence of Escaping#
Escaping means converting special characters into escape sequences. For example, the newline character \n (ASCII 10) becomes two characters: \ and n.
The core is just a few regex replacements:
function escapeString(str: string): string {
return str
.replace(/\\/g, '\\\\') // Backslash must be first
.replace(/"/g, '\\"') // Double quote
.replace(/'/g, "\\'") // Single quote
.replace(/\n/g, '\\n') // Newline
.replace(/\r/g, '\\r') // Carriage return
.replace(/\t/g, '\\t') // Tab
.replace(/\0/g, '\\0') // Null character
}
Why Backslash Must Be First?#
This is a classic pitfall. Consider the string \n (one newline character):
Correct order: Process \ first → \\n, then process newline → \\n (correct result)
Wrong order: Process newline first → \n, then process \ → \\n
But what if the original string is \n (two characters: backslash + n)?
- Process backslash first:
\\n - Process newline:
\\n(no match, because\nalready became\\n)
If reversed:
- Process newline first:
\n(no match, this is backslash+n, not newline) - Process backslash:
\\n
Seems the same? But consider a real newline character (ASCII 10):
- Process backslash first: unchanged (no backslash)
- Process newline:
\n
Now if the text has both, like a\nb (a + backslash + n + b), processing newline first does nothing, then backslash makes it a\\nb. The correct result should be a\\nb.
So the order must be strict: process backslash first, then other escape characters.
Unescape Order is Reversed#
Unescaping requires the opposite order: process other escape sequences first, backslash last.
function unescapeString(str: string): string {
return str
.replace(/\\0/g, '\0') // Null character
.replace(/\\t/g, '\t') // Tab
.replace(/\\r/g, '\r') // Carriage return
.replace(/\\n/g, '\n') // Newline
.replace(/\\'/g, "'") // Single quote
.replace(/\\"/g, '"') // Double quote
.replace(/\\\\/g, '\\') // Backslash last
}
The reason: \\n should become \n (backslash + n), not a newline. If you process backslash first, \\n becomes \n, then gets treated as a newline.
Edge Cases#
1. Null Character \0#
Null character is ASCII 0, valid in JavaScript:
const str = 'hello\0world'
console.log(str.length) // 11, not 10
But in C, \0 terminates strings. Be careful if your data goes to C programs.
2. Windows Newline \r\n#
Windows uses \r\n, Unix uses \n. When escaping:
// Windows text
const text = 'line1\r\nline2'
const escaped = escapeString(text)
// Result: line1\\r\\nline2
Unescaping restores it to \r\n. For Unix format, you need extra handling.
3. Unicode Escape#
Standard escaping doesn’t include Unicode. For that:
function escapeUnicode(str: string): string {
return str.replace(/[\u0000-\u001f\u007f-\uffff]/g, (char) => {
const code = char.charCodeAt(0)
return `\\u${code.toString(16).padStart(4, '0')}`
})
}
escapeUnicode('你好') // \u4f60\u597d
All non-ASCII characters become \uXXXX.
4. HTML Entities vs String Escape#
Don’t confuse them:
- String escape: For JSON, code strings, uses
\n,\t - HTML entities: For HTML content, uses
<,&
// JSON string
const json = '{"text": "Line1\\nLine2"}'
// HTML content
const html = '<div>Line1<br>Line2</div>'
Real-World Use Cases#
1. Generating JSON Strings#
Although JSON.stringify handles escaping automatically:
function toJsonString(obj: unknown): string {
return JSON.stringify(obj)
}
2. Database Queries#
Single quotes in SQL need escaping:
function escapeSql(str: string): string {
return str.replace(/'/g, "''")
}
// Better: use parameterized queries, don't concatenate SQL
3. Regular Expressions#
Special characters in regex need double escaping:
function escapeRegex(str: string): string {
return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
}
const pattern = new RegExp(escapeRegex('a.b')) // Matches literal a.b
Performance Considerations#
Chained .replace() creates intermediate strings. For large texts, single-pass is better:
function escapeStringOptimized(str: string): string {
const result: string[] = []
for (let i = 0; i < str.length; i++) {
const char = str[i]
switch (char) {
case '\\': result.push('\\\\'); break
case '"': result.push('\\"'); break
case "'": result.push("\\'"); break
case '\n': result.push('\\n'); break
case '\r': result.push('\\r'); break
case '\t': result.push('\\t'); break
case '\0': result.push('\\0'); break
default: result.push(char)
}
}
return result.join('')
}
In practice, correctness matters more than performance for string escaping.
The Final Tool#
Based on these ideas, I built: String Escape Tool
Features:
- Bidirectional escape/unescape
- Supports
\n,\t,\r,\0,\\,\",\' - One-click swap input/output
- Real-time preview
The core isn’t complex, but getting the replacement order and edge cases right requires careful thinking. Hope this helps.
Related: URL Encode/Decode | Base64 Encode/Decode