From JSON to Type Definitions: Type Inference and Multi-Language Code Generation
From JSON to Type Definitions: Type Inference and Multi-Language Code Generation#
Last week I was integrating a third-party API with deeply nested JSON responses — five levels deep, thirty fields per object. Writing TypeScript interfaces by hand was painful. Every field required guessing the type, checking for nulls, figuring out whether that number is an integer or a float.
I maintain JsonKit, a developer toolbox that includes a JSON-to-Code generator. This article walks through how type inference and cross-language code generation actually work under the hood.
Type Inference: Reversing from Values#
JSON is weakly typed — numbers don’t distinguish int from float, objects have no fixed schema. To reverse-engineer type definitions for statically-typed languages, the core is an inferType function:
function inferType(value: unknown): string {
if (value === null) return 'any'
if (typeof value === 'string') return 'string'
if (typeof value === 'number') return Number.isInteger(value) ? 'int' : 'float'
if (typeof value === 'boolean') return 'boolean'
if (Array.isArray(value)) {
const elementType = value.length > 0 ? inferType(value[0]) : 'any'
return `${elementType}[]`
}
if (typeof value === 'object') return toPascalCase(prefix)
return 'any'
}
One interesting detail: integer vs float differentiation. JavaScript’s number doesn’t distinguish between them, but Java’s Integer and Double are separate types. Using Number.isInteger() gives a more accurate mapping — integers become int, non-integers become float. It’s not perfect (Java Integer overflows on large values), but it’s better than defaulting everything to Double.
Recursive Property Extraction: Handling Nesting and Arrays#
Nested JSON is inevitable, so property extraction must be recursive:
function extractProperties(obj: object, prefix = '') {
const properties: PropertyInfo[] = []
for (const [key, value] of Object.entries(obj)) {
if (value === null) {
properties.push({ name: key, type: 'any' })
} else if (Array.isArray(value)) {
const elementType = value.length > 0 ? inferType(value[0], key) : 'any'
properties.push({ name: key, type: `${elementType}[]` })
if (value.length > 0 && typeof value[0] === 'object' && value[0] !== null) {
const nested = extractProperties(value[0], key)
// merge nested class definitions
}
} else if (typeof value === 'object') {
const className = toPascalCase(key)
properties.push({ name: key, type: className, nestedClass: className })
const nested = extractProperties(value, key)
// merge nested class definitions
} else {
properties.push({ name: key, type: inferType(value) })
}
}
return { properties, nestedClasses }
}
Edge cases that require special handling:
nullvalues: Impossible to infer the actual type, so fall back toany/Object- Empty arrays
[]: No sample data available, element type defaults toany - Deeply nested objects: Auto-generated class names from field names —
userProfile→UserProfile
One hidden issue with arrays: heterogeneous arrays like [1, "hello", true] where elements have different types. The current approach only inspects the first element. A more rigorous solution would compute a union type across all elements. In practice, real API responses rarely use heterogeneous arrays, so this simplification works fine.
The Name Conversion Pipeline#
Field names in JSON come in all flavors — snake_case, camelCase, PascalCase, kebab-case, and sometimes a mix. The converter needs a unified pipeline:
function toPascalCase(str: string): string {
return str
.split(/[-_\s]+/)
.map(word => word.charAt(0).toUpperCase() + word.slice(1).toLowerCase())
.join('')
}
function toCamelCase(str: string): string {
const pascal = toPascalCase(str)
return pascal.charAt(0).toLowerCase() + pascal.slice(1)
}
function toSnakeCase(str: string): string {
return str
.replace(/([A-Z])/g, '_$1')
.toLowerCase()
.replace(/^_/, '')
}
The strategy is straightforward: first split into words (by -, _, spaces, or uppercase boundaries), then reassemble according to the target language’s convention. Java class names use PascalCase, Go JSON tags use snake_case, TypeScript properties use camelCase. No matter what format the input JSON uses, the output stays consistent.
The Type Mapping Layer#
Every language has its own type system, so a mapping layer is essential. Here’s the Go version:
function mapToGoType(type: string): string {
if (type === 'string') return 'string'
if (type === 'int') return 'int'
if (type === 'float') return 'float64' // Go has no `float`
if (type === 'boolean') return 'bool'
if (type === 'any') return 'interface{}'
if (type.endsWith('[]')) {
const elementType = type.slice(0, -2)
return `[]${mapToGoType(elementType)}` // Go puts brackets first
}
return type
}
Here’s how the internal types map across languages:
| Internal | Java | Python | Go | TypeScript |
|---|---|---|---|---|
| string | String | str | string | string |
| int | Integer | int | int | number |
| float | Double | float | float64 | number |
| boolean | Boolean | bool | bool | boolean |
| any | Object | Any | interface{} | any |
| array | List<T> | List[T] | []T | T[] |
Beyond type mapping, each language has its own code style. Python generates @dataclass decorators — the standard data class approach since Python 3.7 that automatically provides __init__ and __repr__. Java produces full getter/setter methods following the JavaBean spec, required by frameworks like Spring and Jackson. Go appends json:"..." struct tags to ensure field names match the original JSON during serialization. TypeScript outputs clean interface declarations, the go-to pattern for frontend projects consuming API data.
One subtle detail with Go structs: exported field names must start with an uppercase letter (Go’s visibility rule), so toPascalCase(prop.name) is mandatory. At the same time, the json:"original_field_name" tag preserves the original JSON key for serialization compatibility.
The Complete Code Generation Flow#
Putting it all together, here’s the full pipeline from JSON to generated code:
- Parse:
JSON.parse()converts the input string into a JavaScript object - Traverse:
extractProperties()walks every field recursively, collecting property metadata and nested class definitions - Infer:
inferType()determines the type of each value - Map: Internal types are translated to the target language’s type system
- Generate: Code is emitted following the target language’s syntax and conventions
For input { "name": "JsonKit", "count": 42, "active": true }, TypeScript produces:
export interface Root {
name: string;
count: number;
active: boolean;
}
The Java version adds significantly more boilerplate:
import java.util.List;
public class Root {
private String name;
private Integer count;
private Boolean active;
public String getName() { return this.name; }
public void setName(String name) { this.name = name; }
// ... additional getters/setters
}
The JSON to Code Tool#
Based on this approach, the JSON to Code tool on JsonKit supports all four languages. The entire generator is only about 250 lines of code, but covers nested objects, arrays, null values, name conversion, and most common edge cases.
If you’re dealing with similar data structure conversion problems, give it a try on JsonKit. The core type inference logic isn’t complicated, but getting each language’s idiomatic patterns right and handling all the edge cases cleanly — that’s where the real engineering value lives.
Related tools: JSON Formatter | JSON Schema Generator | JSON to CSV