JSON Validation: From Syntax Checks to Schema Enforcement
Validation is more than catching missing brackets. Learn the layered approach professional teams use — syntax, structure, type, and business-rule validation — with real examples.
Most production bugs that look mysterious turn out to be validation failures in disguise. A field that was supposed to be a number arrives as a string. A required key is silently absent. A list that should always contain at least one item shows up empty after a deploy. The application keeps running, the database keeps writing rows, and a few days later somebody notices the totals are wrong.
Defending against this is the job of validation. The mistake teams make is treating validation as a single step — "does this parse?" — when it should be a layered pipeline that runs at every boundary between trusted and untrusted data. This article walks through the layers, explains where each one belongs, and shows what the failure modes look like when one is missing.
Layer 1 — syntactic validation
The first question is the simplest: are these bytes valid JSON? You cannot reason about a payload until a parser has accepted it. A syntactic check catches missing brackets, stray commas, unterminated strings, and the dozens of other small typos that make a string look like JSON without actually being JSON.
Syntactic validation is the cheapest layer to add and the easiest to forget. Many teams rely on the parser inside their HTTP framework to handle it implicitly. That works in production, but during debugging it produces unhelpful error messages — "unexpected token at position 2174" — that take longer to read than the payload itself. Keeping a dedicated validation tool such as the JSON Validator open during development gives you human-readable line and column information without piping payloads into a debugger.
Layer 2 — structural validation
Once the bytes parse, the next question is whether the resulting object has the shape you expect. This is where schema validation comes in. A schema describes the structure of the data: which keys are required, what types each value must be, which enumerations are allowed, and how nested objects relate to each other.
Several formats compete in this space — JSON Schema, OpenAPI, Protocol Buffers, GraphQL types, TypeScript-flavoured runtime validators like zod or valibot. The underlying ideas are similar across all of them. What matters is that you pick one and apply it at every boundary:
- When a request enters a service, validate the body, the query parameters, and the headers you actually use.
- When a response leaves a service, validate the body in development and staging environments. (You can decide whether to skip it in production for performance.)
- When a message is consumed from a queue or a webhook, validate before doing any work that has side effects.
- When configuration is loaded at boot time, validate before the process starts accepting traffic.
The reason boundaries matter is that internal code can usually trust its own data structures. The risk is at the seams, where untyped bytes become typed values. A failure that gets through structural validation is far more expensive to chase down than one that is caught at the door.
Layer 3 — semantic validation
Structural validation checks types and required fields. Semantic validation asks whether the values make sense. A quantity of -3 is a valid integer, but probably not a valid order line. A country code of "ZZ" is a valid string, but not a valid country. A discount of 110% is a valid number, but almost certainly a logic bug.
Semantic validation lives close to the domain code, because it encodes the rules of the business. The general pattern is:
- Pass the parsed and structurally validated data into a domain object constructor.
- Have the constructor enforce invariants — non-negative quantities, valid currency codes, plausible date ranges.
- Reject early, with a specific error message, before any database writes or downstream calls happen.
Semantic errors should never propagate as generic 500s. They are client-fixable problems and deserve a clear 4xx response that explains which rule was violated. A good error format names the field, the rule, and the offending value, and points the consumer at documentation.
Layer 4 — cross-field and cross-record validation
Some rules cannot be expressed by looking at a single field. A start date must be before an end date. A shipping address is required only when the order is physical. A discount code is valid only for the customer it was issued to. These rules cross fields, sometimes across records, and they are the layer most likely to be skipped because they are tedious to write.
Skipping them is exactly how the most embarrassing bugs reach production. The mitigation is to keep a list of cross-field rules in the same place as the schema, even if they are enforced separately in code. A future engineer reading the schema should be able to see which combinations are illegal without grep-ing the codebase.
What good error responses look like
A validation failure is a learning opportunity for the consumer. A good response gives them what they need to fix the call without opening a support ticket:
- A stable machine-readable error code, not just a sentence.
- The path to the offending field, in dot or JSON Pointer notation, so the consumer can navigate straight to it.
- A short, human-readable message explaining the rule.
- A reference to documentation when the rule is non-obvious.
- When safe, an echo of the offending value so the consumer can confirm what the server received.
Avoid leaking sensitive data in error messages, and avoid stack traces in production responses. Both are common, and both are avoidable.
Validation in untrusted environments
Browser-side validation is a UX feature, not a security boundary. Form-level checks make the experience feel responsive and reduce round-trips, but the server must repeat every check, because the client is fully under the user's control. The same logic applies to webhooks: validate the signature first, then the structure, then the semantics, in that order. Skipping any step turns the endpoint into an arbitrary code path that anybody on the internet can poke.
Common anti-patterns
- Validating once at the edge. A monolith with one big request validator at the front door looks neat in a diagram but breaks down as soon as internal services start consuming messages from queues. Validate at every boundary, not just the public one.
- Using exceptions for normal flow. A failed validation is not an exceptional event. Use explicit result types or error responses; reserve exceptions for unexpected internal failures.
- Hand-rolled validators. Re-implementing JSON Schema by hand is rarely a good investment. Use a maintained library and put your effort into the rules, not the engine.
- Silent coercion. Quietly turning the string
"true"into the booleantruefeels helpful and creates bugs that survive for years. Reject mismatched types loudly during development, even if you choose to coerce in production.
Validation as documentation
A side effect of taking validation seriously is that the schema becomes the single source of truth about what your API actually accepts. Generated API docs, client SDKs, mock servers, and contract tests can all consume the same schema. New engineers can read it instead of asking. Auditors can review it without spelunking through application code. The investment in writing the rules pays off again every time someone needs to integrate.
Putting it into practice
Start small. Add structural validation at one boundary in your system — the public API is usually the highest-leverage place. Once the schema is written, port it to the consumer side so the client and server agree about the contract. Layer in semantic and cross-field rules incrementally, and keep a tool like the browser-based JSON Validator handy for the inevitable times you need to check a payload by eye. Within a quarter, the class of bug that wakes you up at three in the morning will quietly disappear.