Validate at the Boundary: Postal Codes as a Data-Governance Primitive
Yesterday we shipped a launch post about a checkout form that auto-fills city and state when the customer types their ZIP. Three seconds saved, scaled across a million customers, ~833 hours of human time returned. That's the easy story to tell — the demo plays in 15 seconds and everyone immediately gets it.
A reply on the post took the conversation somewhere more interesting. It came from Jonathan Papworth:
> Checkouts a direct win. But what about up streams points of entry in CRM. Stops dirty data entering the warehouse systems. Can also think from model perspective — if we use ZIPs for RLS this can prevent bad zips = users seeing data they shouldnt.
Two ideas in three sentences. Both worth pulling apart.
The checkout case is the easy demo. The CRM and RLS cases are the bigger value. This post is about why validate-at-the-boundary is a different category of problem, and what changes when you treat postal codes as a data-governance primitive instead of a checkout-UX nicety.
The two places dirty postal codes get born
Most postal-code data quality conversations focus on the customer-facing form: checkout, shipping address, account profile. Real customers, typing real addresses, occasionally fat-fingering them.
But that's the small source of dirty data in most companies. The bigger sources are:
- CRM lead capture — sales reps hand-typing prospects into Salesforce/HubSpot/Pipedrive after a conference. No autocomplete. No validation. Whatever they type lands.
- Marketing form imports — newsletter signups, gated-content forms, webinar registrations. The data quality on these forms is lower than checkout because the user has less skin in the game.
- CSV imports — list-buys, vendor data exports, partner integrations. Source quality is whatever it is, and once it lands in your warehouse, it's your problem.
- Internal data entry tools — operations teams editing customer records, support ticket fields, account hierarchies. No one reviewing the field-level entries.
What validate-at-write actually means
The fix isn't ML, it isn't a quarterly cleanup script, and it isn't trusting the source. The fix is: every postal code in your system gets validated against a real reference before it's accepted as a write.
[Form / CRM / Import] → [Validation API] → [If valid, write. If not, reject or flag.]
Concrete shape:
- Single record at the boundary. When a sales rep types a prospect into Salesforce, a Lightning Component or platform event fires
POST /api/validateon the postal code field as they tab away. Invalid → red border, helpful message, reject the save. - Bulk at the import. When a CSV upload arrives, the import job calls
POST /api/validate-bulkwith up to 1,000 records per request. Invalid records get flagged in a "review" queue rather than landing silently in production tables. - Historical sweep. Existing dirty data gets the same
validate-bulktreatment in chunks, producing a one-time cleanup report.
PostalDataPI is sub-millisecond per lookup server-side, $0.000028 per call, with bulk validate at the same flat rate. That's the math change that makes write-time validation viable for the first time at most companies.
The Row-Level Security angle
Jonathan's second sentence is the one that made us stop and think.
In many enterprise data models, postal codes participate in row-level security. Think:
- Sales territories assigned by ZIP code or ZIP-prefix
- Regional customer-success queues filtered by location
- Partner portal access scoped to certain markets
- Healthcare data segmented by service area
- Financial data filtered by jurisdiction
The second case is a security problem, not a data quality problem. And it's exactly the kind of bug that goes unnoticed because nothing visibly breaks — the report just contains rows it shouldn't, and nobody checks.
Validation at write time closes this hole at the boundary. If every ZIP that enters the system is normalized to its canonical form (e.g., m4b1b3 → M4B 1B3, 90210-1234 → 90210), the matching rules downstream don't need to be permissive in the first place. Strict equality matching becomes safe.
This is what "data-governance primitive" means in practice: the validation guarantee at the boundary lets every downstream system trust the field. Permission rules get simpler. Joins get cleaner. Reports get truthful.
What this looks like in code
// Anywhere a postal code enters your system — form, import, sync — funnel
// through a validation step before the write is committed.
import { fetch } from "node-fetch";
interface ValidationResult {
valid: boolean;
normalized: string | null;
reason?: "not_found" | "invalid_format" | "unknown_country";
}
async function validatePostalCode(
zipcode: string,
country: string = "US"
): Promise<ValidationResult> {
const res = await fetch("https://postaldatapi.com/api/validate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
zipcode,
country,
apiKey: process.env.POSTALDATAPI_KEY,
}),
});
if (res.status === 404) {
return { valid: false, normalized: null, reason: "not_found" };
}
const data = await res.json();
return {
valid: true,
normalized: data.normalized ?? zipcode,
};
}
// In your CRM save handler / form submit / import row handler:
const result = await validatePostalCode(input.zipcode, input.country);
if (!result.valid) {
return reject(input, Invalid postal code: ${result.reason});
}
record.zipcode = result.normalized;
await db.insert(record);
For bulk imports the shape is identical, just one call per 1,000 records:
const res = await fetch("https://postaldatapi.com/api/validate-bulk", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
records: csvRows.map((r) => ({
postalCode: r.zip,
countryCode: r.country ?? "US",
})),
apiKey: process.env.POSTALDATAPI_KEY,
}),
});
const { results } = await res.json();
const invalid = results.filter((r) => !r.valid);
const valid = results.filter((r) => r.valid);
// Route invalid to a review queue, write valid through.
Why this is a fit for PostalDataPI specifically
Three things have to be true for write-time validation to actually work in production:
This is why we shipped bulk validate at flat rate — and why we keep the API surface narrow. The boundary-validation use case is the one we want to be unambiguously the right tool for.
The reframe
Most teams think about postal-code APIs as "the thing the checkout uses." That's a true use, but it's the smaller one. The bigger framing — the one Jonathan got us to articulate — is:
> Postal codes are a data-governance primitive. Validate them at the boundary, and every downstream system that depends on them gets simpler, faster, and more secure.
This is the same shape as the argument for typed-language compilers, schema validation at API boundaries, or input sanitization at HTTP edges. The pattern is old. The application to postal-code data, surprisingly, is still rare in practice — mostly because the underlying APIs were too expensive or too slow to make write-time validation feasible.
We think that's the change worth marking.
Thanks to Jonathan Papworth for the prompt. This post exists because he asked the better question.
If you want to try the validation pattern yourself: POST /api/validate for single records, POST /api/validate-bulk for up to 1,000 at a time, 1,000 free queries on signup, no credit card.