validationdata-governancecrmdata-qualityrls

Validate at the Boundary: Postal Codes as a Data-Governance Primitive

The PostalDataPI Team·May 7, 2026·8 min read

Yesterday we shipped a launch post about a checkout form that auto-fills city and state when the customer types their ZIP. Three seconds saved, scaled across a million customers, ~833 hours of human time returned. That's the easy story to tell — the demo plays in 15 seconds and everyone immediately gets it.

A reply on the post took the conversation somewhere more interesting. It came from Jonathan Papworth:

> Checkouts a direct win. But what about up streams points of entry in CRM. Stops dirty data entering the warehouse systems. Can also think from model perspective — if we use ZIPs for RLS this can prevent bad zips = users seeing data they shouldnt.

Two ideas in three sentences. Both worth pulling apart.

The checkout case is the easy demo. The CRM and RLS cases are the bigger value. This post is about why validate-at-the-boundary is a different category of problem, and what changes when you treat postal codes as a data-governance primitive instead of a checkout-UX nicety.

The two places dirty postal codes get born

Most postal-code data quality conversations focus on the customer-facing form: checkout, shipping address, account profile. Real customers, typing real addresses, occasionally fat-fingering them.

But that's the small source of dirty data in most companies. The bigger sources are:

CRM lead capture — sales reps hand-typing prospects into Salesforce/HubSpot/Pipedrive after a conference. No autocomplete. No validation. Whatever they type lands.
Marketing form imports — newsletter signups, gated-content forms, webinar registrations. The data quality on these forms is lower than checkout because the user has less skin in the game.
CSV imports — list-buys, vendor data exports, partner integrations. Source quality is whatever it is, and once it lands in your warehouse, it's your problem.
Internal data entry tools — operations teams editing customer records, support ticket fields, account hierarchies. No one reviewing the field-level entries.

The volume here dwarfs checkout entries in most B2B and many B2C operations. And unlike checkout — where a wrong ZIP is fixed by the customer or surfaces as a shipping problem fast — these dirty records sit in the warehouse compounding over months and years.

What validate-at-write actually means

The fix isn't ML, it isn't a quarterly cleanup script, and it isn't trusting the source. The fix is: every postal code in your system gets validated against a real reference before it's accepted as a write.

[Form / CRM / Import]  →  [Validation API]  →  [If valid, write. If not, reject or flag.]

Concrete shape:

Single record at the boundary. When a sales rep types a prospect into Salesforce, a Lightning Component or platform event fires POST /api/validate on the postal code field as they tab away. Invalid → red border, helpful message, reject the save.
Bulk at the import. When a CSV upload arrives, the import job calls POST /api/validate-bulk with up to 1,000 records per request. Invalid records get flagged in a "review" queue rather than landing silently in production tables.
Historical sweep. Existing dirty data gets the same validate-bulk treatment in chunks, producing a one-time cleanup report.

This is the same pattern you'd apply to email format (we already do), phone number format (often), and country codes (sometimes). Postal code validation is the next obvious slot — and it's the one most teams haven't filled because the upstream APIs were too expensive or too slow to call on every write.

PostalDataPI is sub-millisecond per lookup server-side, $0.000028 per call, with bulk validate at the same flat rate. That's the math change that makes write-time validation viable for the first time at most companies.

The Row-Level Security angle

Jonathan's second sentence is the one that made us stop and think.

In many enterprise data models, postal codes participate in row-level security. Think:

Sales territories assigned by ZIP code or ZIP-prefix
Regional customer-success queues filtered by location
Partner portal access scoped to certain markets
Healthcare data segmented by service area
Financial data filtered by jurisdiction

When a row's postal code is wrong, two things can break:

False negatives. A legitimate user can't see a record that should belong to their territory because the ZIP is malformed and the matching rule fails.

False positives. Worse — a permissive matching rule (substring match, soundex, whatever) lets a user see a record they shouldn't because the dirty ZIP accidentally matches a different territory's pattern.

The second case is a security problem, not a data quality problem. And it's exactly the kind of bug that goes unnoticed because nothing visibly breaks — the report just contains rows it shouldn't, and nobody checks.

Validation at write time closes this hole at the boundary. If every ZIP that enters the system is normalized to its canonical form (e.g., m4b1b3 → M4B 1B3, 90210-1234 → 90210), the matching rules downstream don't need to be permissive in the first place. Strict equality matching becomes safe.

This is what "data-governance primitive" means in practice: the validation guarantee at the boundary lets every downstream system trust the field. Permission rules get simpler. Joins get cleaner. Reports get truthful.

What this looks like in code

// Anywhere a postal code enters your system — form, import, sync — funnel
// through a validation step before the write is committed.
import { fetch } from "node-fetch";
interface ValidationResult {
  valid: boolean;
  normalized: string | null;
  reason?: "not_found" | "invalid_format" | "unknown_country";
}
async function validatePostalCode(
  zipcode: string,
  country: string = "US"
): Promise<ValidationResult> {
  const res = await fetch("https://postaldatapi.com/api/validate", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      zipcode,
      country,
      apiKey: process.env.POSTALDATAPI_KEY,
    }),
  });
if (res.status === 404) {
    return { valid: false, normalized: null, reason: "not_found" };
  }
const data = await res.json();
  return {
    valid: true,
    normalized: data.normalized ?? zipcode,
  };
}
// In your CRM save handler / form submit / import row handler:
const result = await validatePostalCode(input.zipcode, input.country);
if (!result.valid) {
  return reject(input, Invalid postal code: ${result.reason});
}
record.zipcode = result.normalized;
await db.insert(record);

For bulk imports the shape is identical, just one call per 1,000 records:

const res = await fetch("https://postaldatapi.com/api/validate-bulk", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    records: csvRows.map((r) => ({
      postalCode: r.zip,
      countryCode: r.country ?? "US",
    })),
    apiKey: process.env.POSTALDATAPI_KEY,
  }),
});
const { results } = await res.json();
const invalid = results.filter((r) => !r.valid);
const valid = results.filter((r) => r.valid);
// Route invalid to a review queue, write valid through.

Why this is a fit for PostalDataPI specifically

Three things have to be true for write-time validation to actually work in production:

Cheap enough to call on every write. At $0.000028 per call, a million writes per month costs $28. That's lunch money for a CRM that has a thousand sales reps; it's literally affordable as a write-time check.

Fast enough not to be felt at the form. Sub-millisecond server-side processing means the network round-trip is the only meaningful latency. Customers don't perceive sub-300ms operations on a form save.

Reliable enough that you can trust the rejection. If the validation API returns false-negatives (real ZIPs flagged as invalid), it becomes a liability rather than a tool. PostalDataPI runs against canonical postal authority data, refreshed regularly.

This is why we shipped bulk validate at flat rate — and why we keep the API surface narrow. The boundary-validation use case is the one we want to be unambiguously the right tool for.

The reframe

Most teams think about postal-code APIs as "the thing the checkout uses." That's a true use, but it's the smaller one. The bigger framing — the one Jonathan got us to articulate — is:

> Postal codes are a data-governance primitive. Validate them at the boundary, and every downstream system that depends on them gets simpler, faster, and more secure.

This is the same shape as the argument for typed-language compilers, schema validation at API boundaries, or input sanitization at HTTP edges. The pattern is old. The application to postal-code data, surprisingly, is still rare in practice — mostly because the underlying APIs were too expensive or too slow to make write-time validation feasible.

We think that's the change worth marking.

Thanks to Jonathan Papworth for the prompt. This post exists because he asked the better question.

If you want to try the validation pattern yourself: POST /api/validate for single records, POST /api/validate-bulk for up to 1,000 at a time, 1,000 free queries on signup, no credit card.