You cannot improve what you are not measuring.
Most job board operators have a good view of their traffic metrics, their employer acquisition numbers, and their application volumes. Most do not have an equally clear view of their data quality metrics.
The result is that data quality problems accumulate undetected until they become visible in user behaviour. Bounce rates that are higher than expected. Engagement with filtering features that is lower than it should be. User feedback about stale listings or incorrect locations. By the time these signals appear, the underlying data problem has usually been building for months.
Every conversation about how to scale a job board to 1 million listings eventually arrives at the same realisation: volume without data quality does not produce a better product. It produces a larger version of the same problems. The boards that scale cleanly are the ones measuring data quality at every stage and fixing gaps before they compound.
A structured data quality audit, run against six specific metrics, gives operators an accurate picture of where their recruitment data platform is performing and where it is failing silently.
Metric 1: Title Normalisation Coverage
What to measure: The percentage of records in the index where the job title has been mapped to a standard taxonomy value, versus records where the title is stored as raw, un-normalised text.
Why it matters: Search and filtering on un-normalised titles returns inconsistent results. Job seekers searching for “Software Engineer” do not see records titled “Sr. SWE” or “Full-Stack Dev” unless normalisation has mapped those variants to the same taxonomy node.
What good looks like: 95% or higher normalisation coverage. Any record below this threshold is contributing noise to search results.
What poor coverage signals: The enrichment pipeline is not handling the full range of title formats entering from current sources. Common causes include new employer sources with unusual title conventions, new ATS platforms generating titles in unexpected formats, or a normalisation model that has not been updated to handle recently emerged role names.
Metric 2: Location Geocoding Coverage
What to measure: The percentage of records with a geocoded location, expressed as standardised city, country, and coordinates, versus records with only a raw location string.
Why it matters: Location-based search and proximity filtering operate on geocoded coordinates, not text strings. A record with location “NYC Hybrid 3 days” is not searchable by proximity. A record geocoded to 40.7128, -74.0060 is.
What good looks like: 90% or higher geocoding coverage. The 10% floor accounts for genuinely ambiguous or remote-only listings where precise geocoding is not meaningful.
What poor coverage signals: Location strings from certain sources or geographies are not being resolved by the geocoding layer. This is often a regional coverage gap: geocoding models trained primarily on US and UK location formats handle those markets well but produce lower coverage for DACH, APAC, or LATAM location conventions.
Metric 3: Salary Field Completion Rate
What to measure: The percentage of records with a structured salary value, either disclosed by the employer or estimated by the enrichment pipeline, versus records with a blank or unusable salary field.
Why it matters: Salary filters are one of the highest-engagement features on any job board. A filter that only works on 30% of listings is a liability rather than an asset.
What good looks like: 70% or higher salary completion after enrichment. Since approximately 70% of employers globally do not disclose salary, a recruitment data platform with salary estimation should be filling the majority of those gaps. A completion rate below 50% after enrichment indicates the estimation model is not running on the full index or is producing low-confidence estimates that are being excluded.
What poor coverage signals: Salary estimation is either not running on all sources or is not configured for the industries and geographies the board covers. Salary estimation models require market-specific training data. A model trained on US technology sector data produces poor estimates for UK healthcare roles.
Metric 4: Duplicate Rate
What to measure: The percentage of records in the index that are duplicates of another record, representing the same job opening from a different source.
Why it matters: Duplicate listings degrade search quality directly. Job seekers see the same role multiple times. The board’s listing count is inflated. Application data is fragmented across multiple records for the same opening.
What good looks like: Below 3% duplicate rate across the full index. Above 5% is a signal that the deduplication layer is not performing adequately for the current source mix.
What poor coverage signals: New sources have been added to the pipeline without updating deduplication logic to handle the overlap with existing sources. Deduplication thresholds are set too conservatively, missing near-duplicate records that differ slightly in title or description across sources.
Metric 5: Listing Freshness Rate
What to measure: The percentage of listings in the index that have been verified as live within the last 48 hours, versus listings whose last verification was more than 48 hours ago.
Why it matters: Stale listings are one of the fastest ways to erode job seeker trust. A job seeker who applies to a role that was filled two weeks ago attributes the failure to the job board, not the employer.
What good looks like: 85% or higher freshness rate at the 48-hour window. The 15% allowance accommodates low-volume employer sources with infrequent posting activity where a 48-hour crawl cycle is disproportionate to the posting frequency.
What poor coverage signals: The crawling layer is not monitoring all sources at sufficient frequency. High-volume employer sources with daily posting activity should be crawled more than once per day. Sources where the freshness rate is consistently low indicate a crawl scheduling problem rather than a source-level issue.
Metric 6: Skills Extraction Coverage
What to measure: The percentage of records with at least one extracted skill tag, versus records where the skills field is empty.
Why it matters: Skills-based search, matching, and recommendation features require structured skills data on a high percentage of the index to function correctly. A skills filter that only works on 40% of listings undermines the feature and the job seekers who rely on it.
What good looks like: 75% or higher skills extraction coverage. The floor accounts for listings whose job descriptions are genuinely sparse and do not contain extractable skill references.
What poor coverage signals: The skills extraction model is not running on all sources, is producing low-confidence extractions that are being excluded, or is not handling the industry vocabulary of the board’s vertical correctly. A board focused on healthcare will have lower skills coverage if the extraction model was not trained on clinical and medical role terminology.
Running the Audit
These six metrics can be measured directly against any job board’s index.
For boards with a job data management platform that includes quality monitoring, these metrics should be visible in a dashboard and tracked continuously. For boards that do not have this monitoring layer, the audit requires a direct query against the index to calculate each metric from the data itself.
Understanding where each metric stands is also the most practical starting point for any operator thinking about how to scale a job board to 1 million listings. At that volume, a 5% duplicate rate is 50,000 bad records. A 20% gap in title normalisation coverage is 200,000 records contributing noise to search results. The metrics that are acceptable at 10,000 listings become structural liabilities at scale.
At Propellum, these metrics are part of the continuous monitoring built into our AI job board infrastructure. When any metric drops below threshold across a client’s feed, the cause is identified and resolved before it reaches the board’s users. Quality monitoring is not a separate audit process. It runs alongside every record, continuously, as part of the data pipeline itself.
The boards that are improving their data quality over time are the ones that are measuring it. The boards that are scaling without quality degradation are the ones whose recruitment data platform is measuring it automatically.
Want to see where your current data layer stands on these metrics? Request a free data quality assessment →
- The Job Board Data Quality Audit: 6 Metrics Every Founder Should Be Measuring
- How to Populate a New Job Board: The Fastest Route from 0 to 100,000 Listings
- Why Raw Job Data Is Broken — And What It Costs Your Job Board
- Real-Time Job Feed vs Job Data API — Which Architecture Actually Works for Job Boards
- What an AI Job Crawler Actually Does That a Standard Scraper Cannot