The business problem behind the build.
Leaders were dealing with the consequences of roughly 200,000 duplicate and incomplete records across key units of the company. The issue affected how teams operated, trusted the CRM, and prioritized manual cleanup work.
The system needed to process large record volumes, work across inconsistent sources, and still leave room for human review where confidence was lower.
A system designed for real decision-making.
We built a Spark-based matching engine with human-in-the-loop review. The pipeline normalizes messy fields, narrows comparisons through blocking keys, applies weighted scoring across identity and location signals, and routes records into action buckets for follow-through.
The design balanced scale and safety: automation handled high-confidence cases while ambiguous matches stayed reviewable by people who understood the operational context.
The decisions that made the workflow hold up.
- Blocking keys reduced the comparison footprint dramatically by narrowing pair generation to likely candidates.
- Weighted scoring and validation rules separated high-confidence matches from records that needed manual review.
- Configuration-driven mappings allowed the pipeline to accommodate upstream schema changes without rewriting the core matching logic.
Workflow View
Entity Matching Flow
How raw records become cleaner operational data with the right review checkpoints.
Multi-source account data
Raw records from multiple business units are pulled into a common staging pipeline so matching can run across the full customer footprint.
Key signal: 200k problematic records
Result
The workflow restored a cleaner single source of truth, reduced duplicates by 70%, and removed more than one full-time equivalent of manual effort from the process.
The program worked because it treated data quality as an operational system, not a one-off cleanup project.
What this work says about how ControlThrive builds.
The best AI-enabled operations workflows know exactly where automation should stop and human review should begin.