Matching Engine for a Manufacturing Data Cleanup Program

What was at stake

The business problem behind the build.

Leaders were dealing with the consequences of roughly 200,000 duplicate and incomplete records across key units of the company. The issue affected how teams operated, trusted the CRM, and prioritized manual cleanup work.

The system needed to process large record volumes, work across inconsistent sources, and still leave room for human review where confidence was lower.

What we built

A system designed for real decision-making.

We built a Spark-based matching engine with human-in-the-loop review. The pipeline normalizes messy fields, narrows comparisons through blocking keys, applies weighted scoring across identity and location signals, and routes records into action buckets for follow-through.

The design balanced scale and safety: automation handled high-confidence cases while ambiguous matches stayed reviewable by people who understood the operational context.

Implementation highlights

The decisions that made the workflow hold up.

Blocking keys reduced the comparison footprint dramatically by narrowing pair generation to likely candidates.
Weighted scoring and validation rules separated high-confidence matches from records that needed manual review.
Configuration-driven mappings allowed the pipeline to accommodate upstream schema changes without rewriting the core matching logic.

Workflow View

Entity Matching Flow

How raw records become cleaner operational data with the right review checkpoints.

Data Inputs

Multi-source account data

Raw records from multiple business units are pulled into a common staging pipeline so matching can run across the full customer footprint.

Key signal: 200k problematic records

What changed

Result

The workflow restored a cleaner single source of truth, reduced duplicates by 70%, and removed more than one full-time equivalent of manual effort from the process.

The program worked because it treated data quality as an operational system, not a one-off cleanup project.

Takeaway

What this work says about how controlthrive builds.

The best AI-enabled operations workflows know exactly where automation should stop and human review should begin.

02 Next step

Have a similar workflow in mind?

Bring the process, bottleneck, or review workflow you want to improve. We can sort out whether it needs a workshop, a lighter decision layer, or a full build.

Book a founder call Email controlthrive

03 More work

Private Capital Advisory

A proprietary workflow that moved from scoring experiments to campaign-ready investor shortlists

How a private capital advisory firm operationalized investor targeting

A founder-led build that helped a private capital advisory team move from fragmented CRM context and tacit deal knowledge to a production AI workflow for investor search, review, and handoff.

Read this case study

Private Equity

Faster portfolio review through ranked alerts and supporting evidence

Portfolio Monitoring System for a Private Equity Team

A portfolio monitoring workflow designed to help investment teams prioritize what deserves a closer look after quarterly calls.

Read this case study