Back to all case studies
Manufacturing 3 min read

Matching Engine for a Manufacturing Data Cleanup Program

Built a scalable matching engine that reduced duplicate and incomplete records across multiple business units.

70% duplicate reduction across a messy multi-source record base. Entity resolution / Operational AI / Process optimization.

Business problem Duplicate and incomplete records across critical units
Delivery shape Spark-based matching engine with human review
Why it worked Automation handled scale while humans reviewed ambiguity
Read the Databris case study
What was at stake

The business problem behind the build.

Leaders were dealing with the consequences of roughly 200,000 duplicate and incomplete records across key units of the company. The issue affected how teams operated, trusted the CRM, and prioritized manual cleanup work.

The system needed to process large record volumes, work across inconsistent sources, and still leave room for human review where confidence was lower.

What we built

A system designed for real decision-making.

We built a Spark-based matching engine with human-in-the-loop review. The pipeline normalizes messy fields, narrows comparisons through blocking keys, applies weighted scoring across identity and location signals, and routes records into action buckets for follow-through.

The design balanced scale and safety: automation handled high-confidence cases while ambiguous matches stayed reviewable by people who understood the operational context.

Implementation highlights

The decisions that made the workflow hold up.

  • Blocking keys reduced the comparison footprint dramatically by narrowing pair generation to likely candidates.
  • Weighted scoring and validation rules separated high-confidence matches from records that needed manual review.
  • Configuration-driven mappings allowed the pipeline to accommodate upstream schema changes without rewriting the core matching logic.

Workflow View

Entity Matching Flow

How raw records become cleaner operational data with the right review checkpoints.

Data Inputs

Multi-source account data

Raw records from multiple business units are pulled into a common staging pipeline so matching can run across the full customer footprint.

Key signal: 200k problematic records

What changed

Result

The workflow restored a cleaner single source of truth, reduced duplicates by 70%, and removed more than one full-time equivalent of manual effort from the process.

The program worked because it treated data quality as an operational system, not a one-off cleanup project.

Takeaway

What this work says about how ControlThrive builds.

The best AI-enabled operations workflows know exactly where automation should stop and human review should begin.

Next step

Have a similar workflow in mind?

Bring the process, bottleneck, or review workflow you want to improve. We can sort out whether it needs a workshop, a lighter decision layer, or a full build.

More work

Two more examples of how the work shows up.

Private Capital Advisory

How a private capital advisory firm operationalized investor targeting

A founder-led build that helped a private capital advisory team move from fragmented CRM context and tacit deal knowledge to a production AI workflow for investor search, review, and handoff.

Private Equity

Portfolio Monitoring System for a Private Equity Team

A portfolio monitoring workflow designed to help investment teams prioritize what deserves a closer look after quarterly calls.