Automated Financial Reconciliation Pipeline
Replacing a two-week manual month-end close with a Python ETL pipeline and ML anomaly detection layer - reducing reconciliation time from 14 days to under 3 hours.

The Challenge
Our client’s finance team spent the first two weeks of every month downloading CSV exports from five different payment gateways, normalizing inconsistent formats in Excel, manually reconciling discrepancies, and producing a single consolidated report for leadership. The process was fully manual, error-prone, and blocked critical decisions until completion.
Two specific problems made it worse: each gateway formatted amounts, dates, and identifiers differently, and the reconciliation rules were held entirely in the heads of two senior team members. When one of them was on leave, the close slipped by days.
Our Approach
We took the view that the right solution was a composable pipeline, not another monolithic finance tool.
Step 1: Ingest and normalize. We built a Python ETL layer using pandas and pydantic to ingest exports from all five gateways. Each gateway got its own Adapter class responsible for one transformation: raw CSV in, normalized Transaction objects out. Adding a new gateway meant writing a new adapter, nothing else.
Step 2: Reconciliation engine. The core reconciliation logic - matching transactions across gateways by amount, currency, and a configurable time window - was implemented as a pure Python module with an extensive test suite. We encoded the domain rules that previously lived only in the team’s heads as explicit, version-controlled logic.
Step 3: ML anomaly detection. After the first three months of operation, we added an isolation forest model trained on historical reconciled data to flag statistical outliers before the final report is generated. The model surfaces rows that look unusual - unexpected late transactions, duplicate amounts, currency mismatches - as a review queue rather than silently including or excluding them. This replaced what had been an experienced human doing a “gut check” pass through the data.
Step 4: Automated delivery. A GCP Cloud Functions scheduler triggers the pipeline nightly and at month-end. Output is PDF summaries and clean unified spreadsheets, delivered directly to the finance Slack channel via webhook and stored in a GCS bucket for the audit trail.
The Results
- 98% reduction in manual overhead - the team went from two weeks of manual work to a 2-hour review of the anomaly queue.
- Month-end close reduced from 14 days to under 3 hours.
- Zero transcription errors across the first two full quarters of operation.
- The two senior team members are no longer a single point of failure - the reconciliation rules are in code, reviewed, and tested.
“The automation essentially gave us back half of our team’s capacity. The ROI was realized in the very first month of operation.” - Director of Finance
Technical Stack
- Python 3.12 - ETL pipeline and reconciliation engine
- pandas + pydantic - data normalization and validation
- scikit-learn - isolation forest anomaly detection
- GCP Cloud Functions + Cloud Scheduler - orchestration
- Google Cloud Storage - audit trail
- Slack webhooks - automated report delivery