Work

Compliance Hub is Swap's system of record for product classification: the tax codes and HS/HTS codes that decide how much duty and tax a cross-border shopper pays at checkout. I led two workstreams that moved classification from manual and partial to automated and destination-specific.

Challenge: shoppers were charged the wrong duty and tax because classification was manual, partial, and blind to what the product actually was. Solution: made Compliance Hub the source of truth, classifying only active products on change with full product context, synced into the fields checkout already reads. Impact: +40% duty/tax accuracy and ~60% less manual SKU review across 13K+ jurisdictions.

The problem

Transactions reached our tax engine without product-level tax codes, so it fell back to standard rates. That meant systematic over-collection: shoppers in markets with reduced rates or exemptions were charged more than the law required, inflating checkout prices and quietly costing conversion.
For duties, we passed merchant 6-digit HS codes to the tax engine, which extended them to destination-specific 8 or 10 digits using only the partial code and the destination country, ignoring product title, description, and material composition. For jewelry, religious items, and mixed-material goods, that produced wrong classifications.
We had no baseline for how wrong we were. The gap stayed invisible until a jewelry brand, with roughly 5,000 SKUs under manual review, flagged the discrepancies and stalled onboarding over customs accuracy. The complaint was the first real signal we had.

How I measured success

Classification coverage: share of active SKUs with destination-specific HS codes.
Merchant complaints: volume of HS code-related support tickets and onboarding concerns.
Duty accuracy: duty calculation disputes tied to HS codes.

What I did

Made Compliance Hub the source of truth for classification, with two coordinated pipelines, since tax codes and customs codes can't come from a single API.
Ran classification asynchronously on product create/update, only for active products and only on change, so checkout never waits on a real-time third-party call and classification cost doesn't scale with the whole catalog.
Fed full product context (title, description, category, material composition) into classification to generate accurate destination-specific codes, and let merchants configure target markets and a default.
Synced generated codes into Shopify's market-specific HS fields, where checkout already reads them, so the change needed no checkout rewrite.

The alternative I rejected

The obvious build was our own fine-tuned HS classification model. We tried it. Against the tax engine's classification API fed with full product context, our model trailed by about 15 points on accuracy. In a regulated domain that gap is the difference between a correct customs invoice and an audit liability, so I chose to orchestrate the existing API with better inputs rather than own the model, and put the effort into the data contracts and sync that actually moved accuracy.

The lesson

The breakthrough on the jewelry disputes was unglamorous: running the classification API with full product depth as input returned the exact HTS codes the merchant said were right. The accuracy problem wasn't the model, it was the inputs we were feeding it. The trade-off I designed around: making Compliance Hub a checkout dependency turned sync freshness into a reliability concern, and per-call costs scale with catalog size, so classifying only active products on change was the lever for both.

What I'd do differently

We found the problem the worst possible way, through a merchant complaint, because we had no view into accuracy before brands flagged it. If I ran it again, I'd ship the coverage and dispute instrumentation before the classification pipeline, not after. Knowing destination-specific code coverage and HS dispute rate from day one would have surfaced the jewelry misclassifications in-house, instead of in a stalled onboarding.

+40%

Duty/tax accuracy

~60%

Less manual SKU review

13K+

Jurisdictions in scope

Results showed up immediately after launching with correct destination-specific duties, across the ~5,000 active SKUs that had been under manual review.

Proof in the wild

Red Equipment, a UK outdoor brand, rebuilt cross-border fulfillment on Swap, including the transparent duties and territory-specific tax/VAT this work powers, and reported +111% global sales YoY, +65% conversion, and +13% AOV. That's a platform-level outcome with many inputs; accurate, surprise-free duty and tax at checkout is one of them. Read the case study →

Swap earns revenue across cross-border commerce through service fees, duties, and taxes, and billing for it ran by hand. I owned, end to end, the platform that turned it into an automated, auditable system.

Challenge: every cycle, Finance reconciled multiple months of order data by hand, because different fees depend on different events (order placed, fulfilled, returned). It was slow, error-prone, hard to explain to merchants, and it tied up working capital while Swap fronted duties and taxes for weeks. Solution: a rule-based engine that captures each merchant's tax and fee rules once, freezes them in time, does the math, and issues invoices automatically. Impact: ~450 of 800 merchants moved onto automated billing, and invoices now generate in seconds instead of weeks.

The problem

Fees depend on different timestamps, service fee on order creation, duties and tax on fulfilment, drawbacks on returns, so a single invoice pulled data across several months of exports.
Billing rules lived in spreadsheets and contracts, so each cycle started with guesswork, and mid-month changes mis-billed past orders because there was no point-in-time reference.
Opaque invoices drove disputes and eroded trust, and slow cycles delayed cash collection.

What I did

Built a versioned configuration layer as the single source of truth for each merchant's tax and fee rules, with validation, change logs, and access control.
Resolved every order event against the settings version that was true at the time (a nightly point-in-time job), so a creation-based fee and a fulfilment-based tax each bill on the correct historical rule.
Built a calculation engine for fees, duties, VAT and sales tax, exemptions, drawbacks, and multi-currency conversion, replacing the manual spreadsheet math.
Built an invoice-generation layer that maps each fee to the right line items and tax codes and produces finance-ready invoices, so Finance updates rates and rules without engineering.

The lesson

The hard part of billing isn't the arithmetic, it's time. Rules change, but past orders must bill on the rule that was true when the event happened. Modelling billing as versioned, point-in-time settings rather than live state is what made it accurate, auditable, and trusted, and it's what let Finance own the rules instead of filing engineering tickets.

~56%

Of merchants migrated (~450 of 800)

Seconds

To generate an invoice (was multi-week manual)

0→1

Platform, owned end to end

Challenge: support split shipments (orders fulfilled from multiple warehouses across countries) where duties and tax calculate by warehouse country of origin, not by order. Solution: shipped tax and duty first, then built the per-line shipping calculator the feature actually needed to be usable. Impact: went from a launch almost nobody could activate to a feature sales now leads with.

Situation

We needed to support split shipments, orders with items from multiple warehouses across countries. The high-stakes part was customs: taxes and duties calculate by warehouse country of origin, not by order, because customs invoices reflect country-of-origin liability. It was the top-requested feature from our largest enterprise merchants.

The call

The challenge had two layers: line-item tax and duty (the visible problem), and line-item shipping fees (the hidden one). Shopify calculates shipping at order level, not line level, which made per-line shipping fees much harder. I made the call to ship the tax-and-duty solution fast and launch with free-shipping-only as the supported scenario. My logic: tax and duty was the bigger pain, we'd handle shipping later. I convinced brands and sales to go along.

Recovery

After launch, brands signed up excited, then couldn't use the feature. Most merchants don't offer free shipping. The feature shipped, and almost nobody activated. So we went back and built the harder problem: a per-line shipping fee calculator inside the Swap portal, since Shopify couldn't support it. Merchants could define their own shipping criteria for split-shipment orders. It took weeks longer than the original launch. We re-released, and it's now a feature sales leads with.

The lesson

"Edge case" is a misleading term in niche or regulated domains. The edge is often what determines whether the feature is usable at all. I traded edge-case coverage for launch speed. The trade was wrong, because the edge wasn't the edge. It was the use case for most of the customer base.

What I'd do differently

Now I scope down the audience, not the feature. Before a release I ask what's the smallest scope where the whole flow works, not just the headline. When I shipped the LLM customs description generator later, I limited launch to clothing only and kept jewelry, footwear, and medical devices out until we'd evaluated composition accuracy. Same trade-off as split shipments, resolved the right way.

I shaped Swap's U.S. sales tax product and wrote the public guide that explains it. The work turns a problem most founders avoid into something operators can actually act on.

Challenge: U.S. sales tax isn't one tax. It's 12K+ jurisdictions and 50+ states with separate filings, and post-Wayfair economic-nexus rules make a brand liable the moment it crosses a state's sales or order threshold, often without realizing. Solution: nexus tracking across all 50 states, one-click registration, and audit-ready filing, with a public guide that explains the whole thing. Impact: rate handling from 0 to 11.25% automated, and a compliance problem founders avoid turned into a guided workflow.

The problem

After South Dakota v. Wayfair, physical presence stopped being the test. Economic nexus means crossing a state's sales or order volume makes a brand liable to register and file there, and the thresholds differ by state.
That leaves 50+ separate filing regimes and 12K+ local jurisdictions, each with its own rate. Most founders don't track when they cross a threshold, so they find out late, after the liability has accrued.

What I did

Shaped a product that tracks nexus across all 50 states and flags when a brand approaches or breaches a threshold.
Added one-click registration and audit-ready filing, so crossing a threshold leads to action instead of a scramble.
Wrote the public guide that explains nexus, thresholds, and filing in operator language, turning the product's logic into something a founder can act on.

12K+

Tax jurisdictions in the U.S.

50+

States needing separate filings

0–11.25%

Rate range handled automatically

Read the guide →

As Product Manager at OCBC, I owned Connect2OCBC, the bank's open-banking portal where fintech partners discover, test, and integrate against OCBC's 500+ APIs. I ran it across the lifecycle, from requirements to launch.

Challenge: a bank can publish APIs and still see little adoption if partners can't find the right one, try it safely, and go live without a sales conversation for every integration. OCBC's open-banking presence sat in InnoPay Quadrant 3. Solution: a self-serve portal with a clear path: get access, experiment in a sandbox, then go live. Impact: InnoPay ranking moved from Quadrant 3 to Quadrant 4 and targeted API consumption grew 10%.

The problem

OCBC had a large API surface but no easy front door. Partners couldn't discover the right API, test it against realistic data, or move to production on their own.
Without a self-serve path, every integration leaned on hand-holding, adoption lagged, and the bank's open-banking standing sat in InnoPay Quadrant 3.

What I did

Owned Connect2OCBC across the product lifecycle: concept docs, PRDs, mockups, and backlog and sprint priorities with the Scrum team.
Built the three-step path partners actually use: get access and create an app, discover and experiment with APIs in a sandbox, then test and go live.
Worked with business stakeholders and support functions to manage change impact, and partnered with clients to drive sales through the API offerings.

Q3 → Q4

InnoPay open-banking ranking

+10%

Targeted API consumption

500+

APIs available to partners

Visit Connect2OCBC →

trade-classify is a small, real system that classifies products to HS codes with an LLM, then refuses to guess when it isn't sure. I built it to make my day-job thinking public: how you ship AI in a regulated domain where a wrong answer is an audit liability, not a bad suggestion.

Challenge: LLMs will happily produce a confident, wrong HS code. In customs, that's a compliance risk, not a UX papercut. Solution: ground the model in the tariff schedule, draw a hard trust boundary between what auto-applies and what routes to a human, and measure it with an eval that reports recall on the risky subset, not just headline accuracy. Impact: 100% precision on auto-applied classifications, 100% recall on the risky subset, and 0% fabrication in the eval set.

What it proves

I ship production-shaped AI, not demos: a trust boundary that decides what the system is allowed to act on alone versus what a human reviews.
I design honest evals. The spec measures recall on the risky subset, because precision without recall is a lie you tell yourself.
I can build and reason about the technical layer myself, in the exact domain I work in day to day.

100%

Precision on auto-applied codes

100%

Recall on the risky subset

0%

Fabrication in the eval set

View on GitHub → Read the eval spec →

pm-ai-skills is a set of Claude Code skills I built for the way I actually work on AI products: turning a PRD into a testable prompt, writing a first-draft eval suite, and red-teaming a feature before it ships. I use them on my own builds, and they're public so other PMs can too.

Challenge: most PM AI tooling stops at "chat with the doc." The repetitive, high-leverage work (writing evals, pressure-testing prompts, finding where a feature breaks) stays manual. Solution: package that work as skills that produce a real first draft you edit, not a blank page. Impact: I dogfooded the eval skill to build trade-classify's eval spec.

What it proves

I build tools other people use, not just decks about tools.
I treat evals and red-teaming as first-class PM work, and I've made that repeatable.
It pairs with trade-classify: the skills built the eval, the eval proves the system.

View on GitHub →

A personal build: an agentic "Proactive Delivery Agent" that watches orders in flight, detects delivery exceptions early, and takes or recommends action before they turn into support tickets. It's where I'm sharpening hands-on agentic-system design, the model-native, decision-making layer rather than AI-assisted features.

The idea, on paper

In active development. More detail and results to come.

Compliance — Automated Tax & Customs Classification

The problem

How I measured success

What I did

The alternative I rejected

Billing & Tax Automation Platform

The problem

What I did

Split Shipments from Multiple Warehouses

U.S. Sales Tax / Nexus

The problem

What I did

Connect2OCBC — Open-Banking API Sandbox

The problem

What I did

trade-classify — Grounded HS-Code Classification

What it proves

pm-ai-skills — Claude Code Skills for AI PMs

What it proves

Proactive Delivery Agent

The idea, on paper