Back to Projects Domain Systems

Industry Classifier

Industry classification with risk assessment. Official registry connectivity and adverse media screening.

The Problem

Classifying a business into the right industry code takes 10–30 minutes of manual research per entity — searching government registries, cross-referencing web presence, and interpreting ambiguous business activities. At scale, this becomes a bottleneck for merchant onboarding and KYB compliance workflows. Different jurisdictions use different classification systems (ANZSIC, SIC, NACE), and each client may enforce its own risk policy with distinct prohibited categories and thresholds.

The Solution

An AI-powered classification service that combines three intelligence sources: official government registries (ABR, NZBN, AUSTRAC), live web search for business activity research, and LLM reasoning for code assignment. Input data is first enriched from registries, then the AI analyzes web presence to determine business activities and map them to industry codes — validated against the official taxonomy hierarchy. A 4-tier risk engine evaluates the result: blacklist screening, code-to-risk mapping, keyword scanning with LLM disambiguation, and full LLM risk reasoning for ambiguous cases. Both the classification system and the risk policy are configurable per client.

Architecture

%%{init: {'theme': 'dark', 'themeVariables': { 'fontFamily': 'Inter', 'secondaryColor': '#1e293b', 'primaryColor': '#3b82f6', 'primaryBorderColor': '#60a5fa' }}}%% graph TB subgraph Input ["Input"] A["Company Data<br/>(name, identifiers, country)"] B["Configuration<br/>(Risk Policy + Classification System)"] end subgraph Enrichment ["Enrichment"] A --> C1["ABR Registry (AU)"] A --> C2["NZBN Registry (NZ)"] A --> C3["AUSTRAC Registry"] A --> C4["Web Search"] end subgraph Classification ["Classification"] C1 & C2 & C3 & C4 --> D1["Trust / Entity Detection"] B --> D2 D1 --> D2["LLM Classification"] D2 --> D3["Code Validation"] end subgraph Risk ["Risk Assessment"] D3 --> E1["Blacklist Screening"] E1 --> E2["Code-to-Risk Mapping"] E2 --> E3["Keyword Scan + LLM Disambiguation"] E3 --> E4["LLM Risk Reasoning"] end subgraph Output ["Output"] E4 --> F["Result<br/>+ Risk Level + Confidence"] end classDef default fill:#0f172a,stroke:#334155,color:#fff,stroke-width:1px; classDef agent fill:#0f172a,stroke:#3b82f6,color:#fff; classDef process fill:#0f172a,stroke:#334155,color:#fff; classDef config fill:#1e1b4b,stroke:#818cf8,color:#fff,stroke-width:2px; class D2,E3,E4 agent; class C1,C2,C3,C4,D1,D3,E1,E2 process; class B config;
AI Agent
Process Step

Tags

PythonWeb SearchData EnrichmentRisk AssessmentMulti-Registry

Outcomes

  • Classification accuracy above 90% with official registry cross-validation
  • Processing time under 1 minute per entity (vs. 10–30 minutes manual)
  • Configurable risk policies and classification systems per client
  • 4-tier risk engine with hard-floor codes that cannot be overridden
  • Deployed in production for enterprise financial services clients