Platform Architecture

How UnGovr discovers, processes, and serves government information across 200+ countries.

System overview

The data pipeline that powers 321,000+ government entities

321K+
Entities
200+
Countries
15K+
Grand Jury Reports
DATA SOURCES Gov Websites Gov website domains Census / StatCan Boundaries & entities European Registers INSEE, Destatis, INE Federal Registers Agencies & tribes Open Records Laws 140+ countries, 57 US Public Input Service & records reqs PROCESSING PIPELINE Domain Validator LLM classification Web Crawler 3-tier fetch engine ORE Extraction HTML + PDF + OCR Enrichment Chunking + embeddings Entity Loader Registry + hierarchy Request Router AI-assisted routing DATA STORE ORE Store Extracts, vector search Entity Graph 321K+ entities, hierarchies Domain Registry Domains mapped Open Records DB Laws, deadlines, procedures APPLICATIONS Entity Browser Discover & navigate Document Search Semantic & full-text Grand Jury Reports 15,000+ CA reports Open Records FOIA across 140+ countries Service Requests SMS / RCS / Web

High-level data flow: sources are crawled and processed through the pipeline, producing Open Records Extracts (OREs) stored in structured databases, and served through five application interfaces.

Technology Stack

Python, FastAPI, PostgreSQL, PostGIS, Playwright, GPU-accelerated OCR, Cloudflare, and more.

View full stack →

Request workflows

How requests flow through the system – from the public to the right government agency

Service Request

1

Resident reports an issue

Photo + description via SMS, RCS, web form, or email. Location is captured from GPS or entered manually.

2

AI classifies and routes

Issue type is extracted from the description. GPS coordinates are mapped to the correct jurisdiction using the entity graph.

3

Agency identified

The entity graph resolves overlapping jurisdictions – city vs. county vs. special district – to find the responsible agency.

4

Request formatted and sent

The request is formatted per the agency's intake method (Open311 API, email, or web form) and submitted.

5

Confirmation and tracking

Resident receives a confirmation with a tracking reference. Status updates are relayed as they arrive.

Key principle: The resident never needs to know which agency handles their issue. UnGovr resolves the jurisdiction automatically.
📄

Open Records Request

1

Resident describes what they need

Plain-language description of the records sought, plus a location or address. No need to know which agency holds the records.

2

Jurisdiction and law resolved

The location is mapped to the correct jurisdiction using the entity graph. UnGovr then identifies which open records law applies – federal FOIA, state open records act, or international FOI law – across 140+ countries and 57 US jurisdictions.

3

Request formatted to requirements

Each law has different requirements: response deadlines (3–30+ days), fee structures, residency rules, and appeal processes. The request is drafted to comply.

4

Submitted to the records officer

Sent via the agency's designated channel – email to the records officer, online portal submission, or postal mail where required.

5

Deadline tracking and follow-up

UnGovr tracks the statutory response deadline and notifies the requester. If the deadline passes, guidance on appeals is provided.

Key principle: Filing a records request shouldn't require knowing which agency to contact or which law applies. UnGovr resolves the jurisdiction from your location and handles the legal complexity.

Under the hood

Entity registry

A comprehensive database of 321,000+ government entities across 200+ countries – from US cities and counties to French communes, Japanese prefectures, and Indigenous nations. Each entity is mapped to its geographic boundaries and connected to its domains, documents, and services.

Web crawler

Our crawler (UnGovrBot) discovers and indexes open records from government websites. We use a three-tier fetch engine – transparent HTTP, stealth browser, and hardened browser – to handle everything from simple sites to those with aggressive bot protection. We respect robots.txt and provide full documentation for webmasters.

Document processing and Open Records Extracts

Documents are processed to produce Open Records Extracts (OREs) – the raw text mined from open records responses. OREs are deduplicated, chunked, and vectorized for loading into a RAG-style knowledge system. Each entity gets its own isolated knowledge base, which can be queried individually (e.g. a single city) or merged into larger groupings (e.g. all entities in Ventura County). For PDFs, we use GPU-accelerated OCR where needed.

Geographic services

Given a location, we identify all the government entities that serve that area – handling complex overlapping jurisdictions where a single address might fall under a city, county, school district, water district, and fire district simultaneously. This powers entity discovery, service request routing, and open records request targeting.

Messaging relay

UnGovr relays service requests and records requests to government agencies via SMS, RCS, and email on behalf of residents. The relay acts as an intermediary – the resident's personal phone number, email address, and identity are never shared with the agency unless approved by the resident. Government sees the request, not the requester. This protects anonymity while still delivering properly formatted, trackable requests.

Open records engine

A database of open records laws across 140+ countries and 57 US jurisdictions, including response deadlines, fee structures, residency requirements, and appeal processes. This powers the open records request workflow and the public reference pages.

Design principles

Traceable sources

Every piece of data links back to its original source. We store source URLs, capture dates, and provenance information so users can verify information at its official source.

Incremental processing

We detect changes and reprocess only what's needed. This keeps our data current while minimizing load on source websites.

Structured extraction

Beyond full-text search, we extract structured information where possible – meeting dates, agenda items, findings and recommendations, response deadlines.

Privacy by design

User data is minimized and protected. We don't track users across the web or build profiles for advertising purposes.

Technical collaboration

Interested in the technical details or want to contribute?

Get involved