Web Crawler

Discovering and indexing open records from government websites.

Our Promise

The UnGovr crawler is designed to be a good citizen of the web. We:

Identify ourselves clearly with a descriptive User-Agent string
Respect robots.txt path restrictions
Honor Crawl-delay headers when specified
Limit request rates to avoid overwhelming servers
Provide contact information for site administrators

If you manage a government website and have questions about our crawler, please contact us at crawl@ungovr.org.

What We Crawl

We focus on publicly accessible government documents:

Meeting agendas and minutes
Budget documents and financial reports
Policy documents and ordinances
Public notices and announcements
Reports and studies

We do not crawl:

Login-protected or authenticated content
Personal information or private records
Non-government websites

Technical Details

User-Agent

UnGovrBot/0.3.53 (+https://ungovr.org/crawler)

Default Behavior

Maximum 1 request per second per domain (unless Crawl-delay specifies otherwise)
Respects robots.txt path restrictions
Only follows links within the same domain
Verifies external domains before crawling

Working With Us

If you'd prefer we access your data differently, we're happy to work with you:

Provide data feeds (JSON, XML, RSS) instead of crawling
Schedule crawls during off-peak hours
Set up specific crawl rules for your site

Contact us at crawl@ungovr.org to discuss options.