archiveinator banner

archiveinator

A local, self-hosted web page archiver with ad blocking and paywall bypass.

archiveinator saves web pages as self-contained single-file HTML documents you can open offline forever — no external dependencies, no external services, full control over your archives.

Choose Your Path

⌨️ CLI User

Archive pages from the command line with full pipeline control, cookie authentication, and paywall bypass.

CLI Reference →

🌐 Web UI User

Archive pages, manage site profiles, monitor RSS feeds, and schedule recurring archives — all from your browser.

Web UI Guide →

🐳 Docker User

Run archiveinator in a container — no Python setup needed. Ships with Chromium, monolith, and blocklists pre-installed.

Docker Guide →

Quick Links

Topic	Description
Getting Started	Prerequisites, installation, first-time setup, and your first archive
CLI Reference	All CLI commands with option tables and examples
Configuration	Full `config.yaml` reference — pipeline, UAs, timeouts, and more
Pipeline	All 17 pipeline steps explained, with their order and default state
Paywall Bypass	Detection logic, bypass trigger conditions, and the 10 bypass strategies
Web UI	Browser-based interface for archiving, profiles, RSS feeds, schedules, and bulk imports
Docker	Running archiveinator in a container — pull, run, volumes, and scripting
Development	Dev setup, project structure, testing, CI, and release process

How archiveinator Works

Enter a URL — from the CLI, web UI, or a scheduled cron job
Pipeline processes the page — network-level ad blocking, headless Chromium page load, paywall detection and bypass, DOM cleanup, image deduplication
Get a self-contained HTML archive — all assets inlined into a single file, viewable offline forever