archiveinator banner

archiveinator

A local, self-hosted web page archiver with ad blocking and paywall bypass.

archiveinator saves web pages as self-contained single-file HTML documents you can open offline forever — no external dependencies, no external services, full control over your archives.


Choose Your Path

⌨️ CLI User

Archive pages from the command line with full pipeline control, cookie authentication, and paywall bypass.

CLI Reference →

🌐 Web UI User

Archive pages, manage site profiles, monitor RSS feeds, and schedule recurring archives — all from your browser.

Web UI Guide →

🐳 Docker User

Run archiveinator in a container — no Python setup needed. Ships with Chromium, monolith, and blocklists pre-installed.

Docker Guide →


Topic Description
Getting Started Prerequisites, installation, first-time setup, and your first archive
CLI Reference All CLI commands with option tables and examples
Configuration Full config.yaml reference — pipeline, UAs, timeouts, and more
Pipeline All 17 pipeline steps explained, with their order and default state
Paywall Bypass Detection logic, bypass trigger conditions, and the 10 bypass strategies
Web UI Browser-based interface for archiving, profiles, RSS feeds, schedules, and bulk imports
Docker Running archiveinator in a container — pull, run, volumes, and scripting
Development Dev setup, project structure, testing, CI, and release process

How archiveinator Works

  1. Enter a URL — from the CLI, web UI, or a scheduled cron job
  2. Pipeline processes the page — network-level ad blocking, headless Chromium page load, paywall detection and bypass, DOM cleanup, image deduplication
  3. Get a self-contained HTML archive — all assets inlined into a single file, viewable offline forever

Back to top

archiveinator © 2026. Distributed under the MIT License.

This site uses Just the Docs, a documentation theme for Jekyll.