
archiveinator
A local, self-hosted web page archiver with ad blocking and paywall bypass.
archiveinator saves web pages as self-contained single-file HTML documents you can open offline forever — no external dependencies, no external services, full control over your archives.
Choose Your Path
⌨️ CLI User
Archive pages from the command line with full pipeline control, cookie authentication, and paywall bypass.
🌐 Web UI User
Archive pages, manage site profiles, monitor RSS feeds, and schedule recurring archives — all from your browser.
Quick Links
| Topic | Description |
|---|---|
| Getting Started | Prerequisites, installation, first-time setup, and your first archive |
| CLI Reference | All CLI commands with option tables and examples |
| Configuration | Full config.yaml reference — pipeline, UAs, timeouts, and more |
| Pipeline | All 17 pipeline steps explained, with their order and default state |
| Paywall Bypass | Detection logic, bypass trigger conditions, and the 10 bypass strategies |
| Web UI | Browser-based interface for archiving, profiles, RSS feeds, schedules, and bulk imports |
| Docker | Running archiveinator in a container — pull, run, volumes, and scripting |
| Development | Dev setup, project structure, testing, CI, and release process |
How archiveinator Works
- Enter a URL — from the CLI, web UI, or a scheduled cron job
- Pipeline processes the page — network-level ad blocking, headless Chromium page load, paywall detection and bypass, DOM cleanup, image deduplication
- Get a self-contained HTML archive — all assets inlined into a single file, viewable offline forever