BucketHead¶
In-memory SQLite backed by periodic snapshots to S3-compatible bucket storage (AWS S3, Cloudflare R2, MinIO).
Your app writes to a regular sqlite3.Connection. BucketHead keeps the
database in memory, snapshots the whole thing to your bucket on a timer,
and restores it on startup. No Redis, no managed service — just SQLite
and one bucket.
Why¶
- Fast. Reads and writes hit an in-memory SQLite; microsecond latencies. BucketHead is never in the hot path.
- Durable. Snapshots land on S3 / R2 / MinIO at a configurable cadence, plus one final flush on shutdown.
- Cheap. Dirty-bit optimization skips the upload whenever the database hasn't changed since the last flush.
- Structured. You're using SQLite — schemas, indexes, transactions, joins — all the usual stuff.
- R2-friendly. Zero egress fees + the dirty-bit mean snapshot cost is dominated by storage, not requests.
Install¶
Python 3.13+.
Quickstart¶
from pathlib import Path
from buckethead import BucketConfig, BucketSQLite
cfg = BucketConfig.for_r2(
account_id="<cloudflare account id>",
bucket="my-bucket",
access_key_id="<r2 s3 api access key>",
secret_access_key="<r2 s3 api secret>",
)
with BucketSQLite(cfg) as bh:
# Raw SQL
bh.connection.execute("CREATE TABLE kv (k TEXT PRIMARY KEY, v TEXT)")
bh.connection.execute("INSERT INTO kv VALUES ('answer', '42')")
bh.connection.commit()
# Key/value interface
bh.kv.set("user/123", "alice")
bh.kv.get("user/123") # "alice"
# File store — content-addressable, dedup'd
bh_key = bh.files.put(Path("/tmp/big.bin"))
bh.files.get(bh_key, dest=Path("/tmp/out.bin"))
# On exit: final flush → snapshot uploaded to R2.
# On next startup: restored automatically.
Env-driven config¶
For 12-factor deployments, skip passing a BucketConfig entirely —
BucketSQLite() auto-loads one from BUCKETHEAD_* env vars plus
~/.config/buckethead/config.toml:
from buckethead import BucketSQLite
bh = BucketSQLite()
# reads BUCKETHEAD_BUCKET__NAME, BUCKETHEAD_BUCKET__ACCESS_KEY_ID,
# BUCKETHEAD_BUCKET__SECRET_ACCESS_KEY, and optionally
# BUCKETHEAD_BUCKET__ENDPOINT_URL or BUCKETHEAD_CLOUDFLARE__ACCOUNT_ID,
# plus BUCKETHEAD_SNAPSHOT__KEY and BUCKETHEAD_SNAPSHOT__BRANCH.
# Any [env] entries in ~/.config/buckethead/config.toml are merged in
# via os.environ.setdefault — live env wins.
All env vars use the BUCKETHEAD_ prefix with __ as the nesting
delimiter, so BUCKETHEAD_BUCKET__NAME lands in settings.bucket.name.
If you want the intermediate BucketConfig object, build it explicitly:
from buckethead import BucketHeadSettings, BucketSQLite
cfg = BucketHeadSettings().to_bucket_config()
bh = BucketSQLite(cfg)
Four typed views on your data¶
| Attribute | Access pattern | What it's for |
|---|---|---|
bh.connection |
raw SQL | anything — full SQLite is yours |
bh.kv |
string-keyed set / get / dict protocol |
small configuration, cache entries, session data |
bh.docs |
named collections of JSON documents with a Mongo-lite filter DSL | structured-ish records you want to query by field |
bh.files |
content-addressable SHA-256 bh-key → bytes in R2 |
arbitrary files (uploads, artifacts, ML inputs) |
users = bh.docs.collection("users")
users.insert({"name": "alice", "age": 30, "tags": ["beta"]})
users.find({"age": {"$gte": 18}, "tags": {"$in": ["beta"]}})
See the DocStore API reference for the full operator list and SQL escape hatch.
Branches¶
BucketHead maintains multiple named snapshots against the same bucket, one per "branch". Useful for risky migrations or experimental work.
bh.branches.create("experiment-1") # fork from current
bh.branches.switch("experiment-1") # flush outgoing, reload from target
bh.connection.execute("...") # writes go to experiment-1's snapshot
bh.branches.switch("main") # back to main
bh.branches.list() # ["experiment-1", "main"]
# Experiment succeeded — make main look like experiment-1:
bh.branches.switch("experiment-1")
bh.branches.overwrite("main") # identity becomes main; in-memory unchanged
bh.branches.delete("experiment-1") # optional cleanup
See the branches API reference for details.
Tracking files on disk¶
LocalFileTracker keeps local filesystem paths in sync with FileStore
blobs and retains a full version history per path:
from pathlib import Path
from buckethead import LocalFileTracker
tracker = LocalFileTracker(bh.connection, bh.files)
tracker.track(Path("/etc/app/settings.json"))
tracker.sync() # SyncReport
tracker.history(Path("/etc/app/settings.json")) # list[FileVersion]
Version metadata snapshots with the rest of the database, so the log survives restarts and travels across branches. See the local file tracking API for the full surface.
Sharing files¶
Give a FileStore blob a shareable URL — either a public path on a
public-read share bucket, or a short-lived presigned GET on a private
one. The share bucket is a separate R2/S3 bucket; provision it once
with buckethead provision share-bucket --project <name>, then:
from buckethead import BucketSQLite
bh = BucketSQLite(project="my-project")
bh.start()
bh_key = bh.files.put(Path("/tmp/report.pdf"))
result = bh.shares.share(bh_key) # copies to share bucket, returns URL
print(result.url)
project= pulls the share bucket name from
~/.config/buckethead/config.toml and the bucket credentials from
the configured secret store. If the project has no share bucket
attached, bh.shares raises — opt in per-project at provision time.
See the sharing API reference for ShareConfig,
ShareConfig.from_project, and the bh.shares.* surface.
Observability¶
Pass callbacks to BucketSQLite (ref) to wire metrics or tracing:
bh = BucketSQLite(
cfg,
on_flush_start=lambda: metrics.incr("flush.start"),
on_flush_complete=lambda duration_s, bytes_uploaded: ...,
on_flush_error=lambda exc: sentry.capture(exc),
)
bytes_uploaded == 0 means the dirty-bit skipped the upload — the DB
didn't change since the last flush.
Deeper profiling via ProfilingConfig (ref):
from buckethead import BucketSQLite, ProfilingConfig
bh = BucketSQLite(
cfg,
profiling_config=ProfilingConfig(
io_counters=True, # JSON summary of bytes / ops per R2 call
memory=True, # requires buckethead[profiling]
cpu=True, # requires buckethead[profiling]
),
)
Scope¶
- Single-process. Multi-threaded in-memory access works
(
bh.connect()vends per-thread connections); cross-process sharing does not. - Durability window. Hard crash loses up to
interval_secondsof writes. Callbh.force_flush()after any write that must not be lost. - DB size. Snapshot wall-time ≈ 1 ms per MB. Comfortable below 100 MB, usable up to ~500 MB.
- Not Redis-compatible. No wire protocol, no pub/sub, no replication.
Dive deeper¶
- Architecture diagrams — sequence diagrams for the main flows.
- CLI — inspect / restore / files list/get/gc.
- API reference — every public class and function.