Skip to content

BucketHead

In-memory SQLite backed by periodic snapshots to S3-compatible bucket storage (AWS S3, Cloudflare R2, MinIO).

Your app writes to a regular sqlite3.Connection. BucketHead keeps the database in memory, snapshots the whole thing to your bucket on a timer, and restores it on startup. No Redis, no managed service — just SQLite and one bucket.

Why

  • Fast. Reads and writes hit an in-memory SQLite; microsecond latencies. BucketHead is never in the hot path.
  • Durable. Snapshots land on S3 / R2 / MinIO at a configurable cadence, plus one final flush on shutdown.
  • Cheap. Dirty-bit optimization skips the upload whenever the database hasn't changed since the last flush.
  • Structured. You're using SQLite — schemas, indexes, transactions, joins — all the usual stuff.
  • R2-friendly. Zero egress fees + the dirty-bit mean snapshot cost is dominated by storage, not requests.

Install

uv add buckethead                # runtime only
uv add 'buckethead[profiling]'   # + memray / pyinstrument hooks

Python 3.13+.

Quickstart

from pathlib import Path
from buckethead import BucketConfig, BucketSQLite

cfg = BucketConfig.for_r2(
    account_id="<cloudflare account id>",
    bucket="my-bucket",
    access_key_id="<r2 s3 api access key>",
    secret_access_key="<r2 s3 api secret>",
)

with BucketSQLite(cfg) as bh:
    # Raw SQL
    bh.connection.execute("CREATE TABLE kv (k TEXT PRIMARY KEY, v TEXT)")
    bh.connection.execute("INSERT INTO kv VALUES ('answer', '42')")
    bh.connection.commit()

    # Key/value interface
    bh.kv.set("user/123", "alice")
    bh.kv.get("user/123")                      # "alice"

    # File store — content-addressable, dedup'd
    bh_key = bh.files.put(Path("/tmp/big.bin"))
    bh.files.get(bh_key, dest=Path("/tmp/out.bin"))

# On exit: final flush → snapshot uploaded to R2.
# On next startup: restored automatically.

Env-driven config

For 12-factor deployments, skip passing a BucketConfig entirely — BucketSQLite() auto-loads one from BUCKETHEAD_* env vars plus ~/.config/buckethead/config.toml:

from buckethead import BucketSQLite

bh = BucketSQLite()
# reads BUCKETHEAD_BUCKET__NAME, BUCKETHEAD_BUCKET__ACCESS_KEY_ID,
# BUCKETHEAD_BUCKET__SECRET_ACCESS_KEY, and optionally
# BUCKETHEAD_BUCKET__ENDPOINT_URL or BUCKETHEAD_CLOUDFLARE__ACCOUNT_ID,
# plus BUCKETHEAD_SNAPSHOT__KEY and BUCKETHEAD_SNAPSHOT__BRANCH.
# Any [env] entries in ~/.config/buckethead/config.toml are merged in
# via os.environ.setdefault — live env wins.

All env vars use the BUCKETHEAD_ prefix with __ as the nesting delimiter, so BUCKETHEAD_BUCKET__NAME lands in settings.bucket.name. If you want the intermediate BucketConfig object, build it explicitly:

from buckethead import BucketHeadSettings, BucketSQLite

cfg = BucketHeadSettings().to_bucket_config()
bh = BucketSQLite(cfg)

Four typed views on your data

Attribute Access pattern What it's for
bh.connection raw SQL anything — full SQLite is yours
bh.kv string-keyed set / get / dict protocol small configuration, cache entries, session data
bh.docs named collections of JSON documents with a Mongo-lite filter DSL structured-ish records you want to query by field
bh.files content-addressable SHA-256 bh-key → bytes in R2 arbitrary files (uploads, artifacts, ML inputs)
users = bh.docs.collection("users")
users.insert({"name": "alice", "age": 30, "tags": ["beta"]})
users.find({"age": {"$gte": 18}, "tags": {"$in": ["beta"]}})

See the DocStore API reference for the full operator list and SQL escape hatch.

Branches

BucketHead maintains multiple named snapshots against the same bucket, one per "branch". Useful for risky migrations or experimental work.

bh.branches.create("experiment-1")      # fork from current
bh.branches.switch("experiment-1")      # flush outgoing, reload from target
bh.connection.execute("...")            # writes go to experiment-1's snapshot
bh.branches.switch("main")              # back to main
bh.branches.list()                      # ["experiment-1", "main"]

# Experiment succeeded — make main look like experiment-1:
bh.branches.switch("experiment-1")
bh.branches.overwrite("main")           # identity becomes main; in-memory unchanged
bh.branches.delete("experiment-1")      # optional cleanup

See the branches API reference for details.

Tracking files on disk

LocalFileTracker keeps local filesystem paths in sync with FileStore blobs and retains a full version history per path:

from pathlib import Path
from buckethead import LocalFileTracker

tracker = LocalFileTracker(bh.connection, bh.files)
tracker.track(Path("/etc/app/settings.json"))
tracker.sync()                                      # SyncReport
tracker.history(Path("/etc/app/settings.json"))     # list[FileVersion]

Version metadata snapshots with the rest of the database, so the log survives restarts and travels across branches. See the local file tracking API for the full surface.

Sharing files

Give a FileStore blob a shareable URL — either a public path on a public-read share bucket, or a short-lived presigned GET on a private one. The share bucket is a separate R2/S3 bucket; provision it once with buckethead provision share-bucket --project <name>, then:

from buckethead import BucketSQLite

bh = BucketSQLite(project="my-project")
bh.start()

bh_key = bh.files.put(Path("/tmp/report.pdf"))
result = bh.shares.share(bh_key)    # copies to share bucket, returns URL
print(result.url)

project= pulls the share bucket name from ~/.config/buckethead/config.toml and the bucket credentials from the configured secret store. If the project has no share bucket attached, bh.shares raises — opt in per-project at provision time.

See the sharing API reference for ShareConfig, ShareConfig.from_project, and the bh.shares.* surface.

Observability

Pass callbacks to BucketSQLite (ref) to wire metrics or tracing:

bh = BucketSQLite(
    cfg,
    on_flush_start=lambda: metrics.incr("flush.start"),
    on_flush_complete=lambda duration_s, bytes_uploaded: ...,
    on_flush_error=lambda exc: sentry.capture(exc),
)

bytes_uploaded == 0 means the dirty-bit skipped the upload — the DB didn't change since the last flush.

Deeper profiling via ProfilingConfig (ref):

from buckethead import BucketSQLite, ProfilingConfig

bh = BucketSQLite(
    cfg,
    profiling_config=ProfilingConfig(
        io_counters=True,     # JSON summary of bytes / ops per R2 call
        memory=True,          # requires buckethead[profiling]
        cpu=True,             # requires buckethead[profiling]
    ),
)

Scope

  • Single-process. Multi-threaded in-memory access works (bh.connect() vends per-thread connections); cross-process sharing does not.
  • Durability window. Hard crash loses up to interval_seconds of writes. Call bh.force_flush() after any write that must not be lost.
  • DB size. Snapshot wall-time ≈ 1 ms per MB. Comfortable below 100 MB, usable up to ~500 MB.
  • Not Redis-compatible. No wire protocol, no pub/sub, no replication.

Dive deeper