BucketHead — Sequence Diagrams¶
Visual companion to plan/project-spec.md and plan/build-plan.md. Three
diagrams cover the three main flows; each maps to one source file so you can
go from diagram to code in one hop.
| Diagram | Source of truth |
|---|---|
| Lifecycle | src/buckethead/core.py — BucketSQLite.{start, stop, _flush_now} |
FileStore.put |
src/buckethead/files/store.py — FileStore.put |
FileStore.gc |
src/buckethead/files/store.py — FileStore.gc |
1. BucketSQLite lifecycle¶
Construction is a pure no-op; everything interesting happens between start()
and stop(). The app always talks to bh.connection directly — BucketHead is
not in the hot path.
sequenceDiagram
autonumber
actor User as App
participant BH as BucketSQLite
participant FL as FlushLoop
participant FS as FileStore
participant BC as BucketClient
participant Conn as sqlite3 conn
participant R2 as R2 bucket
User->>BH: construct with configs
Note over BH: stores config, nothing opened yet
User->>BH: start
BH->>Conn: open shared-cache URI
BH->>BC: construct BucketClient
BH->>BC: exists snap.db
BC->>R2: HEAD snap.db
alt snapshot present
BH->>BC: download snap.db to tmp
BC->>R2: GET snap.db
BH->>Conn: source.backup into memory
BH->>Conn: PRAGMA integrity_check
else no snapshot
Note over BH: start empty
end
BH->>FS: construct FileStore
FS->>Conn: CREATE TABLE IF NOT EXISTS filestore
BH->>FL: start daemon thread
BH->>BH: install SIGTERM, SIGINT, atexit handlers
loop every interval_seconds
FL->>BH: _flush_now
BH->>BC: copy snap.db to snap.db.prev if keep_previous
BH->>Conn: memory.backup to tmp
BH->>BC: upload tmp to snap.db
BC->>R2: PUT snap.db
end
par app workload bypasses BucketHead
User->>Conn: execute INSERT
User->>Conn: commit
end
User->>BH: stop, or SIGTERM, or sys.exit
BH->>BH: uninstall signal handlers
BH->>FL: stop and join daemon thread
BH->>BH: final _flush_now
BH->>R2: COPY and PUT snap.db one last time
BH->>Conn: close, DB disappears here
BH->>BH: unregister atexit hook
BH->>BH: write io-summary.json if profiling enabled
2. FileStore.put(data)¶
Content-addressable, idempotent, and crash-safe. Three possible outcomes depending on what is already present in R2 and SQLite.
sequenceDiagram
autonumber
actor User as App
participant FS as FileStore
participant DB as filestore table
participant BC as BucketClient
participant R2 as R2 bucket
User->>FS: put data plus optional metadata
FS->>FS: bh_key equals sha256 hex of data
FS->>DB: SELECT row where bh_key matches
FS->>BC: exists prefix plus bh_key
BC->>R2: HEAD files slash bh_key
alt row present AND R2 object present
FS-->>User: return bh_key, pure no-op
else R2 object missing
FS->>BC: upload data under prefix plus bh_key
BC->>R2: PUT files slash bh_key
FS->>DB: INSERT OR IGNORE full row
FS-->>User: return bh_key
else row missing but R2 object present
Note over FS: prior-crash recovery
FS->>DB: INSERT OR IGNORE, backfills row
FS-->>User: return bh_key
end
3. FileStore.gc¶
gc walks the R2 prefix and the filestore table, deletes orphaned R2
objects older than the grace window, and flags but does not delete rows whose
R2 object has gone missing. The grace window is the only thing keeping a
concurrent put safe from a simultaneous gc.
sequenceDiagram
autonumber
actor User as Ops
participant FS as FileStore
participant DB as filestore table
participant BC as BucketClient
participant R2 as R2 bucket
User->>FS: gc with grace_seconds
FS->>DB: SELECT all bh_keys
FS->>BC: list_keys under prefix
loop each R2 object under prefix
BC->>R2: paginated ListObjectsV2
alt bh_key known in SQLite
Note over FS: skip, healthy
else orphan, no SQLite row
FS->>BC: head_object on r2 key
BC->>R2: HEAD files slash bh_key
alt age less than grace_seconds
Note over FS: skip, covers in-flight put
FS->>FS: orphans_skipped_grace plus 1
else older
FS->>BC: delete r2 key
BC->>R2: DELETE files slash bh_key
FS->>FS: orphans_deleted plus 1
end
end
end
loop each SQLite bh_key not in R2
FS->>FS: dangling_rows_found plus 1
Note over FS: flagged for human review
end
FS-->>User: return GCReport