Skip to content

BucketHead — Sequence Diagrams

Visual companion to plan/project-spec.md and plan/build-plan.md. Three diagrams cover the three main flows; each maps to one source file so you can go from diagram to code in one hop.

Diagram Source of truth
Lifecycle src/buckethead/core.pyBucketSQLite.{start, stop, _flush_now}
FileStore.put src/buckethead/files/store.pyFileStore.put
FileStore.gc src/buckethead/files/store.pyFileStore.gc

1. BucketSQLite lifecycle

Construction is a pure no-op; everything interesting happens between start() and stop(). The app always talks to bh.connection directly — BucketHead is not in the hot path.

sequenceDiagram
    autonumber
    actor User as App
    participant BH as BucketSQLite
    participant FL as FlushLoop
    participant FS as FileStore
    participant BC as BucketClient
    participant Conn as sqlite3 conn
    participant R2 as R2 bucket

    User->>BH: construct with configs
    Note over BH: stores config, nothing opened yet

    User->>BH: start
    BH->>Conn: open shared-cache URI
    BH->>BC: construct BucketClient
    BH->>BC: exists snap.db
    BC->>R2: HEAD snap.db
    alt snapshot present
        BH->>BC: download snap.db to tmp
        BC->>R2: GET snap.db
        BH->>Conn: source.backup into memory
        BH->>Conn: PRAGMA integrity_check
    else no snapshot
        Note over BH: start empty
    end
    BH->>FS: construct FileStore
    FS->>Conn: CREATE TABLE IF NOT EXISTS filestore
    BH->>FL: start daemon thread
    BH->>BH: install SIGTERM, SIGINT, atexit handlers

    loop every interval_seconds
        FL->>BH: _flush_now
        BH->>BC: copy snap.db to snap.db.prev if keep_previous
        BH->>Conn: memory.backup to tmp
        BH->>BC: upload tmp to snap.db
        BC->>R2: PUT snap.db
    end

    par app workload bypasses BucketHead
        User->>Conn: execute INSERT
        User->>Conn: commit
    end

    User->>BH: stop, or SIGTERM, or sys.exit
    BH->>BH: uninstall signal handlers
    BH->>FL: stop and join daemon thread
    BH->>BH: final _flush_now
    BH->>R2: COPY and PUT snap.db one last time
    BH->>Conn: close, DB disappears here
    BH->>BH: unregister atexit hook
    BH->>BH: write io-summary.json if profiling enabled

2. FileStore.put(data)

Content-addressable, idempotent, and crash-safe. Three possible outcomes depending on what is already present in R2 and SQLite.

sequenceDiagram
    autonumber
    actor User as App
    participant FS as FileStore
    participant DB as filestore table
    participant BC as BucketClient
    participant R2 as R2 bucket

    User->>FS: put data plus optional metadata
    FS->>FS: bh_key equals sha256 hex of data
    FS->>DB: SELECT row where bh_key matches
    FS->>BC: exists prefix plus bh_key
    BC->>R2: HEAD files slash bh_key

    alt row present AND R2 object present
        FS-->>User: return bh_key, pure no-op
    else R2 object missing
        FS->>BC: upload data under prefix plus bh_key
        BC->>R2: PUT files slash bh_key
        FS->>DB: INSERT OR IGNORE full row
        FS-->>User: return bh_key
    else row missing but R2 object present
        Note over FS: prior-crash recovery
        FS->>DB: INSERT OR IGNORE, backfills row
        FS-->>User: return bh_key
    end

3. FileStore.gc

gc walks the R2 prefix and the filestore table, deletes orphaned R2 objects older than the grace window, and flags but does not delete rows whose R2 object has gone missing. The grace window is the only thing keeping a concurrent put safe from a simultaneous gc.

sequenceDiagram
    autonumber
    actor User as Ops
    participant FS as FileStore
    participant DB as filestore table
    participant BC as BucketClient
    participant R2 as R2 bucket

    User->>FS: gc with grace_seconds
    FS->>DB: SELECT all bh_keys
    FS->>BC: list_keys under prefix
    loop each R2 object under prefix
        BC->>R2: paginated ListObjectsV2
        alt bh_key known in SQLite
            Note over FS: skip, healthy
        else orphan, no SQLite row
            FS->>BC: head_object on r2 key
            BC->>R2: HEAD files slash bh_key
            alt age less than grace_seconds
                Note over FS: skip, covers in-flight put
                FS->>FS: orphans_skipped_grace plus 1
            else older
                FS->>BC: delete r2 key
                BC->>R2: DELETE files slash bh_key
                FS->>FS: orphans_deleted plus 1
            end
        end
    end
    loop each SQLite bh_key not in R2
        FS->>FS: dangling_rows_found plus 1
        Note over FS: flagged for human review
    end
    FS-->>User: return GCReport