Skip to content

Architecture

This document defines the intended setup for GitTinkerer (comment-to-code automation), including boundaries, runtime contracts, and the on-disk audit artifacts.

Scope and constraints

  • Single repo per run
  • Triggered by:
  • PR comment (GitHub App webhook on issue_comment)
  • Web UI comment (Svelte app in web/)
  • Non-interactive execution (no prompts, no HITL)
  • Commits are pushed back to the target branch derived from PR metadata/ref
  • Review happens after execution via commits + artifacts

System context (boundaries)

flowchart LR
  Dev[Developer]

  subgraph GitHub[GitHub]
    PR[Pull Request]
    Comments[PR Comments]
    API[GitHub API]
    Repo[Git Repository]
    Webhook[GitHub Webhook Delivery]
  end

  subgraph VPS[VPS]
    subgraph GT["GitTinkerer (this repo)"]
      Entry["bin/gittinkerer (bash)"]
      Work[Workspace manager]
      Codex[Codex runner]
      Git[Git commit/push]
      Reply[Reply publisher]
      Art[Artifacts writer]
    end

    subgraph Web["Web UI (same repo)"]
      Svelte["Svelte app (web/)"]
    end

    Service["HTTP service (service/)"]

    CodexCLI["Codex CLI (installed/authenticated)"]
  end

  subgraph Traefik["Traefik (elsewhere)"]
    Proxy[Reverse proxy]
  end

  Dev -->|writes comment| Comments
  Dev -->|uses web UI| Proxy
  Proxy -->|routes HTTP| Svelte

  Comments --> Webhook
  Webhook -->|HTTP POST| Service
  Service -->|spawn| Entry

  Entry --> Work
  Work --> Repo

  Entry --> Codex
  Codex --> CodexCLI

  Entry --> Git
  Git --> Repo

  Entry --> Reply
  Reply --> API

  Entry --> Art

Execution flows

Service/web/data flow (with analytics)

flowchart LR
    subgraph Web
      UI["Web UI\n(/, /runs, /runs/:id, /admin, /analytics)"]
    end
    subgraph Service
      HTTP[HTTP API\nNode 24 service]
      Metrics[Metrics & Analytics]
      Sentry[Sentry SDK]
    end
    subgraph Data
      PG[(PostgreSQL\nruns, run_metrics,\npaused_repos)]
      Redis[(Redis\nrate limits & cache)]
    end

    UI -->|REST| HTTP
    HTTP -->|read/write| PG
    HTTP -->|cache rate limits\nrun status| Redis
    Metrics -->|aggregations| PG
    HTTP -->|errors| Sentry

Run lifecycle (API/web)

sequenceDiagram
    participant UI as Web UI / Webhook
    participant API as Service API
    participant PG as Postgres
    participant R as Redis
    participant S as Sentry

    UI->>API: POST /api/runs (payload)
    API->>PG: insert run (status=running, user/installation)
    API->>R: rate-limit check + status cache
    API-->>UI: 202 run_id
    API->>API: spawn CLI runner
    API-->>S: capture errors (if spawn/migrate fails)
    API->>PG: update status + metrics on completion
    API->>R: cache final status
    UI->>API: GET /api/runs/:id (polling)

Bash module flow (current implementation)

flowchart TD
  A[bin/gittinkerer] -->|parse_args| B[lib/args.sh]
  B -->|load_env| C[lib/env.sh]
  C -->|parse_payload| D[lib/payload.sh]
  D -->|init_artifacts| E[lib/artifacts.sh]
  E -->|init_workspace| F[lib/workspace.sh]
  F -->|build+run prompt| G[lib/codex.sh]
  G -->|diff/commit/push| H[lib/git.sh]
  H -->|publish reply| I{source?}
  I -->|pr| J[lib/reply/github.sh]
  I -->|web| K[lib/reply/web.sh]
  G -->|metrics| M[lib/metrics.sh]
  subgraph Error handling
    X[EXIT trap] -->|stage + message| Y[write_failure_artifacts]
  end

PR comment mode (happy path)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant GH as GitHub PR
  participant Service as VPS HTTP service
  participant GT as bin/gittinkerer
  participant Repo as Target Git Repo (workspace)
  participant Codex as Codex CLI
  participant API as GitHub API

  Dev->>GH: Comment with instruction (must start with /tinker)
  GH->>Service: issue_comment webhook delivery
  Service->>Service: Verify signature, gate on /tinker
  Service->>API: Fetch PR JSON via issue.pull_request.url
  Service->>GT: Spawn bin/gittinkerer with --payload-file (stdin not supported)

  GT->>GT: Create artifacts/<timestamp>/agent-run/
  GT->>Repo: Clone or fetch/pull
  GT->>Repo: Checkout ref derived from PR metadata/ref

  GT->>GT: Build prompt (includes instruction + reporting contract)
  GT->>Codex: Run Codex CLI (non-interactive)
  Codex-->>GT: Changes + response text

  GT->>Repo: Commit changes (nntin-bot)
  GT->>Repo: Push commits

  GT->>API: Post a new PR conversation comment (by PR number)
  GT->>GT: Write artifacts (prompt, diff, summary, etc.)

Web comment mode (happy path)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant Web as Svelte Web UI
  participant VPS as VPS
  participant GT as bin/gittinkerer
  participant Repo as Target Git Repo (workspace)
  participant Codex as Codex CLI
  participant GH as GitHub (repo + API)

  Dev->>Web: Submit instruction comment
  Web->>GT: Invoke run (local call on VPS)
  GT->>GT: Create artifacts/<timestamp>/agent-run/

  GT->>Repo: Clone or fetch/pull
  GT->>Repo: Checkout ref (from payload / derived context)

  GT->>Codex: Run Codex CLI
  Codex-->>GT: Changes + response text

  GT->>Repo: Commit + push to GitHub
  Repo->>GH: Push

  GT->>Web: Post reply to run_id callback
  GT->>GT: Write artifacts

Data persistence

Actual database schema (from service/src/infra/db/migrations.ts):

erDiagram
  runs {
    TEXT run_id PK
    TEXT repo_full_name
    TEXT repo_clone_url
    TEXT source
    TEXT status
    JSONB payload
    TEXT payload_path
    TEXT artifacts_dir
    TEXT comment_body
    TEXT comment_raw_body
    INT pr_number
    TEXT pr_head_ref
    TEXT pr_head_sha
    TEXT pr_base_ref
    TEXT timestamp
    TIMESTAMPTZ started_at
    TIMESTAMPTZ finished_at
    INT exit_code
    TEXT user_id
    TEXT installation_id
    TIMESTAMPTZ created_at
    TIMESTAMPTZ updated_at
  }

  run_metrics {
    BIGSERIAL id PK
    TEXT run_id FK
    TEXT metric_name
    NUMERIC metric_value
    TIMESTAMPTZ recorded_at
  }

  rate_limits {
    TEXT repo_full_name PK
    TIMESTAMPTZ last_reset_at
    INT allowed_per_window
    INT window_seconds
    INT used_in_window
    TIMESTAMPTZ updated_at
  }

  paused_repos {
    TEXT repo_full_name PK
    TEXT paused_reason
    TIMESTAMPTZ paused_until
    TIMESTAMPTZ created_at
  }

  paused_run_targets {
    TEXT repo_full_name PK
    TEXT user_id PK
    TEXT paused_reason
    TIMESTAMPTZ paused_until
    TIMESTAMPTZ created_at
  }

  runs ||--o{ run_metrics : "has metrics"
  runs ||--o{ paused_run_targets : "can be paused per user"
  • Indexes: repo/status/started/user/installation on runs; run_id/metric_name on run_metrics; repo_full_name and user_id on paused_run_targets.
  • Migrations are idempotent and tracked in schema_migrations; failures surface with a clear startup error.

Runtime components (VPS)

  • Entrypoint: bin/gittinkerer (bash)
  • Workspace manager: Ensures the single target repo is present and on the right ref.
  • Codex runner: Builds prompt, executes Codex CLI, captures response.
  • Git manager: Creates diff, commits as nntin-bot, pushes.
  • Reply publisher: Posts response back to PR comment thread or web conversation.
  • Artifacts writer: Writes a complete audit bundle per run.

Domain model (service/src/domain)

classDiagram
  class Run {
    +runId: string
    +repoFullName: string
    +repoCloneUrl: string
    +source: "pr"|"web"
    +status: RunStatus
    +payload: unknown
    +payloadPath: string?
    +artifactsDir: string?
    +commentBody: string?
    +commentRawBody: string?
    +prNumber: number?
    +prHeadRef: string?
    +prHeadSha: string?
    +prBaseRef: string?
    +timestamp: string?
    +startedAt: Date?
    +finishedAt: Date?
    +exitCode: number?
    +userId: string?
    +installationId: string?
    +createdAt: Date
    +updatedAt: Date
    +start()
    +succeed()
    +fail(error)
    +isTerminal()
    +duration: number?
  }

  class RunStatus {
    <<enum>>
    pending
    running
    success
    succeeded
    failed
    ignored
    unknown
  }

  class RepoRef {
    +fullName: string
    +cloneUrl: string
    +owner(): string
    +name(): string
    +getWorkspacePath(resolver)
  }

  class Instruction {
    +body: string
    +rawBody: string
    +source: "pr"|"web"
    +getPrompt()
    +fromComment(raw, source)
  }

  Run --> RunStatus : uses
  Run --> RepoRef : repo metadata
  Run --> Instruction : instruction prompt

CLI contract: bin/gittinkerer (current)

  • Entry: bin/gittinkerer sources modules: args, env, payload, artifacts, sentry, workspace, metrics, git, codex, reply/github, reply/web.
  • Command: bin/gittinkerer run --payload-file <path> [--workspaces-dir <path>] [--artifacts-dir <path>] [--timestamp <value>] [--dry-run]
  • Parsing: lib/args.sh enforces run + --payload-file (file must exist); bad usage exits 2.
  • Environment: lib/env.sh loads .env and requires GITHUB_TOKEN (or GitHub App token) plus optional overrides for ARTIFACTS_DIR/WORKSPACES_DIR and bot identity.
  • Flow: EXIT trap writes failure artifacts with failure_stage/failure_message; stages advance in order env → payload → artifacts → workspace sync → codex → git → reply.
  • Dry run: skips push and reply but still writes artifacts and metrics.
  • Exit codes: 0 success, 1 failure at any stage, 2 usage/payload validation.

Runner invocation (service → CLI)

  • The HTTP service calls spawnGittinkerer with args: run --payload-file <path> and optional --workspaces-dir, --artifacts-dir, --timestamp, --dry-run.
  • Payload files are created under serviceConfig.directories.artifacts via createPayloadFile; cleaned up by cleanupPayloadFile after spawn completes.
  • The runner uses CLI_PATH and inherits service env; stdout/stderr are buffered, and duration is measured in ms for downstream analytics/logging.

Metrics collection flow

sequenceDiagram
  autonumber
  participant Codex as lib/codex.sh
  participant Git as lib/git.sh
  participant Metrics as lib/metrics.sh
  participant Service as HTTP service

  Codex->>Metrics: record_metric("actual_prompt_tokens"|"completion_seconds")
  Git->>Metrics: record_metric("diff_loc")
  Metrics->>Metrics: flush_metrics (requires RUNNER_METRICS_TOKEN & PAYLOAD_RUN_ID)
  Metrics->>Service: POST /api/runs/:run_id/metrics (Bearer RUNNER_METRICS_TOKEN)
  Service-->>Metrics: 2xx on success (logged, queue cleared)

Proposed payload schema (single repo)

The exact payload can evolve, but the following is the minimum shape GitTinkerer should accept.

web:

{
  "run_id": "2025-12-23T12:34:56Z-<random>",
  "source": "web",
  "repo": {
    "full_name": "OWNER/REPO",
    "clone_url": "https://github.com/OWNER/REPO.git"
  },
  "comment_body": "Instruction text (after stripping /tinker)",
  "comment_raw_body": "/tinker Instruction text (raw; artifact-only)",

  "web": {
    "web_conversation_id": "conv_01J...",
    "user_id": "user_123"
  },

  "pr": {
    "head_ref": "feature/branch-name"
  }
}
pr:

{
  "run_id": "2025-12-23T12:34:56Z-<random>",
  "source": "pr",
  "repo": {
    "full_name": "OWNER/REPO",
    "clone_url": "https://github.com/OWNER/REPO.git"
  },
  "comment_body": "Instruction text (after stripping /tinker)",
  "comment_raw_body": "/tinker Instruction text (raw; artifact-only)",

  "pr": {
    "number": 123,
    "comment_id": 999999,
    "head_ref": "feature/branch-name",
    "head_sha": "abcdef123456...",
    "base_ref": "main"
  },

  "web": {
    "web_conversation_id": "conv_01J..."
  }
}

Rules:

  • source is "pr" or "web".
  • If source == "pr", pr MUST be present with number, comment_id, head_ref, head_sha, and base_ref.
  • If source == "web", web.web_conversation_id (and web.user_id) MUST be present.
  • If source == "web", pr is optional and may include only head_ref for branch selection.
  • Target checkout/push ref is derived from PR metadata/ref (PR mode).

GitHub webhook notes (PR mode):

  • Trigger source is the GitHub App issue_comment webhook.
  • Only comments that begin with /tinker should trigger a run.
  • /tinker supports multiline instructions; everything after the prefix is treated as the instruction body.
  • comment_body should contain the stripped instruction text (no /tinker).
  • comment_raw_body is optional but recommended, and is used for audit only (stored in pr_comment.txt).
  • pr.comment_id is optional for this mode.

Authorization

  • Owner-only restriction: Only repository owners can trigger /tinker commands.
  • The service verifies that repository.owner.type === 'User' (organization-owned repos are not supported).
  • The service verifies that comment.user.login === repository.owner.login.
  • Unauthorized attempts are logged to console and result in a GitHub comment explaining the restriction.
  • These checks occur before rate limiting or PR metadata fetching to minimize resource usage.

Reason: This prevents prompt injection attacks from untrusted collaborators and ensures only the repository owner controls AI-driven code changes.

Prompt contract (Codex)

Every run must include the instruction comment and MUST include the following post-change reporting requirement:

After making changes, write a short rationale explaining:
- What was changed
- Why it was changed
- Any assumptions made

Do not include internal reasoning or deliberation.

The agent response is:

  • Persisted to artifacts
  • Posted back to the origin channel (PR reply or web UI reply)

Artifacts contract

Each run writes to artifacts/<timestamp>/agent-run/ (or an overridden base). Required files:

agent-run/
├── prompt.txt
├── pr_comment.txt
├── diff.patch
├── files_changed.json
├── commit_sha.txt
└── summary.md

Success expectations:

  • prompt.txt: exact prompt sent to Codex (from lib/codex.sh)
  • pr_comment.txt: raw instruction text (if present in payload)
  • diff.patch: patch for all changes applied (lib/git.sh)
  • files_changed.json: JSON array of changed file paths
  • commit_sha.txt: SHA of pushed commit (empty if no commit)
  • summary.md: agent response/rationale (Codex output)

Failure expectations (explicit)

On ANY failure (including payload parsing, repo sync, Codex error, git push failure, or reply failure):

  • The run directory artifacts/<timestamp>/agent-run/ MUST still be created.
  • summary.md MUST exist and begin with a short, human-readable status section:
  • Status: failed
  • Failed stage: <stage-name> (e.g., payload, sync, codex, commit, push, reply)
  • A concise error message (no stack traces if they might leak sensitive paths/tokens)
  • files_changed.json MUST exist:
  • [] if no changes were applied
  • diff.patch MUST exist:
  • Empty file if no diff exists
  • commit_sha.txt MUST exist:
  • Empty file if no commit was created or push did not succeed
  • prompt.txt and pr_comment.txt MUST be written whenever the corresponding data was available at the time of failure.

Web UI hosting (Traefik-friendly)

The web UI runs on the VPS and is intended to sit behind Traefik (configured elsewhere):

  • Dev: npm run dev -- --host 0.0.0.0 --port 5173
  • Production-like preview: npm run preview -- --host 0.0.0.0 --port 4173

Binding to 0.0.0.0 enables Traefik to route to the service.

HTTP API (Node 24)

The web UI interacts with a lightweight HTTP service running on the VPS (implementation in this repository).

This service is the only interface the web UI needs:

  • POST /api/runs
  • Accepts a JSON payload (single repo per run; see payload schema above)
  • Starts a non-interactive run by spawning bin/gittinkerer run --payload-file <temp>
  • Returns run_id immediately (and optional status metadata)

  • GET /api/runs/:run_id

  • Returns run status and results
  • MUST include enough information for the UI to display the agent response (e.g., summary.md contents)
  • MUST include enough information for operators to locate run artifacts (e.g., the artifact path)

Configuration

See .env.example.

  • GITHUB_TOKEN is required on the VPS and should be minted by a GitHub App with least-privilege scopes.
  • The bot commit identity is:
  • nntin-bot
  • 48604375+nntin-bot@users.noreply.github.com
  • The HTTP service now requires a reachable PostgreSQL instance during startup; configure DATABASE_URL or PGHOST/PGUSER/PGPASSWORD/PGDATABASE before launching.

Docker Compose

graph TD
    %% ===== Host Machine =====
    subgraph "Host"
        SSH[~/.ssh<br/>Git Auth]
        Codex["/usr/local/bin/codex<br/>AI CLI"]
        RepoFS["Repo checkout<br/>(./ workspace)"]
    end

    %% ===== Docker Compose Stack =====
    subgraph "Docker Compose"
        subgraph "Services"
            P[postgres:16-alpine<br/>DB: Runs/Metrics<br/>Host:5432]
            R[redis:7-alpine<br/>Cache/Rate Limit<br/>Host:6379]
            S[service Node.js<br/>Webhooks/API/Admin<br/>Host:3000<br/>Mounts: SSH,Codex,Repo]
            W[web SvelteKit<br/>UI<br/>Host:5173]
        end

        %% User access
        W -->|API Calls<br/>VITE_SERVICE_URL| S

        %% Internal dependencies
        S -->|Queries / Inserts| P
        S -->|Cache / Limits| R

        %% Host mounts
        SSH -.->|/root/.ssh:ro| S
        Codex -.->|/usr/local/bin/codex:ro| S
        RepoFS -.->|./:/repo bind| S

        %% Persistence
        P -.->|pgdata| PGVol[pgdata Volume]
    end

    %% ===== Styling =====
    style S fill:#f9f,stroke:#333,stroke-width:3px
    style W fill:#bbf,stroke:#333

Redis 7.0+ is required.

Authentication

graph TD
    %% ===== Host Machine =====
    subgraph "Host"
        SSH[~/.ssh<br/>Git Auth]
        Codex["/usr/local/bin/codex<br/>AI CLI"]
        RepoFS["Repo checkout<br/>(./ workspace)"]
    end

    %% ===== Docker Compose Stack =====
    subgraph "Docker Compose"
        subgraph "Services"
            P[postgres:16-alpine<br/>DB: Runs/Metrics]
            R[redis:7-alpine<br/>Cache/Rate Limit]
            S[service Node.js<br/>API + Webhooks<br/>Internal:3000<br/>Mounts: SSH,Codex,Repo]
            W[web SvelteKit<br/>UI<br/>Internal:5173]
        end

        %% Internal service communication (NO Traefik)
        W -->|API Calls<br/>VITE_SERVICE_URL| S

        %% Internal dependencies
        S -->|Queries / Inserts| P
        S -->|Cache / Limits| R

        %% Host mounts
        SSH -.->|/root/.ssh:ro| S
        Codex -.->|/usr/local/bin/codex:ro| S
        RepoFS -.->|./:/repo bind| S

        %% Persistence
        P -.->|pgdata| PGVol[pgdata Volume]
    end

    %% ===== External Systems =====
    subgraph "External"
        GH[GitHub<br/>Webhooks & API]
        User[User Browser]
        Internet[Public Internet]
    end

    %% ===== Edge Layer =====
    subgraph "Ingress"
        T[Traefik<br/>Reverse Proxy<br/>TLS Termination]
        KC[Keycloak<br/>OIDC]
    end

    %% ===== Ingress Traffic =====

    %% GitHub webhooks (no auth, signed payload)
    GH -.->|Signed Webhook Events| Internet
    Internet -->|HTTPS<br/>/api/github/webhook| T
    T -->|Forward :3000| S

    %% User access to UI (Keycloak-protected)
    User -.->|HTTPS / WSS| Internet
    Internet -->|HTTPS :5173| T
    T -->|OIDC Auth| KC
    KC -->|Token| T
    T -->|Forward :5173<br/>with Keycloak Auth| W

    %% ===== Styling =====
    style T fill:#ffd,stroke:#333,stroke-width:3px
    style KC fill:#dfd,stroke:#333
    style S fill:#f9f,stroke:#333,stroke-width:3px
    style W fill:#bbf,stroke:#333
    style Internet fill:#eee,stroke:#666,stroke-dasharray: 5 5

Data aggregation

flowchart LR
    subgraph Web
      UI["Web UI<br/>(/, /runs, /runs/:id, /admin, /analytics)"]
    end
    subgraph Service
      HTTP[HTTP API<br/>Node 24 service]
      Metrics[Metrics & Analytics]
      Sentry[Sentry SDK]
    end
    subgraph Data
      PG[(PostgreSQL runs,<br/>run_metrics,<br/>paused_repos)]
      Redis[(Redis rate<br/>limits & cache)]
    end

    UI -->|REST| HTTP
    HTTP -->|read/write| PG
    HTTP -->|cache rate limits<br/>run status| Redis
    Metrics -->|aggregations| PG
    HTTP -->|errors| Sentry