Architecture Documentation¶

This document describes the internal architecture, state management, error handling, and technical design decisions of Immich Stack.

System Overview¶

Immich Stack is a stateless CLI application that synchronizes photo stacks between computed groupings and the Immich photo management system via its REST API.

┌──────────────┐      ┌───────────────┐      ┌──────────────┐
│   CLI Tool   │ ───> │ Stacker Logic │ ───> │  Immich API  │
│  (Commands)  │      │  (Grouping)   │      │   (Stacks)   │
└──────────────┘      └───────────────┘      └──────────────┘
       │                      │                      │
       └──────────────────────┴──────────────────────┘
                        Configuration
                    (Criteria, Flags, Env)

Core Components¶

Command Layer (cmd/): CLI interface and command orchestration
Stacker Logic (pkg/stacker/): Grouping algorithm and parent selection
API Client (pkg/immich/): HTTP client with retry logic and error handling
Utilities (pkg/utils/): Shared types, logging, and helpers

State Management¶

Stateless Design Philosophy¶

Immich Stack is intentionally stateless between runs:

No persistent database or state files
Each run fetches fresh data from Immich API
Computed groupings are derived from criteria on each execution
No memory of previous runs or decisions

Why Stateless?¶

Advantages:

Resilient to Immich API changes (always uses current state)
Self-healing from transient errors (retry on next run)
Consistent with manually created stacks (no drift from external state)
No risk of state corruption or inconsistency
Simpler to reason about and debug

Trade-offs:

Must re-fetch all data on each run
Cannot track incremental progress within a run
No built-in idempotency tracking (relies on API state comparison)

State Lifecycle Per Run¶

Each execution follows this lifecycle:

1. Initialize
   ├─ Load configuration (env vars, CLI flags)
   ├─ Create logger
   └─ Create API client

2. Fetch Current State
   ├─ GET /stacks (all existing stacks)
   │  └─ Build stacksMap (asset ID → stack)
   ├─ GET /assets (all assets to process)
   │  └─ Enrich with stack information
   └─ GET /me (current user information)

3. Compute Desired State
   ├─ Apply grouping criteria to assets
   ├─ Form groups (potential stacks)
   └─ Determine parent for each group

4. Compare States
   ├─ Identify new stacks to create
   ├─ Identify stacks to delete
   └─ Identify stacks to update/replace

5. Apply Changes
   ├─ DELETE /stacks/{id} (remove old stacks)
   ├─ PUT /stacks (create/update stacks)
   └─ Log all actions

6. Cleanup
   └─ Exit (no state persisted)

Stack State Representation¶

Current State (from Immich):

type TStack struct {
    ID             string
    PrimaryAssetID string
    Assets         []TAsset
}

Desired State (computed):

type Group struct {
    Key    string
    Assets []TAsset  // First asset is desired parent
}

Stack Comparison Logic¶

Determines if existing stack matches desired stack:

func needsUpdate(existing TStack, desired Group) bool {
    // Different parent?
    if existing.PrimaryAssetID != desired.Assets[0].ID {
        return true
    }

    // Different asset membership?
    if !sameAssets(existing.Assets, desired.Assets) {
        return true
    }

    return false  // Stack is already correct
}

Dry-Run Verification¶

How Dry-Run Works¶

Dry-run mode (DRY_RUN=true) simulates all operations without making API changes:

func (c *Client) ModifyStack(assetIDs []string) error {
    if c.dryRun {
        c.logger.Info("[DRY RUN] Would create stack")
        return nil  // No-op, just log
    }

    // Real API call
    return c.doRequest(http.MethodPut, "/stacks", payload, nil)
}

Dry-Run Guarantees¶

No API Writes: Only GET requests executed, no PUT/POST/DELETE
Full Simulation: All grouping and comparison logic runs normally
Accurate Logging: Shows exactly what would happen in real run
Safe Testing: Can test dangerous operations (RESET_STACKS, REPLACE_STACKS)

Dry-Run Workflow¶

User Request
    │
    ├─ DRY_RUN=true
    │   ├─ Fetch current state (READ)
    │   ├─ Compute desired state
    │   ├─ Compare states
    │   ├─ Log all planned actions
    │   └─ Exit (no writes)
    │
    └─ DRY_RUN=false
        ├─ Fetch current state (READ)
        ├─ Compute desired state
        ├─ Compare states
        ├─ Execute actions (WRITE)
        └─ Exit

Verifying Dry-Run Output¶

Look for these log patterns:

[DRY RUN] Would create stack with 3 assets
[DRY RUN] Would delete stack abc-123-def
[DRY RUN] Would replace stack xyz-456-uvw

Real runs show:

✅ Success! Stack created
🗑️  Deleted stack abc-123-def - replacing child stack with new one
🔄 Updated stack xyz-456-uvw

Error Recovery Mechanisms¶

Error Classification¶

Errors are classified into three categories:

Transient Errors (retry automatically):
Network failures (connection timeout, DNS resolution)
Server errors (5xx responses)
Rate limiting (429 responses)
Permanent Errors (fail immediately):
Authentication failures (401, 403)
Invalid request format (400)
Resource not found (404)
Application Errors (log and continue):
Invalid asset data
Criteria parsing errors
Individual stack operation failures

Error Handling Strategy¶

┌────────────────┐
│  API Request   │
└───────┬────────┘
        │
        ├─ Success (2xx)
        │  └─> Return data
        │
        ├─ Transient Error (5xx, timeout, 429)
        │  ├─> Retry with exponential backoff
        │  └─> Max 3 retries, then fail
        │
        ├─ Permanent Error (4xx except 429)
        │  └─> Fail immediately, log error
        │
        └─ Application Error
           └─> Log error, continue processing

Graceful Degradation¶

When errors occur during processing:

Individual Asset Failure: Skip asset, continue with others
Stack Operation Failure: Log error, continue with remaining stacks
API Client Failure: Retry automatically, then fail entire run
Criteria Parsing Failure: Fail fast (cannot proceed without valid criteria)

Recovery Actions¶

For Transient Errors:

Automatic retry with exponential backoff (500ms, 1s, 2s)
Log retry attempts for debugging
Fail entire operation after max retries

For Permanent Errors:

Log detailed error message with context
Provide actionable remediation steps
Exit with non-zero status code

For Application Errors:

Log error with asset/stack context
Continue processing remaining items
Report summary at end of run

API Retry Logic and Backoff Strategy¶

Retry Configuration¶

const (
    maxRetries  = 3
    baseDelay   = 500 * time.Millisecond
)

Exponential Backoff¶

Retry delays follow exponential pattern:

Attempt 1: Wait 500ms  (baseDelay × 2^0)
Attempt 2: Wait 1s     (baseDelay × 2^1)
Attempt 3: Wait 2s     (baseDelay × 2^2)
Fail: No more retries

Retry Implementation¶

func (c *Client) doRequest(method, path string, body, response interface{}) error {
    for attempt := 0; attempt < maxRetries; attempt++ {
        err := c.makeRequest(method, path, body, response)

        if err == nil {
            return nil  // Success
        }

        if !isRetriable(err) {
            return err  // Permanent error, don't retry
        }

        if attempt < maxRetries-1 {
            delay := baseDelay * time.Duration(1<<attempt)
            c.logger.Warnf("Retry %d/%d after %v", attempt+1, maxRetries, delay)
            time.Sleep(delay)
        }
    }

    return fmt.Errorf("max retries exceeded")
}

Retriable Conditions¶

func isRetriable(err error) bool {
    // Network errors
    if isNetworkError(err) {
        return true
    }

    // HTTP status codes
    if statusCode == 429 {  // Rate limited
        return true
    }
    if statusCode >= 500 && statusCode < 600 {  // Server errors
        return true
    }

    return false  // Client errors (4xx) are not retriable
}

Backoff Jitter¶

To prevent thundering herd, random jitter can be added:

delay := baseDelay * time.Duration(1<<attempt)
jitter := time.Duration(rand.Int63n(int64(delay / 2)))
time.Sleep(delay + jitter)

Rate Limiting Handling¶

When receiving 429 (Too Many Requests):

Check Retry-After header if present
Use exponential backoff if header absent
Log rate limit event for monitoring
Respect server's requested delay

Concurrency Handling¶

Multi-User Operations¶

When processing multiple API keys:

API_KEY=user1_key,user2_key,user3_key

Processing is sequential, not concurrent:

apiKeys := strings.Split(os.Getenv("API_KEY"), ",")

for _, key := range apiKeys {
    client := immich.NewClient(apiURL, key, ...)

    user, err := client.GetCurrentUser()
    if err != nil {
        logger.Errorf("Failed for key: %v", err)
        continue  // Skip this user, continue with others
    }

    logger.Infof("Processing user: %s", user.Name)

    // Process stacks for this user
    if err := processStacks(client); err != nil {
        logger.Errorf("Error for user %s: %v", user.Name, err)
        continue
    }
}

Why Sequential Processing?¶

Design Choice: Sequential processing per user to:

Avoid API Rate Limits: Concurrent requests could exceed limits
Maintain Clear Logs: User-by-user logging is easier to follow
Prevent Resource Contention: Single HTTP client per user
Ensure Isolation: Errors in one user don't affect others

Within-User Parallelism¶

Within a single user's processing, operations are sequential:

Fetch Stacks → Fetch Assets → Group Assets → Apply Changes
    ↓             ↓               ↓              ↓
  Serial        Serial          Serial         Serial

Rationale:

Stacks depend on assets (must fetch stacks first)
Grouping requires all assets (can't parallelize)
Stack operations have dependencies (delete before create)

Thread Safety¶

HTTP client is not shared across goroutines:

// Safe: New client per user
for _, key := range apiKeys {
    client := immich.NewClient(...)  // Fresh instance
    // Use client for this user only
}

// Unsafe: Sharing client across goroutines
client := immich.NewClient(...)
for _, key := range apiKeys {
    go func() {
        // DON'T DO THIS - not thread-safe
        client.SetAPIKey(key)
    }()
}

Signal Handling¶

Graceful shutdown for cron mode:

sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)

go func() {
    <-sigChan
    logger.Info("Received shutdown signal")
    shutdownFlag.Set(true)  // Set atomic flag
}()

for !shutdownFlag.Get() {
    runStacker()
    time.Sleep(cronInterval)
}

API Client Architecture¶

HTTP Client Configuration¶

client := &http.Client{
    Timeout: 600 * time.Second,  // 10 minutes
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 100,
        IdleConnTimeout:     90 * time.Second,
    },
}

Request/Response Flow¶

1. Build Request
   ├─ Set method (GET, POST, PUT, DELETE)
   ├─ Build URL (baseURL + path)
   ├─ Marshal JSON body (if present)
   ├─ Set headers (Content-Type, x-api-key)
   └─ Create http.Request

2. Execute Request (with retries)
   ├─ Attempt 1: Send request
   │  ├─ Success? Return response
   │  └─ Retriable error? Continue
   ├─ Wait with exponential backoff
   ├─ Attempt 2: Send request
   │  └─ ...
   └─ Attempt 3: Send request
      └─ Fail if still erroring

3. Handle Response
   ├─ Check status code
   ├─ Read response body
   ├─ Unmarshal JSON (if expected)
   └─ Return data or error

Connection Pooling¶

Benefits of connection pooling:

Reduced Latency: Reuse existing TCP connections
Lower Overhead: Avoid handshake for each request
Better Performance: Especially for many small requests

Configuration:

MaxIdleConns: 100          // Total idle connections across all hosts
MaxIdleConnsPerHost: 100   // Idle connections per host
IdleConnTimeout: 90s       // Close idle connections after 90s

Grouping Algorithm¶

High-Level Flow¶

Assets (unsorted) → Group By Criteria → Sort Within Groups → Stacks

Grouping Process¶

Initialize Empty Groups:

groups := make(map[string][]TAsset)

Iterate All Assets:

for _, asset := range assets {
    key := computeGroupKey(asset, criteria)
    groups[key] = append(groups[key], asset)
}

Compute Group Key:

func computeGroupKey(asset TAsset, criteria []Criterion) string {
    keys := []string{}
    for _, crit := range criteria {
        switch crit.Key {
        case "originalFileName":
            keys = append(keys, extractFilename(asset, crit))
        case "localDateTime":
            keys = append(keys, formatTime(asset, crit))
        // ... other criteria
        }
    }
    return strings.Join(keys, "|")
}

Parent Selection Within Group¶

Sort Group by Promotion Rules:

sort.Slice(group, func(i, j int) bool {
    return compareByPromotionRules(group[i], group[j])
})

Promotion Rule Precedence:

1. PARENT_FILENAME_PROMOTE list order (left to right)
2. PARENT_EXT_PROMOTE list order (left to right)
3. Built-in extension rank (.jpeg > .jpg > .png > others)
4. Alphabetical order (case-insensitive)
5. Local date/time (earliest first)
6. Asset ID (lexicographic)

First Asset Becomes Parent:

parent := group[0]
children := group[1:]

Performance Characteristics¶

Time Complexity¶

Fetching Assets: O(n) where n = total assets
Grouping: O(n × m) where m = number of criteria
Sorting Groups: O(k × g log g) where k = number of groups, g = avg group size
Creating Stacks: O(k) API calls
Overall: O(n × m + k × g log g)

Space Complexity¶

Assets: O(n) - all assets stored in memory
Groups: O(n) - assets distributed across groups
Stacks Map: O(s) where s = number of existing stacks
Overall: O(n)

Bottlenecks¶

Network I/O: Fetching large asset lists from API
Regex Evaluation: Complex patterns on every asset
JSON Marshaling: Large payloads for stack operations
Memory: Large libraries (100k+ assets) can consume 1-2GB

Optimization Strategies¶

Use simple criteria (Legacy mode) for large libraries
Increase time deltas to reduce group count
Optimize regex patterns (anchors, no wildcards)
Filter assets with WITH_ARCHIVED/WITH_DELETED
Process in batches for very large libraries

Logging Architecture¶

Log Levels¶

trace   // Very detailed (e.g., HTTP request/response bodies)
debug   // Detailed (e.g., parent selection decisions)
info    // Standard (e.g., stack created, assets processed)
warn    // Warnings (e.g., retries, unexpected conditions)
error   // Errors (e.g., API failures, invalid config)

Structured Logging¶

Using logrus for structured logs:

logger.WithFields(logrus.Fields{
    "assetID": asset.ID,
    "filename": asset.OriginalFileName,
    "stackID": stack.ID,
}).Info("Stack created")

Log Formats¶

Text Format (human-readable):

level=info msg="Stack created" assetID=abc-123 filename=IMG_1234.jpg

JSON Format (machine-parseable):

{
  "level": "info",
  "msg": "Stack created",
  "assetID": "abc-123",
  "filename": "IMG_1234.jpg",
  "time": "2025-11-12T10:30:00Z"
}

Dual Logging¶

When LOG_FILE is set:

if logFile != "" {
    file, err := os.OpenFile(logFile, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
    if err == nil {
        logger.SetOutput(io.MultiWriter(os.Stdout, file))
    } else {
        // Fallback to stdout only
        logger.Warn("Could not open log file, using stdout only")
    }
}

Testing Architecture¶

Test Structure¶

pkg/
├─ stacker/
│  ├─ stacker.go          # Implementation
│  ├─ stacker_test.go     # Unit tests
│  └─ stacker_integration_test.go  # Integration tests
│
└─ immich/
   ├─ client.go           # API client
   └─ client_test.go      # Mock API tests

Test Categories¶

Unit Tests: Test individual functions in isolation
Integration Tests: Test component interactions
Mock Tests: Test API client with mock HTTP server

Testing Best Practices¶

Use table-driven tests for multiple scenarios
Mock external dependencies (API, filesystem)
Test edge cases (empty groups, single-asset stacks)
Verify error handling paths
Check log output for correct messages

Security Considerations¶

API Key Handling¶

Never log API keys (sanitize in logs)
Store keys in environment variables, not files
Support multiple keys for multi-user scenarios
Validate key format before use

Input Validation¶

Validate all user inputs (criteria, env vars)
Sanitize regex patterns to prevent ReDoS
Check for SQL injection in any database queries
Validate file paths for log files

Network Security¶

Use HTTPS for API calls (validate TLS)
Set reasonable timeouts to prevent DoS
Implement rate limiting respect
Handle redirects securely

Future Architecture Considerations¶

Potential Improvements¶

Incremental Processing: Track processed assets to skip on subsequent runs
Parallel API Calls: Concurrent fetching/updating with proper throttling
Persistent Cache: Cache asset metadata to reduce API calls
Batch Optimization: Group stack operations into larger batches
Streaming Processing: Process assets in streaming fashion for very large libraries

Scalability Limits¶

Current architecture scales to:

Assets: ~200k (limited by memory)
Stacks: ~50k (limited by API response size)
Users: Unlimited (sequential processing)
API Calls: Respects rate limits with exponential backoff

Extension Points¶

Areas designed for extension:

New Criteria Types: Add to criteria.go
Custom Comparison Logic: Extend grouping algorithm
Additional Commands: Add to cmd/ directory
Alternative APIs: Implement new client interface