Skip to content

SwarmCracker TODO Implementation Plan

Status: ALL PHASES COMPLETE ✅


9 TODO items found across production code and tests. This plan prioritizes production code by complexity.


Phase 1: Quick Wins (LOW Complexity)

1.1 Fix HT/SMT Extraction

File: pkg/swarmkit/vmm.go:296 Issue: HtEnabled hardcoded to false, should extract from machine config

Current:

HtEnabled:  false, // TODO: Extract from machine config

Fix:

HtEnabled:  toBool(machineConfig["smt"]), // Extract from machine-config

Add helper:

func toBool(v interface{}) bool {
    if v == nil {
        return false
    }
    if b, ok := v.(bool); ok {
        return b
    }
    return false
}

Effort: 10 minutes Impact: Correctness - enables SMT when configured


1.2 YAML Config Loading

File: cmd/swarmcracker-agent/main.go:75 Issue: Hardcoded default config instead of YAML loading

Current:

// TODO: Implement YAML config loading
return &swarmkit.Config{...}

Fix:

import "gopkg.in/yaml.v3"

func loadConfig(path string) (*swarmkit.Config, error) {
    data, err := os.ReadFile(path)
    if err != nil {
        return nil, fmt.Errorf("failed to read config: %w", err)
    }

    config := &swarmkit.Config{}
    if err := yaml.Unmarshal(data, config); err != nil {
        return nil, fmt.Errorf("failed to parse config: %w", err)
    }

    // Set defaults for missing fields
    setConfigDefaults(config)

    return config, nil
}

func setConfigDefaults(c *swarmkit.Config) {
    if c.FirecrackerPath == "" {
        c.FirecrackerPath = "firecracker"
    }
    if c.KernelPath == "" {
        c.KernelPath = "/usr/share/firecracker/vmlinux"
    }
    // ... other defaults
}

Effort: 30 minutes Impact: Essential - enables proper configuration


Phase 2: Medium Complexity

2.1 IO Device Discovery for Cgroups

File: pkg/jailer/cgroup.go:305 Issue: IO bandwidth limits not applied (need device major:minor)

Approach: 1. Discover block devices from rootfs path 2. Get major:minor numbers via /sys/dev/block/ 3. Write to io.max file

Implementation:

func (m *CgroupManager) setIODeviceLimits(cgroupPath string, limits ResourceLimits) error {
    if limits.IOReadBPS == 0 && limits.IOWriteBPS == 0 {
        return nil
    }

    // Discover the block device backing the rootfs
    devices, err := m.discoverBlockDevices()
    if err != nil {
        return fmt.Errorf("failed to discover block devices: %w", err)
    }

    ioMaxPath := filepath.Join(cgroupPath, "io.max")
    for _, dev := range devices {
        line := fmt.Sprintf("%d:%d rbps=%d wbps=%d", 
            dev.Major, dev.Minor, limits.IOReadBPS, limits.IOWriteBPS)
        if err := os.WriteFile(ioMaxPath, []byte(line), 0644); err != nil {
            return fmt.Errorf("failed to write io.max: %w", err)
        }
    }
    return nil
}

func (m *CgroupManager) discoverBlockDevices() ([]BlockDevice, error) {
    // Parse /proc/mounts or use stat on rootfs to get device
    // Extract major:minor from /sys/dev/block/
}

Effort: 1-2 hours Impact: Correctness - enables IO throttling


2.2 Network Key Management

File: pkg/swarmkit/executor.go:332 Issue: SetNetworkBootstrapKeys does nothing

Approach: 1. Store keys in secure memory/disk 2. Use keys for VXLAN encryption 3. Integrate with VXLANManager

Implementation:

func (e *Executor) SetNetworkBootstrapKeys(keys []*api.EncryptionKey) error {
    if len(keys) == 0 {
        return nil
    }

    // Store keys securely
    e.networkKeys = keys

    // Pass to network manager if available
    if e.networkMgr != nil {
        return e.networkMgr.SetEncryptionKeys(keys)
    }

    return nil
}

// In NetworkManager:
func (nm *NetworkManager) SetEncryptionKeys(keys []*api.EncryptionKey) error {
    // Configure VXLAN with encryption keys
    // Keys used for secure peer communication
}

Effort: 2-3 hours Impact: Security - enables encrypted overlay network


2.3 SwarmKit Node Discovery

File: pkg/network/manager.go:703 Issue: Peer discovery hardcoded/empty

Approach: 1. Query SwarmKit for active nodes 2. Filter by VXLAN-enabled nodes 3. Return peer IPs

Implementation:

func (nm *NetworkManager) discoverPeerWorkers() []string {
    if nm.nodeDiscovery == nil {
        return []string{}
    }

    nodes, err := nm.nodeDiscovery.GetNodes()
    if err != nil {
        nm.logger.Warn().Err(err).Msg("Failed to discover nodes")
        return []string{}
    }

    peers := []string{}
    for _, node := range nodes {
        if node.Status == api.NodeStatus_READY {
            // Extract VXLAN IP from node
            peers = append(peers, node.VXLANIP)
        }
    }
    return peers
}

// Add interface:
type NodeDiscovery interface {
    GetNodes() ([]NodeInfo, error)
}

Effort: 2-3 hours Impact: Integration - automatic peer discovery


Phase 3: High Complexity (Full Features)

3.1 Local Rootfs Preparation

File: cmd/swarmcracker/main.go:690 Issue: Rootfs not prepared locally before deploy

Full Workflow: 1. Pull OCI image (containerd/docker) 2. Extract filesystem to temp dir 3. Create ext4 rootfs image 4. Upload to remote hosts 5. Clean up temp files

Implementation:

func prepareLocalRootfs(imageRef string, outputDir string) (string, error) {
    // 1. Pull image using containerd or skopeo
    imageDir := filepath.Join(outputDir, "image-extract")
    if err := pullAndExtractImage(imageRef, imageDir); err != nil {
        return "", err
    }

    // 2. Create ext4 rootfs
    rootfsPath := filepath.Join(outputDir, "rootfs.ext4")
    if err := createExt4Rootfs(imageDir, rootfsPath, 512); err != nil {
        return "", err
    }

    // 3. Return path for upload
    return rootfsPath, nil
}

func createExt4Rootfs(sourceDir string, outputPath string, sizeMB int) error {
    // Create raw image file
    // Format as ext4
    // Mount and copy files
    // Unmount
}

Dependencies: - guestfs or mkfs.ext4 + mount commands - containerd client or skopeo binary

Effort: 4-6 hours Impact: Feature - enables full deployment workflow


3.2 Full Deployment Logic (Bash Script)

File: cmd/swarmcracker/main.go:942 Issue: Deployment script incomplete

Current placeholder, needs: 1. Pull OCI image 2. Create rootfs 3. Setup network (TAP/bridge) 4. Start Firecracker VM 5. Configure VM via API

Implementation: Convert to Go code in deployToHost() instead of bash script:

func deployToHost(host string, plan *DeploymentPlan) error {
    client, err := createSSHClient(...)

    // 1. Upload rootfs
    err = uploadRootfs(client, localRootfsPath, remoteRootfsPath)

    // 2. Setup network on remote
    err = setupRemoteNetwork(client, plan)

    // 3. Start Firecracker
    err = startFirecrackerVM(client, taskID, plan)

    // 4. Configure via API
    err = configureVM(client, taskID, plan)

    return nil
}

Effort: 4-6 hours Impact: Feature - completes deployment workflow


Phase 4: Test TODOs (Optional)

4.1 Network Connectivity Test

File: test/e2e/full_workflow_test.go:246 Effort: 2 hours Deferred: Test enhancement, not production blocker

4.2 Network Setup Test

File: test/e2e/firecracker/04_real_image_test.go:187 Effort: 2 hours Deferred: Test enhancement, not production blocker


Phase Task Effort Priority
1.1 HT/SMT extraction 10 min 🔴 HIGH (correctness)
1.2 YAML config loading 30 min 🔴 HIGH (essential)
2.1 IO device discovery 1-2 hr 🟡 MEDIUM (feature)
2.2 Network key management 2-3 hr 🟡 MEDIUM (security)
2.3 SwarmKit node discovery 2-3 hr 🟡 MEDIUM (integration)
3.1 Local rootfs preparation 4-6 hr 🟢 LOW (future feature)
3.2 Full deployment logic 4-6 hr 🟢 LOW (future feature)

Total Quick Wins: 40 minutes Total Medium: 5-8 hours Total High: 8-12 hours


Next Action

Start with Phase 1 (quick wins): 1. Fix HT/SMT extraction (10 min) 2. Implement YAML config loading (30 min)

Then proceed to Phase 2 based on priorities.