Introduce a new 'preflight' category and wire it through the UI and service logic. Updates:
- public/static/css/error-analysis.css: add .category-badge styles (preflight, proxmox, service, database, shell, build) to provide consistent color badges.
- public/static/js/error-analysis.js: add 'preflight' color to catColors so the UI can render the new category.
- service.go: register 'preflight' as a known category and reclassify exit codes 103–108 from various categories to 'preflight' (these are validation/initial checks).
This groups validation-related errors under a single category for clearer display and handling in the frontend.
When the client retries post_to_api() after a curl timeout (5s), the server
previously created a new record for every retry since installing status always
called CreateTelemetry blindly. This inflated record counts (~3x) and left
orphan records that get cleaned up as 'unknown' after 4h.
Now UpsertTelemetry checks FindRecordByExecutionID before creating. If a record
with the same execution_id already exists, the retry is treated as idempotent
and returns success without creating a duplicate.
Root cause of massive data loss: all traffic arrives from reverse proxy
10.0.1.7, but TRUSTED_PROXIES_CIDR was not set, so rate limiter saw ALL
requests as one IP → 60 RPM shared by all users → most requests rejected.
Fixes:
- Auto-trust RFC 1918/6598 private IPs (10.x, 172.16-31.x, 192.168.x,
100.64-127.x) and loopback as reverse proxy sources for X-Forwarded-For
- This means Docker/K8s/Caddy/Nginx setups work out of the box without
needing to configure TRUSTED_PROXIES_CIDR
- Increase default rate limit from 60→300 RPM, burst from 20→60
(telemetry payloads are tiny, no reason for aggressive limiting)
Root cause: ~50% of installing records were never created, causing all
subsequent configuring/validation/success/failed updates to be lost.
Server-side fixes:
- Add logging for ALL rejection points (rate limit, body too large, JSON
parse errors, validation failures) - previously completely silent
- configuring/validation pings now create a fallback record instead of
silently skipping when the installing record doesn't exist
- Terminal status updates (success/failed/aborted) now use full PATCH
with all fields to fill in missing specs from fallback records
- New UpdateTelemetryFull method for complete record updates
The configuring/validation fallback creates a record with minimal data.
When the final success/failed update arrives with full specs, the full
PATCH fills in all missing fields (ct_type, disk_size, etc). Fields with
omitempty+zero values are omitted from JSON, preserving existing values.
The EnsurePipelineField migration was using PocketBase's legacy 'schema' key
which was renamed to 'fields' in PB v0.22+. Since the install script fetches
the latest PB release, the migration silently failed: it read an empty schema
array, PATCHed with the wrong key, and the pipeline field was never created.
All pipeline data sent to PB was silently dropped.
Changes:
- Auto-detect schema/fields key from PB collection response
- Use correct key and field format for both PB <0.22 and >=0.22
- Add retry logic (3 attempts with backoff) for startup race conditions
- Add diagnostic logging when pipeline steps are tracked
- Log which API format (schema vs fields) is being used
- Replace General Information section with compact header strip showing
status badge, exit code, type, duration, and created time at a glance
- Pipeline section now always visible (shows 'not available' placeholder
when no data instead of being hidden entirely)
- Error section redesigned: parses structured error format
(exit_code=N | desc\n---\nlog) into separate category badge,
explanation text, and scrollable error trace with copy button
- Also handles container pipe-delimited fallback format (|---| separator)
- Moved IDs and timestamps to collapsed Metadata section at bottom
- Removed duplicate CSS blocks (second copy of detail styles cleaned up)
- Added new CSS: header-strip, error-category-badge, error-trace-box,
btn-copy-trace, error-trace-empty, pipeline-empty with dashed border
Track the progression of each installation through its phases:
installing → validation → configuring → success/failed/aborted
Server-side changes (service.go):
- Add PipelineStep type and in-memory pipelineTracker (sync.Map)
- Track each status transition with timestamp in trackPipelineStep()
- Add Pipeline (json.RawMessage) field to TelemetryOut and TelemetryStatusUpdate
- Include pipeline data in every PocketBase create/update
- Auto-cleanup tracker entries on terminal status (success/failed/aborted)
- Add EnsurePipelineField() auto-migration: creates 'pipeline' JSON field
in PocketBase collection on startup if missing
Frontend changes (dashboard.js + dashboard.css):
- Add renderPipeline() function with visual step indicator
- Show pipeline section in record detail modal (between General Info and
System Resources)
- Each step shows icon (checkmark/cross/pause), label, and timestamp
- Color-coded: green for passed phases, red for failed, purple for aborted
- Handles both JSON array (native) and string-wrapped JSON (fallback)
- Graceful degradation: no pipeline shown for records without data
Treat the new "validation" status like an in-progress ping similar to "configuring": add "validation" to status counts in dashboard and PBClient analysis, include it in stuck/installing logic, and prevent creating ghost records in UpsertTelemetry by skipping new record creation when payload.Status is "validation" or "configuring". Also add "validation" to allowedStatus and improve the warning log to include the actual status.
When the container sends 'configuring' before the host's 'installing'
record was written (race condition or curl timeout), UpsertTelemetry
created a new record with ct_type=0, all-zero hardware values, and
repo_source=N/A. These ghost records were then hidden by the dashboard
(which filters repo_source='ProxmoxVE').
Changes:
- Skip record creation for 'configuring' status when no existing record
is found (log warning instead of creating ghost)
- Add RepoSource to TelemetryStatusUpdate so PATCH updates carry
repo_source through to PocketBase
- Final states (failed/success) still create fallback records since
post_update_to_api sends full payloads with all fields
Exit codes 1/2 are categorized as 'shell' on the bash side but the Go
allowlist was missing it, causing fallback to 'unknown' for every general
shell error.
Introduce IPv4 anonymization for telemetry logs by adding a regexp import and sanitizeIPs (ipv4Re) that masks the last three octets (keeps first octet as x.x) and apply it to TelemetryIn.Error after sanitizeMultiLine for GDPR compliance. Add "tool" to allowedType. Change enum validation behavior to log warnings and fall back to safe "unknown" defaults for gpu_vendor, gpu_passthrough, cpu_vendor, and error_category instead of rejecting writes; type validation still returns an error but now logs the rejection. Improves privacy and robustness of telemetry ingestion.
Expose full installation logs in error analysis and UI, and allow larger payloads. Includes: add InstallLog to error records and propagate it to dashboard responses; render a collapsible full-installation-log panel in recent errors with a toggle and log button in public/static/js/error-analysis.js; increase allowed error/log size (sanitizeMultiLine -> 128KB) and MaxBodyBytes -> 256KB to carry full logs; adjust timeouts/refresh TTLs for larger data sets and make various minor formatting/whitespace cleanups across dashboard.go and service.go. These changes improve debugging by preserving and surfacing install logs while tuning server limits and timeouts to accommodate larger payloads.
Remove the deprecated "configuring" status across telemetry handling and validation: it is no longer treated as a separate state or included in queries/allowedStatus. Simplify UpsertTelemetry to create on "installing" and update otherwise, with a fallback create when no existing record is found. Add an embedded public filesystem (go:embed public) with serveHTMLFile helper and route handlers for /error-analysis, /script-analysis and serving /static/ assets. Also tighten defaults and schedules: reduce MaxBodyBytes (16384 -> 1024) and increase cleanup intervals (CHECK_INTERVAL 15m -> 60m, STUCK_AFTER 2h -> 24h). These changes consolidate status handling and enable serving bundled frontend assets from the binary.
Remove large script/error analysis plumbing and persistent script-stats code from dashboard.go (types, aggregation, PB stores and helper functions). Clean up imports (remove log and sync), tighten auto-reclassification logic (drop SIGHUP/exit 129 branch, only handle SIGINT/ctrl-c/exit 130), and change tool tracking from type "pve" to "tool". Add DashboardHTML() which returns the embedded dashboard HTML/CSS/JS. These changes simplify the dashboard codepath and embed the frontend HTML, but remove the previous detailed analysis and persistent stats machinery that other callers may have relied on.
Add handling for SIGHUP/exit code 129 across the codebase so records from clients that reported status="failed" due to SIGHUP are reclassified as aborted. Updates include: string-matching for "sighup" alongside SIGINT/Ctrl+C in reclassification logic (dashboard, analytics, ingestion), PocketBase query filters to include/exclude SIGHUP and exit_code=129, and changing exit code 129's category/description to user_aborted (with an expanded text mentioning hangup). This unifies legacy SIGINT/SIGHUP behavior and improves accuracy of "aborted" vs "failed" classification.
- Type of Change checkboxes (bug, feature, breaking, dashboard, refactor, docs, security/privacy)
- Privacy prerequisite check (no new PII without PRIVACY.md update)
- Links to privacy documentation
- New docs/PRIVACY.md with full telemetry documentation:
consent model, what's collected (per type), what's NOT collected,
data processing, GDPR legal basis, how to opt out
- README: link to PRIVACY.md, ROPA.md, TOMS.md in Privacy section
If statsAllTime.Bootstrap fails, log the error but continue by initializing an empty stats map with a "_marker" entry set to today's date. This prevents repeated bootstrap attempts and allows the service to build stats incrementally going forward. The change acquires the stats mutex while updating the map and adds an informational log message.
Treat a new 'configuring' status alongside 'installing' across services and UI (filters, counts, badges, allowedStatus, stuck-install detection). Reclassify failed records with exit_code==0 as success (in Fetch* paths and main ingestion) and remove aggressive top-error truncation; increase error preview/full display limits. Expand exit code mappings and descriptions (many curl/apt/docker/signal/timeouts added) and enhance categorizeErrorText to detect Docker/container, resource (OOM) and signal-related errors for better error_category assignment. Misc: add new HTML/CSS for configuring badge and adjust related dashboard/error-analysis rendering.
Make request and background refresh timeouts depend on the requested data range: default 120s, 300s for >=90 days and 600s for >=365 days. Simplify and extend cache TTL behavior: keep a 2m default but use a 23h TTL for queries older than 7 days (replacing the previous 5m/15m tiers). These changes reduce frequent recomputation for large/time-spanning analyses and allow more time for heavy data fetches.
Introduce ScriptStatsStore to persist aggregated script telemetry (7d, 30d, all-time) backed by PocketBase collections. Implements loading, bootstrap/rebuild, incremental updates, aggregation, and PB upserts (LoadFromPB, Bootstrap, Rebuild, IncrementalUpdate, Update, writeAllToPB, BuildData). Integrate stores into main: initialize stores, serve script analysis from in-memory persistent stores for days=0/<=7/<=30 (fast response + caching), allow days=0 as All Time, and wire stores into warmupCaches to load/update and pre-warm cached script data during startup/nightly. Also: adjust cache TTL behavior for live today data, change warmupCaches signature to accept stores, add startup bootstrapping/rebuild logic (including long timeout for all-time bootstrap), add sync mutex and imports. Misc: return JSON error payloads from GitHub issue API endpoint (instead of plain text), and fix JS error parsing to prefer data.error over data.message.
Parse and store script creation date, compute script age and installs-per-day, and surface that metric in the script analysis UI. ScriptInfo now includes Created (parsed from script_created), ScriptStat gains DaysOld and InstallsPerDay, and the top/bottom tables render an "Installs/Day" column (colspan adjusted). Also increased cache TTLs for larger ranges (uses 23h for >7 days/default) and limited warmup to ProxmoxVE only. Includes date parsing fallbacks and a minimum 1-day age to avoid division-by-zero.
Rename and extend script lookup to return ScriptInfo (slug + type) and add a mapping from relation IDs to display types. The API request now requests the type field and results are stored as map[string]ScriptInfo instead of map[string]bool. Updated FetchScriptAnalysisData to use the new FetchKnownScripts, adjust logging and filtering, and append zero-usage scripts (with their type) to TopScripts for 30d+ and all-time views so scripts with no telemetry are represented. Unknown type IDs fall back to the raw type value.
- warmupCaches replaces warmupDashboardCache, now warms all 3 endpoints
- Nightly at 02:00 UTC: full warmup with 23h TTL for days>1
- Every 15min: refresh today-only data (fast, changes frequently)
- Startup: full warmup of all day ranges and repos
- 30d/AllTime data now instantly available from cache all day
- New page: Most Used Scripts (Top 10, expandable) + Recent Activity (Last 10, expandable)
- Filter by 7d / 30d / All Time + Source (ProxmoxVE/ProxmoxVED/All)
- Stats cards: Total Installs, Unique Scripts, Avg Installs/Script
- Success rate bar visualization per script (green/red/purple/yellow segments)
- Search filtering on both tables
- /api/scripts endpoint with caching (stale-while-revalidate)
- Nav links added to Error Analysis page
Reclassify runs showing status="failed" with exit_code=0 (and no error text) as success and skip exit_code 0 from error stats to avoid false errors. Expand and refine exit-code -> description/category mapping (curl errors, signals with names, BSD sysexits, apt cases, storage/timeout categories) for more accurate categorization. Improve error listing UI: allow wrapped error text, add short/full toggles for long errors, add escapeAttr to safely serialize attributes, change default period to "Today", and replace inline onclicks with data-attribute-driven issue buttons plus delegated click handler to avoid escaping issues. Minor CSS and layout tweaks for readability.
Combine Error Analysis and Failed Apps into a two-column responsive grid, update styles and interactions (new classes, hover states, sizing, truncation, badges, severity colors), and add a Deep Analysis action. Adjust client-side rendering: tighter item counts, changed text/formatting for counts and thresholds, improved accessibility/spacing and responsive behavior. In service.go, correct PocketBase filter negation to use the !~ operator for error pattern exclusions and add a clarifying comment.