FaultLine

Reliability & incident postmortem platform that models outages as timelines and surfaces org-wide reliability patterns.

SoloNext.jsTypeScriptSupabase / PostgresMulti-tenantSRE workflowsAnalytics

Overview

FaultLine logs incidents as evolving timelines with structured postmortems. Events are timestamped and used to derive metrics like time-to-impact and time-to-mitigation.

Services are modeled as first-class units with ownership and criticality, enabling org-wide reliability insights.

The Problem

Incidents are often tracked in ad-hoc documents with no shared structure, making it hard to compare outages or learn from patterns.

Without timelines and normalized postmortems, metrics become manual and reliability hotspots stay hidden.

Constraints

The system must preserve multi-tenant isolation, normalize entities for analytics, and compute metrics from events rather than manual input.

Org isolation across incidents, services, and actions.
Derived metrics from timeline events.
Normalized postmortem factors and ownership.
Import-ready architecture for external sources.

Solution

FaultLine models incidents as event timelines with actions and services to derive reliability metrics automatically. Postmortems capture root cause, detection, preventability, and contributing factors in a structured format.

Timeline-driven metrics

Time-to-impact and time-to-mitigation from events.

Service ownership

Criticality and ownership tied to each service.

Structured postmortems

Normalized factors and deduplicated root causes.

Org-wide insights

Trends across severity, frequency, and hotspots.

Incident timeline with timestamped events — Event timeline used to compute incident metrics.

Incident postmortem summary view — Postmortem summary with root cause and resolution.

Architecture & Data Model

The schema is normalized around incidents, events, services, and actions. Postmortem factors are deduplicated to support analytics across time and teams.

Multi-tenant isolation is enforced throughout, with derived metrics computed from event timestamps.

Incident events and actions as first-class entities.
Service criticality and ownership per org.
Normalized contributing factors for trend analysis.

Key Screens

Reliability insights dashboard — Org-wide reliability insights and severity trends.

Reliability charts and trend analysis — Trend analysis for frequency and time-to-mitigation.

FaultLine schema and relationships — Normalized entities for incidents and services.

Outcome

Models incidents as timelines with derived metrics.
Shows normalized entities for reliability analysis.
Surfaces org-wide trends across services and severity.
Supports action items and structured postmortems.

View CV Email me LinkedIn