Availability

Learn how Sedai helps avoid downtime through proactive remediations.

This feature is best for production resources that experience regular traffic and usage.

Sedai continuously analyzes traffic patterns and performance to build a baseline profile of normal resource behavior. If a resource starts to deviate from its usual state, Sedai's prediction engine generates a remediation to intervene and avoid downtime. Remediations must pass a series of safety checks that validate Sedai's confidence and ability to safely execute in production. Completed remediations are assessed to ensure changes achieved their intended effect and fed into the system's ML feedback loop. This informs its decision model to continually learn and intelligently adapt to production behavior in real-time.

Support

Resource TypeIssue Detection

Serverless

(AWS Lambda)

Throttling, timeouts, OOMs, errors

Containers (AWS ECS/Fargate)

Errors, out of memory/CPU, timeouts

Virtual Machines (AWS EC2, Azure VM)

Errors, out of memory/CPU, timeouts

Kubernetes

(Stateless Workloads)

Errors, out of memory/CPU, timeouts

In addition to remediations, Sedai will also generate an alert if it detects unhealthy behavior but is unsure how to intervene.

For each of the resource types mentioned above, you can individually control when a remediation is executed once Sedai detects an issue:

SettingDescription

Autopilot Automatically executes remediation

Best for optimal results; allows Sedai to intervene immediately to intercept issues ahead of time, and continuously learn from resource behavior to improve its symptom detection and reaction.

Datapilot

Requires approval to execute

Best for learning how Sedai works and the types of issues it can detect. You can optionally approve remediations that Sedai will execute on your behalf. (Note: If you do not approve the issue within 4 hours of generation, recommendations will automatically expire due to the time sensitivity of most issues)

By default, availability is set to Datapilot for all resource types.

Once a remediation is submitted for execution, it must first pass a series of safety checks to ensure Sedai can safely intervene. If the safety checks fail, then the remediation will not be attempted.

You can view a history of remediations and recommendations in the Activity Timeline page. Open recommendations can be viewed in the Tasks page.

You can optionally configure availability settings using tags.

Last updated