๐Ÿ’กRelease Intelligence

Learn how to quickly validate production releases by understanding autonomously generated scorecards that grade release quality.

With Release Intelligence, deployments are autonomously detected and graded based on performance, error, and saturation metrics in varying traffic scenarios. Releases are analyzed for 48 hours and given an overall score that indicates whether it is behaving better, worse, or similarly compared to the resource's prior release.

Release Intelligence is designed to help your team push and iterate on production releases faster so that you can quickly validate behavior and track improvements or degradations. Scorecards can be used as a decision making tool during your review process, and allow you to measure a releaseโ€™s success against your expectations as well as ML-based predictions for a resourceโ€™s behavior. If a scorecard indicates a resource deviates from expected behavior, it is best practice to review and verify it is functioning as you intended.

Scores

Following a release, sample datasets are pulled in varying traffic scenarios, which are identified based on a resource's observed traffic patterns. Traffic scenarios are categorized as either low, typical, or high. The analysis period will persist until datasets in high traffic scenarios are present, or up to 5 days post-release.

Connected monitoring data is automatically prioritized, and the following metric categories are analyzed to generate a score:

  • Resource usage

  • Latency

  • Errors

A score will be generated as soon as data is available, and may be updated as the analysis a release is deployed, we continue to analyze its performance and update it after 24 hours (assuming there was sufficient data for analysis).

If the resource has insufficient traffic following a release, the analysis window will be extended for up to 5 days.

The analysis is based on sample datasets from 20-minute time windows, and each metric within the dataset is compared to

The summary score primarily emphasizes the extent to which behavior changes (either bett

  • Whether latency is relatively longer or shorter compared to previous releases

  • Whether the error count is relatively higher or lower compared to previous releases

  • The amount of traffic relative to the resource (classified as Low, Typical, or High)

We then aggregate results into an overarching score that prioritizes datasets with considerable change in high traffic scenarios. The score is graded based on the amount of deviation detected.

Latency and error count are primarily assessed for AWS Lambdas. For other resources, additional saturation metrics such as CPU and memory will also be taken into account during the analysis.

Score Categories

The score is based on a scale of 0 to 10, which is broken down into three levels of deviation based on which metrics changed and by how much.

  • Minimal (10): The release is performing as expected compared to prior deployments with similar metric behavior.

  • Moderate (5.0 to 9.9): The release is exhibiting inconsistent behavior, which means:

    • Latency is either slightly longer or shorter, and/or

    • Errors are slightly higher or lower.

  • Significant (0 to 4.9): The release distinctly varies from previous observed behavior, which means:

    • Latency is either considerably longer or shorter, and/or

    • Errors are considerably higher or lower.

We recommend reviewing all scorecards with a moderate or significant deviation rating to ensure changes arenโ€™t negatively impacting your resource. You can connect custom alerts to your preferred integration from the Notifications page.

Keep in mind that we rate the level of deviations based on each resourceโ€™s respective history. You might receive a scorecard of significant deviations with considerably more errors, but the resource could still have a reasonable error rate. Compared to its historical error trends, we flag the increase as a departure from the resourceโ€™s norm.

A preliminary score is generated as soon as enough data is available. However, the final score may vary significantly based on further analysis. The initial score reflects performance immediately following the release, but we continue to analyze the release for up to 24 hours to assess its performance in varying traffic scenarios.

Scores are based on an analysis of available data following a release. If there is insufficient traffic to complete the analysis, the score will be blank or unavailable:

  • Blank ( โ€” ): Incomplete analysis; we will attempt to analyze performance for up to a week post-release.

  • Unavailable (N/A): Unsuccessful analysis; no data available from the 7 days post-release.

We will only analyze a resource's latest release. If a new release is detected while an analysis is already underway, the existing analysis will be completed as-is and a new analysis will automatically be triggered.

Understanding a Scoreโ€™s Context

You should factor in your releaseโ€™s goal when reviewing scores since context matters. For example, the goal of a release might be a change in metric behavior that ultimately leads to an improvement in performance (for example, fewer errors and a shorter duration). However, that might not be your primary goal and those unexpected deviations could indicate a deeper underlying issue, such as bad code that is functioning incorrectly.

Examples of expected scorecards:

  • If youโ€™re attempting to improve latency, a lower score that indicates significant deviations is a good thing.

  • If youโ€™re pushing a brand new feature that is graded with moderate deviations in errors, you likely want to dig in and look for ways to improve your code.

Examples of unexpected scorecards:

  • If you pushed an update with minimal deviations but you expected to improve the error count, you missed your goal.

  • If you pushed an update with an unexpected and significant decline in latency, you might have mistakenly broken the service and should revisit your code.

How to View Scorecards

Navigate to Release Intelligence from the side navigation. This page provides an overview of all of your accounts and summarizes their total scorecards broken down by the past week, the past month, and the past three months (note: scorecards are only saved in Sedai for 90 days).

Choose an account and select View all scorecards. The page will refresh to display a timeline of all scorecards by resource in that account, ordered by the most recent. You can narrow down the list of scorecards by using the filters on the lefthand side of the page, or search for a single resource to view all of its releases in the past three months.

The list groups scorecards by date. You can scroll back in time to look at older releases, or quickly jump to a specified date in the upper righthand corner of the screen (select Go to date).

The scorecard itself has three distinct sections:

  • The far left portion provides the overall score

  • The middle portion includes when the release occurred, as well as details about the score

  • The far right portion identifies the resource

Scorecards might include signal icons next to their score description if Sedai detects signal activity within 24 hours of the release. Open the scorecardโ€™s side drawer and go to the Related Signals tab for details.

Select a scorecard to open the side drawer with its details. The side drawer has four sections:

  • At the top, you can view a summary of the resource itself and copy its ARN code by clicking the blue link icon.

  • Below that is the overall score summary (depending on when you view a scorecard, this could be preliminary or final).

  • Following the summary is the Assessment tab. This shows each dataset that contributed to the overall score and identifies the level of deviation per metric as well as the amount of traffic during that time period. If the score is final, you can toggle to view the preliminary score and its respective datasets. You can also select a dataset to view its metric and traffic charts.

  • The Related Signals tab shows any activity (including availability, optimization, or SLO signals) Sedai detected within 24 hours of the release.

The preliminary score may vary from the final score. We recommend reviewing all datasets to understand what changed throughout the analysis time period. For example, there might be low traffic during the initial data assessed. Metrics might fluctuate more in higher traffic scenarios, which is why we continue to analyze the release and provide a final score after more time passes.

Manual Scorecard Analysis

While Release Intelligence automatically generates scorecards for all Lambda serverless functions and EKS resources once a deployment is detected, you can manually trigger a scorecard analysis for any resource at any time.

There are two primary reasons you might want to do this:

  • The resource is not a Lambda serverless function or EKS resource. We currently are unable to autonomously detect deployments for other resource types, but can still analyze their releases when provided the resource name and release time.

  • There was insufficient data due to low traffic following the release. For example, your Lambda is a low priority service that is infrequently used. When itโ€™s last release was deployed, we were unable to analyze its performance because we didnโ€™t have any datasets in the week following the release.

In either scenario, you can manually trigger an analysis for an on-demand scorecard and select how much time post-release you want Sedai to review performance.

To trigger a scorecard analysis, from within an account list of scorecards select the Analyze release button in the upper righthand corner. The side drawer will pop out and allow you to select the resource type and enter the resource name.

Last updated