Sysdig | Alan Murphy https://sysdig.com/blog/author/alan-murphy/ Wed, 13 Sep 2023 23:30:14 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://sysdig.com/wp-content/uploads/favicon-150x150.png Sysdig | Alan Murphy https://sysdig.com/blog/author/alan-murphy/ 32 32 Announcing AWS Lambda Telemetry API Support for Sysdig Monitor https://sysdig.com/blog/aws-lambda-telemetry/ Fri, 11 Nov 2022 05:15:00 +0000 https://sysdig.com/?p=59106 The Sysdig Monitor Extension for AWS Lambda Telemetry API is now available to aid observability in serverless computing environments.

The post Announcing AWS Lambda Telemetry API Support for Sysdig Monitor appeared first on Sysdig.

]]>
Observability in serverless computing environments, such as AWS Lambda, has always been a challenge. The pure nature of serverless environments has meant that traditional observability tools can be at a slight disadvantage due to the following issues:

  • Serverless computing typically does not require nor offer a static runtime environment
  • The ephemeral nature of serverless functions,
  • The micro-duration often seen with the execution of functions
  • The distributed nature of when and where serverless functions are executed

Cloud providers often export the equivalent of control plane metrics for serverless functions by exposing runtime metrics such as invocations, concurrency, duration, error count, etc, but access to those metrics has historically come at a literal cost in consumption and latency. Enter the newly announced AWS Lambda Telemetry API which aims to address both problems by giving direct access to serverless function metrics at runtime.

AWS Lambda 101

AWS Lambda is a serverless service, allowing you to run code (or functions) on-demand without the need to provision runtime environments (such as AWS EC2, AWS EKS, Kubernetes, etc). AWS Lambda function support ranges from very straightforward functions – a Hello World Python script, for example, which when invoked simply return the ubiquitous “Hello World” via HTTP along with an HTTP 200 response code – to the ability to directly integrate with additional AWS services as functions such as using Lambda ones to build and execute AWS Alexa skills. For more information on AWS Lambda services, please visit the AWS Lambda services home page.

Although AWS Lambda does not require a pre-configured runtime, Lambda functions do still run in an execution environment. This execution environment also allows you to attach external code, tools, and post-processors, called Lambda Layers, to Lambda functions in that same runtime environment. Lambda Layers is a highly flexible system for acting on both the outcome of a Lambda function as well as actions around the execution of that function, such as extracting control plane events from the execution of each Lambda function.

Sysdig Monitor Lambda Extension

Today AWS announced the availability of the AWS Lambda Telemetry API, which gives providers such as Sysdig the ability to create specific telemetry tools which run alongside the respective Lambda functions, packaged as Lambda Layers. These tools, or extensions, can be attached to each Lambda function to consume telemetry data, such as function events, function start time, function end-time, function errors, etc. When a Lambda function executes, the extension will be executed along with that function.

Along with the general availability of AWS Lambda Telemetry API, Sysdig is excited to announce preview availability of the Sysdig Monitor Lambda Extension for AWS Lambda Telemetry API. This tool will allow Sysdig to generate and collect real-time metrics based on event data coming from each individual Lambda function and push those metrics directly into your Sysdig Monitor account. That allows you to consume near real-time serverless metrics along with your other core observability metrics.

How does it work?

The Sysdig Monitor Extension for AWS Lambda generates four metrics which are critical for monitoring a serverless control plane environment:

  • aws_lambda_invocations
  • aws_lambda_duration
  • aws_lambda_postruntime_extensions_duration
  • aws_lambda_errors
Announcing Lambda for AWS with Sysdig

The AWS Lambda Telemetry API allows Sysdig to plug directly into the event stream from Lambda functions.

It pushes Lambda function events in a format compatible with OpenTelemetry events for consumption by other extensions, so that observability tools can natively ingest those events and, in Sysdig’s case, convert them into actionable metrics.

Given that this is a real-time architecture, events are generated every time a Lambda function is executed, resulting in immediate access to Lambda events at runtime in an OTEL-compatible format. As Lambda functions are invoked, events are triggered and exposed via the Telemetry API, allowing Sysdig to consume those events directly per-function at execution time of each function. As Sysdig consumes those events, real-time metrics, such as invocation count and function duration, are generated and pushed to the Sysdig Monitor platform.

The result is lower latency in receiving metrics from Lambda functions, something that is critical in high-volume function workloads and environments. If you have a low-latency requirement for a system where other components may depend on the outcome of Lambda function events, the ability to funnel those events into a real-time observability stack, like Sysdig Monitor, is critical. By deploying the Sysdig Monitor Extension for AWS Lambda Telemetry API, users can immediately plug into that real-time event stream and lower the latency of receiving metrics into the observability stack.

For Sysdig Monitor users who are already ingesting metrics from AWS CloudWatch Metric Streams, the addition of metrics generated from the AWS Lambda Telemetry API gives access to very specific Lambda metrics at a much lower latency and a much greater fidelity. The list of metrics generated from the Lambda Telemetry API is not meant to be exhaustive nor to match the complete list of metrics available when streaming AWS Lambda metrics through AWS CloudWatch Metric Streams, but rather provide a highly focused list of critical metrics for real-time function monitoring.

For customers who may not wish to consume metrics through AWS CloudWatch Metric Streams, direct access to these critical Lambda function metrics through the Telemetry API and via the Sysdig Monitor extension can provide the required observability in near real-time with simplified instrumentation.

For more information, check the official documentation

Sysdig Monitor Extension benefits

The goal of any observability tool is to drive lower MTTR with greater flexibility throughout the observability stack…in other words, get as close to the event source as possible in as little time as possible. With the preview availability of the Sysdig Monitor Extension for AWS Lambda Telemetry API, Sysdig users will immediately see the following benefits:

Simplified Instrumentation

By exposing platform metrics available directly via the Telemetry API, Sysdig users have a more direct path for ingesting critical Lambda serverless metrics into the Sysdig Monitor platform. This leads to lower latency and lower MTTR for real-time serverless functions.

By default, the Sysdig Monitor Extension for AWS Lambda Telemetry API will push event-generated metrics to the Sysdig Monitoring platform at a cadence of every 10 seconds, as opposed to the normal cadence of AWS Lambda metrics through AWS CloudWatch Metric Streams, which is one minute. Quicker access to function state allows lower MTTR for Lambda functions, which becomes critical for real-time event-driven systems.

Enhanced observability

The AWS Lambda Telemetry API provides deeper insights into the runtime performance and phases of the Lambda execution environment lifecycle (initialization, invocation, etc). By generating metrics based on real-time events via the Telemetry API, Sysdig provides greater insight into the performance and states of AWS Lambda functions:

  • Better cold start visibility (through events related to init phase)
  • Understanding initialization success and behavior (during init phase)
  • Visibility into issues which often occur during the invoke phase due to timeout/reset (through “phase” visibility on init events).
Announcing Lambda for AWS with Sysdig

OpenTelemetry (OTEL) compatibility

Today, Sysdig supports the OTEL format implemented by AWS CloudWatch Metric Streams natively, receiving JSON-formatted metrics directly from AWS Kinesis Firehose and converting the metrics information contained in that data format into Prometheus-native metrics. With the addition of the Sysdig Monitor Lambda Extension, Sysdig extends that support to real-time events generated from Lambda functions by the AWS Lambda Telemetry API. The OTEL-compatible Sysdig Monitor Extension easily and efficiently consumes Lambda events via the Telemetry API to gather insights for Lambda function execution and environments.

Greater Flexibility

The Sysdig Monitor Extension for AWS Lambda is designed to augment the ingestion of the full metrics suite provided by AWS Lambda via AWS CloudWatch Metric Streams. It can also be used to ingest only the invocation-level metrics required for real-time monitoring of AWS Lambda functions, giving you the flexibility to choose the types of metrics you want to collect and from which source.

Availability

The Sysdig Monitor Extension for AWS Lambda Telemetry API is currently available in Preview mode. For more information on how to access, install, configure, and use the extension, please visit https://aws.amazon.com/blogs/compute/introducing-the-aws-lambda-telemetry-api/

You can try it for free right now by signing up for a 30 day trial and choosing an AWS region during the sign-up process.

The post Announcing AWS Lambda Telemetry API Support for Sysdig Monitor appeared first on Sysdig.

]]>
Collect critical AWS metrics faster with Sysdig https://sysdig.com/blog/aws-cloudwatch-stream-support-monitor/ Thu, 14 Jul 2022 19:01:15 +0000 https://sysdig.com/?p=52130 Today, we are excited to announce support for Amazon CloudWatch Metric Streams. This support will enable our customers to ingest...

The post Collect critical AWS metrics faster with Sysdig appeared first on Sysdig.

]]>
Today, we are excited to announce support for Amazon CloudWatch Metric Streams. This support will enable our customers to ingest metrics from AWS CloudWatch in real time, increase metric and state fidelity and time to ingestion while decreasing MTTR, and support cloud metrics at scale without the need to customize or re-configure new AWS service metrics.

In this blog, we dig deep into:

  • New support for ingesting real-time metrics for your AWS services via Amazon CloudWatch Metric Streams
  • The value of event-based metric ingestion via a push vs. pull model.
  • Using real-time metrics to improve MTTR.

For the past few years, Sysdig Monitor has supported ingesting metrics and metadata from AWS services via pulling from Amazon CloudWatch. This provides a flexible model for Sysdig Monitor customers to have granular control over what and how AWS CloudWatch metrics are ingested, but it does keep the frequency of that ingestion at arms length – any pull-based model will depend on how frequently metrics are ingested and made available to the monitoring system. When dealing with more critical systems, or systems where alerting needs to be as close to real-time as possible, a different model is required: A push model where the underlying metrics acquisition engine can notify the monitoring system based on the frequency required by the systems being monitored. Enter Amazon CloudWatch Metric Streams.

Launched in 2021, Amazon CloudWatch Metric Streams is a feature of Amazon CloudWatch which allows customers to send near real-time, continuous metrics from over 70 AWS services to external monitoring platforms such as Sysdig Monitor. This allows AWS administrators and operators to aggregate those near real-time metrics into systems used for monitoring other parts of the infrastructure, such as Kubernetes environments, and tie those metrics together to create a holistic view of application health and performance. Sysdig Monitor with Amazon CloudWatch Metric Streams sourced metrics and metadata enables you to continuously ingest application and infrastructure metrics, along with service and infrastructure metadata, providing insight into AWS cloud usage, performance, and overall system health of your applications and services.

AWS architecture of how ASW Cloudwatch sends metrics to Sysdig monitor via Kinesis

The value of near real-time

The ultimate goal of any monitoring and alerting platform is to provide administrators and operators with immediate access to real-time system status:

  • How is my infrastructure currently handling load?
  • How are my applications performing?
  • When do I need to deploy a scale event or, even better, be notified when an autonomous scale event is deployed to handle a change in performance and load?

All of these questions, and the systems that provide resolution, point to a very important metric: How to minimize MTTR (Mean Time to Resolution) for any event. An event can be a failed system, an application that’s slowing to respond under load, or even a drastic increase in cloud consumption costs due to any issue that’s managed by the cloud platform. The more we can reduce MTTR for any event the more we can reduce costs, system churn, and ultimately customer unhappiness.

Traditional pull-based metrics ingestion can help with MTTR but there will always be an inherent latency in that system. First, a service will need to generate a metric, then another service will need to query the source service for the latest metric state. Then, that receiving service will need to process, analyze, and act on any event that’s detected as part of that metric set. While we can minimize that latency by removing as many pieces in the communication chain as possible, we’ll always be limited to the immediacy of data, which will be dependent on the cadence at which our receiving system allows us to poll for that metric state change.

Pull model vs push model. Pull model is more optimal, as data is sent in real-time.

In contrast, a streaming, or push, model allows the metric source to dictate the frequency at which it sends us metrics and state changes, which can vary based on the critical nature of the source service. Amazon CloudWatch Metric Streams turns the control over metric delivery and cadence to the source AWS service so the services can decide how frequently to deliver critical metrics, which metrics are the most important to deliver at any given time, and which services are considered more critical than others. AWS ECS, for example, publishes metrics every minute by default whereas AWS EC2 sends metrics every five minutes by default. Using Amazon CloudWatch Metric Streams, the monitoring endpoint – Sysdig Monitor, in this case – will receive metrics for each as they arrive into CloudWatch, every minute and every five minutes, respectively.

Configuring Amazon CloudWatch Metric Streams in Sysdig Monitor using CloudFormation templates

Configuring Amazon CloudWatch Metric Streams is done in two phases:

  1. Configuring a new Amazon CloudWatch Stream, done in your AWS account
  2. Integration of your AWS account with Sysdig Monitor for real-time status monitoring and consumption of additional AWS resource information
Connecting a new AWS account to Sysdig Monitor

Creating a new Stream can be done via a CloudFormation Template which is linked from the Sysdig UI. To get started, click on the “Start Installation” button and select CloudWatch Metric Streams -> Use CloudFormation Template. That will open a new window or tab which points to the pre-configured CloudFormation Template used to configure a new Stream and point that Stream at the Sysdig HTTPS receiver.

Connecting a new AWS account to Sysdig Monitor. Using cloudFormation template.

Amazon CloudFormation template for creating a new Stream stack.

Overview of the Amazon CloudFormation template for creating a new Stream stack.

For more information on AWS account integration with Sysdig and configuring Amazon CloudWatch Metric Streams, please refer to the official documentation.

Using Sysdig Monitor out-of-the-box dashboards and alerts for Amazon CloudWatch Metric Streams

Once you have configured Amazon CloudWatch Metric Streams in Sysdig Monitor, our pre-built dashboards and alerts will be automatically available for you to start using right away. You can use them as-is or customize them to your heart’s content.

Dashboard in Sysdig Monitor with some metrics like: Container Count by Task, request time, or CPU usage.
An alarm on Sysdig Monitor for HighFunctionErrorRate. Alert when (aws_lambda_errors_sum / aws_lambda_invocations_sum) > 0.15

Conclusion

As a powerful companion to traditional Amazon CloudWatch metrics, Amazon CloudWatch Metric Streams offers Sysdig Monitor users the ability to ingest and consume near real-time metrics from many AWS cloud platform and application services. With ingestion support for 70+ AWS services (at launch time), along with a set of curated out-of-the-box, per-service dashboards and alerts, Sysdig Monitor allows you to deploy a turn-key platform for all of your infrastructure and application metrics, regardless of location, environment, or cloud provider. Together, these metrics can be used to create a single pane of glass monitoring tool for your entire infrastructure, and with Amazon CloudWatch Metric Streams as the source for one major component of that system you’ll be able to increase visibility while reducing MTTR across your entire organization.

You can try it for free right now by signing up for a 30 day trial and choosing an AWS region during the sign-up process.

The post Collect critical AWS metrics faster with Sysdig appeared first on Sysdig.

]]>