Sysdig | Harry Perks https://sysdig.com/blog/author/harry/ Mon, 17 Jun 2024 13:12:58 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://sysdig.com/wp-content/uploads/favicon-150x150.png Sysdig | Harry Perks https://sysdig.com/blog/author/harry/ 32 32 Introducing New Investigation Features for Sysdig Secure https://sysdig.com/blog/sysdig-introduces-new-cdr-investigation-features/ Mon, 17 Jun 2024 13:45:00 +0000 https://sysdig.com/?p=90121 Cloud migration and continuous innovation provide organizations with substantial gains in speed, scalability, and cost (to name a few). Most...

The post Introducing New Investigation Features for Sysdig Secure appeared first on Sysdig.

]]>
Cloud migration and continuous innovation provide organizations with substantial gains in speed, scalability, and cost (to name a few). Most security teams have no choice but to make the jump to the cloud, in at least some capacity, to support and protect this rapidly expanding attack surface. 

But organizations and security teams aren’t alone. Threat actors have been readily adapting their craft to take advantage of cloud speed. As a result, cloud attacks happen fast, rapidly weaving through a target’s cloud estate and drawing on extensive capabilities to achieve their goals. 

A prime example is the SCARLETEEL attack, which can infiltrate an organization, execute cryptominers, uncover cloud credentials, pivot to other cloud accounts, and ultimately exfiltrate proprietary data – all in just 220 seconds. Investigating cloud attacks like SCARLETEEL has traditionally been a laborious, error-prone, and manual process. The odds are stacked against defenders, and the reality is that security teams are often unable to investigate threats before the attack completes. 

That’s why the 5/5/5 Benchmark for Cloud Detection and Response – the only industry standard for cloud security – establishes that you have just five minutes to perform cloud investigations to head off attacks before they can be executed.

What’s new: Enhanced investigations capabilities

Today, Sysdig is streamlining cloud detection and response (CDR) use cases by automating the collection and correlation of events, posture, and vulnerabilities to identities. The cloud context these capabilities provide is unparalleled. An interactive visualization of this information helps analysts instantly conceptualize attacks, unlocking five-minute investigations across the most advanced threats. 

The key new capabilities enhancing investigations include:

Attack chain visualization 

Security teams can leverage any alert or suspicious finding as a starting point to launch an investigation with the Sysdig Cloud Attack Graph. The graph provides attack chain visualization and empowers security analysts to rapidly understand the relationships between resources, and their implications for the attack chain across any cloud environment.

Overlaying threat context with the Sysdig security graph gives responders a quick understanding of the blast radius of an attack.

Sysdig’s attack chain visualization accelerates investigations by automatically correlating cloud and workload events to identities. Deep context from command history, as well as network and file activity, is easily gleaned from the overlays. Sysdig’s automated captures enable analysts to dig deeper by automatically tying digital forensic evidence to the events. Real-time context is combined with vulnerabilities and misconfiguration findings to provide a comprehensive and holistic view of a threat. To further simplify workflows, and narrow an investigation window when necessary, all investigations are MITRE-mapped and filterable. 

Contextualize posture, vulnerabilities, and deep runtime insights, including activity audit and process trees.

Real-time identity correlation 

At their core, all cloud attacks revolve around identities. Whether it be human or machine, one or many, analysts need a way to stitch suspicious findings to identities and their associated behaviors. Sysdig’s enhanced investigation capabilities automatically correlate cloud events with enriched identity data. Using attack chain visualization, analysts can rapidly understand suspicious identity behaviors such as unusual logins, impossible travel scenarios, and malicious IP addresses. With this context, teams can rapidly understand the who, what, where, and how of threat actors in their infrastructure.

Understand the activity happening in your cloud environments with identity investigation.

This visibility also helps teams to rapidly rightsize excessive permissions, such as by configuring them to permissions from before they were compromised by a malicious adversary. 

Understand an attached role and investigate it further.

Investigation workflow optimization 

A single purpose-built platform can break silos and streamline downstream activities. Security becomes a critical and valuable business partner by delivering relevant, high-context guidance across key stakeholders. Rapid investigation findings enable prescriptive guidance for response actions across incident response, platform, developer, and DevSec teams. These accelerated findings allow response teams to initiate a response within five minutes, adhering to the five minute response standard outlined in the 5/5/5 Benchmark

Closing the loop, the enhanced incident debrief findings these investigations provide (such as what misconfigurations, permissions, and vulnerabilities were abused to perpetuate the attack) can then be shared to tune and harden preventive controls. This focus on perpetual improvement to preventative controls helps ensure incidents are non-recurring, reducing organizational cloud risk.  

Outpace cloud attacks with Sysdig’s enhanced investigations

The acceleration of cloud detection and response is critical to combat modern attacks. The automation-fueled pace of cloud attacks means that investigations must move even faster. Sydig’s enhanced investigations unlock security teams by increasing efficiency, reducing skill gaps, and empowering security and platform teams to make better-informed decisions, faster. 

Join our upcoming webinar, Cloud Investigations in Just 5 Minutes, for a discussion with security experts on the evolution of cloud detection and response and its impacts. 

The post Introducing New Investigation Features for Sysdig Secure appeared first on Sysdig.

]]>
Sysdig Advisor: Making Kubernetes troubleshooting effortless https://sysdig.com/blog/kubernetes-troubleshooting-advisor/ Mon, 16 May 2022 07:01:28 +0000 https://sysdig.com/?p=50151 The cloud, Kubernetes, CI/CD, DevOps, GitOps… the last five years have seen a huge transformation in how organizations are architecting...

The post Sysdig Advisor: Making Kubernetes troubleshooting effortless appeared first on Sysdig.

]]>
The cloud, Kubernetes, CI/CD, DevOps, GitOps… the last five years have seen a huge transformation in how organizations are architecting and shipping applications. It’s hard to keep up with the pace and learn all of this new tech!

Nearly 55% of respondents to Canonical’s 2021 Kubernetes and cloud native operations report highlighted how the lack of sufficient in-house skills and people power is the biggest challenge that Kubernetes brings to businesses. Let’s be clear – the shift to cloud native, when executed well, allows businesses to enjoy the fruits of their labor, but the human factor is often overlooked.

When thinking about operating Kubernetes in practice, the platform teams provide an environment for application developers to deploy applications. The lack of skills within these various teams manifests as SLAs being breached when things go wrong, and that costs money. Organizations often provide service credits / refunds for four or five nines not being met. And while automation helps, when there are problems it’s equally important to understand where and what to look for, as it is how to fix the issue.

Accelerate troubleshooting by up to 10x

We’re excited to announce Advisor, a new Kubernetes troubleshooting product in Sysdig Monitor, that accelerates troubleshooting by up to 10x. Advisor displays a prioritized list of issues and relevant troubleshooting data to surface the biggest problem areas and accelerate time to resolution.

Sysdig Monitor Advisor - Curated problem priorization. Highlights problems like CrashloopbackOff, Pending Pods, CPU Throttling, Node Pressure
Curated problem prioritization gives attention faster – allowing you to identify what’s on fire or what should need to be addressed soon.

Troubleshooting Kubernetes needs more than just metrics. For example, when debugging a CrashLoopBackoff, what’s the last state of a container? What are the events? What do the container logs say? When an issue is identified, Advisor gives you all the information you need to solve it, removing the dependence and context-switching of troubleshooting data such as logs, dashboards, and the command line or kubectl.

All of this information is actionable. The simple user interface surfaces all the important details in a single unified tool with a curated, actionable set of steps for remediation of Kubernetes breakages. No digging around knowledge content such as wikis, Stack Overflow, and blogs.

Sysdig Monitor Advisor - Identify and understand why a pod is in CrashLoopBackOff in 15 seconds
Identify and understand why a pod is in CrashLoopBackOff in 15 seconds

As soon as the agent is installed, Advisor will automatically identify problems by looking through thousands of different data points with zero configuration required.

Richest data for troubleshooting every type of problem

But of course things can go wrong for a multitude of different reasons. Advisor is a powerful troubleshooting tool for any kind of problem. You can browse your infrastructure, logically grouped by cluster, application, workload, and pod to understand what’s happening at a 10,000 foot view all the way down to deep network, file, and process metrics derived from syscalls for any pod in your environment. And it’s easy to see the right data; contextualize things with open alert incidents, container logs, object descriptions (eg. kubectl describe pod), a feed of events from Kubernetes and containers, and kube-state metrics.

Because Sysdig Advisor is doing the work for you, developers and other team members who don’t normally get access to kubectl can take advantage of all this information, too. No need to convince your security team to make an exception anymore when troubleshooting a Kubernetes application issue.

Examples of dashboard panels in Sysdig monitor. Golden Signals. Requests by Status Code.
Zero app instrumentation golden signals, network, file, and process telemetry

And for platform teams, Advisor helps you ensure your cluster is correctly sized. Be confident that you have enough capacity for new workloads, and existing workloads aren’t greedy with resources resulting in infrastructure waste (and money!).

Quickly monitor cluster capacity health, and identify resource status of workloads

Advisor is now available to all customers at no additional cost, and additional troubleshooting features will be added over the coming weeks. We’re always happy to hear how our products are helping you with operational excellence. Reach out to your Sysdig contact or chat with us in-app. Speedy troubleshooting!

The post Sysdig Advisor: Making Kubernetes troubleshooting effortless appeared first on Sysdig.

]]>
Write Prometheus queries faster with our new PromQL Explorer https://sysdig.com/blog/write-prometheus-queries-faster-with-our-new-promql-explorer/ Wed, 03 Mar 2021 16:00:47 +0000 https://sysdig.com/?p=34927 We are announcing the new PromQL Explorer for Sysdig Monitor that will help you easily understand your monitor data. The...

The post Write Prometheus queries faster with our new PromQL Explorer appeared first on Sysdig.

]]>
easily understand your monitor data.

The new PromQL Explorer allows you to write PromQL queries faster by automatically identifying the common labels among different metrics. It also allows you to interactively modify the PromQL results by using the visual label filtering

It’s all about labels

Sysdig’s native compatibility with Prometheus monitoring makes it possible to use the powerful query language, PromQL, in Sysdig Dashboards & Alerts. It can query metrics by leveraging advanced functions, operators, and boolean logic.

Prometheus stores each time series identified by its metric name and key-value pairs (labels):

<metric name>{<label name>=<label value>, ...}

For example, a metric tracking HTTP requests by different labels may have the following time series:

http_requests_total{status_code="200", method="get", handler="/users"}
http_requests_total{status_code="200", method="post", handler="/order"}
http_requests_total{status_code="500", method="post", handler="/order"} 

That allows you to filter those HTTP requests, enabling queries like “give me the requests with a 200 OK response for the handler /users “.

Labels are a fundamental element for the Prometheus data-model as, with PromQL, you can filter and aggregate based on not only metrics, but also labels. To do this effectively, you need to know every label for each metric you’re trying to combine to write a PromQL query. This is because two metrics having different labels can only be combined if you explicitly choose the labels that the metrics have in common.

PromQL Explorer to the rescue

We’re excited to announce the PromQL Explorer, a new feature in Sysdig Monitor to query metrics using PromQL, understand the labels and values, and create queries faster before using them in Dashboards & Alerts.

PromQL can be used not only with metrics collected from Prometheus endpoints, but also with Sysdig native metrics collected by the agent.

using PromQL queries in the new Sysdig Monitor PromQL Explorer
Understand the time series associated with Sysdig native metrics

Using label filtering helps you visualize the common labels between metrics, which is key when combining multiple metrics.

Use the label filtering to identify common labels between queries for vector matching. In the above example, you can see that A and B metrics have only the container_id label in common.
Use the label filtering to identify common labels between queries for vector matching. In the above example, you can see that A and B metrics have only the container_id label in common.

Animated image showing how to create an alert directly from the new Sysdig Monitor PromQL Explorer
Easily use your new PromQL Query in a Dashboard or Alert

And more exciting things coming up

Over the next few weeks, Sysdig will be introducing new features that will help you write PromQL queries even faster.

By combining our unique ServiceVision™ capability with PromQL, Sysdig will automatically enrich your metrics with Kubernetes and application context without needing to instrument additional labels in your environment. This reduces operational complexity and cost since the enrichment takes place in our metric ingestion pipeline after time series have been sent to our backend.

Thus, you could go from this query:

sum by (cluster,owner_name) (sum by (cluster,namespace,pod) (sysdig_container_cpu_cores_used * on (container_id) group_right kube_pod_container_info) * on (cluster,namespace,pod) group_right kube_pod_owner{owner_kind="Node"}) / on (cluster, owner_name) group_left label_replace(kube_node_status_capacity_cpu_cores, "owner_name", "$1", "node", "(.*)") * 100

To the following one, which is much simpler:

sum by (kube_cluster_name,kube_node_name) (sysdig_container_cpu_cores_used) * 100

PromQL queries will be simplified with ServiceVision™, making it much easier to filter or aggregate metrics by Kubernetes context.

Want to get your hands dirty?

We’re starting to roll out the PromQL Explorer to users of our hosted SaaS service, and self-hosted customers will have access later in the year. You’ll find PromQL Explore under the Explore tab in the Sysdig Monitor toolbar.

We welcome your feedback and would love to learn more about the queries you create. Reach out to us anytime.

Sysdig Monitor helps you gain visibility into your infrastructure, enriching your metrics with Kubernetes and application context. You’ll be set up in just a few minutes. Request your free trial today!

The post Write Prometheus queries faster with our new PromQL Explorer appeared first on Sysdig.

]]>
New and improved dashboards: PromQL, Teams sharing, and more! https://sysdig.com/blog/new-promql-dashboards-more/ Thu, 11 Jun 2020 11:59:08 +0000 https://sysdig.com/?p=25561 To accompany Sysdig’s announcement of the first cloud-scale Prometheus monitoring offering, we had to re-architect our dashboarding experience from the...

The post New and improved dashboards: PromQL, Teams sharing, and more! appeared first on Sysdig.

]]>
To accompany Sysdig’s announcement of the first cloud-scale Prometheus monitoring offering, we had to re-architect our dashboarding experience from the ground up to support the Prometheus query language, PromQL. The query language is the standard method to query metrics within the ecosystem, and it’s an entirely new way to slice and dice metrics within Sysdig Monitor. However, we wanted to ensure the steep learning curve associated with PromQL is not prohibitive for anyone wanting to build dashboards faster.

Using dashboards within Sysdig Monitor provides a complete end-to-end solution with support for both PromQL and our simple, form-based editor. You can see all of your Prometheus metrics federated across multiple clouds, troubleshoot problems with Sysdig’s deep level of telemetry, provide RBAC to metrics with Teams, and ensure regulatory compliance with enterprise-grade access controls.

We’re happy to announce the general availability of our next generation dashboards. Starting today, users within our hosted cloud environment can get started with our dashboards, and self-hosted customers will receive access to these features over the course of the next few months.

The good news is that all of your dashboards will be migrated for you – there’s nothing you need to do. 🎉

PromQL or Sysdig’s form-based querying – or unite both

PromQL is a powerful way to query your metrics within Sysdig; you can perform complex mathematical operations, statistical analyses, and use a variety of functions to dig deeper with metrics. Using PromQL, you’ll now be able to answer more questions about the health and performance of your infrastructure using advanced functions and operators.

While mastering PromQL can make it feel like you’ve leveled up your monitoring expertise, it does have a steep learning curve which is something we didn’t want to overlook. We’ve ensured that the form-based dashboard editor is retained for users wanting to get up and running quickly. If you want to run a basic query to have a look at your CPU usage grouped by each Kubernetes deployment, you shouldn’t have to write complex PromQL queries composed of joins and functions. And it shouldn’t be complex for non-technical folks who just want to run a simple report to perform rightsizing tasks.

An example of how you can create a Dashboard in Sysdig without PromQL Knowledge, by using Sysdig Monitor’s form-based dashboards

Answer questions about the health and performance of your infrastructure without any PromQL knowledge by using Sysdig Monitor’s form-based dashboards

But what about when you want to know what the 95th percentile response time of web traffic in production was? Or what percentage of web requests were 5xx errors? How about the number of days before your file system fills up? And finally, how are you performing against SLOs over the last 30 days?

Beat outages by forecasting next week’s file system usage

First, craft a PromQL query leveraging the predict_linear function to forecast next week’s disk usage for a given file system. Then, map the forecasted values to text within a number panel to make it overly obvious if a problem is going to be expected, ensuring your team gets ahead of any issues.

We can then use the same query within Sysdig’s alerting engine to notify the team that there’s going to be a problem next week – via PagerDuty, OpsGenie, email, Slack, custom webhooks and more.

You can create alerts in Sysdig Monitor using PromQL. This "Disk space critical" will be triggered if the disk usage is above the forecast for the next week.

predict_linear(node_filesystem_free{device=$Device}[7d], 604800) 1 week = 604800 seconds

Meet agreements by measuring SLOs using indicators

You can use the metrics being emitted from your infrastructure to measure your SLOs, ensuring that you’re keeping within the boundaries of your SLA. With histograms, we can easily understand the percentage of requests successfully delivered within a given time frame.

An example dashboard monitoring global SLO's, with panels for "Total requests", "Requests served within 1s", and "Requests served within 500ms"

sum(rate(http_request_duration_seconds_bucket{le="1"}[$__interval])) by (kubernetes_cluster_name)/ sum(rate(http_request_duration_seconds_count[$__interval])) by (kubernetes_cluster_name)

Slide and dice multiple metrics with mathematical operations

Try taking multiple metrics and perform mathematical operations on them. For example, you can calculate the percentage of JVM memory by measuring the maximum against real usage.

An examle PromQL panel mixing three metrics using mathematical functions like sum and average over time.

sum by (cluster_name) (avg_over_time(appinfo_jvm_mem_heap_used[$__interval])) / sum by (cluster_name) (avg_over_time(appinfo_jvm_mem_heap_max[$__interval])) * 100

Additionally, you can seamlessly unite both PromQL and Sysdig form-based panels within the same dashboard for a unified experience.

Two panels of the same dashboard, one using promql and the other being form-based.

Use either PromQL or Sysdig’s simple form-based view – or both – within Sysdig’s new dashboards

What’s new and improved?

We listened to feedback from our customers about what was great – and not so great – about our previous generation of dashboards, and have addressed them. Here’s a list of what’s new, and what’s improved.

RBAC for Prometheus & improved dashboard sharing model

Sysdig Teams allow portions of your organization to only access the Prometheus metrics and telemetry that they care about. With full RBAC support, you can provide an application team responsible for maintaining an analytics tooling system access to only the metrics being emitted from their namespace, or give an on-call team read-only access to production hosts.

We’re committed to continuous improvements of the multi-tenant sharing capabilities within Sysdig Monitor, and we know our customers want to create a single dashboard and share it across their Sysdig Teams. They also want more fine-grained sharing controls.

Starting today, you can share your dashboard with users within your Sysdig Team, or share it across Teams with fine-grained access controls. Define who should be able to see those dashboards and what level of access they should be granted (View Only, or Collaborator with edit privileges).

Details of the settings panel, where you can set different permissions for each team. For example "Collaborator" to "Monitor Operations" , or "View only" to "Monitor backend team"

Intelligent $__interval

Use $__interval within a query and Sysdig will intelligently populate the query with the most appropriate sampling depending on the time range you’ve selected. This ensures that we balance providing access to the most granular data available while downsampling when you select a long time range.

Scope variables

Configure scope variables at the dashboard level to quickly scope based on cluster, namespace, workload and more. You’ll be able to dynamically use that $variable within the query. This is very important when troubleshooting as it allows you to switch context quickly without reconfiguring PromQL queries.

An example panel where you can define a variable $elasticsearch_cluster in the query, then, in the UI, scope the data to display depending on that variable's value.

appinfo_jvm_mem_heap_used{cluster_name=$elasticsearch_cluster}

Smart autocompletion & syntax highlighting

Autocomplete suggests metrics, operators and functions, while syntax highlighting helps keep you on the right path and highlight problems within a query. This is invaluable in dynamic environments, and allows you to craft the right queries faster.

Time series name templating

Customize the time series on dashboard panels by using labels associated with Prometheus metrics and segments to gain context faster. For example, if a metric has a label indicating the job type, use {{job_type}} as the time series friendly name.

Example of a time series template. Type: Lines, Query Display name is JVM Usage, and Timeseries Name is Cluster plus the actual cluster name.

Improved user experience

We’ve introduced a more fluid, natural dashboard building experience. The UI has been redesigned and a new panel editor makes it easier to craft the best way to visualize your metrics. They look really nice too!

A detail of the new panel editing experience. The UI covers the whole window, making it easier to craft panels.

A new editing experience utilizes the entire page, making it easier to craft panels.

Multi-metric, multi-segmentation

Configure multiple queries within a single panel, and configure each query with multiple segmentation and scoping options. Individual queries can be customized to render as a line or stacked area. For example, you could stack up the memory requests of all pods within a namespace as an area chart, and graph the maximum memory quota as a line chart to understand capacity issues.

A detail of a panel editing. It shows two different metrics, Deployment Memory as a sacked Area, and Quota limit as a line.

Event overlays

Contextualize metrics and understand the “why” faster with a unified view of both metrics and events. Configure Event Overlay to display events from Kubernetes (deployments, node failures, etc.) as well as alert events, security violations and any other events ingested using Sysdig’s open REST API.

A detail of a dashboard showcasting the event overlays. Above each panel there are rectangles indicating that there were events in that time moment. When hovering, a panel is displaying offering further details on those events. Details like: Timestamp of the event, priority of the event, type of event, event name and event description.

Dashboard templates

Get up and running quickly with dashboard templates; view your infrastructure through the lens of one of Sysdig’s curated dashboards, or use it as a base to start building your own. We have dashboard templates for managing Kubernetes capacity and health, hosts and server performance, applications and services telemetry, and the security posture of your infrastructure with data fed from Sysdig Secure.

Additionally, we’ve released PromCat.io, a resource catalog for enterprise-class Prometheus monitoring. Leverage a complete turnkey solution to monitor Kubernetes and cloud-native applications with supported Prometheus exporters, coupled with meaningful dashboards and alerts to accelerate developer productivity faster.

You’ll find dashboard templates in the dashboard navigation. You can use predefined scope variables to easily see metrics from specific entities within your infrastructure. Keep in mind, dashboard templates aren’t designed to be edited, but we’ve made it simple to copy one and start customizing it.

Map values to text

Instantly understand what’s going on by mapping number panels values to text. If you have a metric that returns 1 for up, and 0 for down, map those values to “UP” and “DOWN” respectively. No longer doubt if you should be concerned about a value by defining your thresholds. This is critically valuable when dashboards are shared between team members.

An example of a panel, where it inputs the free disk space, but its displaying "Disk space OK", "Disk space low" or "Disk space critical" with green, orange or red background colors depending on the value.

Granular axis and legend controls

Get granular with your axis and legends. We’ve introduced more flexibility when customizing your axis, as well as better support for time series with long names. You can now configure the legend by toggling its visibility and moving it to the bottom of the panel.

The future

We’re delighted to release these new dashboards with PromQL capabilities and an entirely new user experience. We’re already hard at work building additional dashboarding functionality to support more flexible visualizations, as well as improvements to make it easier to build and manage dashboards. We’d love your feedback, not only on our new dashboards, but on what you’d like to see next.

The post New and improved dashboards: PromQL, Teams sharing, and more! appeared first on Sysdig.

]]>
Detecting + preventing cgroups escape via SCTP – CVE-2019-3874. https://sysdig.com/blog/detecting-and-preventing-cgroups-escape-via-sctp-cve-2019-3874/ Fri, 22 Mar 2019 18:49:09 +0000 https://sysdig.com/?p=14807 This week CVE-2019-3874 was discovered which details a flaw in the Linux kernel where an attacker can circumvent cgroup memory...

The post Detecting + preventing cgroups escape via SCTP – CVE-2019-3874. appeared first on Sysdig.

]]>
This week CVE-2019-3874 was discovered which details a flaw in the Linux kernel where an attacker can circumvent cgroup memory isolation using the SCTP socket buffer. In containerised environments, this has the potential for a container running as root to create a DoS.

To verify if you’re vulnerable, you can check if the SCTP module is loaded:

$ modprobe sctp; lsmod | grep sctp
sctp                  311296  2
libcrc32c              16384  4 nf_conntrack,nf_nat,raid456,sctp

If you see SCTP like in the output above, you may be vulnerable. SELinux prevents binding to the SCTP socket by non-root users, however where SELinux is not enforcing, it is advised to disable and blacklist the SCTP kernel module (requires a hard reboot).

This vulnerability has been given a CVSS rating of 5.3 – a kernel patch is in development and expected soon. Detecting + preventing cgroups escape via SCTP – #CVE-2019-3874 #Kubernetes #security. Click to tweet

Detecting & stopping CVE-2019-3874 using Sysdig Falco

Sysdig Falco is an open source, container security monitor designed to detect anomalous activity in your containers. Sysdig Falco taps into system calls to generate an event stream of all system activity. Falco’s rules engine then allows you to create rules based on this event stream, allowing you to alert on system events that seem abnormal. Falco’s rich language, allows you to write rules at the host level and identify suspicious activity.

Using Falco we can create a rule to detect containers attempting to bind SCTP and kill them before a DoS attack can be accomplished.

Because Falco uses Sysdig’s powerful filter engine, we can easily identify the use of SCTP by looking for protocol number 132 (SCTP) used as part of a socket bind. We can then create a rule that detects a SCTP bind attempt within a container:

Leveraging this rule within Sysdig Secure allows us to provide some enforcement by stopping the container. This effectively stops the attack in its tracks. With our policy configured to capture all activity that took place before, during, and after an attack, we have all the data we need to perform detailed forensics.

Kubernetes API vulnerability CVE-2019-3874

Detecting CVE-2019-3874: End to end security with Sysdig

Of course, ideally Sysdig’s container run-time protection is a last line of defense against the most nefarious attacks. The Sysdig platform provides a comprehensive suite of steps to secure our environment, such as image scanning, container compliance and visibility that can help detect and prevent the exploitation of such an attack.

1. Prevent vulnerable images prior to deployment

Sysdig Secure prevents misconfigured images from being pushed through your CICD pipeline by integrating with tools like Jenkins scanning, Bamboo scanning, etc or run in production. When securing your deployment pipeline with Sysdig’s image scanning policies, you can block containers from being deployed where the effective user is root:

Kubernetes API vulnerability CVE-2019-1002100

In addition, if you’re not yet using pod security policies, you can generate effective policies with kube-psp-advisor.

Kubernetes API vulnerability CVE-2019-1002100

2. Ensure best practices based security posture

Sysdig Secure offers comprehensive container compliance for Docker and Kubernetes environments. Users can run CIS benchmark checks for their Docker containers. This allows them to quickly isolate containers that are running in privileged mode.

Kubernetes API vulnerability CVE-2019-1002100

3. Gain visibility into container infrastructure

Sysdig Monitor provides a topology map that shows the average of all images running with a root user across your multiple kubernetes clusters either on-prem or in the cloud.

Kubernetes API vulnerability CVE-2019-1002100

CIS metrics like the number of privileged containers can also be viewed as time series metrics so users can quickly visualize their container security and risk posture over time.

Kubernetes API vulnerability CVE-2019-1002100

The post Detecting + preventing cgroups escape via SCTP – CVE-2019-3874. appeared first on Sysdig.

]]>
Dynamic DNS & Falco: detecting unexpected network activity https://sysdig.com/blog/unexpected-domain-connection/ Sun, 18 Nov 2018 16:18:39 +0000 https://sysdig.com/?p=12100 Since the inception of Falco, we’ve seen users write custom rules covering a number of different use cases. Because Falco...

The post Dynamic DNS & Falco: detecting unexpected network activity appeared first on Sysdig.

]]>
Since the inception of Falco, we’ve seen users write custom rules covering a number of different use cases. Because Falco is behavioral monitoring with a syntax that leverages system calls, you can write a rule for just about anything: opening a file, becoming root, or making a network connection.

Today I’m going to talk about using Falco in a more traditional manner – using a rule to detect unexpected network activity. A Sysdig Secure customer wanted a policy to trigger if a connection was established to an unknown (and thus untrusted) domain names.

Out of the box Falco has a macro to detect outbound network activity:

- macro: outbound

condition: >

(((evt.type = connect and evt.dir=<)) or

(fd.typechar = 4 or fd.typechar = 6) and

(fd.ip != "0.0.0.0" and fd.net != "127.0.0.0/8") and

(evt.rawres >= 0 or evt.res = EINPROGRESS))

Let’s break this down:
evt.type = connect and evt.dir=<
is looking for a “connect” exit event

fd.typechar = 4 or fd.typechar = 6 is matching a IPv4 or v6 file descriptor

fd.ip != "0.0.0.0" and fd.net != "127.0.0.0/8" is excluding local connections

evt.rawres >= 0 or evt.res = EINPROGRESS ensures we only look for successful (or in progress) socket binds

With this macro we can easily write a rule matching against a connection to an IP address. This rule will trigger upon establishing connection with sysdig.com:

$ host sysdig.com

sysdig.com has address 35.184.21.208

- rule: Connection to sysdig.com

desc: Detect attempts to connect to sysdig.com (35.184.21.208)

condition: outbound and fd.sip="35.184.21.208"

output: Outbound connection to sysdig.com (command=%proc.cmdline connection=%fd.name %container.info image=%container.image)

priority: NOTICE

tags: [network]

This gets us part way there, but what if sysdig.com has dynamic DNS resulting in the IP address changing? Our rule would be useless, as we’ve hardcoded in a static IP 35.184.21.208. For example, s3.us-east-2.amazonaws.com can resolve to different IP addresses on each lookup:

$ host s3.us-east-2.amazonaws.com

s3.us-east-2.amazonaws.com has address 52.219.96.58

$ host s3.us-east-2.amazonaws.com

s3.us-east-2.amazonaws.com has address 52.219.104.82

We could keep a number of different IP addresses in a Falco list:

- list: s3_ips

items: ['"52.219.96.58"', '"52.219.104.82"']

- rule: Connection to S3

desc: Detect attempts to connect to S3

condition: outbound and fd.sip in (s3_ips)

output: Outbound connection to S3 URL(command=%proc.cmdline connection=%fd.name %container.info image=%container.image)

priority: NOTICE

tags: [network]

… but it wouldn’t be easy to ensure this list is fresh and up to date, resulting in expected network connections to trigger a policy.

Hostnames on network rules using fd.sip.name

As a member of our customer success team, I’m well aware how receptive Sysdig’s product teams are to customer requests. Realising the awkwardness of dynamic DNS and (not wanting) hard coded IP addresses within Falco rules, our engineering team worked on a better way to solve this problem. In version 0.24.0 of Sysdig open source, we added a number of new filter checks:

fd.cip.name     Domain name associated with the client IP address.

fd.sip.name Domain name associated with the server IP address.

fd.lip.name Domain name associated with the local IP address.

fd.rip.name Domain name associated with the remote IP address.

These new filterchecks allow specifying a domain name, and Sysdig will resolve the IP address, intelligently maintaining the IP addresses it resolves to. This means we can now easily pass Falco domain names, and not have to worry about unreliably resolving the IP addresses and keeping them up to date.

In the example below, I have defined a list of trusted domain names (sysdig.com, github.com & google.com). Any network connection to an IP address that isn’t resolved to by any of these domain names will trigger the policy. Great for telling Falco what’s allowed and being told about everything else.

- list: trusted_domains

items: [sysdig.com, github.com, google.com]

- rule: Unexpected outbound network connection

desc: Detect outbound connections with destinations not on allowed list

condition: >

outbound

and not (fd.sip.name in (trusted_domains))

output: Unexpected Outbound Connection

(container=%container.name

command=%proc.cmdline

procpname=%proc.pname

connection=%fd.name

servername=%fd.sip.name

serverip=%fd.sip

type=%fd.type

typechar=%fd.typechar

fdlocal=%fd.lip

fdremote=%fd.rip)
priority: NOTICE
Using DNS based network rules to control inbound and outbound traffic from your #containers in #Kubernetes Click to tweet

Dynamic DNS based network policy for Kubernetes

Once a Falco rule has been written it can be easily added to Sysdig Secure. Rules can be associated with a policy, scoped to a specific portion of your infrastructure, and tied to remediation actions like killing or pausing a container.

In the example below the above Falco rule has been applied to the scope of kubernetes.deployment.name = javaapp meaning if any container with the metadata in that scoping makes a connection to an IP address outside the IPs our domains resolve to, then an alert will be fired and in this case the container will also be killed.

Using the labels from Kubernetes or your cloud provider can be a powerful way to make sure your policies are tailored to your specific clusters, namespaces, or VPC’s. Sysdig Secure does all of this tagging automatically and manages the distribution of Falco rules to all your endpoints. Even if they’re across thousands of nodes.

Conclusion

Hopefully this blog gives you an idea of how powerful system calls can be as used a data source for intrusion detection, auditing, and behavioral monitoring. We’d love to hear more from you about what you’re doing with Falco in the wild. Reach out to us on Twitter or join our Slack.

The post Dynamic DNS & Falco: detecting unexpected network activity appeared first on Sysdig.

]]>