Sysdig | Dan Papandrea https://sysdig.com/blog/author/danpop/ Mon, 28 Nov 2022 14:32:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://sysdig.com/wp-content/uploads/favicon-150x150.png Sysdig | Dan Papandrea https://sysdig.com/blog/author/danpop/ 32 32 “Chain”ging the Game – how runtime makes your supply chain even more secure https://sysdig.com/blog/chainging-the-game/ Thu, 30 Sep 2021 13:00:42 +0000 https://sysdig.com/?p=42065 There is a lot of information out there (and growing) on software supply chain security. This info covers the basics...

The post “Chain”ging the Game – how runtime makes your supply chain even more secure appeared first on Sysdig.

]]>
There is a lot of information out there (and growing) on software supply chain security. This info covers the basics around source and build, but does it cover all of your full software supply chain lifecycle? Is your build env at runtime protected? Is your application post deploy protected at runtime?

This article will not only discuss what these concepts are, but provide additional discussions around the following:

  • What are Software Supply Chain attacks, SBOM, and the executive order?
  • Falco and Sysdig Secure capabilities that will have the most impact on securing your software supply chain.
  • Tips to think about implementing protection in your software supply chain so that you and your teams are better equipped to deal with them.

Read on brave reader…


A men with alerts and dependences.

What is software supply chain attack

A software supply chain attack occurs when a cyber threat actor infiltrates a software vendor’s network and employs malicious code to compromise the software before the vendor sends it to their customers. The affected software then compromises the customer’s data or system.

What is software supply chain security?

Supply chain security activities aim to enhance the security of the supply chain.

Diagram of Software Supply Chain
Diagram of Software Supply Chain

What is the executive order?

The Executive Order (EO) charges multiple agencies – including NIST– with enhancing cybersecurity through a variety of initiatives related to the security and integrity of the software supply chain.

Section 4 of the EO directs NIST to solicit input from the private sector, academia, government agencies, and others, and to identify existing – or develop new – standards, tools, best practices, and other guidelines to enhance software supply chain security.

Those guidelines are to include:

  • Criteria to evaluate software security.
  • Criteria to evaluate the security practices of the developers and suppliers themselves.
  • Innovative tools or methods to demonstrate conformance with secure practices.

What is an SBOM?

A software bill of materials (SBOM) is a list of components in a piece of software.

Software vendors often create products by assembling open source and commercial software components. The SBOM describes the components in a product. It is analogous to a list of ingredients on food packaging. Where you might consult a label to avoid foods that may cause an allergic reaction, SBOMs can help companies avoid consumption of software that could harm their organization.

The tl/dr:

  • SBOMs are important for security.
  • But you can only rely on an SBOM if you can trust the process that generated it.
  • This means you need a secure runtime for your build environment. We discuss this in detail below.

Falco, the cloud-native runtime security project, is the de facto Kubernetes threat detection engine. Falco was created by Sysdig in 2016 and is the first runtime security project to join CNCF as an incubation-level project. Falco detects unexpected application behavior and alerts on threats at runtime. Let’s discuss how Falco is able to do this.

Falco uses system calls to secure and monitor a system, by:

  • Parsing the Linux system calls from the kernel at runtime.
  • Asserting the stream against a powerful rules engine.
  • Alerting when a rule is violated.

Falco provides over 120 out-of-the-box rules in order to protect you and your software supply chain at runtime. The practical examples are numerous, however, specific to Supply Chain, there are rules that address issues we’ve recently seen with attacks.

Attack vector: Control plane/builder component runtime security

We have seen attackers successfully infiltrate for months via a lowest common denominator, a build machine or build pod. We described an attack as one that involves an attacker dropping a malicious set of code into the build and/or container while it’s running, connecting to an external network, and sending key company assets literally unchecked…. (SCARY!) These machines or processes historically have not been appropriately locked down nor had any degree of audit enabled to better understand what anomalous behavior is occurring.

This Outbound SSH Connection rule is one of the OOTB (out-of-the-box) rules that will detect any outbound connection to help identify the scenario attack vector we mention above.

- rule: Outbound SSH Connection
desc: Detect Outbound SSH Connection
condition: >
((evt.type = connect and evt.dir=<) or
(evt.type in (sendto,sendmsg))) and ssh_port
output: >
Outbound SSH connection (user=%user.name server_ip=%fd.sip server_port=%fd.sport client_ip=%fd.cip)
priority: WARNING
tags: [network]

Below is an example of file integrity monitoring, if an attacker drops a malicious payload to a sensitive directory or that process does something anomalous on the builder machine and/or the pod. Falco detects this and can notify the team instantly through many methods (Falco Sidekick has over 40 outputs to choose from).

These rules, in particular, could address specific file access and integrity that could plague a build machine or build pod/container

  • Detect New File: Detects when a new file is created.
  • Detect New Directory: Detects when a new directory is created.
  • Detect File Permission or Ownership Change: Detects file permissions or ownership change.
  • Detect Directory Change: Detects directory changes, including mkdir, rmdir, mvdir, mv.
  • Kernel Module Modification: Detects kernel modules changes via modprobe or insmod.
  • Node Created in Filesystem: Detects when a node is created via mknod.
  • Listen on a New Port: Detects when a new port is listening.

These rules were community created and added to the Security Hub.

Sysdig Secure

The key to securing your supply chain is to block containers from the build process, should they not comply – or there are specific things in the image itself that aren’t compliant – with your internal policies.

This can be done via image scanning integrations with your CI/CD process or build software (e.g., Jenkins, Bamboo, Tekton).

Diagram CI/CD process.

Sysdig Secure embeds security and compliance into the build, run, and respond stages of the container and Kubernetes lifecycle.

A core component of this for runtime protection is Falco!

Here is an example, via the Sysdig Secure UI, where a “write to below root” is detected. Note that it includes the contextual detail of where it occurred from the cluster name, all the way down to the pod it happened on… to the process… to the syscall.

The Needle in the proverbial haystack found!

Kubernetes Activity with alert summary.
Sysdig activity audit dashboard.
A men whose mind is blown

Sysdig Secure also provides a capability called runtime profiling which will look at network, process, file system, and syscalls in a container at the speed of the syscall itself!

Remember when we discussed attackers? In this case, if Sysdig runtime profiling sees a deviation, you can create an actionable response and have a full forensic path to all commands run to better understand what the attacker did.

Sysdig secure image profiling.

Overall Tips to up your Software Supply “Chain Game”

To summarize, here are some tips you may use as a checklist:

  • Create a culture of security for not only security professionals, but also developers and operators. Reward the team for being security focused and diligent in patching dependencies.
  • Sign your software via cosign (sigstore).
  • Implement hardware level security (tpm and beyond!).
  • Ensure your systems are patched and control plan components are up to date (CIS benchmarking capabilities).
  • Ensure pipeline security.
  • Implement runtime security (Falco/ Sysdig Secure) for your control plane/build servers, as well as for your running applications.

If you want to learn more, you may be interested in these links:

And related to Falco and Sysdig:

Conclusion

The threat of software supply chain attacks is real. Without solutions in the OSS/Commercial world, you run the risk of being another compromised organization and adding to US$5.2 trillion (per Accenture) lost in cybercrime due to misconfiguration.

Think in broad strokes, implement organization-wide culture and tools that encompasses pipeline security, but also underlying builder and control plane runtime protection.

Special thanks to Dan Lorenc and Stefano Chierici for providing feedback on this article.

Get involved!

If you would like to find out more about Falco:

At Sysdig Secure, we extend Falco with out-of-the-box rules along with other open source projects, making them even easier to work with and manage Kubernetes security. Register for our Free 30-day trial and see for yourself!

The post “Chain”ging the Game – how runtime makes your supply chain even more secure appeared first on Sysdig.

]]>
K3s + Sysdig: Deploying and securing your cluster… in less than 8 minutes! https://sysdig.com/blog/k3s-sysdig-falco/ Tue, 13 Oct 2020 15:00:35 +0000 https://sysdig.com/?p=29972 As Kubernetes is eating the world, discover an alternative certified Kubernetes offering called K3s, made by the wizards at Rancher....

The post K3s + Sysdig: Deploying and securing your cluster… in less than 8 minutes! appeared first on Sysdig.

]]>
As Kubernetes is eating the world, discover an alternative certified Kubernetes offering called K3s, made by the wizards at Rancher.

K3s is gaining a lot of interest in the community for its easy deployment, low footprint binary, and its ability to be used for specific use cases that the full Kubernetes may be too advanced for. K3s is a fully CNCF (Cloud Native Computing Foundation) certified Kubernetes offering. This means that you can write your YAML to operate against a regular Kubernetes, and they will work exactly the same on a k3s cluster.

In this blog post, we walk through a deployment of k3s. We will then install Falco OSS (for those who want to understand the Open source runtime engine and how its rules and alerting work in k3s) and Sysdig Essentials to walk through security and visibility of k3s quickly, and enhance k3s’ awesome with more awesomeness. We will provide steps and scripts to show you how to create a secure and fully observable cluster within eight minutes.

But first….

What is k3s?

k3s web page

Lightweight Kubernetes. Production ready, easy to install, half the memory, all in a binary less than 100 MB. Useful for Edge, IOT, CI, Development, and many other ways to embed k8s capabilities in a small and powerful package.

In Ranchers own gitpage words:

K3s is a fully conformant production-ready Kubernetes distribution with the following changes:

  1. It is packaged as a single binary.
  2. It adds support for sqlite3 as the default storage backend. Etcd3, MySQL, and Postgres are also supported.
  3. It wraps Kubernetes and other components in a single, simple launcher.
  4. It is secure by default with reasonable defaults for lightweight environments.
  5. It has minimal to no OS dependencies (just a sane kernel and cgroup mounts needed).
  6. It eliminates the need to expose a port on Kubernetes worker nodes for the kubelet API by exposing this API to the Kubernetes control plane nodes over a websocket tunnel.

How does it work? (Diagram, because sometimes a picture can speak 1000 words) This diagram shows that the k3s components are similar to the ones on the vanilla Kubernetes distribution, but only to its bare essentials.

Diagram showing k3s architecture

Lets deploy our k3s cluster! (minutes 1-3)

We provide three options below to quickly deploy a k3s cluster with three nodes. You can add more nodes by updating the Terraform config or by adding them in your provider script.

A. Rancher Default Script

1. Download k3s latest release. x86_64, ARMv7, and ARM64 are supported. Disclaimer: ARM isn’t supported yet for Sysdig/Falco.

2. Run the server:

sudo k3s server &
# Kubeconfig is written to /etc/rancher/k3s/k3s.yaml
sudo k3s kubectl get node

On a different node, run the following. NODE_TOKEN comes from /var/lib/rancher/k3s/server/node-token.

# on your server
sudo k3s agent --server https://myserver:6443 --token ${NODE_TOKEN}

B. k3sup

k3sup is a light-weight utility (from Alex Ellis) to get from zero to KUBECONFIG with k3s on any local or remote VM. All you need is ssh access to run the k3sup binary and get kubectl access immediately.

We prepared some Terraform scripts that provision three nodes on GKE or AWS, and then deploy k3s on them with k3sup. We also created a k3sup.sh script to make things even easier, although you will need to tweak it with the nodes’ IPs and your ssh keys.

C. Managed k3s Cluster

Civo Cloud is the world’s first k3s-powered, managed Kubernetes service. With a managed service, you’ll have your cluster ready in just a few clicks.

Lets install Falco! (Minutes 3-4) (OSS)

Falco is an open source runtime security tool. It was originally built by Sysdig, Inc, was donated to the CNCF, and is now a CNCF incubating project. Falco parses Linux system calls from the kernel at runtime, and asserts the stream against a powerful rules engine. If a rule is violated, a Falco alert is triggered and can be sent to many convenient mechanisms (listed later on in this blog).

Here’s a great diagram @krisnova put together to show how Falco works, which we modified to show k3s:

Falco default rulesets

By default, Falco ships with a mature set of rules that will check the kernel for unusual behavior, such as:

  • Privilege escalation using privileged containers
  • Namespace changes using tools like setns
  • Read/Writes to well-known directories such as /etc, /usr/bin, /usr/sbin, etc
  • Creating symlinks
  • Ownership and mode changes
  • Unexpected network connections or socket mutations
  • Spawned processes using execve
  • Executing shell binaries such as sh, bash, csh, zsh, etc.
  • Executing SSH binaries such as ssh, scp, sftp, etc.
  • Mutating Linux coreutils executables
  • Mutating login binaries
  • Mutating shadowutil or passwd executables
    • shadowconfig
    • pwck
    • chpasswd
    • getpasswd
    • change
    • useradd
    • Etc

Falco alerts

Falco can send alerts to one or more channels:

  • Standard Output
  • A file
  • Syslog
  • A spawned program
  • A HTTP[s] end point
  • A client via the gRPC API

Installing Falco

Now that we know what Falco is and what it does, here are the steps to run Falco on your k3s cluster:

  1. Deploy your cluster with one of the methods suggested above: k3s script, k3sup, or civo cloud.
  2. Optionally, create a namespace with: kubectl create ns falco.
  3. Run helm install falco falcosecurity/falco. If you are using GCP or later kernels, try this instead: helm install falco falcosecurity/falco --set ebpf.enabled=true.
    helm repo add falcosecurity https://falcosecurity.github.io/charts
    helm repo update
    helm install falco falcosecurity/falco
    

Falco in action

Here’s an example of a rule violation and the resulting output from Falco (standard output):

The Falco Rule: Network tool launched in a container.

The Attack:

The Result: A notification in STDOUT at the speed of the kernel that can be sent as JSON and other formats!

For more information on Falco rules and installation details, check out falco.org.

Lets install Sysdig! (Commercial Tool, Minutes 3-8)

With the new Sysdig onboarding, you’ll be set in less than five mins.

  1. Sign up for a free trial.
  2. Install via our “Get Started” curl scripts.

    • You will be provided with an access key and a curl statement that requires a cluster name. In this case, we used sysdg-k3s.
    • K3s uses a non-standard place for the containerd runtime. You will need to add this to the end of your curl statement:
      -cd unix:///run/k3s/containerd/containerd.sock -cv /run/k3s/containerd

Sysdig for k3s in action: The 5 essential workflows

Now that we are installed, let’s see what Sysdig brings to the table.

Earlier this year we launched Sysdig Essentials to provide your organization with an easy way to run containers in production with confidence. We provided a Sysdig Essentials pricing tier aimed for organizations looking to start with the essential use cases; this gives a simplified on-ramp to a Secure DevOps approach.

Let’s now see the five essential workflows in action on our recently created k3s cluster. We’ll start from image scanning, and we’ll go all the way to overall visibility. It’s vital to understand what’s occurring during a security event.

1. Image scanning

In this example, we have deployed a busybox image as a k3s pod. The Sysdig image scanner can analyze the image both inline (locally in your CI/CD pipeline), and by fetching it from the registry.

Sysdig helps you follow the image scanning best practices. For example, you can shift left security with inline scanning, detecting security issues earlier on the pipeline and blocking vulnerable images from being pushed to a registry.

All you need is an image … We got the rest.

2. Runtime security: Falco rules engine… PLUS+

The same rule we used with the Falco example can be used in Sysdig Secure. However, this time we were able to stop the container and leave an audit trail.

The Sysdig Activity Audit shows the ncat command execution, along with all of the other commands that were run in the k3s cluster/pod/process/container.

All of the rich context you can see in our UI can be extracted to a downstream SIEM, to power your Kubernetes pursuits with embedded security … Powerful!

3. Compliance

Although there currently aren’t CIS benchmarks for k3s, the normal kubernetes host or linux host conventions would work. This will allow you to adjust and ensure your k3s cluster compliance at the instance level.

You can visualize the compliance score across your environments as a dashboard within Sysdig.

4. Kubernetes and container monitoring

It’s super important to understand the utilization, availability, and overall capacity of your k3s cluster. That key information will help you either add or remove resources, and size accordingly. A better management of your resources will save you money.

5. Application and cloud service monitoring

The out-of the box dashboards provided by Sysdig will help you monitor the most common cloud services. They follow the Golden Signals principles to give you a very good idea of the health and performance of your application as seen by your final users.

Sysdig is also fully compatible with Prometheus, the de-facto standard for Kubernetes monitoring. This means that your developers can keep using the tools they know and love while accessing a broad variety of resources from repositories like PromCat.

Summary

What did we learn today? K3s is excellent, and you get the ease of deployment and capabilities of Kubernetes in a simple binary.

Add Sysdig, and it’s a no brainer. Running K3s on your instances, along with the five essential workflows for DevOps, can all be done within eight minutes.

Deploy a k3s cluster today and secure it with Falco and Sysdig now!

Useful Links

https://k3s.io/ – K3s

https://github.com/alexellis/k3sup – Alex Ellis’ K3sup tool

https://falco.org – Falco project page

https://sysdig.com/company/free-trial/ – Sysdig Essentials 14 day trial

https://github.com/danpopSD/sysdig-k3s – POP’s Script Repo (Terraform and k3sup scripts)

https://civo.com – managed k3s clusters.

Thank you to Darren Shepherd and the Rancher team for your out-of-the-box thinking and creating k3s, as well as Alex Ellis, Saiyam Pathak, and the Civo Cloud team for the assist on this blog!

The post K3s + Sysdig: Deploying and securing your cluster… in less than 8 minutes! appeared first on Sysdig.

]]>
Sysdig + IBM Bluemix container service: A rhapsody in BLUEmix… https://sysdig.com/blog/sysdig-and-ibm-bluemix-container-service-a-rhapsody-in-bluemix/ Wed, 06 Sep 2017 20:10:33 +0000 https://sysdig.com/?p=4561 We at Sysdig were happy to see IBM’s launch of their new container service. A very stable and well managed...

The post Sysdig + IBM Bluemix container service: A rhapsody in BLUEmix… appeared first on Sysdig.

]]>
We at Sysdig were happy to see IBM’s launch of their new container service. A very stable and well managed cluster environment for your apps and containers powered by IBM is a big step forward for both Kubernetes and the Cloud. And, in our humble opinion, pairing Bluemix up with a cornerstone container intelligence tool like Sysdig is a match made in Devops heaven.

This blog was easy to write as it took 3 steps to get up and running with Bluemix’s excellent deployment and management cli tools along with the Sysdig Monitor daemonset deployment.

  1. Deploy your Cluster. Great tutorial here.
  2. Deploy Sysdig: Sign up for a free trial on our website, then deploy our daemonset.
  3. Prosper!

For context, Sysdig is the container monitoring company. We’re based on the open source Linux troubleshooting project by the same name. The open source project allows you to see every single system call down to process, arguments, payload, and connection on a single host. The commercial offering turns all this data into thousands of metrics for every container and host, aggregates it all, and gives you dashboarding, alerting, and an htop-like exploration environment.

Before we dive into the how of using Bluemix and Sysdig, let’s talk about the particular challenges of monitoring containers. IBM #Bluemix and #Sysdig, a match made in DevOps heaven. Get full container visibility in 3 straightforward steps. Click to tweet

Containers radically change monitoring

Container advantages Containers are pretty powerful. They are :

  • Simple – typically an individual process
  • Small – 1/10th of the size of a VM means they’re portable
  • Isolated – They have few dependencies
  • Dynamic – Rapid startup times means they can be scaled, killed, and moved quickly

Keeping containers simple is core to their value proposition and what makes them a great building block for microservices. But this simplicity comes at a cost. From an ops perspective, you need deep visibility inside containers rather than just knowing that some containers exist.

Your containers are also likey managed by an orchestration system (think Bluemix backed by Kubernetes), and your developers may be pushing new applications at any time, without informing the ops team.

Alright, so now we know we’re dealing with these small black boxes that appear, die, and are moved at the whims of your orchestration system. Your developers have the freedom to add and modify their applications freely. And your job is to make sure that your company’s apps are running properly, not to mention have the data to solve issues when they might arise.

Now let’s solve these monitoring challenges using Bluemix and Sysdig!

Prerequisites

To gain access to your cluster, download and install a few CLI tools and the IBM Bluemix Container Service plug-in.

Bluemix CLI

Kubernetes CLI

  1. Deploy your IBM Bluemix Container Service via the excellent UI Bluemix provides

IBM Bluemix container

  1. We install the Bluemix cli plugin. In this example i am using the windows CLI. Install instructions here.

Bluemix windows CLI

  1. We access our cluster, here are directions on how to connect:

Connect to cluster

After about 5-10 minutes we have a fully functional Bluemix Kubernetes Cluster:

Bluemix Kubernets cluster

Sysdig Installation with DaemonSet (Look Ma no hands!)

Before deploying Sysdig, it’s worth a moment to describe our instrumentation approach because it’s so different from other container monitoring approaches out there. We have developed a transparent, per-host instrumentation model that doesn’t require developers to modify code, doesn’t involve adding a container per pod, and certainly doesn’t involve adding a process to every single container!

This transparent instrumentation – we call it ContainerVision – captures all application, container, statsd, and host metrics with a single instrumentation point, leveraging the tracepoints facility in the kernel, and sends it to a container per host for processing & transfer. This eliminates the need to turn everything into a statsd metric, which is something we’ve seen many people resort to. Unlike per-pod sidecar models, per-host agents drastically reduce resource consumption of monitoring agents, and require no modification to application code. It does, however, require a privileged container and a kernel module.

Sysdig ContainerVision

Ok, back to Bluemix. Schedule the Sysdig agent container as a DaemonSet by creating a resource manifest (YAML) file following this example provided on github. Any parameters commented out with ‘#’ are optional or are needed only in on-premises installations (We have cloud and on-premise software offerings.). You must at least enter your Sysdig Monitor agent access key as found in the User Profile tab of the Sysdig Monitor app settings (Go to Sysdig -> Settings -> User Profile -> Access Key.):

Bluemix CLI plugin

Deploy the DaemonSet by issuing this command:

kubectl create -f ‘sysdig.yaml’

Now time to Sysdig!

Before we hop into the metrics – just a note – everything you see below is out of the box. No special configurations, no modification to your application code, no configuring plug-ins. While Sysdig can give you even more than what’s below, you’ll see you start with quite a lot!

We now have an overview of my agent host(s) running my Bluemix Cluster. In less than 20 minutes I have full visibility of my cluster!

Sysdig cluster visibility

As you can see I am seeing CPU, Memory, and Network details by selecting a node. Then, drilling using Sysdig’s Out of the box “Overview by Process”, I get down to a default dashboard showing per-process metrics.

Sysdig overview by process

I deployed a simple Guestbook-GO from the Kubernetes example pages to the DEFAULT namespace and now I want to view Kubernetes as a whole vs at a selective host by host basis. We have the bases covered in terms of Out of the box visualization of Deployments, Namespaces, Pods, and Services:

Deployments, Namespaces, Pods, and Services

Here’s a Kubernetes overview of my cluster

Notice Sysdig is grabbing the metatag data directly from Kubernetes API without any type of direct configuration. I’m seeing our request count per service, Top Services, Top namespaces and host capacity. This helps me grow the environment naturally and improved capacity planning based on REAL metrics.

Sysdig Kubernetes metadata

I can even view how my applications are communicating through our great topology visualizations, here’s a network topology view:

Sysdig topology view

What about application metrics? How do I get to those?

Here’s the same redis example from the guestbook, we have a service level view, not just single hosts, and many many more.

Similarly, Sysdig can visualize statsd, JMX, and prometheus custom metrics… without even changing the collection endpoint you already have in place. (Yes, that sounds crazy. Read how it works here.)

Sysdig statsd JXM prometheus

All of that in 3 steps!

IBM Container Service is a great place for developers hosting applications and running production applications. Sysdig makes it even better with true visibility and intelligence. We even have a 4 part tutorial on monitoring kubernetes with Sysdig here.

Sysdig and IBM bluemix

The post Sysdig + IBM Bluemix container service: A rhapsody in BLUEmix… appeared first on Sysdig.

]]>
Monitoring Azure container service with Sysdig: A step-by-step guide https://sysdig.com/blog/monitoring-azure-container-service/ Mon, 08 May 2017 05:46:51 +0000 https://sysdig.com/?p=4038 Sysdig hearts (can I emoji here?) monitoring Azure Container Service just as much Amazon or other public cloud providers… and...

The post Monitoring Azure container service with Sysdig: A step-by-step guide appeared first on Sysdig.

]]>
Sysdig hearts (can I emoji here?) monitoring Azure Container Service just as much Amazon or other public cloud providers… and we can prove it to you!

Azure has made leaps and bounds progress in terms of container and container orchestration support. This blog post will show you some great things you can do with Sysdig and monitoring Azure in terms of giving you a full picture of your container or micro service based applications.

Azure container and orchestrator setup

Create your container service in azure with your favorite orchestrator (Sysdig support(s) all 3 of them). (great article for the step by step: docs.microsoft.com/en-us/azure/container-service/container-service-deployment

Azure container console

The concept of a resource group is a good one it groups your resources in easy to find “buckets” for deploying and troubleshooting and overall ease of use vs the independent method in Azure Portal Classic and other public cloud UI’s beyond creating automation scripts. I created (2) Azure Container Services, using Docker Swarm and DC/OS Mesosphere as my orchestrators. Azure’s quickstart templates can also be saved as JSON files for this process to be recreated or modified quickly.

DCOS/Swarm Resources

For this exercise, I created a DCOS and Docker Swarm Instance. Kubernetes is also supported by Azure Container Services.

Here is a high level CPU/Disk Read and Write/Network Azure infrastructure based dashboard of the docker swarm and DCOS masters. Azure’s dashboarding/metrics does a great job of showing me my azure services and infrastructure, along with resource groups and billing.

Azure container console

Deep container monitoring for Azure

What if I really want to get in depth details about monitoring my microservices? Maybe I want to see how my individual containers or microservices are interacting. Perhaps I even want to see http request/response, network connectivity between the containers, and how my front end web services are communicating with backend data services? Top connections/ports connecting to my containers? Maybe I want container CPU shares, Memory, or Disk space on my service along with my overall Azure Container Service.. How would I get more in depth microservices/container visibility within Azure?

That’s where Sysdig comes in and can help you. Sysdig is the container monitoring specialist. We focus on the complex task of seeing inside containers, and then relating that information to your orchestrator and your cloud in real-time. With that, let’s put Sysdig to work on monitoring Azure.


The Sysdig steps for setting up Azure monitoring


For Docker

All you have to do is deploy the sysdig monitoring agent into your cluster – no setup or configuration needed.. see specific instructions here.. https://support.sysdigcloud.com/hc/en-us/articles/204498905-Sysdig-Install-Standard-Linux-Docker-CoreOS- ****

For Kubernetes


We can deploy our agent via daemonset. Microsoft has documentation on the steps to accomplish this: https://docs.microsoft.com/en-us/azure/container-service/container-service-kubernetes-sysdig ****

For Mesosphere


You would install the agent on your master instances with the instructions above, and then deploy the agent via our Universe deployment, here are the steps straight from Microsoft on the topic https://docs.microsoft.com/en-us/azure/container-service/container-service-monitoring-sysdig

Done… That’s it.

Sysdig will then automatically collect metadata and other goodies from the Orchestrator of your choice and then help you organize and group your applications for your teams to monitor (container or microservices) in a more holistic way. Sysdig’s magic is, with this simple instrumentation method, you can even see what your applications are doing inside your containers.

Sysdig Monitor Azure

Here is a service performance of a simple MySQL Demo docker compose service running on my Azure Container Services Docker Swarm cluster.

Sysdig Monitor Azure

Here is a topology view of response times for your services running in your docker swarm cluster.

Azure Topology Map

Here’s the MySQL Demo service with a client communicating with a backend MySQL server. We are showing service bottleneck details and response times.

Swarm Monitoring

We are even getting my service’s MySQL(and many other application integrations) without installing an agent directly into my container)

Swarm MySQL Monitoring

Sysdig even captures event data if one of your containers goes down right from the Orchestrator… we make your Azure container service even more fantastic.

Docker Event Monitoring

How does Sysdig do it?

Container and Service Vision that’s how! We automatically find and poll your orchestrator for metadata about your deployment. This allows us to then aggregate your monitoring data on-the-fly to give you smarter, microservice-based views of your resources, instead of just physical host/ip/container views.We were also built to understand microservices from day one. We leverage the data in your orchestration system through a functionality we call ServiceVision to allow us to have an always-up-to-date view of where your containers are deployed and what they are supposed to be doing.

Sysdig

Take a look at all these (OOTB) Dashboards you can create quickly…. All the usual suspects you can deploy with Azure Container Services (Docker/Swarm, Kubernetes, and Mesosphere and Marathon) from an overview to individual services/tasks these templates are there right when you sign up for a trial or sign up for Sysdig.

Swarm Monitoring

Our agent runs at the kernel level for Sysdig to provide container vision without bloating your code or containers. Here is a docker compose overview that lays out the more in depth details from a service oriented point of view, even if your application’s/service’s containers are scattered in different Azure regions, availability sets, etc. Sysdig can provided a unified holistic view of the performance of that app/service.

Swarm Monitoring

Sysdig is also getting DCOS data from the Orchestration API. Along with other application centric methods. you can see things such as Master and Agent node metrics, tasks, and even individual Marathon grouping

Swarm Monitoring

We can even provide granular application and or service level metrics for monitoring your Azure deployments. Here’s an example node.js app running on the 3 agent docker swarm we deployed in our azure container service:

Swarm Monitoring

We are able to segment out which container is using x amount of heap size for instance…. All without anything installed in your containers or application code at all.

Service based access control to Azure container services… Can Sysdig do that?

With Sysdig Teams, we can isolate specific Azure instances to the end application teams needing visibility, we can even narrow it down to the end applications they are building. In this example I created two Teams, Azure-DCOS and AZURE-SWARM and created a service level access control to only visibility into the DC/OS Mesos environment

Swarm Monitoring

after switching to the AZURE-DCOS team, we only see what metrics ive selected from the team creation.

Swarm Monitoring

What if I don’t want to use a SaaS monitoring solution?

Sysdig can be installed our on premises installation in either a standard AZURE VM instance.

Directions here: https://support.sysdigcloud.com/hc/en-us/articles/206519903-On-Premises-Installation-Guide or your brand new Azure Container Services environment. (contact Sysdig for details)

Final thoughts

Sysdig works extremely well with Azure and can you provide you and your team even more visibility into your applications in addition to the default Azure tools. We hope this blog will help you take the leap in testing your applications/containers with Sysdig and Azure. Sysdig is offering 14 day trial if you want to take a test drive and you can also contact us if you want a demo.

The post Monitoring Azure container service with Sysdig: A step-by-step guide appeared first on Sysdig.

]]>