Sysdig | Javier Martínez https://sysdig.com/blog/author/javier-martinez/ Fri, 23 Feb 2024 07:18:06 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://sysdig.com/wp-content/uploads/favicon-150x150.png Sysdig | Javier Martínez https://sysdig.com/blog/author/javier-martinez/ 32 32 Top metrics for Elasticsearch monitoring with Prometheus https://sysdig.com/blog/elasticsearch-monitoring/ Fri, 05 May 2023 13:21:34 +0000 https://sysdig.com/?p=71732 Starting the journey for Elasticsearch monitoring is crucial to get the right visibility and transparency over its behavior. Elasticsearch is...

The post Top metrics for Elasticsearch monitoring with Prometheus appeared first on Sysdig.

]]>
Starting the journey for Elasticsearch monitoring is crucial to get the right visibility and transparency over its behavior.

Elasticsearch is the most used search and analytics engine. It provides both scalability and redundancy to provide a high-availability search. As of 2023, more than sixty thousand companies of all sizes and backgrounds are using it as their search solution to track a diverse range of data, like analytics, logging, or business information.

By distributing data in JSON documents and indexing that data into several shards, Elastic search provides high availability, quick search, and redundancy capabilities.

In this article, we will evaluate the most important Prometheus metrics provided by the Elasticsearch exporter.

You will learn what are the main areas to focus on when monitoring an Elasticsearch system:

How to start monitoring ElasticSearch with Prometheus

As usual, the easiest way to start your Prometheus monitoring journey with Elasticsearch is to use PromCat.io to find the best configs, dashboards, and alerts. The Elasticsearch setup guide in Promcat includes the Elasticsearch exporter with a series of out-of-box metrics that will be automatically scrapped to Prometheus. It also includes a collection of curated alerts and dashboards to start monitoring Elasticsearch right away.

Top metrics for Elasticsearch - metric list

You can combine these metrics with the Node Exporter to get more insights into your infrastructure. Also, if you’re running Elasticsearch on Kubernetes, you can use KSM and CAdvisor to combine Kubernetes metrics with Elasticsearch metrics.

How to monitor Golden Signals in Elasticsearch

To review a bare minimum of important metrics, remember to check the so-called Golden Signals:

  • Errors.
  • Traffic.
  • Saturation.
  • Latency.

These represent a set of the essential metrics to look for in a system, in order to track black-box monitoring (focus only on what’s happening in the system, not why). In other words, Golden Signals will measure symptoms, not solutions to the current problem. This could be a good starting point for creating an Elasticsearch monitoring dashboard.

Errors

elasticsearch_cluster_health_status

Cluster health in Elasticsearch is measured by the colors green, yellow, and red, as follows:

  • Green: Data integrity is correct, no shard is missing.
  • Yellow: There’s at least one shard missing, but data integrity can be preserved due to replicas.
  • Red: A primary shard is missing or unassigned, and there’s a data loss.

With elasticsearch_cluster_health_status, you can quickly check the current situation for Elasticsearch data on a particular cluster. Remember that this won’t retrieve the actual causes of the data integrity loss, just that you need to act in order to prevent further problems.

Traffic

elasticsearch_indices_search_query_total

This metric is a counter with the total number of search queries executed, which by itself won’t give you much information as a number.

Consider as well using rate() or irate(), to detect sudden changes or spikes in traffic. Dig deeper into Prometheus queries with our Getting started with PromQL guide

Saturation

For a detailed latency analysis, check the section on How to monitor Elasticsearch infra metrics.

Latency

For a detailed latency analysis, check the section on How to monitor Elasticsearch index performance.

How to monitor Elasticsearch infra metrics

Infrastructure monitoring focuses on tracking the overall performance of the servers and nodes of a system. As with similar cloud applications, most of the effort will be spent on monitoring CPU and Memory consumption.

Monitoring Elasticsearch CPU

elasticsearch_process_cpu_percent

This is a gauge metric used to measure the current CPU usage percent (0-100) of the Elasticsearch process. Since chances are that you’re running several Elasticsearch nodes, you will need to track each one separately.

Top metrics for Elasticsearch - CPU usage
elasticsearch_indices_store_throttle_time_seconds_total

In case you’re using a file system as an index store, you can expect a certain level of delays in input and output operations. This metric represents how much your Elasticsearch index store is being throttled.

Since this is a counter metric that will only aggregate the total number of seconds, consider using rate or irate for an evaluation of how much it’s suddenly changing.

Monitoring Elasticsearch JVM Memory

Elasticsearch is based on Lucene, which is built in Java. This means that monitoring the Java Virtual Machine (JVM) memory is crucial to understand the current usage of the whole system.

elasticsearch_jvm_memory_used_bytes

This metric is a gauge that represents the memory usage in bytes for each area.

Top metrics for Elasticsearch - Memory used

How to monitor Elasticsearch index performance

Indices in ElasticSearch partition the data as a logical namespace. Elasticsearch indexes documents in order to retrieve or search them as fast as possible.

Every time a new index is created, you can define the number of shards and replicas for it:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}
elasticsearch_indices_indexing_index_time_seconds_total

This metric is a counter of the seconds accumulated spent on indexing. It can give you a very approximated idea of the Elasticsearch indexing performance.

Note that you can divide this metric by elasticsearch_indices_indexing_index_total in order to get the average indexing time per operation.

elasticsearch_indices_refresh_time_seconds_total

For an index to be searchable, Elasticsearch needs a refresh to be executed. This is set up with the config index.refresh_interval, which is set by default to one minute.

This metric elasticsearch_indices_refresh_time_seconds_total represents a counter with the total time dedicated to refreshing in Elasticsearch.

In case you want to measure the average time for refresh, you can divide this metric by elasticsearch_indices_refresh_total.

How to monitor Elasticsearch search performance

While Elasticsearch promises near-instant query speed, chances are that in the real world, you may feel that is not the case. The number of shards, the storage solution chosen, or the cache configuration might impact search performance, and it’s crucial to track what is the current behavior.

Additionally, the usage of wildcards, joins or the number of fields being searched will affect drastically the overall processing time of search queries.

elasticsearch_indices_search_fetch_time_seconds

A counter metric aggregating the total amount of seconds dedicated to fetching results in search.

In case you want to retrieve the average fetch time per operation, just divide the result by elasticsearch_indices_search_fetch_total.

How to monitor Elasticsearch cluster performance

Apart from the usual cloud requirements, an Elasticsearch system would like to look at:

  • Number of shards.
  • Number of replicas.

As a rule of thumb, the ratio between the number of shards and GB of heap space should be less than 20.

Note as well that it’s suggested to have a separate cluster dedicated to monitoring.

elasticsearch_cluster_health_active_shards

This metric is a gauge that will indicate the number of active shards (both primary and replicas) from all the clusters.

elasticsearch_cluster_health_relocating_shards

Elasticsearch will dynamically move shards between nodes based on balancing or current usage. With this metric, you can control when this movement is happening.

Advanced Monitoring

Remember that the Prometheus exporter will give you a set of out-of-the-box metrics that are relevant enough to kickstart your monitoring journey. But the real challenge comes when you take the step to create your own custom metrics tailored to your application.

REST API

Additionally, mind that Elasticsearch provides a REST API that you can query for more fine-grained monitoring.

VisualVM

The Java VisualVM project is an advanced dashboard for Memory and CPU monitoring. It features advanced resource visualization, as well as process and thread utilization.

Download the Dashboards

You can download the dashboards with the metrics seen in this article through the Promcat official page.

This is a curated selection of the above metrics that can be easily integrated with your Grafana or Sysdig Monitor solution.

Top metrics for Elasticsearch - Grafana dashboards

Conclusion

Elasticsearch is one of the most important search engines available, featuring high availability, high scalability, and distributed capabilities through redundancy.

Using the Elasticsearch exporter for Prometheus you can kickstart the monitoring journey in an easy way, by automatically receiving the important metrics directly.

As with many other applications, CPU, and Memory are crucial to understand system saturation. You should be aware of the current CPU throttling and the memory handling of the JVM.

Finally, it’s important to dig deeper into the particularities of Elasticsearch, like indices and search capabilities, to truly understand the challenges of monitoring and visualization.


Want to dig deeper? Download the Prometheus monitoring guide Start your monitoring journey with the standard metrics solution

The post Top metrics for Elasticsearch monitoring with Prometheus appeared first on Sysdig.

]]>
Tales from the Kube! https://sysdig.com/blog/tales-from-the-kube/ Wed, 26 Apr 2023 16:13:24 +0000 https://sysdig.com/?p=71196 The attendants to Kubecon + Cloudnativecon Europe 2023 in Amsterdam were able to see the first edition of Tales from...

The post Tales from the Kube! appeared first on Sysdig.

]]>
The attendants to Kubecon + Cloudnativecon Europe 2023 in Amsterdam were able to see the first edition of Tales from the Kube!, a Sysdig comic book special feature displaying an inner look at the life of a Kubernetes cluster.

Without much further ado, here’s the digital version for you to enjoy!

Credits:

  • Script: Javier Martínez
  • Ink and color: Xcar Malavida

The post Tales from the Kube! appeared first on Sysdig.

]]>
Kubernetes CreateContainerConfigError and CreateContainerError https://sysdig.com/blog/kubernetes-createcontainerconfigerror-createcontainererror/ Thu, 23 Mar 2023 14:28:02 +0000 https://sysdig.com/?p=69307 CreateContainerConfigError and CreateContainerError are two of the most prevalent Kubernetes errors found in cloud-native applications. CreateContainerConfigError is an error happening...

The post Kubernetes CreateContainerConfigError and CreateContainerError appeared first on Sysdig.

]]>
CreateContainerConfigError and CreateContainerError are two of the most prevalent Kubernetes errors found in cloud-native applications.

CreateContainerConfigError is an error happening when the configuration specified for a container in a Pod is not correct or is missing a vital part.

CreateContainerError is a problem happening at a later stage in the container creation flow. Kubernetes displays this error when it attempts to create the container in the Pod.

In this article, you will learn:

What is CreateContainerConfigError?

During the process to start a new container, Kubernetes first tries to generate the configuration for it. In fact, this is handled internally by calling a method called generateContainerConfig, which will try to retrieve:

  • Container command and arguments
  • Relevant persistent volumes for the container
  • Relevant ConfigMaps for the container
  • Relevant secrets for the container

Any problem in the elements above will result in a CreateContainerConfigError.

What is CreateContainerError?

Kubernetes throws a CreateContainerError when there’s a problem in the creation of the container, but unrelated with configuration, like a referenced volume not being accessible or a container name already being used.

Similar to other problems like CrashLoopBackOff, this article only covers the most common causes, but there are many others depending on your current application.

How you can detect CreateContainerConfigError and CreateContainerError

You can detect both errors by running kubectl get pods:

NAME  READY STATUS                     RESTARTS AGE

mypod 0/1   CreateContainerConfigError 0        11m

As you can see from this output:

  • Pod is not ready: container has an error.
  • There are no restarts: these two errors are not like CrashLoopBackOff, where automatic retrials are in place.

Kubernetes container creation flow

In order to understand CreateContainerError and CreateContainerConfligError, we need to first know the exact flow for container creation.

Kubernetes follows the next steps every time a new container needs to be started:

  1. Pull the image.
  2. Generate container configuration.
  3. Precreate container.
  4. Create container.
  5. Pre-start container.
  6. Start container.

As you can see, steps 2 and 4 are where a CreateContainerConfig and CreateContainerErorr might appear, respectively.

Create container and start container flow

Common causes for CreateContainerError and CreateContainerConfigError

Not found ConfigMap

Kubernetes ConfigMaps are a key element to store non-confidential information to be used by Pods as key-value pairs.

When adding a ConfigMap reference in a Pod, you are effectively indicating that it should retrieve specific data from it. But, if a Pod references a non-existent ConfigMap, Kubernetes will return a CreateContainerConfigError.

Not found Secret

Secrets are a more secure manner to store sensitive information in Kubernetes. Remember, though, this is just raw data encoded in base64, so it’s not really encrypted, just obfuscated.

In case a Pod contains a reference to a non-existent secret, Kubelet will throw a CreateContainerConfigError, indicating that necessary data couldn’t be retrieved in order to form container config.

Container name already in use

While an unusual situation, in some cases a conflict might occur because a particular container name is already being used. Since every docker container should have a unique name, you will need to either delete the original or rename the new one being created.

How to troubleshoot CreateContainerError and CreateContainerConfigError

While the causes for an error in container creation might vary, you can always rely on the following methods to troubleshoot the problem that’s preventing the container from starting.

Describe Pods

With kubectl describe pod, you can retrieve the detailed information for the affected Pod and its containers:

Containers:
  mycontainer:
    Container ID:
    Image:          nginx
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:  3
---
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      myconfigmap
    Optional:  false

Get logs from containers

Use kubectl logs to retrieve the log information from containers in the Pod. Note that for Pods with multiple containers, you need to use the –all-containers parameter:

Error from server (BadRequest): container "mycontainer" in pod "mypod" is waiting to start: CreateContainerConfigError

Check the events

You can also run kubectl get events to retrieve all the recent events happening in your Pods. Remember that the describe pods command also displays the particular events at the end.

Createcontainerconfig error troubleshooting diagram
Terminal windows for the kubectl commands used to troubleshoot a CreateContainerConfigError

How to detect CreateContainerConfigError and CreateContainerError in Prometheus

When using Prometheus + kube-state-metrics, you can quickly retrieve Pods that have containers with errors at creation or config steps:

kube_pod_container_status_waiting_reason{reason="CreateContainerConfigError"} > 0
kube_pod_container_status_waiting_reason{reason="CreateContainerError"} > 0

Other similar errors

Pending

Pending is a Pod status that appears when the Pod couldn’t even be started. Note that this happens at schedule time, so Kube-scheduler couldn’t find a node because of not enough resources or not proper taints/tolerations config.

ContainerCreating

ContainerCreating is another waiting status reason that can happen when the container could not be started because of a problem in the execution (e.g: No command specified)

Error from server (BadRequest): container "mycontainer" in pod "mypod" is waiting to start: ContainerCreating   

RunContainerError

This might be a similar situation to CreateContainerError, but note that this happens during the run step and not the container creation step.

A RunContainerError most likely points to problems happening at runtime, like attempts to write on a read-only volume.

CrashLoopBackOff

Remember that CrashLoopBackOff is not technically an error, but the waiting time grace period that is added between retrials.

Unlike CrashLoopBackOff events, CreateContainerError and CreateContainerConfigError won’t be retried automatically.

Conclusion

In this article, you have seen how both CreateContainerConfigError and CreateContainerError are important messages in the Kubernetes container creation process. Being able to detect them and understand at which stage they are happening is crucial for the day-to-day debugging of cloud-native services.

Also, it’s important to know the internal behavior of the Kubernetes container creation flow and what is errors might appear at each step.

Finally, CreateContainerConfigError and CreateContainerError might be mistaken with other different Kubernetes errors, but these two happen at container creation stage and they are not automatically retried.


Troubleshoot CreateContainerError with Sysdig Monitor

With Sysdig Monitor’s Advisor, you can easily detect which containers are having CreateContainerConfigError or CreateContainerError problems in your Kubernetes cluster.

Advisor accelerates mean time to resolution (MTTR) with live logs, performance data, and suggested remediation steps. It’s the easy button for Kubernetes troubleshooting!

Rightsize your Kubernetes Resources with Sysdig Monitor

Try it free for 30 days!

The post Kubernetes CreateContainerConfigError and CreateContainerError appeared first on Sysdig.

]]>
Millions wasted on Kubernetes resources https://sysdig.com/blog/millions-wasted-kubernetes/ Thu, 02 Mar 2023 14:39:21 +0000 https://sysdig.com/?p=67832 The Sysdig 2023 Cloud-Native Security and Container Usage Report has shed some light on how organizations are managing their cloud...

The post Millions wasted on Kubernetes resources appeared first on Sysdig.

]]>
The Sysdig 2023 Cloud-Native Security and Container Usage Report has shed some light on how organizations are managing their cloud environments. Based on real-world customers, the report is a snapshot of the state of cloud-native in 2023, aggregating data from billions of containers.

Our report retrieves data from cloud projects in the following areas:

  • Number of containers that are using fewer CPU and memory than needed.
  • Number of containers with no CPU limits set.
  • Overallocation and estimated losses.

Limits and requests

Over the year, we have covered the importance of Limits and Requests. Simply put, they provide a way of specifying the maximum and the guaranteed amount of a computing resource for a container, respectively.

But they are more than that – they also indicate your company’s intention for a particular process. They can define the eviction tier level and the Quality of Service for the Pods running those containers.

Our study shows that:

  • 49% of containers have no memory limits set.
  • 59% of containers have no CPU limits set.

While setting up memory limits might have negative side effects, it’s important to set CPU limits to avoid starvation in processes or particular containers having drastic spikes of CPU consumption.

Millions wasted charts 1

59% containers with no CPU limits

Our study showed that 59% of containers had no CPU limits set at all. Normally, adding CPU limits might lead to Throttling, but the report shows as well that on average 69% of the purchased CPU was unused, suggesting that no capacity planning was in place.

49% containers with no memory limits

Almost half of the containers had no memory limits at all. This particular case is special, since adding a limit to memory might eventually cause OOM errors.

Kubernetes overallocation

Cloud providers give plenty of options to run applications with the ease of a click, which is a great way to kickstart the monitoring journey. However, cloud-native companies tend to allocate resources just to avoid becoming saturated, which can lead to astronomical costs.

Why does this happen?

  • Urge to scale quickly
  • Lack of resource consumption visibility
  • Multi-tenant scaling
  • Lack of Kubernetes knowledge
  • Lack of capacity planning
Millions wasted charts 2

Since CPU is the most costly resource in a cloud instance, companies might be overspending on something they are never going to use.

By using the average cost for AWS pricing on nodes based on CPU and memory, we can then calculate the average savings for companies that address these problems.

Specifically, our report showed that companies with more than 1,000 nodes could reduce their wasted resources by $10M per year.

Millions wasted charts 3

CPU overcommitment

In case the limits set are higher than the actual CPU, Kubernetes nodes will display:

Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.)

This means that Kubernetes will throttle some processes to provide higher CPU utilization.

Cost reduction strategies

Capacity planning

By using applications to track resource usage and by performing capacity planning, companies can mitigate these costs with a clear investment/return net gain. Both Limits and Requests are useful tools that can be used to restrict the usage, but they can be cumbersome as they can lead to Pod eviction or over-commitment.

Limitranges are a useful tool to automatically assign a value range for both limits and requests for all containers within a namespace.

Autoscaling

Both vertical autoscaling (increasing the resource size on demand) and horizontal autoscaling (increasing or decreasing the amount of Pods based on utilization) can be used to dynamically adapt to the current needs of your cloud-native solution.

ResourceQuota

Companies with multi-tenant solutions might come up with the problem that some of their projects are more demanding than others in terms of resources. Because of this, assigning the same resources might eventually cause overspending.

That’s why you can use ResourceQuotas to set a maximum amount of a resource to be consumed for all processes in a namespace.

Conclusion

There has been rapid growth in the number of companies investing in cloud solutions in recent years.
But with great power comes great responsibility. Cloud projects might want to find a balance for resources like CPU or memory.

Generally, they want to allocate enough so they never have saturation problems. But, on the other hand, over-allocating will lead to massive spending on unused resources.

The solution? Capacity planning, autoscaling, and visibility into costs are the best tools to take back control over your cloud-native spendings.

Sixth annual Sysdig Container Usage report
Sysdig 2023 Cloud-Native Security and Usage Report Download the full report

Reduce your Kubernetes costs with Sysdig Monitor

Sysdig Monitor can help you reach the next step in the Monitoring Journey.

With Cost Advisor, you can reduce Kubernetes resource waste by up to 40%.

And with our out-of-the-box Kubernetes Dashboards, you can discover underutilized resources in a couple of clicks.

Custom metrics monitoring

Try it free for 30 days!

The post Millions wasted on Kubernetes resources appeared first on Sysdig.

]]>
Monitoring with Custom Metrics https://sysdig.com/blog/monitoring-custom-metrics/ Wed, 01 Mar 2023 17:03:40 +0000 https://sysdig.com/?p=67628 Custom metrics are application-level or business-related tailored metrics, as opposed to the ones that come directly out-of-the-box from monitoring systems...

The post Monitoring with Custom Metrics appeared first on Sysdig.

]]>
Custom metrics are application-level or business-related tailored metrics, as opposed to the ones that come directly out-of-the-box from monitoring systems like Prometheus (e.g: kube-state-metrics or node exporter).

By kickstarting a monitoring project with Prometheus, you might realize that you get an initial set of out-of-the-box metrics with just Node Exporter and Kube State Metrics. But, this will only get you so far since you will just be performing black box monitoring. How can you go to the next level and observe what’s beyond?

They are an essential part of the day-to-day monitoring of cloud-native systems, as they provide an additional dimension to the business and app level.

  • Metrics provided by an exporter.
  • Tailored metrics designed by the customer.
  • An aggregate from previous existing metrics.

In this article, you will see:

Why custom metrics are important

Custom metrics allow companies to:

  • Monitor Key Performance Indicators (KPIs).
  • Detect issues faster.
  • Track resource utilization.
  • Measure latency.
  • Track specific values from their services and systems.

Examples of custom metrics:

  • Latency of transactions in milliseconds.
  • Database open connections.
  • % cache hits / cache misses.
  • orders/sales in e-commerce site.
  • % of slow responses.
  • % of responses that are resource intensive.

As you can see, any metrics retrieved from an exporter or created ad hoc will fit into the definition for custom metric.

When to use Custom Metrics

Autoscaling

By providing specific visibility over your system, you can define rules on how the workload should scale.

  • Horizontal autoscaling: add or remove replicas of a Pod.
  • Vertical autoscaling: modify limits and requests of a container.
  • Cluster autoscaling: add or remove nodes in a cluster.

If you want to dig deeper, check this article about autoscaling in Kubernetes.

Latency monitoring

Latency measures the time it takes for a system to serve a request. This monitoring golden signal is essential to understand what the end-user experience for your application is.

These are considered custom metrics as they are not part of the out-of-the-box set of metrics coming from Kube State Metrics or Node Exporter. In order to measure latency, you might want to either track individual systems (database, API) or end-to-end.

Application level monitoring

Kube-state-metrics or node-exporter might be a good starting point for observability, but they just scratch the surface as they perform black-box monitoring. By instrumenting your own application and services, you create a curated and personalized set of metrics for your own particular case.

Considerations when creating Custom Metrics

Naming

Check for any existing convention on naming, as they might be either colliding with existing names or confusing. Custom metric name is the first description for its purpose.

Labels

Thanks to labels, we can add parameters to our metrics, as we will be able to filter and refine through additional characteristics. Cardinality is the number of possible values for each label and since each combination of possible values will require a time series entry, that can increase resources drastically. Choosing the correct labels carefully is key to avoiding this cardinality explosion, which is one of the causes of resource spending spikes.

Costs

Custom metrics may have some costs associated with them depending on the monitoring system you are using. Double-check what is the dimension used to scale costs:

  • Number of time series
  • Number of labels
  • Data storage

Custom Metric lifecycle

In case the Custom Metric is related to a job or a short-living script, consider using Pushgateway.

Kubernetes Metric API

One of the most important features of Kubernetes is the ability to scale the workload based on the values of metrics automatically.

Metrics API are defined in the official repository from Kubernetes:

  • metrics.k8s.io
  • custom.metrics.k8s.io
  • external.metrics.k8s.io

Creating new metrics

You can set new metrics by calling the K8s metrics API as follows:

curl -X POST \
  -H 'Content-Type: application/json' \
  http://localhost:8001/api/v1/namespaces/custom-metrics/services/custom-metrics-apiserver:http/proxy/write-metrics/namespaces/default/services/kubernetes/test-metric \
  --data-raw '"300m"'

Prometheus custom metrics

As we mentioned, every exporter that we include in our Prometheus integration will account for several custom metrics.

Check the following post for a detailed guide on Prometheus metrics.

Challenges when using custom metrics

Cardinality explosion

While the resources consumed by some metrics might be negligible, the moment these are available to be used with labels in queries, things might get out of hand.

Cardinality refers to the cartesian products of metrics and labels. The result will be the amount of time series entries that need to be used for that single metric.

Custom metrics - cardinality example

Also, every metric will be scraped and stored in a time series database based on your scrape_interval. The higher this value, the higher the amount of time series entries.

All these factors will eventually lead to:

  • Higher resource consumption.
  • Higher storage demand.
  • Monitoring performance degradation.

Moreover, most common monitoring tools don’t give visibility on the current cardinality of metrics or costs associated.

Exporter over usage

Exporters are a great way to include relevant metrics in your system. With them, you can easily instrument relevant metrics bound to your microservices and containers. But with great power comes great responsibility. Chances are that many of the metrics included in the package may not be relevant to your business at all.

By enabling custom metrics and exporters in your solution, you may end up having a burst in the amount of time series database entries.

Custom metrics cost spikes

Because of the elements explained above, monitoring costs could increase suddenly, as your current solution might be consuming more resources than expected, or your current monitoring solution has certain thresholds that were surpassed.

Alert fatigue

With metrics, most companies and individuals would love to start adding alerts and notifications when their values exceed certain thresholds. However, this could lead to higher notification sources and a reduced attention span.
Learn more about Alert Fatigue and how to mitigate it.

Conclusion

Custom metrics represent the next step for cloud-native monitoring as they represent the core of business observability. While using Prometheus along kube-state-metrics and node exporter is a nice starting step, eventually companies and organizations will need to take the next step and create tailored and on-point metrics to suit their needs.

Want to learn how you can save 75% on custom metrics?

Watch Now!

Register

The post Monitoring with Custom Metrics appeared first on Sysdig.

]]>
Kubernetes OOM and CPU Throttling https://sysdig.com/blog/troubleshoot-kubernetes-oom/ Wed, 25 Jan 2023 13:50:32 +0000 https://sysdig.com/?p=64782 Introduction When working with Kubernetes, Out of Memory (OOM) errors and CPU throttling are the main headaches of resource handling...

The post Kubernetes OOM and CPU Throttling appeared first on Sysdig.

]]>
Introduction

When working with Kubernetes, Out of Memory (OOM) errors and CPU throttling are the main headaches of resource handling in cloud applications. Why is that?

CPU and Memory requirements in cloud applications are ever more important, since they are tied directly to your cloud costs.

With limits and requests, you can configure how your pods should allocate memory and CPU resources in order to prevent resource starvation and adjust cloud costs.

In case a Node doesn’t have enough resources, Pods might get evicted via preemption or node-pressure.
When a process runs Out Of Memory (OOM), it’s killed since it doesn’t have the required resources.
In case CPU consumption is higher than the actual limits, the process will start to be throttled.

But, how can you actively monitor how close your Kubernetes Pods to OOM and CPU throttling?

Kubernetes OOM

Every container in a Pod needs memory to run.

Kubernetes limits are set per container in either a Pod definition or a Deployment definition.

All modern Unix systems have a way to kill processes in case they need to reclaim memory. This will be marked as Error 137 or OOMKilled.

   State:          Running
      Started:      Thu, 10 Oct 2019 11:14:13 +0200
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Thu, 10 Oct 2019 11:04:03 +0200
      Finished:     Thu, 10 Oct 2019 11:14:11 +0200

This Exit Code 137 means that the process used more memory than the allowed amount and had to be terminated.

This is a feature present in Linux, where the kernel sets an oom_score value for the process running in the system. Additionally, it allows setting a value called oom_score_adj, which is used by Kubernetes to allow Quality of Service. It also features an OOM Killer, which will review the process and terminate those that are using more memory than they should.

Note that in Kubernetes, a process can reach any of these limits:

  • A Kubernetes Limit set on the container.
  • A Kubernetes ResourceQuota set on the namespace.
  • The node’s actual Memory size.
Kubernetes OOM graph

Memory overcommitment

Limits can be higher than requests, so the sum of all limits can be higher than node capacity. This is called overcommit and it is very common. In practice, if all containers use more memory than requested, it can exhaust the memory in the node. This usually causes the death of some pods in order to free some memory.

Monitoring Kubernetes OOM

When using node exporter in Prometheus, there’s one metric called node_vmstat_oom_kill. It’s important to track when an OOM kill happens, but you might want to get ahead and have visibility of such an event before it happens.

Instead, you can check how close a process is to the Kubernetes limits:

(sum by (namespace,pod,container)
(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by 
(namespace,pod,container)
(kube_pod_container_resource_limits{resource="cpu"})) > 0.8
Prometheus Got Out of Hand, Discover What Bloomreach Did Next!
Webinar: Prometheus Got Out of Hand, Discover What Bloomreach Did Next! Register now

Kubernetes CPU throttling

CPU Throttling is a behavior where processes are slowed when they are about to reach some resource limits.

Similar to the memory case, these limits could be:

  • A Kubernetes Limit set on the container.
  • A Kubernetes ResourceQuota set on the namespace.
  • The node’s actual Memory size.

Think of the following analogy. We have a highway with some traffic where:

  • CPU is the road.
  • Vehicles represent the process, where each one has a different size.
  • Multiple lanes represent having several cores.
  • A request would be an exclusive road, like a bike lane.

Throttling here is represented as a traffic jam: eventually, all processes will run, but everything will be slower.

CPU process in Kubernetes

CPU is handled in Kubernetes with shares. Each CPU core is divided into 1024 shares, then divided between all processes running by using the cgroups (control groups) feature of the Linux kernel.

Kubernetes shares system for CPU

If the CPU can handle all current processes, then no action is needed. If processes are using more than 100% of the CPU, then shares come into place. As any Linux Kernel, Kubernetes uses the CFS (Completely Fair Scheduler) mechanism, so the processes with more shares will get more CPU time.

Unlike memory, Kubernetes won’t kill Pods because of throttling.

Kubernetes Throttling graph

You can check CPU stats in /sys/fs/cgroup/cpu/cpu.stat

CPU overcommitment

As we saw in the limits and requests article, it’s important to set limits or requests when we want to restrict the resource consumption of our processes. Nevertheless, beware of setting up total requests larger than the actual CPU size, as this means that every container should have a guaranteed amount of CPU.

Monitoring Kubernetes CPU throttling

You can check how close a process is to the Kubernetes limits:

(sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total
{container!=""}[5m])) / sum by (namespace,pod,container)
(kube_pod_container_resource_limits{resource="cpu"})) > 0.8

In case we want to track the amount of throttling happening in our cluster, cadvisor provides container_cpu_cfs_throttled_periods_total and container_cpu_cfs_periods_total. With these two, you can easily calculate the % of throttling in all CPU periods.

Best practices

Beware of limits and requests

Limits are a way to set up a maximum cap on resources in your node, but these need to be treated carefully, as you might end up with a process throttled or killed.

Prepare against eviction

By setting very low requests, you might think this will grant a minimum of either CPU or Memory to your process. But kubelet will evict first those Pods with usage higher than requests first, so you’re marking those as the first to be killed!

In case you need to protect specific Pods against preemption (when kube-scheduler needs to allocate a new Pod), assign Priority Classes to your most important processes.

Throttling is a silent enemy

By setting unrealistic limits or overcommitting, you might not be aware that your processes are being throttled, and performance impacted. Proactively monitor your CPU usage and know your actual limits in both containers and namespaces.

Wrapping up

Here’s a cheat sheet on Kubernetes resource management for CPU and Memory. This summarizes the current article plus these ones which are part of the same series:

Kubernetes resources cheatsheet

Reduce your Kubernetes costs with Sysdig Monitor

Sysdig Monitor can help you reach the next step in the Monitoring Journey.

With Cost Advisor, you can reduce Kubernetes resource waste by up to 40%.

And with our out-of-the-box Kubernetes Dashboards, you can discover underutilized resources in a couple of clicks.

Custom metrics monitoring

Try it free for 30 days!

The post Kubernetes OOM and CPU Throttling appeared first on Sysdig.

]]>
Kubernetes Services: ClusterIP, Nodeport and LoadBalancer https://sysdig.com/blog/kubernetes-services-clusterip-nodeport-loadbalancer/ Thu, 08 Dec 2022 15:15:58 +0000 https://sysdig.com/?p=62882 Pods are ephemeral. And they are meant to be. They can be seamlessly destroyed and replaced if using a Deployment....

The post Kubernetes Services: ClusterIP, Nodeport and LoadBalancer appeared first on Sysdig.

]]>
Pods are ephemeral. And they are meant to be. They can be seamlessly destroyed and replaced if using a Deployment. Or they can be scaled at some point when using Horizontal Pod Autoscaling (HPA).

This means we can’t rely on the Pod IP address to connect with applications running in our containers internally or externally, as the Pod might not be there in the future.

You may have noticed that Kubernetes Pods get assigned an IP address:

stable-kube-state-metrics-758c964b95-6fnbl               1/1     Running   0          3d20h   100.96.2.5      ip-172-20-54-111.ec2.internal   <none>           <none>
stable-prometheus-node-exporter-4brgv                    1/1     Running   0          3d20h   172.20.60.26    ip-172-20-60-26.ec2.internal

This is a unique and internal IP for this particular Pod, but there’s no guarantee that this IP will exist in the future, due to the Pod’s nature.

Services

A Kubernetes Service is a mechanism to expose applications both internally and externally.

Every service will create an everlasting IP address that can be used as a connector.

Additionally, it will open a port that will be linked with a targetPort. Some services can create ports in every Node, and even external IPs to create connectors outside the cluster.

With the combination of both IP and Port, we can create a way to uniquely identify an application.

Creating a service

Every service has a selector that filters that will link it with a set of Pods in your cluster.

spec:
  selector:
    app.kubernetes.io/name: myapp

So all Pods with the label myapp will be linked to this service.

There are three port attributes involved in a Service configuration:

  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30036
    protocol: TCP
  • port: the new service port that will be created to connect to the application.
  • targetPort: application port that we want to target with the services requests.
  • nodePort: this is a port in the range of 30000-32767 that will be open in each node. If left empty, Kubernetes selects a free one in that range.
  • protocol: TCP is the default one, but you can use others like SCTP or UDP.

You can review services created with:

kubectl get services
kubectl get svc

Types of services

Kubernetes allows the creation of these types of services:

  • ClusterIP (default)
  • Nodeport
  • LoadBalancer
  • ExternalName

Let’s see each of them in detail.

ClusterIP

This is the default type for service in Kubernetes.

As indicated by its name, this is just an address that can be used inside the cluster.

Take, for example, the initial helm installation for Prometheus Stack. It installs Pods, Deployments, and Services for the Prometheus and Grafana ecosystem.

NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
alertmanager-operated                     ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   3m27s
kubernetes                                ClusterIP   100.64.0.1      <none>        443/TCP                      18h
prometheus-operated                       ClusterIP   None            <none>        9090/TCP                     3m27s
stable-grafana                            ClusterIP   100.66.46.251   <none>        80/TCP                       3m29s
stable-kube-prometheus-sta-alertmanager   ClusterIP   100.64.23.19    <none>        9093/TCP                     3m29s
stable-kube-prometheus-sta-operator       ClusterIP   100.69.14.239   <none>        443/TCP                      3m29s
stable-kube-prometheus-sta-prometheus     ClusterIP   100.70.168.92   <none>        9090/TCP                     3m29s
stable-kube-state-metrics                 ClusterIP   100.70.80.72    <none>        8080/TCP                     3m29s
stable-prometheus-node-exporter           ClusterIP   100.68.71.253   <none>        9100/TCP                     3m29s
Kubernetes Services ClusterIP

This creates a connection using an internal Cluster IP address and a Port.

But, what if we need to use this connector from outside the Cluster? This IP is internal and won’t work outside.

This is where the rest of the services come in…

NodePort

A NodePort differs from the ClusterIP in the sense that it exposes a port in each Node.

When a NodePort is created, kube-proxy exposes a port in the range 30000-32767:

apiVersion: v1
kind: Service
metadata:
  name: myservice
spec:
  selector:
  	app: myapp
  type: NodePort
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30036
    protocol: TCP
Kubernetes Services Nodeport

NodePort is the preferred element for non-HTTP communication.

The problem with using a NodePort is that you still need to access each of the Nodes separately.

So, let’s have a look at the next item on the list…

LoadBalancer

A LoadBalancer is a Kubernetes service that:

  • Creates a service like ClusterIP
  • Opens a port in every node like NodePort
  • Uses a LoadBalancer implementation from your cloud provider (your cloud provider needs to support this for LoadBalancers to work).
apiVersion: v1
kind: Service
metadata:
  name: myservice
spec:
  ports:
  - name: web
    port: 80
  selector:
    app: web
  type: LoadBalancer
my-service                                LoadBalancer   100.71.69.103   <pending>     80:32147/TCP                 12s
my-service                                LoadBalancer   100.71.69.103   a16038a91350f45bebb49af853ab6bd3-2079646983.us-east-1.elb.amazonaws.com   80:32147/TCP                 16m

In this case, Amazon Web Service (AWS) was being used, so an external IP from AWS was created.

Then, if you use kubectl describe my-service, you will find that several new attributes were added:

Name:                     my-service
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app.kubernetes.io/name=pegasus
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       100.71.69.103
IPs:                      100.71.69.103
LoadBalancer Ingress:     a16038a91350f45bebb49af853ab6bd3-2079646983.us-east-1.elb.amazonaws.com
Port:                     <unset>  80/TCP
TargetPort:               9376/TCP
NodePort:                 <unset>  32147/TCP
Endpoints:                <none>
Session Affinity:         None
External Traffic Policy:  Cluster

The main difference with NodePort is that LoadBalancer can be accessed and will try to equally assign requests to Nodes.

Kubernetes Service LoadBalancer

ExternalName

The ExternalName service was introduced due to the need of connecting to an element outside of the Kubernetes cluster. Think of it not as a way to connect to an item within your cluster, but as a connector to an external element of the cluster.

This serves two purposes:

  • It creates a single endpoint for all communications to that element.
  • In case that external service needs to be replaced, it’s easier to switch by just modifying the ExternalName, instead of all connections.
apiVersion: v1
kind: Service
metadata:
  name: myservice
spec:
  ports:
    - name: web
      port: 80
  selector:
    app: web
  type: ExternalName
  externalName: db.myexternalserver.com

Conclusion

Services are a key aspect of Kubernetes, as they provide a way to expose internal endpoints inside and outside of the cluster.

ClusterIP service just creates a connector for in-node communication. Use it only in case you have a specific application that needs to connect with others in your node.

NodePort and LoadBalancer are used for external access to your applications. It’s preferred to use LoadBalancer to equally distribute requests in multi-pod implementations, but note that your vendor should implement load balancing for this to be available.

Apart from these, Kubernetes provides Ingresses, a way to create an HTTP connection with load balancing for external use.


Monitor Kubernetes and troubleshoot issues up to 10x faster

Sysdig can help you monitor and troubleshoot your Kubernetes cluster with the out-of-the-box dashboards included in Sysdig Monitor. Advisor, a tool integrated in Sysdig Monitor accelerates troubleshooting of your Kubernetes clusters and its workloads by up to 10x.

Custom metrics monitoring

Sign up for a 30-day trial account and try it yourself!

The post Kubernetes Services: ClusterIP, Nodeport and LoadBalancer appeared first on Sysdig.

]]>
Understanding Kubernetes Limits and Requests https://sysdig.com/blog/kubernetes-limits-requests/ Fri, 18 Nov 2022 13:25:00 +0000 https://sysdig.com/?p=61304 When working with containers in Kubernetes, it’s important to know what are the resources involved and how they are needed....

The post Understanding Kubernetes Limits and Requests appeared first on Sysdig.

]]>
When working with containers in Kubernetes, it’s important to know what are the resources involved and how they are needed. Some processes will require more CPU or memory than others. Some are critical and should never be starved. 

Knowing that, we should configure our containers and Pods properly in order to get the best of both.

In this article, we will see:

Introduction to Kubernetes Limits and Requests

Limits and Requests are important settings when working with Kubernetes. This article will focus on the two most important ones: CPU and memory.

Kubernetes defines Limits as the maximum amount of a resource to be used by a container. This means that the container can never consume more than the memory amount or CPU amount indicated. 

Requests, on the other hand, are the minimum guaranteed amount of a resource that is reserved for a container.

Hands-on example

Let’s have a look at this deployment, where we are setting up limits and requests for two different containers on both CPU and memory.

kind: Deployment
apiVersion: extensions/v1beta1
…
template:
  spec:
    containers:
      - name: redis
        image: redis:5.0.3-alpine
        resources:
          limits:
            memory: 600Mi
            cpu: 1
          requests:
            memory: 300Mi
            cpu: 500m
      - name: busybox
        image: busybox:1.28
        resources:
          limits:
            memory: 200Mi
            cpu: 300m
          requests:
            memory: 100Mi
            cpu: 100m

Let’s say we are running a cluster with, for example, 4 cores and 16GB RAM nodes. We can extract a lot of information:

Kubernetes Limits and Requests practical example
  1. Pod effective request is 400 MiB of memory and 600 millicores of CPU. You need a node with enough free allocatable space to schedule the pod.
  2. CPU shares for the redis container will be 512, and 102 for the busybox container. Kubernetes always assign 1024 shares to every core, so redis: 1024 * 0.5 cores ≅ 512 and busybox: 1024 * 0.1cores ≅ 102
  3. Redis container will be OOM killed if it tries to allocate more than 600MB of RAM, most likely making the pod fail.
  4. Redis will suffer CPU throttle if it tries to use more than 100ms of CPU in every 100ms, (since we have 4 cores, available time would be 400ms every 100ms) causing performance degradation.
  5. Busybox container will be OOM killed if it tries to allocate more than 200MB of RAM, resulting in a failed pod.
  6. Busybox will suffer CPU throttle if it tries to use more than 30ms of CPU every 100ms, causing performance degradation.

Kubernetes Requests

Kubernetes defines requests as a guaranteed minimum amount of a resource to be used by a container.

Basically, it will set the minimum amount of the resource for the container to consume.

When a Pod is scheduled, kube-scheduler will check the Kubernetes requests in order to allocate it to a particular Node that can satisfy at least that amount for all containers in the Pod. If the requested amount is higher than the available resource, the Pod will not be scheduled and remain in Pending status.

For more information about Pending status, check Understanding Kubernetes Pod pending problems.

In this example, in the container definition we set a request for 100m cores of CPU and 4Mi of memory:

resources:
   requests:
        cpu: 0.1
        memory: 4Mi

Requests are used:

  • When allocating Pods to a Node, so the indicated requests by the containers in the Pod are satisfied.
  • At runtime, the indicated amount of requests will be guaranteed as a minimum for the containers in that Pod.
How to set good CPU requests

Kubernetes Limits

Kubernetes defines limits as a maximum amount of a resource to be used by a container.

This means that the container can never consume more than the memory amount or CPU amount indicated.

    resources:
      limits:
        cpu: 0.5
        memory: 100Mi

Limits are used:

  • When allocating Pods to a Node. If no requests are set, by default, Kubernetes will assign requests = limits.
  • At runtime, Kubernetes will check that the containers in the Pod are not consuming a higher amount of resources than indicated in the limit.
Setting good Limits in Kubernetes

CPU particularities

CPU is a compressible resource, meaning that it can be stretched in order to satisfy all the demand. In case that the processes request too much CPU, some of them will be throttled.

CPU represents computing processing time, measured in cores. 

  • You can use millicores (m) to represent smaller amounts than a core (e.g., 500m would be half a core)
  • The minimum amount is 1m
  • A Node might have more than one core available, so requesting CPU > 1 is possible
Kubernetes requests for CPU image
Prometheus Got Out of Hand, Discover What Bloomreach Did Next!
Webinar: Prometheus Got Out of Hand, Discover What Bloomreach Did Next! Register now

Memory particularities

Memory is a non-compressible resource, meaning that it can’t be stretched in the same manner as CPU. If a process doesn’t get enough memory to work, the process is killed.

Memory is measured in Kubernetes in bytes.

  • You can use, E, P, T, G, M, k to represent Exabyte, Petabyte, Terabyte, Gigabyte, Megabyte and kilobyte, although only the last four are commonly used. (e.g., 500M, 4G)
  • Warning: don’t use lowercase m for memory (this represents Millibytes, which is ridiculously low)
  • You can define Mebibytes using Mi, as well as the rest as Ei, Pi, Ti (e.g., 500Mi)

A Mebibyte (and their analogues Kibibyte, Gibibyte,…) is 2 to the power of 20 bytes. It was created to avoid the confusion with the Kilo, Mega definitions of the metric system. You should be using this notation, as it’s the canonical definition for bytes, while Kilo and Mega are multiples of 1000

Kubernetes Limits for memory image

Best practices

In very few cases should you be using limits to control your resources usage in Kubernetes. This is because if you want to avoid starvation (ensure that every important process gets its share), you should be using requests in the first place. 

By setting up limits, you are only preventing a process from retrieving additional resources in exceptional cases, causing an OOM kill in the event of memory, and Throttling in the event of CPU (process will need to wait until the CPU can be used again).

For more information, check the article about OOM and Throttling.

If you’re setting a request value equal to the limit in all containers of a Pod, that Pod will get the Guaranteed Quality of Service. 

Note as well, that Pods that have a resource usage higher than the requests are more likely to be evicted, so setting up very low requests cause more harm than good.For more information, check the article about Pod eviction and Quality of Service.

Namespace ResourceQuota

Thanks to namespaces, we can isolate Kubernetes resources into different groups, also called tenants.

With ResourceQuotas, you can set a memory or CPU limit to the entire namespace, ensuring that entities in it can’t consume more from that amount.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-demo
spec:
  hard:
    requests.cpu: 2
    requests.memory: 1Gi
    limits.cpu: 3
    limits.memory: 2Gi

  • requests.cpu: the maximum amount of CPU for the sum of all requests in this namespace
  • requests.memory: the maximum amount of Memory for the sum of all requests in this namespace
  • limits.cpu: the maximum amount of CPU for the sum of all limits in this namespace
  • limits.memory: the maximum amount of memory for the sum of all limits in this namespace

Then, apply it to your namespace:

kubectl apply -f resourcequota.yaml --namespace=mynamespace

You can list the current ResourceQuota for a namespace with:

kubectl get resourcequota -n mynamespace

Note that if you set up ResourceQuota for a given resource in a namespace, you then need to specify limits or requests accordingly for every Pod in that namespace. If not, Kubernetes will return a “failed quota” error:

Error from server (Forbidden): error when creating "mypod.yaml": pods "mypod" is forbidden: failed quota: mem-cpu-demo: must specify limits.cpu,limits.memory,requests.cpu,requests.memory

In case you try to add a new Pod with container limits or requests that exceed the current ResourceQuota, Kubernetes will return an “exceeded quota” error:

Error from server (Forbidden): error when creating "mypod.yaml": pods "mypod" is forbidden: exceeded quota: mem-cpu-demo, requested: limits.memory=2Gi,requests.memory=2Gi, used: limits.memory=1Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi

Namespace LimitRange

ResourceQuotas are useful if we want to restrict the total amount of a resource allocatable for a namespace. But what happens if we want to give default values to the elements inside?

LimitRanges are a Kubernetes policy that restricts the resource settings for each entity in a namespace.

apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-resource-constraint
spec:
  limits:
  - default:
      cpu: 500m
    defaultRequest:
      cpu: 500m
    min:
      cpu: 100m
    max:
      cpu: "1"
    type: Container
  • default: Containers created will have this value if none is specified.
  • min: Containers created can’t have limits or requests smaller than this.
  • max: Containers created can’t have limits or requests bigger than this.

Later, if you create a new Pod with no requests or limits set, LimitRange will automatically set these values to all its containers:

    Limits:
      cpu:  500m
    Requests:
      cpu:  100m

Now, imagine that you add a new Pod with 1200M as limit. You will receive the following error:

Error from server (Forbidden): error when creating "pods/mypod.yaml": pods "mypod" is forbidden: maximum cpu usage per Container is 1, but limit is 1200m

Note that by default, all containers in Pod will effectively have a request of 100m CPU, even with no LimitRanges set.

Conclusion

Choosing the optimal limits for our Kubernetes cluster is key in order to get the best of both energy consumption and costs.

Oversizing or dedicating too many resources for our Pods may lead to costs skyrocketing.

Undersizing or dedicating very few CPU or Memory will lead to applications not performing correctly, or even Pods being evicted.

As mentioned, Kubernetes limits shouldn’t be used, except in very specific situations, as they may cause more harm than good. There’s a chance that a Container is killed in case of Out of Memory, or throttled in case of Out of CPU.

For requests, use them when you need to ensure a process gets a guaranteed share of a resource.

Kubernetes resources cheatsheet

Reduce your Kubernetes costs with Sysdig Monitor

Sysdig Monitor can help you reach the next step in the Monitoring Journey.

With Cost Advisor, you can reduce Kubernetes resource waste by up to 40%.

And with our out-of-the-box Kubernetes Dashboards, you can discover underutilized resources in a couple of clicks.

Custom metrics monitoring

Try it free for 30 days!

The post Understanding Kubernetes Limits and Requests appeared first on Sysdig.

]]>
The four Golden Signals of Monitoring https://sysdig.com/blog/golden-signals-kubernetes/ Thu, 27 Oct 2022 10:14:00 +0000 https://sysdig.com/?p=68448 Golden Signals are a reduced set of metrics that offer a wide view of a service from a user or...

The post The four Golden Signals of Monitoring appeared first on Sysdig.

]]>
Golden Signals are a reduced set of metrics that offer a wide view of a service from a user or consumer perspective: Latency, Traffic, Errors and Saturation. By focusing on these, you can be quicker at detecting potential problems that might be directly affecting the behavior of the application.

Google introduced the term “Golden Signals” to refer to the essential metrics that you need to measure in your applications. They are the following:

  • Errors – rate of requests that fail.
  • Saturation – consumption of your system resources.
  • Traffic – amount of use of your service per time unit.
  • Latency – the time it takes to serve a request.

This is just a set of essential signals to start monitoring in your system. In other words, if you’re wondering which signals to monitor, you will need to look at these four first.

Errors

The Errors golden signal measures the rate of requests that fail.

Note that measuring the bulk amount of errors might not be the best course of action. If your application has a sudden peak of requests, then logically the amount of failed requests may increase.

That’s why usually monitoring systems focus on the error rate, calculated as the percent of calls that are failing from the total.

If you’re managing a web application, typically you will discriminate between those calls returning HTTP status in the 400-499 range (client errors) and 500-599 (server errors).

Measuring errors in Kubernetes

One thermometer for the errors happening in Kubernetes is the Kubelet. You can use several Kubernetes State Metrics in Prometheus to measure the amount of errors.

The most important one is kubelet_runtime_operations_errors_total, which indicates low level issues in the node, like problems with container runtime.

If you want to visualize errors per operation, you can use kubelet_runtime_operations_total to divide.

Errors example

Here’s the Kubelet Prometheus metric for error rate in a Kubernetes cluster:

sum(rate(kubelet_runtime_operations_errors_total{cluster="",
job="kubelet", metrics_path="/metrics"}[$__rate_interval])) 
by (instance, operation_type)
Errors diagram

Saturation

Saturation measures the consumption of your system resources, usually as a percentage of the maximum capacity. Examples include:

  • CPU usage
  • Disk space
  • Memory usage
  • Network bandwidth

In the end, cloud applications run on machines, which have a limited amount of these resources.

In order to correctly measure, you should be aware of the following:

  • What are the consequences if the resource is depleted? It could be that your entire system is unusable because this space has run out. Or maybe further requests are throttled until the system is less saturated.
  • Saturation is not always about resources about to be depleted. It’s also about over-resourcing, or allocating a higher quantity of resources than what is needed. This one is crucial for cost savings.

Measuring saturation in Kubernetes

Since saturation depends on the resource being observed, you can use different metrics for Kubernetes entities:

  • node_cpu_seconds_total to measure machine CPU utilization.
  • container_memory_usage_bytes to measure the memory utilization at container level (paired with container_memory_max_usage_bytes).
  • The amount of Pods that a Node can contain is also a Kubernetes resource.

Saturation example

Here’s a PromQL example of a Saturation signal, measuring CPU usage percent in a Kubernetes node.

100 - (avg by (instance) (rate(node_cpu_seconds_total{}[5m])) * 100)
Saturation diagram

Traffic

Traffic measures the amount of use of your service per time unit.

In essence, this will represent the usage of your current service. This is important not only for business reasons, but also to detect anomalies.

Is the amount of requests too high? This could be due to a peak of users or because of a misconfiguration causing retries.

Is the amount of requests too low? That may reflect that one of your systems is failing.

Still, traffic signals should always be measured with a time reference. As an example, this blog receives more visits from Tuesday to Thursday.

Depending on your application, you could be measuring traffic by:

  • Requests per minute for a web application
  • Queries per minute for a database application
  • Endpoint requests per minute for an API

Traffic example

Here’s a Google Analytics chart displaying traffic distributed by hour:

Traffic diagram

Latency

Latency is defined as the time it takes to serve a request.

Average latency

When working with latencies, your first impulse may be to measure average latency, but depending on your system that might not be the best idea. There may be very fast or very slow requests distorting the results.

Instead, consider using a percentile, like p99, p95, and p50 (also known as median) to measure how the fastest 99%, 95%, or 50% of requests, respectively, took to complete.

Failed vs. successful

When measuring latency, it’s also important to discriminate between failed and successful requests, as failed ones might take sensibly less time than the correct ones.

Apdex Score

As described above, latency information may not be informative enough:

  • Some users might perceive applications as slower, depending on the action they are performing.
  • Some users might perceive applications as slower, based on the default latencies of the industry.

This is where the Apdex (Application Performance Index) comes in. It’s defined as:

Apdex formula

Where t is the target latency that we consider as reasonable.

  • Satisfied will represent the amount of users with requests under the target latency.
  • Tolerant will represent the amount of non-satisfied users with requests below four times the target latency.
  • Frustrated will represent the amount of users with requests above the tolerant latency.

The output for the formula will be an index from 0 to 1, indicating how performant our system is in terms of latency.

Measuring latency in Kubernetes

In order to measure the latency in your Kubernetes cluster, you can use metrics like http_request_duration_seconds_sum.

You can also measure the latency for the api-server by using Prometheus metrics like apiserver_request_duration_seconds.

Latency example

Here’s an example of a Latency PromQL query for the 95% best performing HTTP requests in Prometheus:

histogram_quantile(0.95, sum(rate(prometheus_http_request_duration_seconds_bucket[5m]))
by (le))
Saturation diagram

RED Method

The RED Method was created by Tom Wilkie, from Weaveworks. It is heavily inspired by the Golden Signals and it’s focused on microservices architectures.

RED stands for:

  • Rate
  • Error
  • Duration

Rate measures the number of requests per second (equivalent to Traffic in the Golden Signals).

Error measures the number of failed requests (similar to the one in Golden Signals).

Duration measures the amount of time to process a request (similar to Latency in Golden Signals).

USE Method

The USE Method was created by Brendan Gregg and it’s used to measure infrastructure.

USE stands for:

  • Utilization
  • Saturation
  • Errors

That means for every resource in your system (CPU, disk, etc.), you need to check the three elements above.

Utilization is defined as the percentage of usage for that resource.

Saturation is defined as the queue for requests in the system.

Errors is defined as the number of errors happening in the system.

While it may not be intuitive, Saturation in Golden Signals is not similar to the Saturation in USE, but rather Utilization.

Golden Signals vs Red vs Use

A practical example of Golden signals in Kubernetes

As an example to illustrate the use of Golden Signals, here’s a simple go application example with Prometheus instrumentation. This application will apply a random delay between 0 and 12 seconds in order to give usable information of latency. Traffic will be generated with curl, with several infinite loops.

An histogram was included to collect metrics related to latency and requests. These metrics will help us obtain the initial three Golden Signals: latency, request rate and error rate. To obtain saturation directly with Prometheus and node-exporter, use percentage of CPU in the nodes.



File: main.go
-------------
package main
import (
    "fmt"
    "log"
    "math/rand"
    "net/http"
    "time"
    "github.com/gorilla/mux"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
    //Prometheus: Histogram to collect required metrics
    histogram := prometheus.NewHistogramVec(prometheus.HistogramOpts{
        Name:    "greeting_seconds",
        Help:    "Time take to greet someone",
        Buckets: []float64{1, 2, 5, 6, 10}, //Defining small buckets as this app should not take more than 1 sec to respond
    }, []string{"code"}) //This will be partitioned by the HTTP code.
    router := mux.NewRouter()
    router.Handle("/sayhello/{name}", Sayhello(histogram))
    router.Handle("/metrics", promhttp.Handler()) //Metrics endpoint for scrapping
    router.Handle("/{anything}", Sayhello(histogram))
    router.Handle("/", Sayhello(histogram))
    //Registering the defined metric with Prometheus
    prometheus.Register(histogram)
    log.Fatal(http.ListenAndServe(":8080", router))
}
func Sayhello(histogram *prometheus.HistogramVec) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        //Monitoring how long it takes to respond
        start := time.Now()
        defer r.Body.Close()
        code := 500
        defer func() {
            httpDuration := time.Since(start)
            histogram.WithLabelValues(fmt.Sprintf("%d", code)).Observe(httpDuration.Seconds())
        }()
        if r.Method == "GET" {
            vars := mux.Vars(r)
            code = http.StatusOK
            if _, ok := vars["anything"]; ok {
                //Sleep random seconds
                rand.Seed(time.Now().UnixNano())
                n := rand.Intn(2) // n will be between 0 and 3
                time.Sleep(time.Duration(n) * time.Second)
                code = http.StatusNotFound
                w.WriteHeader(code)
            }
            //Sleep random seconds
            rand.Seed(time.Now().UnixNano())
            n := rand.Intn(12) //n will be between 0 and 12
            time.Sleep(time.Duration(n) * time.Second)
            name := vars["name"]
            greet := fmt.Sprintf("Hello %s \n", name)
            w.Write([]byte(greet))
        } else {
            code = http.StatusBadRequest
            w.WriteHeader(code)
        }
    }
}

The application was deployed in a Kubernetes cluster with Prometheus and Grafana, and generated a dashboard with Golden Signals. In order to obtain the data for the dashboards, these are the PromQL queries:

Latency:

sum(greeting_seconds_sum)/sum(greeting_seconds_count)  //Average
histogram_quantile(0.95, sum(rate(greeting_seconds_bucket[5m])) by (le)) //Percentile p95

Request rate:

sum(rate(greeting_seconds_count{}[2m]))  //Including errors
rate(greeting_seconds_count{code="200"}[2m])  //Only 200 OK requests

Errors per second:

sum(rate(greeting_seconds_count{code!="200"}[2m]))

Saturation:

100 - (avg by (instance) (irate(node_cpu_seconds_total{}[5m])) * 100)

Conclusion

Golden Signals, RED, and USE are just guidelines on what you should be focusing on when looking at your systems. But these are just the bare minimum on what to measure.

Understand the errors in your system. They will be a thermometer of all the other metrics, as they will point to any unusual behavior. And remember that you need to correctly mark requests as erroneous, but only the ones that should be exceptionally incorrect. Otherwise, your system will be prone to false positives or false negatives.

Measure latency of your requests. Try to understand your bottlenecks and what the negative experiences are when latency is higher than expected.

Visualize saturation and understand the resources involved in your solution. What are the consequences if a resource gets depleted?

Measure traffic to understand your usage curves. You will be able to find the best time to take down your system for an update, or you could be alerted when there’s an unexpected amount of users.

Once metrics are in place, it’s important to set up alerts, which will notify you in case any of these metrics reach a certain threshold.


Monitor Kubernetes and troubleshoot issues up to 10x faster

Sysdig can help you monitor and troubleshoot your Kubernetes cluster with the out-of-the-box dashboards included in Sysdig Monitor. Advisor, a tool integrated in Sysdig Monitor accelerates troubleshooting of your Kubernetes clusters and its workloads by up to 10x.

Custom metrics monitoring

Sign up for a 30-day trial account and try it yourself!

The post The four Golden Signals of Monitoring appeared first on Sysdig.

]]>
Kubernetes ErrImagePull and ImagePullBackOff in detail https://sysdig.com/blog/kubernetes-errimagepull-imagepullbackoff/ Wed, 05 Oct 2022 13:20:46 +0000 https://sysdig.com/?p=54936 Pod statuses like ImagePullBackOff or ErrImagePull are common when working with containers. ErrImagePull is an error happening when the image...

The post Kubernetes ErrImagePull and ImagePullBackOff in detail appeared first on Sysdig.

]]>
Pod statuses like ImagePullBackOff or ErrImagePull are common when working with containers.

ErrImagePull is an error happening when the image specified for a container can’t be retrieved or pulled.

ImagePullBackOff is the waiting grace period while the image pull is fixed.

In this article, we will take a look at:

Take a look at our video on ImagePullBackoff!

Container Images

One of the greatest strengths of containerization is the ability to run any particular image in seconds. A container is a group of processes executing in isolation from the underlying system. A container image contains all the resources needed to run those processes: the binaries, libraries, and any necessary configuration.

A container registry is a repository for container images, where there are two basic actions:

  • Push: upload an image so it’s available in the repo
  • Pull: download an image to use it in a container


The docker CLI will be used in the examples for this article, but you can use any tool that implements the Open Container Initiative Distribution specs for all the container registry interactions.

Pulling images

Images can be defined by name. In addition, a particular version of an image can be labeled with a specific name or tag. It can also be identified by its digest, a hash of the content.

The tag latest refers to the most recent version of a given image.

Pull images by name

By only providing the name for the image, the image with tag latest will be pulled

docker pull nginx
kubectl run mypod nginx

Pull images by name + tag

If you don’t want to pull the latest image, you can provide a specific release tag:

docker pull nginx:1.23.1-alpine
kubectl run mypod nginx:1.23.1-alpine

For more information, you can check this article about tag mutability.

Pull images by digest

A digest is sha256 hash of the actual image. You can pull images using this digest, then verify its authenticity and integrity with the downloaded file.

docker pull nginx@sha256:d164f755e525e8baee113987bdc70298da4c6f48fdc0bbd395817edf17cf7c2b
kubectl run mypod --image=nginx@sha256:d164f755e525e8baee113987bdc70298da4c6f48fdc0bbd395817edf17cf7c2b
Prometheus Got Out of Hand, Discover What Bloomreach Did Next!
Webinar: Prometheus Got Out of Hand, Discover What Bloomreach Did Next! Register now

Image Pull Policy

Kubernetes features the ability to set an Image Pull Policy (imagePullPolicy field) for each container. Based on this, the way the kubelet retrieves the container image will differ.

There are three different values for imagePullPolicy:

  • Always
  • IfNotPresent
  • Never

Always

With imagePullPolicy set to Always, kubelet will check the repository every time when pulling images for this container.

IfNotPresent

With imagePullPolicy set to IfNotPresent, kubelet will only pull images from the repository if it doesn’t exist in the node locally.

Never

With imagePullPolicy set to Never, kubelet will never try to pull images from the image registry. If there’s an image cached locally (pre-pulled), it will be used to start the container.

If the image is not present locally, Pod creation will fail with ErrImageNeverPull error.

ImagePullPolicy description: always, never and ifnotpresent

Note that you can modify the entire image pull policy of your cluster by using the AlwaysPullImages admission controller.

Default Image Pull Policy

  • If you omit the imagePullPolicy and the tag is latest, imagePullPolicy is set to Always.
  • If you omit the imagePullPolicy and the tag for the image, imagePullPolicy is set to Always.
  • If you omit the imagePullPolicy and the tag is set to a value different than latest, imagePullPolicy is set to IfNotPresent.

ErrImagePull

When Kubernetes tries to pull an image for a container in a Pod, things might go wrong. The status ErrImagePull is displayed when kubelet tried to start a container in the Pod, but something was wrong with the image specified in your Pod, Deployment, or ReplicaSet manifest.

Imagine that you are using kubectl to retrieve information about the Pods in your cluster:

$ kubectl get pods
NAME    READY   STATUS             RESTARTS   AGE
goodpod 1/1     Running            0          21h
mypod   0/1     ErrImagePull       0          4s

Which means:

  • Pod is not in READY status
  • Status is ErrImagePull

Additionally, you can check the logs for containers in your Pod:

$ kubectl logs mypod --all-containers
Error from server (BadRequest): container "mycontainer" in pod "mypod" is waiting to start: trying and failing to pull image

In this case, this is pointing to a 400 Error (BadRequest), since probably the image indicated is not available or doesn’t exist.

ImagePullBackOff

ImagePullBackOff is a Kubernetes waiting status, a grace period with an increased back-off between retries. After the back-off period expires, kubelet will try to pull the image again.

This is similar to the CrashLoopBackOff status, which is also a grace period between retries after an error in a container. Back-off time is increased each retry, up to a maximum of five minutes.

Note that ImagePullBackOff is not an error. As mentioned, it’s just a status reason that is caused by a problem when pulling the image.

$ kubectl get pods
NAME    READY   STATUS             RESTARTS   AGE
goodpod 1/1     Running            0          21h
mypod   0/1     ImagePullBackOff   0          84s

Which means:

  • Pod is not in READY status
  • Status is ImagePullBackOff
  • Unlike CrashLoopBackOff, there are no restarts (technically Pod hasn’t even started)
$ kubectl describe pod mypod
State:          Waiting
Reason:       ImagePullBackOff
...
Warning  Failed     3m57s (x4 over 5m28s)  kubelet            Error: ErrImagePull
Warning  Failed     3m42s (x6 over 5m28s)  kubelet            Error: ImagePullBackOff
Normal   BackOff    18s (x20 over 5m28s)   kubelet            Back-off pulling image "failed-image"
ErrImagePull and ImagePullBackOff timeline

Debugging ErrImagePull and ImagePullBackOff

There are several potential causes of why you might encounter an Image Pull Error. Here are some examples:

  • Wrong image name
  • Wrong image tag
  • Wrong image digest
  • Network problem or image repo not available
  • Pulling from a private registry but not imagePullSecret was provided

This is just a list of possible causes, but it’s important to note that there might be many others based on your solution. The best course of action would be to check:

$ kubectl describe pod podname

$ kubect logs podname –all-containers

$ kubectl get events --field-selector involvedObject.name=podname

In the following example you can see how to dig into the logs, where an image error is found.

Three terminals with debugging options for ErrImagePull and ImagePullBackOff

Other image errors

ErrImageNeverPull

This error appears when kubelet fails to pull an image in the node and the imagePullPolicy is set to Never. In order to fix it, either change the Pull Policy to allow images to be pulled externally or add the correct image locally.

Pending

Remember that an ErrImagePull and the associated ImagePullBackOff may be different from a Pending status on your Pod.

Pending status, most likely, is the result of kube-scheduler not being able to assign your Pod to a working or eligible Node.

Monitoring Image Pull Errors in Prometheus

Using Prometheus and Kube State Metrics (KSM), we can easily track our Pods with containers in ImagePullBackOff or ErrImagePull statuses.

kube_pod_container_status_waiting_reason{reason="ErrImagePull"}
kube_pod_container_status_waiting_reason{reason="ImagePullBackOff"}

In fact, these two metrics are complementary, as we can see in the following Prometheus queries:

Monitoring ErrImagePull and ImagePullBackOff in Prometheus

The Pod is shifting between the waiting period in ImagePullBackOff and the image pull retrial returning an ErrImagePull.

Also, if you’re using containers with ImagePullPolicy set to Never, remember that you need to track the error as ErrImageNeverPull.

kube_pod_container_status_waiting_reason{reason="ErrImageNeverPull"}

Conclusion

Container images are a great way to kickstart your cloud application needs. Thanks to them, you have access to thousands of curated applications that are ready to be started and scaled.

However, due to misconfiguration, misalignments, or repository problems, image errors might start appearing. A container can’t start properly if the image definition is malformed or there are errors on the setup.

Kubernetes provides a graceful period in case of an image pull error. This Image Pull Backoff is quite useful, as it gives you time to fix the problem in the image definition. But you need to be aware when this happens in your cluster and what does it mean each time.


Monitor Kubernetes and troubleshoot issues up to 10x faster

Sysdig can help you monitor and troubleshoot your Kubernetes cluster with the out-of-the-box dashboards included in Sysdig Monitor. Advisor, a tool integrated in Sysdig Monitor accelerates troubleshooting of your Kubernetes clusters and its workloads by up to 10x.

Custom metrics monitoring

Sign up for a 30-day trial account and try it yourself!

The post Kubernetes ErrImagePull and ImagePullBackOff in detail appeared first on Sysdig.

]]>