How Much Does Your Managed Service for Prometheus Cost?

By Victor Hernando - APRIL 25, 2023
Topics: Monitoring

SHARE:

Are you using a managed service for Prometheus and finding the costs too high? Or are you considering delegating your Prometheus metrics ingestion, processing, and management and want to know more about the costs involved?

Nowadays, many companies opt for a managed service for Prometheus instead of maintaining their own OSS Prometheus monitoring bundle. This approach is becoming increasingly common, as it allows businesses to reduce their operational and infrastructure monitoring spending and take the burden off.

When evaluating and ultimately choosing a managed Prometheus service, there are many factors to consider, with pricing being one of the most important. Planning ahead and understanding how costs are calculated is crucial to avoid unpleasant surprises in the form of unexpected high bills.

This article will provide insights into how the leading managed Prometheus service providers charge for their services, highlighting the most expensive and the most affordable options.

Managed service for Prometheus pricing

Disclaimer: In this article, you’ll find the prices that correspond to the time this blog post was written (April 2023). For current pricing, please check the public pricing information from every vendor.

AWS

The current Amazon managed service for Prometheus prices are available here. Amazon also provides its own pricing calculator to estimate your bill.

Amazon charges for different services within its Amazon managed service for Prometheus, like the metric ingestion costs, storage, and query. Metric ingestion is charged per sample.

Metrics ingestionCost ($/10M samples)
First 2 billion samples$0.90
Next 250 billion samples$0.35
Over 252 billion samples$0.16
Other costs
Metrics storage$0.03/Gb-Mo
Query Samples Processed (QSP)$0.10/B samples processed

Google Cloud

Google Cloud monitoring pricing is available on its website. Google also has its own pricing calculator, just select “Cloud operations (Logging, Monitoring, Trace, Managed Prometheus)” and estimate your bill.

Google Cloud charges metric ingestion per sample. The following table shows the Google Cloud monitoring pricing.

Metrics ingestionCost ($/1M samples)
First 50 billion (B) samples$0.15
Next 50B-250B samples$0.12
Next 250B-500B samples$0.09
> 500B samples$0.06

Some other items such as “Monitoring API calls” usage and “Execution of monitoring uptime checks” may or may not be charged, depending on the usage within a full month.

Azure

Azure Monitor pricing is available here. If you want to estimate your Azure Monitor bill, you can use its pricing calculator. Azure charges metrics ingestion per sample.

Metrics ingestionCost ($/10M samples)
Any number of samples$0.16
QueriesCost ($/1B samples)
Metrics queries$0.10

Azure charges for alerts and notifications as well, like emails, push notifications, or web hooks, among others.

Grafana Labs

If you want to check the Grafana Cloud pricing model, visit its website. In the pricing page, you can also calculate the estimated cost of your bill. Grafana Labs charges metrics ingestion per time series (TS).

Metrics ingestionCost ($/1000 time series)
First 20K TSIncluded in monthly usage subscription
Next 1K TS$8

Grafana Labs charges based on active series, which may cause your bill to vary depending on your metric usage. On the other hand, alerting and QSP are included, so no extra costs are incurred.

Per Grafana Labs documentation, a time series is considered active if new data points have been received within the last 15 or 30 minutes.

Sysdig Monitor

Sysdig Monitor charges metric ingestion per time series. Following, you’ll find Sysdig’s metrics pricing.

Metrics ingestionCost ($/1000 time series)
Node exporter, cAdvisor, and KSM metrics included$0
First 2K TS included in agent subscription per node$0
> 2K TS per agent$5

Note that node exporter, KSM, and cAdvisor time series metrics are included in the Sysdig Agent price. Hence, these time series are not charged. In addition, alerting and QSP are included, you won’t be charged for these features either.

Price comparison per TS

If you extrapolate the information you obtained from providers that charge based on samples, you can obtain the time series equivalent. This way, you can compare costs between managed Prometheus services providers.

How can you calculate the TS equivalent from samples? Let’s get TS equivalent data from different ingestion sampling intervals.

  • 60s interval -> 1TS / 60s * 3,600s in an hour * 744 hours in a month = 44,640 samples
  • 30s interval -> 1TS / 30s * 3,600s in an hour * 744 hours in a month = 89,280 samples
  • 10s interval -> 1TS / 10s * 3,600s in an hour * 744 hours in a month = 267,840 samples

Based on these numbers you can easily convert prices per sample into prices per time series.

VendorTS conversionNumber of samplesPriceSamples per price unitCost
AWS (60s) 2B samples44,802.8744,640.00$0.9010,000,000.000.0040
AWS (60s) Next 250B samples5,600,358.4244,640.00$0.3510,000,000.000.0016
AWS (60s) over 252B samples47,903,493.7144,640.00$0.1610,000,000.000.0007
AWS (30s) 2B samples22,401.4389,280.00$0.9010,000,000.000.0080
AWS (30s) Next 250B samples2,800,179.2189,280.00$0.3510,000,000.000.0031
AWS (30s) over 252B samples50,726,074.3589,280.00$0.1610,000,000.000.0014
AWS (10s) 2B samples7,467.14267,840.00$0.9010,000,000.000.0241
AWS (10s) Next 250B samples933,393.07267,840.00$0.3510,000,000.000.0094
AWS (10s) over 252B samples52,607,794.78267,840.00$0.1610,000,000.000.0043
GCP (60s) First 50B samples1,120,071.6844,640.00$0.151,000,000.000.0067
GCP (60s) Next 50B-250B samples4,480,286.7444,640.00$0.121,000,000.000.0054
GCP (60s) Next 250B-500B samples5,600,358.4244,640.00$0.091,000,000.000.0040
GCP (60s) >500B samples42,347,938.1544,640.00$0.061,000,000.000.0027
GCP (30s) First 50B samples560,035.8489,280.00$0.151,000,000.000.0134
GCP (30s) Next 50B-250B samples2,240,143.3789,280.00$0.121,000,000.000.0107
GCP (30s) Next 250B-500B samples2,800,179.2189,280.00$0.091,000,000.000.0080
GCP (30s) >500B samples47,948,296.5889,280.00$0.061,000,000.000.0054
GCP (10s) First 50B samples186,678.61267,840.00$0.151,000,000.000.0402
GCP (10s) Next 50B-250B samples746,714.46267,840.00$0.121,000,000.000.0321
GCP (10s) Next 250B-500B samples933,393.07267,840.00$0.091,000,000.000.0241
GCP (10s) >500B samples51,681,868.86267,840.00$0.061,000,000.000.0161
Azure (60s) Any number of samples44,640.00$0.1610,000,000.000.0007
Azure (30s) Any number of samples89,280.00$0.1610,000,000.000.0014
Azure (10s) Any number of samples267,840.00$0.1610,000,000.000.0043
Grafana Labs (60s) First 20,000 TSIncluded in subscription
Grafana Labs (60s) > 20,000 TS0.0080
Sysdig (10s) node exporter, cAdvisor, KSM TSIncluded in subscription
Sysdig (10s) First 2,000 TS per nodeIncluded in subscription
Sysdig (10s) > 2,000 TS per node0.0050

K8s single cluster use case

Note that the number of time series you can have in your Prometheus instance can vary significantly and is dependent on your architecture. The more applications, and operational tasks like redeploy, creation, deletion, and scaling in your cluster, the more time series you will generate. Depending on how volatile your Pods and Kubernetes objects are, cardinality explosions may occur and can cause serious trouble. The more time series, the more storage you need, and the more prone you are to scalability and performance issues, and much more cost.

Let’s mock up a sample architecture that will serve as a foundation to estimate the managed service for Prometheus costs for every vendor.

For this use case, we have the following information about the Kubernetes infrastructure:

  • One Kubernetes cluster
  • 25 nodes

Next, you’ll find the total number of time series registered in the Prometheus instance after a few days. This Kubernetes cluster was running under a normal load, not being stressed by heavy workloads or peaks of user activity. To emulate a minimal and a likely application lifecycle, we redeployed a few applications every day. These are the number of time series generated by job:

  • kubernetes-apiservers: 73,713 TS
  • kubernetes-pods: 275,421 TS
  • kubernetes-nodes-cadvisor: 257,649 TS
  • kubernetes-service-endpoints (node exporter + KSM): 144,202 TS
  • kubernetes-nodes: 42,166 TS
  • kube-dns: 370 TS
  • etcd: 4399 TS
  • prometheus: 1068 TS
  • felix_metrics: 4008 TS
  • kube_controller_metrics: 63 TS
  • Time series TOTAL: 803,059

In terms of query processing and user activity, let’s start with the following assumptions:

  • 10 different users accessing Prometheus with their own dashboards and graphs reporting data for their projects.
  • An average of eight graphs querying data per user.
  • The refresh interval for each graph is 10 seconds. Assuming an average of two hours per user and day, it corresponds to 720 queries. 5,760 is the total number of queries for eight graphs per day and user.
  • Data is being shown in a three hour timeframe on average.
  • The highest number of samples processed in this environment for 3 hours is: 867,303,720.
  • We’ll assume 300,000 samples on average per query.
  • Monitoring query processing ~525,657,600,000 (1)
  • Alerting query processing ~5,256,000,000,000 (2)

(1) Monitoring query processing has been calculated from: 10 monitoring users * 5,760 queries per day and user * 30 days a month * 300,000 avg samples per query (3h).

(2) Alerting query processing has been calculated from: 2 executions per minute * 200 alerts * 60 minutes per hour * 730 hours per month * 300,000 avg samples per alerting rule.

Once we have all the data, let’s do some math!

First, we need to calculate the number of samples ingested based on our numbers.

803,059 TS / 10 collection interval in seconds * 3600 seconds * 744 hours in a month = 215,091,322,560 -> ~215 billion samples.

For those services where storage costs are charged, let’s assume that storage initially needed for that volume of metrics is ~12GB.

With regards to queries processed, based on previous calculations, we’ll assume that the total volume of queries is ~5,781 billion queries / month. You may think this number is too large but, on the contrary, it may be too small if we take into account that for a single query you may be querying millions of samples.

Notice that a 10 seconds sampling interval was used to calculate the total number of samples. Some vendors like Grafana Labs implement a 60 second sampling interval by default.

Disclaimer: Following, you’ll find a quick calculation of the managed service for Prometheus costs for every vendor. Bear in mind that this is an approximation, cost may fluctuate depending on the usage of your monitoring platform.

AWS

Let’s see what the costs charged by each service would be.

ServiceCost
TS first 2B samples2B samples * $0.90 /10M = $180
TS next 250B samples213,091,322,560 samples * $0.35/10M = $7,458.19
Storage$0.03 * 12.92GB * 365 days = $141.52 / month
Query Samples Processed (QSP)$0.10/ B * 5,781 B queries = $578.1 / month
Total cost$180 + $7,458.19 + $141.52 + $578.1 = $8,357.81 / month

So, if you own an Amazon managed service for Prometheus for processing the monitoring data belonging to the architecture defined earlier, you would spend around $8,357 a month.

GCP

It’s now time to analyze the pricing for Google Cloud monitoring.

ServiceCost
TS first 50B samples50B samples * $0.15 /1M = $7,500
Next 50B-250B samples165,091,322,560 samples * $0.12/1M = $19,810.95
Total cost$7,500 + $19,810.95 = $27,310.95

If you own a Google Cloud monitoring instance for processing the same data, you’ll pay around $27,310 a month.

Azure

Let’s analyze the Azure offering for monitoring and ingesting your Prometheus metrics.

ServiceCost
TS215,091,322,560 * $0.16 /10M = $3,441.46
Query processing$0.10/ B * 5,781 B queries = $578.1
Total cost$3,441.46 + $578.1 = $4,019.56 / month

Using Azure for this specific use case would cost around $4,019 a month, and you’d also need to take into account the costs related to alerts and notifications, which are extra assets that would be charged.

Grafana Labs

These are the Grafana Labs costs for its Grafana Cloud product with a 10-second sampling interval.

Since the cost will vary depending on your active metrics, let’s suppose all of your metrics (100%) are active.

ServiceCost
Service fee$299 / month
TS first 20kincluded in subscription
TS > 20k803,059-20k = 783,059 * $8 / 1,000 = $6,264.4
Total cost$299 + $6,264.4 = $6,563.4

With Grafana Labs, total cost would be around $6,563 a month.

Sysdig Monitor

Sysdig Monitor implements a 10-second sampling interval by default, resulting in up to 6x more metrics compared to competitors and at a lower cost, as you’ll see next.

First of all, you need to pull out node exporter, cAdvisor, and KSM metrics from the current numbers. For the sake of simplicity, let’s subtract the following jobs from the total number of time series:

  • kubernetes-nodes-cadvisor: 257,649 TS
  • kubernetes-service-endpoints (node exporter + KSM): 144,202 TS
  • kubernetes-nodes: 42,166 TS

The new TOTAL number of time series is: 803,059 – 257,649 – 144,202 – 42,166 = 359,042 TS. The number of billable time series has been reduced by ~55%!

ServiceCost
Agent cost$30 * 25 nodes = $750
Metrics included in agent subscription2,000 * 25 nodes = 50,000 TS included free of charge
Metrics ingestionNext 309,042 * $5 / 1,000 = $1,545.21 / month
Total cost$750 + 1,545.21 = $2,295.21 / month

If you use Sysdig Monitor as your managed service for Prometheus, you’d pay around $2,295 a month.

Price comparison

After doing all the calculations, it’s time to sum up the costs of every service. This time, the price is calculated by TS for different ingestion intervals (60s, 30s, and 10s).

TS calculatorAWS First ~44,800/22,400/7,500 TSAWS Next ~933,393 TSGCP First ~1,120,071/ 560,035/186,678 TSGCP Next ~746,714 TSAzure – any number of TSGrafana Labs & SysdigQSPDiskTOTAL
AWS (60s)$180.00$1,184.70$578.17$141.52$2,084.38
AWS (30s)$180.00$2,439.40$578.17$141.52$3,339.08
AWS (10s)$180.00$7,458.20$578.17$141.52$8,357.88
GCP (60s)$5,377.28$5,377.28
GCP (30s)$7,500.00$2,603.65$10,103.65
GCP (10s)$7,500.00$19,810.96$27,310.96
Azure (60s)$573.58$578.17$1,151.74
Azure (30s)$1,147.15$578.17$1,725.32
Azure (10s)$3,441.46$578.17$4,019.63
Grafana Labs (60s)$6,563.47$6,563.47
Sysdig (10s)$2,295.21$2,295.21

When comparing managed service for Prometheus costs, be aware of the costs by metric interval sampling. Sysdig’s metric interval sampling is 10 seconds by default, while DIY Prometheus, GCP, and Grafana pull metrics every 60 seconds. Despite collecting 6x more data than some of its competitors, Sysdig is the cheapest option. There are no extra charges for storage or queries samples processing with Sysdig. Bear in mind that these services can dramatically increase your bill, plus the inherent complexity of forecasting QSP numbers, it is something variable that depends on the users’ usage.

Comparing all the managed service for Prometheus prices analyzed in this article under the 10 seconds metrics ingestion scope, you’ll see a huge difference among vendors:

  • AWS: ~$8,357 / month
  • GCP: ~$27,310 / month
  • Azure: ~$4,019 / month
  • Grafana Labs: ~$6,563 / month
  • Sysdig: ~$2,295 / month

Azure is almost 2x more costly than Sysdig, AWS is almost 4x, and GCP is 12x more.

K8s multi-cluster use case

If you skipped the “K8s single cluster” use case, please take a look at the first paragraph of that section. It is key to understand how TS are generated and how volatile and dynamic the numbers can be for every use case.

This time, we’ll analyze costs in a larger scenario. This architecture is made up of 5 K8s clusters, 50 nodes each cluster.

  • Five Kubernetes cluster
  • 50 nodes per cluster

The total number of time series have been calculated under regular load, no stress or high load average peaks. For the sake of simplicity, during the testing cycle we redeployed/scaled down/scaled up a few deployments. That way, we can emulate a real application lifecycle. Following, you’ll find the number of time series generated by this group of K8s clusters by job:

  • kubernetes-apiservers: 368,565 TS
  • kubernetes-pods: 4,131,315 TS
  • kubernetes-nodes-cadvisor: 3,607,086 TS
  • kubernetes-service-endpoints (node exporter + KSM): 2,018,828 TS
  • kubernetes-nodes: 505,992 TS
  • kube-dns: 1,850 TS
  • etcd: 21,995 TS
  • prometheus: 5,340 TS
  • felix_metrics: 20,040 TS
  • kube_controller_metrics: 315 TS
  • Time series TOTAL: 10,681,326

In terms of query processing and user activity, let’s start with the following assumptions:

  • 50 different users accessing Prometheus with their own dashboards and graphs reporting data for their projects.
  • An average of eight graphs querying data per user.
  • The refresh interval for each graph is 10 seconds. Assuming an average of 2h per user and day, it corresponds to 720 queries. 5,760 is the total number of queries per day and user.
  • Data is being shown in a three hour timeframe on average.
  • The highest number of samples processed in this environment for three hours is: 11,535,832,080.
  • We’ll assume 300,000 samples on average per query.
  • Monitoring query processing ~2,628,288,000,000 (1)
  • Alerting query processing ~26,280,000,000,000 (2)

(1) Monitoring query processing has been calculated from: 50 monitoring users * 5,760 queries per day and user * 30 days a month * 300,000 avg samples per query (3h).

(2) Alerting query processing has been calculated from: 2 executions per minute * 200 alerts per cluster * 60 minutes per hour * 730 hours per month * 300,000 avg samples per alerting rule.

Disclaimer: Following, you’ll find a quick calculation of the managed service for Prometheus costs for every vendor. Bear in mind that this is an approximation, and cost may fluctuate depending on the usage of your monitoring platform.

This time, we’ll go straight to the point and calculate the costs for each provider using the price per TS we got from the previous section.

Price comparison

Let’s compare the 10 second metrics ingestion prices for every service.

TS calculatorAWS First ~44,800/22,400/7,500 TSAWS Next ~5,600,358/2,800,179/933,393 TSAWS over 252B samplesGCP First ~1,120,071/ 560,035/186,678 TSGCP Next ~4,480,286/2,240,143/746,714 TSGCP Next ~5,600,358/2,800,179/933,393 TSGCP over 500B samplesAzure – any number of TSGrafana Labs & SysdigQSPDiskTOTAL
AWS (60s)$180.00$8,750.00$3,597.03$2,890.83$1,882.28$17,300.13
AWS (30s)$180.00$8,750.00$11,226.06$2,890.83$1,882.28$24,929.16
AWS (10s)$180.00$8,750.00$41,742.18$2,890.83$1,882.28$55,445.29
GCP (60s)$7,500.00$24,000.00$20,413.30$0.00$51,913.30
GCP (30s)$7,500.00$24,000.00$22,500.00$27,217.73$81,217.73
GCP (10s)$7,500.00$24,000.00$22,500.00$141,653.18$195,653.18
Azure (60s)$7,629.03$2,890.83$10,519.86
Azure (30s)$15,258.06$2,890.83$18,148.89
Azure (10s)$45,774.18$2,890.83$48,665.01
Grafana Labs (60s)$85,589.61$85,589.61
Sysdig (10s)$27,747.10$27,747.10

In summary, these are the costs for every service ingesting ~11 million metrics with a 10 second interval.

  • AWS: ~$55,445/month
  • GCP: ~$195,653/month
  • Azure: ~$48,665/month
  • Grafana Labs: ~$85,589/month
  • Sysdig: ~$27,747/month

For a ~11 million time series volume, Sysdig’s managed service for Prometheus is significantly cheaper than its competitors. GCP is almost 4x more expensive than AWS and Azure. Grafana Labs increased the cost significantly, being the second most expensive option. In comparison, Sysdig offers the most cost-effective solution.

Sysdig’s managed service for Prometheus benefits

Sysdig’s managed service for Prometheus stands out as the most affordable option on the market, with significant cost savings compared to other cloud providers such as GCP or AWS.

If you are a OSS Prometheus user looking to delegate your Prometheus metrics ingestion and management, you’ll benefit from huge cost savings. Sysdig’s managed service for Prometheus can help you reduce operation costs by taking care of metrics maintenance, scalability, storage, performance, and issue resolution.

If you are already processing and delegating your metrics to a managed Prometheus provider, keep in mind that you can still reduce your costs and even obtain more features for less. Query Sample Processing (QSP) charges can be particularly tricky to calculate, as they depend on many various factors such as the number of concurrent users, graphs, dashboards, interval refresh, infrastructure size, etc.

Sysdig’s agent ingestion interval time is set to 10 seconds by default, whereas others opt for longer intervals up to 60 seconds. Reducing this interval can negatively impact performance and reliability in some cases, resulting in a poor user experience. By choosing Sysdig’s managed service for Prometheus, you can benefit from shorter metrics ingestion time intervals without sacrificing performance, stability, and reliability, all at a lower price!

When it comes to querying and analyzing your historical data, performance is key. A managed service for Prometheus not able to give your data in a timely manner is not operational and can cause a lot of harm. Sysdig Monitor rolls up historical data over time. This is a key feature(1) that makes Sysdig Monitor do QSP way faster than its competitors.

Apart from including free of charge KSM, node exporter, and cAdvisor metrics, as well as all the metrics collected by Sysdig, including your own custom metrics and platform metrics, you can benefit from the metric enrichment Sysdig brings out of the box. With Sysdig, your Prometheus metrics now gain cloud and Kubernetes context.

Sysdig is much more than a monitoring tool to ingest and store metrics, and analyze your data. Metrics enrichment, eBPF instrumentation, Sysdig Advisor for troubleshooting, Cost Advisor for reducing your K8s and cloud costs, out-of-the-box dashboards, alerts, and integrations, are some of the benefits that Sysdig Monitor brings. Check out this article and discover these features and much more!.

(1) If you want to customize how Sysdig Monitor rolls up your data, please reach out to Sysdig support representatives.

Cut operational and infrastructure related costs

There are other costs associated with Prometheus monitoring worth mentioning. Prometheus metrics cost, the number of time series you manage, and QSP volume are not the only topics that can save you money. A managed service for Prometheus will take most of your burden off by ingesting and processing your Prometheus metrics automatically, and will make all your data available for you.

Businesses may struggle maintaining, escalating, and supporting their Prometheus monitoring infrastructure. This task can be challenging and painful, especially when cardinality explosion comes into play. Time series start to grow exponentially causing serious troubles with stability, scalability, and costs.

There’s no need to worry anymore about whether your monitoring infrastructure is well sized, if you can scale your environment up in a timely manner, or that issue that is causing a big headache that prevents your organization from consuming your metrics. Relying on Sysdig Monitor to ingest, process, and manage your Prometheus metrics and your observability platform can help you dramatically reduce your operational and infrastructure costs.

In terms of Kubernetes and cloud costs, Sysdig can do much more to help you with cutting costs. Sysdig’s Cost Advisor is a tool included in Sysdig Monitor that helps you identify in which areas you are overspending. You can drill down through your whole infrastructure, get granular information, and finally reduce your wasted spending by rightsizing your workloads based on Sysdig’s recommendations. Do you want to learn more? Check how Cost Advisor can help you to reduce your wasted spending by 40% on average!

Conclusion

While the vendors analyzed in this article offer similar managed services for Prometheus, their associated costs may vary. Some vendors also charge for QSP and storage, which can increase your bill exponentially, making it difficult to limit and control costs since they are tied to usage. The more users query, inspect and monitor your data, the higher your bill will be.

Sysdig’s managed service for Prometheus is significantly cheaper than its competitors, even when ingesting metrics every 10 seconds. QSP and storage are included in the price, so there are no surprises when the bill arrives.

To learn more about how to reduce costs with Sysdig’s managed service for Prometheus, visit the Sysdig Monitor trial page and request a 30-day free account. You’ll be up and running in minutes!

Subscribe and get the latest updates