Sitemap

Prometheus Vs Victoria Metrics Load Testing

Introduction

7 min readJan 3, 2024

--

Press enter or click to view image in full size

In our ongoing project, we are actively involved in an extensive load testing program, meticulously assessing the performance of both Prometheus and VictoriaMetrics under a range of load conditions. Through systematic experimentation, our objective is to acquire a deep understanding of how these two systems handle varying levels of stress and data influx. Our evaluation covers critical factors such as response times, scalability, resource utilization, and stability in diverse scenarios. By subjecting both systems to realistic testing conditions, we expect to reveal valuable insights that will inform our decision-making process in selecting the most suitable solution for our monitoring and metrics requirements, ultimately improving the efficiency of our infrastructure.

Factors affecting load testing

Active Time Series: The quantity of active time series plays a pivotal role in shaping the resource utilization and overall workload management within both Prometheus and Victoria Metrics. Prometheus stores active time series in memory during the compaction window, whereas Victoria Metrics agents retain this data until it is written to the VM insert storage. This distinction in how active time series are managed influences the memory and storage requirements of each system, consequently impacting their performance characteristics.

Ingestion Rate: The rate of ingestion is directly correlated with the size of the metrics being scraped. As the ingestion rate increases, there is a corresponding rise in resource utilization.

Targets: The quantity of targets directly influences the CPU demands, with a greater number of targets leading to elevated CPU usage. Additionally, the number of targets also impacts the active time series within the system.

Load testing scenarios

The following load scenarios we have run and captured resource utilization data.

Baseline load test

  • Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 1000
  • Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 2000
  • Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 5000

Current production scale load test

  • Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 1000
  • Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 1000
  • Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 2000
  • Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 2000
  • Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 5000
  • Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 5000

Future projection load test

  • Active Time Series : 15 Million, Ingestion Rate: 180k, Target: 1000
  • Active Time Series : 15 Million, Ingestion Rate: 250k, Target: 1000
  • Active Time Series : 15 Million, Ingestion Rate: 180k, Target: 2000
  • Active Time Series : 15 Million, Ingestion Rate: 250k, Target: 2000
  • Active Time Series : 15 Million, Ingestion Rate: 180k, Target: 5000
  • Active Time Series : 15 Million, Ingestion Rate: 250k, Target: 5000

Baseline Load test

Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 1000

Resource Utilization:

Prometheus:

CPU — 1.5

Memory — 10.6

Victoria Metrics:

CPU — 3.68

Memory — 5.45

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 2000

Resource Utilization:

Prometheus:

CPU — 1.5

Memory — 12.3

Victoria Metrics:

CPU — 4.36

Memory — 5.83

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 5000

Resource Utilization:

Prometheus:

CPU — 1.5

Memory — 12.5

Victoria Metrics:

CPU — 4.68

Memory — 5.86

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Baseline load test conclusion

In our baseline test scenarios, it was evident that Victoria Metrics was consuming more CPU resources compared to Prometheus under various load conditions. During the load testing phase, we captured a CPU flame graph for Victoria Metrics’ VMAgent, which emerged as the primary source responsible for the heightened CPU usage. Subsequently, we proceeded to enhance the CPU performance of the Victoria Metrics stack by implementing optimization measures.

Optimization after baseline load test

In optimizing VictoriaMetrics vmagent, we executed several significant strategies:

1. Label Reduction: To enhance efficiency, we eliminated the practice of adding agent-specific labels to all metrics. This step yielded a reduction in CPU utilization, a finding substantiated by Victoria’s flame graph analysis.

2. Preserved Kubernetes Metadata: Our approach to preserving labels generated from target sources, without discarding any extra data, had an important effect. This avoided any additional computational operations.

3. Enhanced `remove_stalemaker: Given our exclusive use of Victoria Metrics agent for data scraping, without employing Prometheus federation, the default functionality of `remove_stalemaker` was unnecessary. Turning it off resulted in a noticeable reduction in memory usage.

4. Network Bandwidth Optimization: Our observation of vmagent’s egress bandwidth for 10 million time series revealed high figures ranging from 30 to 60 Mbps. Addressing this, we enabled compression, effectively curtailing the bandwidth to a much more manageable 1 Mbps.

Flame Graph Before Label Reduction :

Press enter or click to view image in full size

Flame Graph After Label Reduction :

Press enter or click to view image in full size

Current production scale load test

Current zones resource utilization, Active time series, targets & ingestion rate

Press enter or click to view image in full size

Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 1000

Resource Utilization:

Prometheus:

CPU — 1.70

Memory — 13.5 Gi

Victoria Metrics:

CPU — 1.40

Memory — 4.27 Gi

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Grafana Reference:

Press enter or click to view image in full size

Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 1000

Resource Utilization:

Prometheus:

CPU — 2.51

Memory — 18.7 Gi

Victoria Metrics:

CPU — 2.30

Memory — 5.40 GI

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Grafana Reference:

Press enter or click to view image in full size

Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 2000

Resource Utilization:

Prometheus:

CPU — 1.89

Memory — 19.7

Victoria Metrics:

CPU — 1.69

Memory — 5.27

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Grafana Reference:

Press enter or click to view image in full size

Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 2000

Resource Utilization:

Prometheus:

CPU — 2.23

Memory — 17.9

Victoria Metrics:

CPU — 2.05

Memory — 4.87

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Grafana Reference:

Press enter or click to view image in full size

Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 5000

Resource Utilization:

Prometheus:

CPU — 1.91

Memory — 20.7

Victoria Metrics:

CPU — 1.30

Memory — 4.92

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Grafana Reference:

Press enter or click to view image in full size

Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 5000

Resource Utilization:

Prometheus:

CPU — 2.27

Memory — 25.5

Victoria Metrics:

CPU — 2.06

Memory — 5.09

Cpu Usage:

Press enter or click to view image in full size

Memory Usage:

Press enter or click to view image in full size

Grafana Reference:

Press enter or click to view image in full size

Load test conclusion

  • Based on the aforementioned load test, our findings reveal that Victoria Metrics outperforms Prometheus. We noted that Victoria Metrics exhibited slightly lower CPU utilization compared to Prometheus. Notably, in memory usage, Victoria Metrics showcased a remarkable fivefold performance improvement across all production-grade load test scenarios.
  • Furthermore, it was observed that as the load increased, Prometheus’s memory utilization also exhibited a corresponding rise.
Press enter or click to view image in full size
  • Victoria Metrics demonstrated consistent memory utilization across different load conditions, indicating that memory is not a limiting factor for Victoria Metrics in environments with increasing metrics volume. This stability underscores Victoria Metrics’ ability to effectively handle growing metric datasets.
Press enter or click to view image in full size

Part 1 — https://medium.com/@zetablogs/supercharge-your-monitoring-migrate-from-prometheus-to-victoriametrics-for-scalability-and-speed-e1e9df786145

Part 2 — https://medium.com/@zetablogs/part-2-supercharge-your-monitoring-migrate-from-prometheus-to-victoriametrics-for-optimised-cpu-9a90c015ccba

Thanks

Authors:

Vijesh Nair → linkedin.com/in/vijesh-nair-b651a2a1

Ritesh Sanjay →linkedin.com/in/riteshsanjaymahajan

Reviewers:

Shashidhar Soppin→ linkedin.com/in/shashidhar-soppin-8264282

Praveen Irrinki → linkedin.com/in/pirrinki

Shaik Idris →linkedin.com/in/shaikidris

--

--

'Celebration of Engineering'
'Celebration of Engineering'

Written by 'Celebration of Engineering'

Engineering adventures and the stories from the trenches

Responses (2)