Prometheus Vs Victoria Metrics Load Testing

Introduction

'Celebration of Engineering'
7 min readJan 3, 2024

In our ongoing project, we are actively involved in an extensive load testing program, meticulously assessing the performance of both Prometheus and VictoriaMetrics under a range of load conditions. Through systematic experimentation, our objective is to acquire a deep understanding of how these two systems handle varying levels of stress and data influx. Our evaluation covers critical factors such as response times, scalability, resource utilization, and stability in diverse scenarios. By subjecting both systems to realistic testing conditions, we expect to reveal valuable insights that will inform our decision-making process in selecting the most suitable solution for our monitoring and metrics requirements, ultimately improving the efficiency of our infrastructure.

Factors affecting load testing

Active Time Series: The quantity of active time series plays a pivotal role in shaping the resource utilization and overall workload management within both Prometheus and Victoria Metrics. Prometheus stores active time series in memory during the compaction window, whereas Victoria Metrics agents retain this data until it is written to the VM insert storage. This distinction in how active time series are managed influences the memory and storage requirements of each system, consequently impacting their performance characteristics.

Ingestion Rate: The rate of ingestion is directly correlated with the size of the metrics being scraped. As the ingestion rate increases, there is a corresponding rise in resource utilization.

Targets: The quantity of targets directly influences the CPU demands, with a greater number of targets leading to elevated CPU usage. Additionally, the number of targets also impacts the active time series within the system.

Load testing scenarios

The following load scenarios we have run and captured resource utilization data.

Baseline load test

  • Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 1000
  • Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 2000
  • Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 5000

Current production scale load test

  • Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 1000
  • Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 1000
  • Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 2000
  • Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 2000
  • Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 5000
  • Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 5000

Future projection load test

  • Active Time Series : 15 Million, Ingestion Rate: 180k, Target: 1000
  • Active Time Series : 15 Million, Ingestion Rate: 250k, Target: 1000
  • Active Time Series : 15 Million, Ingestion Rate: 180k, Target: 2000
  • Active Time Series : 15 Million, Ingestion Rate: 250k, Target: 2000
  • Active Time Series : 15 Million, Ingestion Rate: 180k, Target: 5000
  • Active Time Series : 15 Million, Ingestion Rate: 250k, Target: 5000

Baseline Load test

Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 1000

Resource Utilization:

Prometheus:

CPU — 1.5

Memory — 10.6

Victoria Metrics:

CPU — 3.68

Memory — 5.45

Cpu Usage:

Memory Usage:

Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 2000

Resource Utilization:

Prometheus:

CPU — 1.5

Memory — 12.3

Victoria Metrics:

CPU — 4.36

Memory — 5.83

Cpu Usage:

Memory Usage:

Active Time Series : 5 Million, Ingestion Rate: 180k, Target: 5000

Resource Utilization:

Prometheus:

CPU — 1.5

Memory — 12.5

Victoria Metrics:

CPU — 4.68

Memory — 5.86

Cpu Usage:

Memory Usage:

Baseline load test conclusion

In our baseline test scenarios, it was evident that Victoria Metrics was consuming more CPU resources compared to Prometheus under various load conditions. During the load testing phase, we captured a CPU flame graph for Victoria Metrics’ VMAgent, which emerged as the primary source responsible for the heightened CPU usage. Subsequently, we proceeded to enhance the CPU performance of the Victoria Metrics stack by implementing optimization measures.

Optimization after baseline load test

In optimizing VictoriaMetrics vmagent, we executed several significant strategies:

1. Label Reduction: To enhance efficiency, we eliminated the practice of adding agent-specific labels to all metrics. This step yielded a reduction in CPU utilization, a finding substantiated by Victoria’s flame graph analysis.

2. Preserved Kubernetes Metadata: Our approach to preserving labels generated from target sources, without discarding any extra data, had an important effect. This avoided any additional computational operations.

3. Enhanced `remove_stalemaker: Given our exclusive use of Victoria Metrics agent for data scraping, without employing Prometheus federation, the default functionality of `remove_stalemaker` was unnecessary. Turning it off resulted in a noticeable reduction in memory usage.

4. Network Bandwidth Optimization: Our observation of vmagent’s egress bandwidth for 10 million time series revealed high figures ranging from 30 to 60 Mbps. Addressing this, we enabled compression, effectively curtailing the bandwidth to a much more manageable 1 Mbps.

Flame Graph Before Label Reduction :

Flame Graph After Label Reduction :

Current production scale load test

Current zones resource utilization, Active time series, targets & ingestion rate

Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 1000

Resource Utilization:

Prometheus:

CPU — 1.70

Memory — 13.5 Gi

Victoria Metrics:

CPU — 1.40

Memory — 4.27 Gi

Cpu Usage:

Memory Usage:

Grafana Reference:

Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 1000

Resource Utilization:

Prometheus:

CPU — 2.51

Memory — 18.7 Gi

Victoria Metrics:

CPU — 2.30

Memory — 5.40 GI

Cpu Usage:

Memory Usage:

Grafana Reference:

Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 2000

Resource Utilization:

Prometheus:

CPU — 1.89

Memory — 19.7

Victoria Metrics:

CPU — 1.69

Memory — 5.27

Cpu Usage:

Memory Usage:

Grafana Reference:

Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 2000

Resource Utilization:

Prometheus:

CPU — 2.23

Memory — 17.9

Victoria Metrics:

CPU — 2.05

Memory — 4.87

Cpu Usage:

Memory Usage:

Grafana Reference:

Active Time Series : 10 Million, Ingestion Rate: 180k, Target: 5000

Resource Utilization:

Prometheus:

CPU — 1.91

Memory — 20.7

Victoria Metrics:

CPU — 1.30

Memory — 4.92

Cpu Usage:

Memory Usage:

Grafana Reference:

Active Time Series : 10 Million, Ingestion Rate: 250k, Target: 5000

Resource Utilization:

Prometheus:

CPU — 2.27

Memory — 25.5

Victoria Metrics:

CPU — 2.06

Memory — 5.09

Cpu Usage:

Memory Usage:

Grafana Reference:

Load test conclusion

  • Based on the aforementioned load test, our findings reveal that Victoria Metrics outperforms Prometheus. We noted that Victoria Metrics exhibited slightly lower CPU utilization compared to Prometheus. Notably, in memory usage, Victoria Metrics showcased a remarkable fivefold performance improvement across all production-grade load test scenarios.
  • Furthermore, it was observed that as the load increased, Prometheus’s memory utilization also exhibited a corresponding rise.
  • Victoria Metrics demonstrated consistent memory utilization across different load conditions, indicating that memory is not a limiting factor for Victoria Metrics in environments with increasing metrics volume. This stability underscores Victoria Metrics’ ability to effectively handle growing metric datasets.

Part 1 — https://medium.com/@zetablogs/supercharge-your-monitoring-migrate-from-prometheus-to-victoriametrics-for-scalability-and-speed-e1e9df786145

Part 2 — https://medium.com/@zetablogs/part-2-supercharge-your-monitoring-migrate-from-prometheus-to-victoriametrics-for-optimised-cpu-9a90c015ccba

Thanks

Authors:

Vijesh Nair → linkedin.com/in/vijesh-nair-b651a2a1

Ritesh Sanjay →linkedin.com/in/riteshsanjaymahajan

Reviewers:

Shashidhar Soppin→ linkedin.com/in/shashidhar-soppin-8264282

Praveen Irrinki → linkedin.com/in/pirrinki

Shaik Idris →linkedin.com/in/shaikidris

--

--

'Celebration of Engineering'
'Celebration of Engineering'

Written by 'Celebration of Engineering'

Engineering adventures and the stories from the trenches

No responses yet