Main menu:

Site search

Freedom

Creative Commons License

Sponsorship

Categories

Archive

Meta

And assorted junk

Trend Monitoring Suites

I hate cacti. Sorry guys, there are lots of things that are good about it, and those things are that if you want to monitor just switch/router interface stats, via snmp, and that’s *it*, its very easy. When you want to plot technical data that you source through something other than snmp, working through the cacti template system is like wading through tar.

ganglia.pngStep in some newer projects. Ganglia was really interesting, and a colleague found it thanks to some presentation that Flickr demo’d. I really liked how easy it was to configured. Set the agent up on a bunch of PCs, run the web interface on one, and bang, graphs. Its that easy. We installed the agent on a couple of trial PCs and we had graphs. We then wrote some scripts to measure other metrics from custom applications. If we could write a script that produced a number, then we could graph that metric in ganglia, just by passing the number to the bundled ‘gmetric’ application. Brilliant. But what about if we can’t run an agent on the device that we want to monitor, such as a switch ? There has been talk on the ganglia developers list since 2004 about incorporating snmp support, but no real evidence of traction. So it wont work for me.

So let me offer a golden rule of performance monitoring. If you are going to write a performance monitoring suite, make sure it supports SNMP on day one. If you are writing a monitoring layer for your application, make sure it uses SNMP.

In steps Zabbix. The best of both worlds. Here, there’s an agent again, so if you want to monitor the health over time of a server, you configure the agent and send back figures to a monitoring box. Figures appear. There’s also an snmp interface, so you point it at a router, tell it the community, and more figures appear.

No graphs yet, but thats because you configure them yourself, but its really easy. Want to aggregate all of the exit ports on your router - make a graph using those metrics ! If you can imagine it, you can graph it with Zabbix. Some of it is quite clunky, i.e. configuring the snmp community for each device is a bit slow, but the back end if just MySQL, so you can change the community for a device with an “update items set snmp_community =’xx’ where hostid=’yy’;” instead of using the clunky interface. Also, to measure interface stats, you must change the ifInOctets and ifOutOctets delta to ’speed over time’ not just accept the counter value, otherwise your graphs will show nothing more than the port counting more data as time goes by.

I strongly recommend Zabbix to anyone who finds cacti arduous to configure.

2 Responses to “Trend Monitoring Suites”

  1. Comment from Jean-Marc Liotier:

    I am also a Cacti refugee, but my needs were much simpler than the ones described here so I chose Munin instead. Munin is extremely simple to deploy, and writing custom scripts is rather easy. It uses an agent separate from the graphing component, it supports SNMP and no MySQL is needed. Take a look at it if you don’t need a system as full-featured as Zabbix.

  2. Comment from tomahawk:

    I agree about Munin, but would also recommend taking a look at collectd (http://collectd.org). It’s kind of a cross between Munin and Ganglia (no need for central server - data can be pushed to a multicast group or single host if you wish).

    It’s designed to act as a collector - it can gather stats every 10 seconds which can make for some finely detailed graphs (very useful when you need to investigate small outages/blips).

Write a comment

You must be logged in to post a comment.

Related articles