After rolling out openvpn across our infrastructure I need to know how it is behaving, if I can trust it for production workflows, logging, etc.
inventory_processes_perf += [ ( ['!ping'], ALL_HOSTS, "openvpn", "/usr/sbin/openvpn", GRAB_USER, 1, 1, 1, 1), ]
That one line of configuration appended to
inventory_process_perf gives us:
- Automatic inventory of all openvpn process on all hosts.
- Graphs of memory usage the openvpn client is consuming.
- Notification if openvpn appears more then once.
- Notification if openvpn crashes.
- Notification if the owner of any openvpn process changes.
That's about as DRY it gets. Nothing changes on the clients, no additional load, no new connections.
How does check_mk do this? The check_mk "agent" is a single bash script.
One of the things it does is dump the process list, amongst which is this:
$ check_mk_agent | grep ope[n]vpn (root,4960,2228,0.0) /usr/sbin/openvpn --writepid /var/run/openvpn.client.pid --daemon ovpn-client --status /var/run/openvpn.client.status 10 --cd /etc/openvpn --config /etc/openvpn/client.conf --script-security 2
pro tip: The brackets in the grep prevent you having to grep out the grep in the ps output
It returns the owner, vsize, rsize, cpu%, and full command line for every process. This seems obvious once you start slicing and dicing it but no other distributed monitoring system has caught on to it.
Check_mk (combined with icinga/nagios) processes this list centrally to give you all the aformentioned bullet points. It can now notice if an extra sshd server appears, or if apache2 starts forking out of control, or if the pid owner of php-fpm is suddenly root. It passes along load and ram usage as performance counters to nagios giving you trends and graphs.
All this just from the process list returned by /usr/bin/check_mk_agent.Go Top