Chef MonitoringSander Botman
Chef Enterprise is the central place for all our system-configuration. It is however hard to see all the changes that are being made, so Chef is sometimes perceived as a “black box” controlling all configurations within the enterprise. If configured right, the amount of control is awesome and saves you a lot of time. But with 150+ engineers changing cookbooks, roles, node properties or environment settings, you might want to know who is changing what! Where the hell did this attribute change? Who updated this role information?. Unfortunately there is only one source you can use to retrieve this information. The source of change information is the nginx log on the frontend of your web server. Splunk, Graylog, Kibana or another tool could help you to see that a role or cookbook has been touched, but you still miss the actual change information itself.
With the knowledge that tracing changes in Chef is not trivial in mind, we decided to create a processes to eliminate the problem of change tracking. As a result Chef-monitor was created. The monitor is built out of two components: “chef-logmon”; a process running on every front-end server scanning the nginx log and sending this information to the existing RabbitMQ service. The RabbitMQ service is the actual service that lives on the back-end that comes with Chef Enterprise by default. On the back-end a component called “chef-worker” is running, which picks up all the information from RabbitMQ. The “chef-worker” downloads the object out of Chef Enterprise and stores this on disk, this process can run on a separate “monitoring server” and it only needs access to Chef with an account that has enough permissions to read every object. The “chef-worker” is combined with a SVN or GIT repository, sending a diff on every commit, showing you exactly what has been changed on the object. The repository can also serve as a backup of your entire Chef environment, with just a few lines of code you can import everything into a new environment. Implementing this in a typical high available Chef Enterprise environment, will architecturally look as follows: 2 front-end web-servers (in blue), 2 back-end database servers (in red), adding our monitor server (in green)
Having all the required objects in place, as described above, the process will look like this:
Scanning the log and downloading the object can take a couple of seconds. A situation that two people change the same object at the exact same moment can occur. There is a possibility that you miss one of the changes when there is a simultaneous change, this is however apparent in the diff which makes troubleshooting a lot easier. The result of a diff looks somewhat like this:
If you want to know more about this, our cookbook and chef-monitor gem are open source. Use the links below to read more about it or to implement this yourself. Contributions in any kind or form are also highly appreciated.