tick-charts Update

This is an update to an earlier blog post I did on tick-charts, the project to make the InfluxData stack as easy to run on Kubernetes as possible. While developing the project I’ve learned that some of the original implementation decision were not the right way to go! Also Chronograf has matured significantly since that time and has required additions to the chart to support a few different flavors of OAuth: Heroku, Github, and Google.

Telegraf Changes#

The primary change from the first version is the re-archecture of the telegraf chart. Initially I thought that a monolithic chart for for telegraf would be the best way to go. That chart puts host level monitoring as well as polling in one values.yaml file. This becomes complicated because telegraf is primarily configuration driven and holding both configurations in one file quickly becomes too complex. Another downside of that chart is that spinning up a single telegraf instance required deleting a bunch of configuration for the daemonset.

To reduce this complexity I’ve taken a two prong approach. First I split up the telegraf chart into two providing a much cleaner interface for users. Next I reduced the amount of configuration required for the daemonset (telegraf-ds). The defaults provide the basics for host level monitoring in Kubernetes: Kubelet and Docker polling, cpu, mem, disk, system load, and network statistics. All that is required of the user is to set is config.outputs.influxdb.url. If you don’t want to host your InfluxDB you can easily spin up an InfluxCloud instance to hold the data.

The chart for spinning up single telegraf instances is now called telegraf-s. Currently it is implemented in much the same way as the old telegraf chart: using some custom golang templates to generate the configuration. This is difficult for plugins that require a substantial amount of configuration such as snmp or jolokia and is error prone. In order to eliminate that complexity and the difficulty of maintaining custom code, I’ve added a toToml template function to helm. Once the 2.3 release of Helm is available, the telegraf-s chart will be modified to use it. This will make creating telegraf instances in your cluster extremely easy. Using a tool like remarshal to convert the toml output by telegraf into yaml the workflow for generating a new telegraf instance to monitor a piece of your cluster will look as follows:

# On mac at least...
$ telegraf -sample-config -input-filter nginx:cloudwatch -output-filter kafka_producer | toml2yaml | pbcopy

The resulting blob of yaml can be added directly to your values.yaml file under the config section and edited there.

Next Steps#

I plan to continue working on improving the production-readiness, examples, and user experience for deploying the TICK stack. The following items are on my # TODO: for this project:

Production Readiness
- Make InfluxDB deploy with basic authentication enabled
- Add backup/restore job examples to back up InfluxDB to s3 or other object store
- Create job to dynamically reload configuration for telegraf on upgrade
User Experience
- Create a top level chart so that the whole deployment can be managed from one chart.
- Reduce the time from zero to dashboards to as small as possible
- Address any pain-points that begin to show with increasing usage
Examples
- Monitoring Prometheus endpoints with Telegraf
- Using queues in Kubernetes with Telegraf
- Monitoring {{ .telegraf_plugin.name }} (suggestions welcome!)

As suggested above I would also love some input as to what features the community would like to see out of this integration. Please post your questions/concerns/suggestions or other comments on this post over on our Community site. I can’t wait to hear from you!