29 March 2020
I run all of my compute workloads on kubernetes despite it being overkill for the size of projects that I run. In any case, I used to use Google's Google Kubernetes Engine managed service, but they recently announced that they would be introducing a cluster management fee that amounts to about $73 USD per month. One of the benefits of kubernetes is that it's cloud-agnostic, so I immediated started looking for alternatives. I ultimately settles on Digital Ocean's managed offering, but one of the things that I lost was the abilty to search logs using stackdriver. Of course one can always use
kubectl logs, but that's not convenient when dealing with multiple replicas.
I also decided to try out using the Loggly service as they have a very generous free tier. However my applications are also somewhat noisy logging all of the kubernetes health check events and filling up my quota. But I also didn't want to lose these events completely, I just didn't need to see them unless I needed them. So I settled upon the following architecture:
grep or similar. But this is the cheapest storage option and allows us to store logs for much longer in the event that we do need them for some reason.
I realize that these are somewhat unique requirements, and most people who actually need kubernetes are the size organization where they can probably just pay for a single logging service and be done with it. But here's how I solved it for myself.
Most people who need logging to an external service in kubernetes use fluentd. They also provide handy prebuilt images with the most popular plugins: https://github.com/fluent/fluentd-kubernetes-daemonset. The only issue is that the images they distribute only include a single plugin at a time. So what I have done is create a new image based on their image that includes the configuration for multiple plugins at once. It's available on GitHub and I publish the resulting images to Docker Hub. This image is based on the upstream image and just includes the fluentd plugins for Loggly, AWS CloudWatch, and GCS all-in-one.
By default the configuration on this image ships all of the logs to all three sources, however from the first bullet point above we only wanted to ship a subset of logs to Loggly. The way we manage this on the actual cluster is by overwriting the
fluent.conf with another Config Map. I've cut out the actual plugin configuration here to just show the important parts:
pattern /GET \/info\/ping/
@copy directive we can send all of the logs to both CloudWatch and GCS, and then we use a
relabel operation to basically allow us to reprocess all the logs messages again using the
@LOGGLYSMALL label. In that configuration we use
@grep to filter out logs that we don't want to be sent to Loggly (like all of the Kubernetes health check requests).
Then we need to update the DaemonSet volume mount:
- mountPath: /fluentd/etc/fluent.conf
An important note here is that becuase we're using a subpath the running containers do not get updated when this configuration changes. We need to manually restart the containers with
kubectl -n kube-system rollout restart daemonset fluentd.