FluentD is data collection platform and a popular choice for Kubernetes to aggregate logs. Aggregating logs is all well and good but for properly managing logs you really want to output them to a log management platform, ideally one which provides some degree of visualisation and insights, unless you really love working with raw logs it’s nice to be able to view them and see patterns in a manner that’s a little more helpful. In AWS’ (using Elastic Kubernetes Service) the native way to work with logs is using AWS’ own CloudWatch Service. In this post we’ll be looking at how to deploy FluentD to EKS and integrate properly with CloudWatch.
Isn’t this already documented?
Well, for the most part it’s pretty well documented in the AWS docs here, there’ no point really retreading the same ground but a few things are oddly omitted from the documentation that are worth being aware of.
The high level steps we need to are:
- Create a Kubernetes Namespace named amazon-cloudwatch
- Create a Kubernetes ServiceAccount named fluentd
- Create a Kubernetes ConfigMap named cluster-info
- Install FluentD to your cluster by deploy the FluentD DaemonSet to your Cluster
The YAML manifests to deploy the Namespace, ServiceAccount and DaemonSet are hosted on GitHub, the steps below can be quickly deployed by running the below commands:
#--Create Namespace kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml #--Create ConfigMap kubectl create configmap cluster-info --from-literal=cluster.name=tinfoilcluster --from-literal=logs.region=eu-west-2 -n amazon-cloudwatch #--Deploy DaemonSet and ServiceAccount kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluentd/fluentd.yaml
NOTE: The above examples use a cluster named tinfoilcluster in region eu-west-2 and will capture all logs, the previously linked full documentation covers the details of this deployment and breaks down how to tune FluentD.
This will result in a number of pods being created within the amazon-cloudwatch namespace, these can be verified by running:
#--Verify running FluentD pods kubectl get pods -n amazon-cloudwatch
So We Still Don’t Have Logs…
Within the CloudWatch Log Groups tab we’re hoping to see (by default) that 3 new log groups have been created:
The single Log Group above is the Control Plane logs only, which was set up when creating the Cluster, our FluentD logs are nowhere to be seen. a critical part of the setup is that the IAM role used by the Node Group(s) must have the ability to write to CloudWatch. This role is most quickly visible from the EKS console by viewing the Node Group(s):
By default, an EC2 Node Group IAM Role needs only the below Policies assigned in order to function:
None of these Policies however allow us the rights to write to CloudWatch so we’ll need to attach another policy. Within the IAM console, we can assign the minimum possible permissions to our Role:
Clicking the Attach Policies button will allow us to search the built in Managed Policies or roll one of our own, however one already exists ideal for our needs named CloudWatchAgentServerPolicy:
So Now We Have Logs
So our Log Groups have now sprang in to life and within them are our Log Streams from our various Kubernetes entities:
All good, but what’s with that retention. 1 month might be a little low for your needs, I’ve found that if you try and change the retention it will just flip straight back, if you try and delete and recreate it will just flip straight back, isn’t that nice.
What does work however, is if you manually create those Log Groups (or use your Infrastructure as Code tooling to create them, presumably the same one you’re using to deploy your Cluster) and set the retention at the same time that you create them, you can then alter the retention from that point on without issue and Log Streams will continue to be fed in to the Log Groups without issue as long as you name them correctly.