7. Monitoring your cluster and applications
okctl we create an observability stack in the cluster that provides metrics, traces and logs from the Kubernetes cluster, relevant AWS resources and the applications running in the cluster.
- AWS CloudWatch for AWS resources, including EKS control plane logs
- Loki for logs
- Prometheus for metrics
- Tempo for traces
These backends provide us with the basic building blocks we required to build a fully functional observability stack.
We love declarative configuration, being able to check everything into git is the best thing to happen since sliced bread. We use declarative configuration to add dashboards, alerts, and scrapers to Grafana and Prometheus. By using
ConfigMaps with annotations to add dashboards to Grafana we can easily track these resources in our version control system also, similarly for
ServiceMonitor type for Prometheus.
We will eventually use the AlertManager for setting up alerts, feel free to do so now, but we haven't started looking at this in-depth yet.
NB: We only support prometheus for metrics currently, as such, you need to ensure that your application has a metrics endpoint that can be scraped by the
ServiceMonitor you setup.
The full list of available CustomResourceDefinitions provides a good overview of the capabilities provided by this operator.
For setting up monitoring of your application, we recommend reading this guide. The most relevant part is the setup of the
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example-app labels: team: frontend spec: selector: matchLabels: app: example-app endpoints: - port: web
Once you have setup a
ServiceMonitor for your application you can login to the
Grafana dashboard and use the query explorer using the
Prometheus datasource and start searching for the metrics you have defined.
We scrape the logs of all pods in the Kubernetes cluster using Promtail and send these on to Loki. For details on how best to setup your logging for use with Loki, we recommend reading the documentation. Essentially, these logs will be available, and can be queried from the query explorer in Grafana when the Loki datasource is selected.
We have set up Grafana for you, and multiple datasources. Once you have found the metrics, traces or logs you are interested in following you can setup a dashboard for easy viewing. The easiest way of creating a dashboard is through the UI. Once you are satisfied with the result, you can export it as json and add it to grafana via a declarative config.
We achieve this, because we have enabled a sidecar for dashboards. In essence, you define a
ConfigMap like so:
apiVersion: v1 kind: ConfigMap metadata: name: sample-grafana-dashboard labels: grafana_dashboard: "1" data: k8s-dashboard.json: |- [...]
For more inspiration, you can take a look at some of the default dashboards.
The important part is the
grafana_dashboard label, also, please remember that the name of the dashboard, e.g.:
k8s-dashboard.json needs to be unique, if you use the same name everywhere they will overwrite each other.
- Collecting logs, metrics and traces
- Setting up alarms and alerts