Ceph Monitoring
About Ceph Monitoring
The monitoring system in Ceph is based on Grafana, using Prometheus as datasource and the native ceph prometheus plugin as
metric exporter. Prometheus node_exporter
is used for node metrics (cpu, memory, etc).
For long-term metric storage, Thanos is used to store metrics in S3 (Meyrin)
Access the monitoring system
-
All Ceph monitoring dashboards are available in monit-grafana (Prometheus) and filer-carbon (Graphite - Legacy)
-
The prometheus server is configured in the host
cephprom.cern.ch
, hostgroupceph/prometheus
-
Configuration files (Puppet):
it-puppet-hostgroup-ceph/code/manifests/prometheus.pp
it-puppet-hostgroup-ceph/data/hostgroup/ceph/prometheus.yaml
it-puppet-hostgroup-ceph/data/hostgroup/ceph.yaml
- Alertmanager templates:
it-puppet-hostgroup-ceph/code/files/prometheus/am-templates/ceph.tmpl
- Alert definition:
it-puppet-hostgroup-ceph/code/files/generated_rules/
-
Thanos infrastructure is under
ceph/thanos
hostgroup, configured via the corresponding hiera files.
A analog qa
infrastructure is also available, which all components replicated (cephprom-qa, thanos-store-qa, etc). This qa
infra is configured overriding the puppet environment:
it-puppet-hostogroup-ceph/data/hostgroup/ceph/environments/qa.yaml
Add/remove a cluster to/from the monitoring system
- Enable the prometheus mgr module in the cluster:
ceph mgr module enable prometheus
NOTE: Make sure that the port 9283
is accepting connections.
Instances that include the hg_ceph::classes::mgr
class will be automatically discovered through puppetdb and scraped by prometheus.
- To ensure that we don't lose metrics during mgr failovers, all the cluster mgr's will be scraped. As a side benefit, we can monitor the online status of the mgr's.
- Run or wait for a puppet run on
cephprom.cern.ch
.
Add/remove a node for node metrics (cpu, memory, etc)
Instances that include the prometheus::node_exporter
class (anything under ceph
top hostgroup) will be automatically discovered through puppetdb and scraped by prometheus.
Add/remove an alert rule to/from the monitoring system
Alerts are defined in yaml
files managed by puppet in:
it-puppet-hostgroup-ceph/files/prometheus/generated_rules
They are organised in services, so add the alert in the appropiate file (e.g: ceph alerts in alerts_ceph.yaml
). The file rules.yaml
is used to add recorded rules
There are 3 notification channels currently: e-mail, SNOW ticket and Mattermost message.
Before creating the alert, make sure you test your query in advance, for example using the Explore panel on Grafana. Once the query is working, proceed with the alert definition.
A prometheus alert could look like this:
rules:
- alert: "CephOSDReadErrors"
annotations:
description: "An OSD has encountered read errors, but the OSD has recovered by retrying the reads. This may indicate an issue with hardware or the kernel."
documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#bluestore-spurious-read-errors"
summary: "Device read errors detected on cluster {{ $labels.cluster }}"
expr: "ceph_health_detail{name=\"BLUESTORE_SPURIOUS_READ_ERRORS\"} == 1"
for: "30s"
labels:
severity: "warning"
type: "ceph_default"
alert
: Mandatory. Name of the alert, which will be part of the subject of the email, head of the ticket and title of the mattermost notification. Try to follow the same pattern as the ones already createdCephDAEMONAlert
. Daemon in uppercase and rest in camel case.expr
: Mandatory. PromQL query that defines the alert. The alert will trigger if the query returns one of more matches. It's a good exercise to usepromdash
for tuning the query to ensure that it is well formed.for
: Mandatory.The alert will be triggered if stays activefor
more than the specified time (e.g30s
,1m
,1h
).annotations:summary
: Mandatory. Express the actual alert in a a concise way.annotations:description
: Optional. Allow to specify more detailed information about the alert when the summary is not enough.annotation:documentation
: Optional. Allows to specify the url of the documentation/procedure to follow to handle the alert.labels:severity
: Mandatory. Defines the notification channel to use, based on the following:warning
/critical
: Sends an e-mail to ceph-alerts.ticket
: Sends an e-mail AND creates an SNOW ticket.mattermost
: Sends an e-email AND sends a Mattermost message to the ceph-bot channel.
labels:type
: Optional. Allows to distinguish from alerts created upstreamceph_default
from created by usceph_cern
. It has no actual implication on the alert functionality.labels:xxxxx
: Optional. You can add custom labels that could be used on the template.
NOTES
- In order for the templating to work as expected, make sure that labels
cluster
orjob_name
are part of the resulting query. In case the query does not preserve labels (likecount
) you can specify manually the label and value in thelabels
section in the alert definition.- All annotations, if defined, will appear in the body of the ticket, e-mail or mattermost message generated by the alert.
- Alerts are evaluated against the local prometheus server which contains metrics for the last 7 days. Take that into account while defining alerts that evaluates longer periods (like
linear_predict
). In such cases, you can create the alert in Grafana using the Thanos-LTMS metric datasource (more on that later this doc) .- In
grafana
orpromdash
you can access the alerts querying the metric calledALERTS
- For more information about how to define an alert, refer to the Prometheus Documentation
Create / Link procedure/documentation to Prometheus Alert.
Prometheus alerts are pre-configured to show the procedure needed for handling the alert via the annotation procedure_url
. This is an optional argument that could be configured per alert rule.
Step 1: Create the procedure in case does not exist yet.
Update the file rota.md
on this repository and add the new procedure. Use this file for convenience, but you can create a new file if needed.
Step 2: Edit the alert rule and link to the procedure.
Edit the alert following instructions above, and add the link to the procedure under the annotations
section, under the key documentation
, for example:
- alert: "CephMdsTooManyStrays"
annotations:
documentation: "http://s3-website.cern.ch/cephdocs/ops/rota.html#cephmdstoomanystrays"
summary: "The number of strays is above 500K"
expr: "ceph_mds_cache_num_strays > 500000"
for: "5m"
labels:
severity: "ticket"
Push the changes and prometheus server will reload automatically picking the new changes. Next time the alert is triggered, a link to the procedure will be shown in the alert body.
Silence Alarms
You can use the alertmanager Web Interface to silence alarms during scheduled interventions. Please always specify a reason for silencing the alarms (a JIRA link or ticket would be a plus). Additionally, for the alerts that generate an e-mail, you will find a link to silence it in the email body.
Alert Grouping
Alert grouping is enabled by default, so if the same alert is triggered in different nodes, we only receive one ticket with all involved nodes.
Modifying AlertManager Templates
Both email and Snow Ticket templates are customizable. For doing that, you need to edit the following puppet file:
it-puppet-hostgroup-ceph/code/files/prometheus/am-templates/ceph.tmpl
You have to use Golang's Template syntax. The structure of the file is as follows:
{{ define "ceph.email.subject" }}
....
{{ end }}
{{ define "ceph.email.body" }}
....
{{ end }}
For reference check the default AlertManager Templates
In case you add templates make sure that you adapt the AlertManager
configuration accordingly:
- name: email
email_configs:
- to: ceph-admins@cern.ch
from: altertmanager@locahost
smarthost: cernmx.cern.ch:25
headers:
Subject: '{{ template "ceph.email.subject" . }}'
html: '{{ template "ceph.email.body" . }}'
Note A restart of AlertManager is needed for the changes to be applied.
Accessing the prometheus dashboard (promdash)
The prometheus dashboard or Dashprom
is a powerful interface that allows to quickly asses the prometheus server status and also provide a quick way of querying metrics. The prometheus dashboard is accesible from this link: Promdash.
- The prometheus dashboard is useful for:
- Checking the status of all targets: Target status
- Check the status of the alerts Alert Status
- For debug purposes, you can execute PromQL queries directly on the dashboard and change the intervals quickly.
- In grafana there is an icon just near the metric definition to view the current query in promdash.
- You can also use the Grafana Explorer.
Note: This will only give you access to the metrics of the last 7 days, refer to the next chapter for accessing older metrics.
Long Term Metric Storage - LTMS
The long term storage metrics are kept in S3 CERN Service using Thanos. The
bucket is called prometheus-storage
and is accessed using the EC2 credentials of Ceph's Openstack Project. Accesing
to this metrics is transparent from Grafana:
- Metrics of the last 7 days are served directly from prometheus local storage
- Older metrics are pulled from S3.
- As metrics in S3 contains downsampled versions (5m, 1h) is usually much faster that getting metrics from the local prometheus.
- RAW metrics are also kept, so it is possible to zoom-in to the 15 second-resolution
Accessing the thanos dashboard
There is a thanos promdash
version here, from where you can access all historical
metrics. This dashboard has some specific thanos features like deduplication
(for use cases with more than one
prometheus servers scrapping the same data) and the possibility of showing downsampled
data (thanos stores two
downsampled versions of the metrics, with 1h and 5m resolution). This downsampled data is also stored in S3.
Thanos Architecture
You can find more detailed information in Thanos official webpage, but these are the list of active components in our current setup and the high level description of what they do:
Sidecar
- Every time Prometheus dumps the data to disk (by default, each 2 hours), the
thanos-sidecar
uploads the metrics to the S3 bucket. It also acts as a proxy that serves Prometheus’s local data.
Store
- This is the storage proxy which serves the metrics stored in S3
Querier
- This component reads the data from
store(s)
andsidecar(s)
and answers PromSQL using the standard Prometheus HTTP API. This is the component you have to point from monitoring dashboards.
Compactor
- This is a detached component which compacts the data in S3 and also creates the downsampled versions.