View markdown source on GitHub

Galaxy Monitoring with Telegraf and Grafana

Contributors

Questions

Objectives

last_modification Last modification: Apr 6, 2021

Telegraf, InfluxDB, and Grafana

General purpose tools for monitoring systems and services.

Tool Use
Telegraf plugin-driven server agent for collecting & reporting metrics
Influxdb purpose built time series database
Grafana dashboard for beautiful analytics and monitoring

Dataflow:

Speaker Notes


Grafana showcase

If you see a dashboard you can export its configuration and put it on your Grafana with your data. Copy away!

Speaker Notes


galaxy dashboard showing route timings, user counts, job counts, etc.

Speaker Notes


node detail dashboard with filesystem usage, process states, cpu, memory, load, network, etc.

Speaker Notes


DB dashboard showing transactions, tuples fetched/modified, and index sizes for each database

Speaker Notes


user statistics page for Eu with 23k users, 30k workflows, 400k histories, 13M jobs, and 30M datasets. Additional breakdowns provided for years of compute time on various clusters included 1k years on de.NBI cloud.

Speaker Notes


cvmfs dashboard showing which repos each server supports in green, and missing ones in white. ~90% of repos are supported

Speaker Notes


Key Points

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! page logo This material is licensed under the Creative Commons Attribution 4.0 International License.