Galaxy Monitoring
Contributors
last_modification Last modification: Mar 1, 2022
Manage Jobs
An admin interface to list current unfinished jobs and finished jobs of a certain age.
- You can stop unfinished jobs
- You can show details of old jobs
- You can lock the server from spawning new jobs. (e.g. for maintenance.)
Log Files
- Galaxy logs (
journalctl -f -u galaxy
)- Web (uWSGI)
- Handler
- nginx logs (
/var/log/nginx/*
)
Analytics
Can we make better walltime decisions?
scripts/runtime_stats.py
: Database-driven job runtime statistics
Reports
Galaxy ships with its own app that reports usage (user, job, data, etc numbers)
Nagios
Nagios is a general-purpose tool for monitoring systems and services.
Galaxy-specific check in contrib/nagios/
: Runs Galaxy jobs
Sentry
- Motto: “Stop hoping your users will report errors”
- Error tracking and analysing tool.
- Galaxy has Sentry middleware that you can enable in configuration.
Job Metrics
Galaxy can collect metrics on each job through configurable plugins in job_metrics_conf.xml
.
Some plugins:
core
: Captures Galaxy slots, start and end of job, runtimecpuinfo
: processor count for each jobenv
: dump environment for each jobcollectl
: monitor a wide array of system performance data
Telegraf, InfluxDB, and Grafana
General purpose tools for monitoring systems and services.
Tool | Use |
---|---|
Telegraf | plugin-driven server agent for collecting & reporting metrics |
Influxdb | purpose built time series database |
Grafana | dashboard for beautiful analytics and monitoring |
Dataflow:
- Galaxy produces data
- Telegraf consumes and buffers it, before sending it to
- InfluxDB which stores the data
- And Grafana is used to visualise it
Infrastructure for Grafana
- Everything captured in Galaxy Ansible infrastructure-playbook repository.
- Ansible playbook to install Telegraf.
- Ansible tasks for installing InfluxDB and Grafana.
Grafana showcase
- usegalaxy.eu public server
- usegalaxy.org.au public server
- usegalaxy.org private server
If you see a dashboard you can export its configuration and put it on your Grafana with your data. Copy away!