name: inverse layout: true class: center, middle, inverse
---
# Galaxy Monitoring
Authors:
Nate Coraor
Björn Grüning
Simon Gladman
Helena Rasche
last_modification
Updated: Mar 1, 2022
text-document
Plain-text slides
Tip:
press
P
to view the presenter notes |
arrow-keys
Use arrow keys to move between slides
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ## Manage Jobs An admin interface to list current unfinished jobs and finished jobs of a certain age. * You can stop unfinished jobs * You can show details of old jobs * You can lock the server from spawning new jobs. (e.g. for maintenance.) --- # Log Files - Galaxy logs (`journalctl -f -u galaxy`) - Web (uWSGI) - Handler - nginx logs (`/var/log/nginx/*`) --- # Analytics Can we make better walltime decisions? `scripts/runtime_stats.py`: Database-driven job runtime statistics --- # Reports Galaxy ships with its own app that reports usage (user, job, data, etc numbers) --- # Nagios [Nagios](https://www.nagios.com/) is a general-purpose tool for monitoring systems and services. Galaxy-specific check in `contrib/nagios/`: Runs Galaxy jobs --- # Sentry * Motto: *"Stop hoping your users will report errors"* * Error tracking and analysing tool. * Galaxy has Sentry middleware that you can enable in configuration. --- # Job Metrics Galaxy can collect metrics on each job through configurable plugins in `job_metrics_conf.xml`. Some plugins: - `core`: Captures Galaxy slots, start and end of job, runtime - `cpuinfo`: processor count for each job - `env`: dump environment for each job - `collectl`: monitor a wide array of system performance data --- # Telegraf, InfluxDB, and Grafana General purpose tools for monitoring systems and services. Tool | Use --- | --- [Telegraf](https://github.com/influxdata/telegraf) | plugin-driven server agent for collecting & reporting metrics [Influxdb](https://github.com/influxdata/influxdb/) | purpose built time series database [Grafana](https://grafana.com/) | dashboard for beautiful analytics and monitoring Dataflow: - Galaxy produces data - Telegraf consumes and buffers it, before sending it to - InfluxDB which stores the data - And Grafana is used to visualise it --- # Infrastructure for Grafana * Everything captured in Galaxy Ansible [infrastructure-playbook](https://github.com/galaxyproject/infrastructure-playbook/) repository. * Ansible [playbook](https://github.com/dj-wasabi/ansible-telegraf) to install Telegraf. * Ansible [tasks](https://github.com/galaxyproject/infrastructure-playbook/blob/master/roles/stats/tasks/redhat.yml) for installing InfluxDB and Grafana. --- # Grafana showcase * usegalaxy.eu [public server](https://stats.usegalaxy.eu) * usegalaxy.org.au [public server](https://stats.genome.edu.au) * usegalaxy.org private server If you see a dashboard you can export its configuration and put it on your Grafana with your data. Copy away! --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Authors:
Nate Coraor
Björn Grüning
Simon Gladman
Helena Rasche
This material is licensed under the Creative Commons Attribution 4.0 International License
.