Galaxy Monitoring

name: inverse
layout: true
class: center, middle, inverse

</span></div>

</span></div>

---

# Galaxy Monitoring

<div class="contributors-line">
		Authors: 
<a href="/training-material/hall-of-fame/natefoo/" class="contributor-badge contributor-natefoo"><img src="https://avatars.githubusercontent.com/natefoo?s=27" alt="Avatar">Nate Coraor</a>

<a href="/training-material/hall-of-fame/bgruening/" class="contributor-badge contributor-bgruening"><img src="/training-material/assets/images/orcid.png" alt="orcid logo"/><img src="https://avatars.githubusercontent.com/bgruening?s=27" alt="Avatar">Björn Grüning</a>

<a href="/training-material/hall-of-fame/slugger70/" class="contributor-badge contributor-slugger70"><img src="https://avatars.githubusercontent.com/slugger70?s=27" alt="Avatar">Simon Gladman</a>

<a href="/training-material/hall-of-fame/hexylena/" class="contributor-badge contributor-hexylena"><img src="/training-material/assets/images/orcid.png" alt="orcid logo"/><img src="https://avatars.githubusercontent.com/hexylena?s=27" alt="Avatar">Helena Rasche</a>

</div>

<div class="footnote" style="bottom: 8em;"><i class="far fa-calendar" aria-hidden="true"></i><span class="visually-hidden">last_modification</span> Updated: Mar 1, 2022</div>

<div class="footnote" style="bottom: 6em;">

<i class="fas fa-file-alt" aria-hidden="true"></i><span class="visually-hidden">text-document</span><a href="slides-plain.html"> Plain-text slides</a>
</div>

<div class="footnote" style="bottom: 2em;">
    <strong>Tip: </strong>press <kbd>P</kbd> to view the presenter notes
    | <i class="fa fa-arrows" aria-hidden="true"></i><span class="visually-hidden">arrow-keys</span> Use arrow keys to move between slides

</div>

???
Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press `P` again to switch presenter notes off

Press `C` to create a new window where the same presentation will be displayed.
This window is linked to the main window. Changing slides on one will cause the
slide to change on the other.

Useful when presenting.

---

## Manage Jobs

An admin interface to list current unfinished jobs and finished jobs of a certain age.

* You can stop unfinished jobs
* You can show details of old jobs
* You can lock the server from spawning new jobs. (e.g. for maintenance.)

---

# Log Files

- Galaxy logs (`journalctl -f -u galaxy`)
  - Web (uWSGI)
  - Handler
- nginx logs (`/var/log/nginx/*`)

---

# Analytics

Can we make better walltime decisions?

`scripts/runtime_stats.py`: Database-driven job runtime statistics

---

# Reports

Galaxy ships with its own app that reports usage (user, job, data, etc numbers)

---
# Nagios

[Nagios](https://www.nagios.com/) is a general-purpose tool for monitoring systems and services.

Galaxy-specific check in `contrib/nagios/`: Runs Galaxy jobs

---

# Sentry

* Motto: *"Stop hoping your users will report errors"*
* Error tracking and analysing tool.
* Galaxy has Sentry middleware that you can enable in configuration.

---
# Job Metrics

Galaxy can collect metrics on each job through configurable plugins in `job_metrics_conf.xml`.

Some plugins:
- `core`: Captures Galaxy slots, start and end of job, runtime
- `cpuinfo`: processor count for each job
- `env`: dump environment for each job
- `collectl`: monitor a wide array of system performance data

---

# Telegraf, InfluxDB, and Grafana

General purpose tools for monitoring systems and services.

Tool     | Use
---      | ---
[Telegraf](https://github.com/influxdata/telegraf) | plugin-driven server agent for collecting & reporting metrics
[Influxdb](https://github.com/influxdata/influxdb/) | purpose built time series database
[Grafana](https://grafana.com/)  | dashboard for beautiful analytics and monitoring

Dataflow:

- Galaxy produces data
- Telegraf consumes and buffers it, before sending it to
- InfluxDB which stores the data
- And Grafana is used to visualise it

---
# Infrastructure for Grafana

* Everything captured in Galaxy Ansible [infrastructure-playbook](https://github.com/galaxyproject/infrastructure-playbook/) repository.
* Ansible [playbook](https://github.com/dj-wasabi/ansible-telegraf) to install Telegraf.
* Ansible [tasks](https://github.com/galaxyproject/infrastructure-playbook/blob/master/roles/stats/tasks/redhat.yml) for installing InfluxDB and Grafana.

---
# Grafana showcase

* usegalaxy.eu [public server](https://stats.usegalaxy.eu)
* usegalaxy.org.au [public server](https://stats.genome.edu.au)
* usegalaxy.org private server

If you see a dashboard you can export its configuration and put it on your Grafana with your data. Copy away!

---

## Thank You!

This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!

<a href="/training-material/hall-of-fame/slugger70/" class="contributor-badge contributor-slugger70"><img src="https://avatars.githubusercontent.com/slugger70?s=27" alt="Avatar">Simon Gladman</a>

</div>

</div>

<a rel="license" href="https://creativecommons.org/licenses/by/4.0/">
This material is licensed under the Creative Commons Attribution 4.0 International License</a>.