Galactic Database
Contributors
Galactic Database
.left[Galaxy uses a database for:]
- Galaxy objects and all their relations (users, histories, datasets, workflows)
- Job state persistence, job dispatching
.left[Galaxy does not use a database for:]
- Dataset contents: files on disk
Speaker Notes
- Galaxy objects like users, histories, datasets, and workflows are stored in the database.
- All job information is likewise stored there.
- User and reference data, however, are stored outside of the database.
- You will want to backup both user data and your database.
Defaults
- Galaxy uses the SQLAlchemy database abstraction layer. This allows for different database servers to be plugged in.
- By default Galaxy automatically creates and uses an SQLite database during the first startup.
- The database is in the file
database/universe.sqlite
- The database is in the file
Speaker Notes
- Galaxy uses SQLAlchemy for interacting with databases.
- This allows Galaxy to transparently use sqlite or postgres.
Choices
- SQLite
- Useful for single-user Galaxy or development.
- PostgreSQL
- The recommended standard for anything serious.
MySQL- Supported by SQLAlchemy but Galaxy is not tested against it.
Speaker Notes
- sqlite is fine in development, but should not be used for production.
- postgres is the best for any production server.
- Do not use mysql if possible.
Sizing
Galaxy rarely deletes from the database, most objects are marked deleted.
Allocate at least 20 GB of disk to start, 50+ GB if expanding would be difficult.
8-16 GB memory should be sufficient.
Recommended: Run PostgreSQL on a different server for resource isolation.
Speaker Notes
- Data is not removed from the database, so plan accordingly.
- Allocated at least 20 GB of disk to start, and 50 if expanding later is difficult.
- The RAM usage is usually not significant.
- We recommend running the database on a separate server from Galaxy for better isolation.
Configuration
database_connection
is specified as a database URL in galaxy.yml
-
Default SQLite:
sqlite:///./database/universe.sqlite?isolation_level=IMMEDIATE
-
Local PostgreSQL (socket):
postgresql:///<db_name>[?host=/var/run/postgresql]
-
Network PostgreSQL:
postgresql://[user][:password]@<host>[:5432]/<db_name>
Speaker Notes
- The parameter database_connection tells galaxy where the database is.
- There are several styles, depending on how you connect to your database.
New Database
On first startup with an empty database, Galaxy creates its schema
Speaker Notes
- On first startup with an empty database, Galaxy creates its schema.
Migrations
Changes in the Galaxy DB model (when upgrading Galaxy) are captured incrementally in the form of atomic scripts.
Each script can both upgrade and downgrade a DB.
$ ./manage_db.sh upgrade
$ ./manage_db.sh downgrade --version=132
Speaker Notes
- Whenever Galaxy is upgraded, there may be changes to the database schema.
- These are stored in migration scripts, and you can use the manage db script to up or downgrade.
Tuning - Pool
If the server logs errors about not having enough database pool connections.
Galaxy config option | default value | usegalaxy.org value |
---|---|---|
database_engine_option_pool_size |
5 |
10 |
database_engine_option_max_overflow |
10 |
20 |
Speaker Notes
- Databases have a limited number of connection slots.
- Galaxy can pool connections, and re-use existing connections from a pool when it needs to query the DB.
- There are a couple of options for controlling pooling.
- The defaults are generally fine until you see errors.
Tuning - Server-side cursors
If large database query results are causing memory or response time issues in the Galaxy process, leave it on server (PostgreSQL only, recommended).
Galaxy config option | default value | usegalaxy.org value |
---|---|---|
database_engine_option_server_side_cursors |
false |
true |
Speaker Notes
- Large queries may slow down Galaxy or the DB.
- You can enable server side cursors to help with this.
Tuning - Slow query logging
Queries slower than this threshold (in s) will be logged at debug level.
Galaxy config option | default value | usegalaxy.org value |
---|---|---|
slow_query_log_threshold |
0 |
2 |
Speaker Notes
- If you notice slow responses, you can enable slow query logging.
- This will print a message in your Galaxy log if a query takes more than a specified time period.
- This can be useful to help you debug issues.
Tuning - TS install database
Galaxy can track Tool Shed data in a separate DB.
Galaxy config option | default value | usegalaxy.org value |
---|---|---|
install_database_connection |
value of database_connection |
SQLite DB in CVMFS |
All other database config options but prefixed with install_
are also available.
.left[This allows:]
- Bootstrapping fresh Galaxy instances with prebuilt/tested tool sets
- Atomic installation/rollback (esp. with SQLite: backup and restore DB file)
Speaker Notes
- Galaxy can track tool shed data in a separate database
- This can enable deploying Galaxies with prebuilt tool sets
Access through model
Python script to access Galaxy’s database layer via the Galaxy model.
(venv)$ python -i scripts/db_shell.py --config config/galaxy.yml
>>> new_user = User('foo@example.org', 'secret')
>>> new_user.username = 'foo'
>>> sa_session.add(new_user)
>>> sa_session.flush()
>>> sa_session.query(User).all()
Speaker Notes
- You can use the Galaxy python models to interact with the database.
- After activating the galaxy virtual environment, you can use the DB shell script to interface.
- This can allow scripting tasks like resetting passwords.
Useful queries
Captured in gxadmin (tutorial)
Speaker Notes
- Many useful DB queries are captured in gxadmin
- Look into this if you need to query the database for information like running jobs or recent users.