name: inverse layout: true class: center, middle, inverse
---
# Scripting Galaxy using the API and BioBlend
Authors:
Olivia Doppelt-Azeroual
Fabien Mareuil
Nicola Soranzo
Dannon Baker
last_modification
Updated: Jul 14, 2022
video-slides
Video slides
|
text-document
Plain-text slides
Tip:
press
P
to view the presenter notes |
arrow-keys
Use arrow keys to move between slides
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ## Requirements Before diving into this slide deck, we recommend you to have a look at: - [JupyterLab](https://jupyterlab.readthedocs.io/) --- ### <i class="far fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions - What is a REST API? - How to interact with Galaxy programmatically? - Why and when should I use BioBlend? --- ## Galaxy API - **Application Programming Interface (API)**: the protocol defined by a software for how it can be controlled by an external program - Galaxy provides a rich API: - Over the HTTP protocol - The Galaxy UI is being migrated on top of the Galaxy API ??? - An Application Programming Interface provides software developers with a definition of the methods to interact with a program or a library. - When the program is a remote web server like Galaxy, the methods are represented by URIs and the communication is through the HTTP protocol. - Nowadays, most of the Galaxy user interface makes use of the backend API to implement an asynchronous web application. --- ## Interacting with Galaxy: UI vs. API - The Galaxy UI is good for: - exploring and visualizing data - experimenting - graphically designing workflows - people not comfortable with the command line - The Galaxy API is good for: - interact programmatically with the server - complex control: **branching** and **looping** (not yet possible in workflows) - automate repetitive tasks - integration with external resources ??? - The Galaxy user interface is a better choice for explorative analysis, visualising data and drawing workflows. - The API instead allows you to automate tasks using Galaxy's capabilities programmatically. - A typical use case is to upload FASTQ files as soon as your sequencer finishes writing them, and running a quality control workflow. - Importantly, however you interact with Galaxy, you can equally benefit from features like reproducibility of the analysis and data sharing. - In fact, all the work done via the API is still accessible when you return to the UI. --- ## Galaxy API functionalities - Users can: - upload and download data - run tools and workflows, ... - manage histories and datasets - Admins can also manage: - data libraries - Tool Shed repositories - users, quotas, roles... - Source code lives at https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/webapps/galaxy/api/ ??? - Most of the operations you would normally perform on the Galaxy UI are available via the API. - You can for example upload data, run tools and workflows, and manage your histories. - It is also possible to perform admin tasks, like manage data libraries and install tools. --- ## RESTful API .left[REpresentational State Transfer (REST) is the architectural style of the World Wide Web:] - client–server - API requests: - standard HTTP request **methods** (GET, POST, PUT, DELETE,...) and status codes - Uniform **Resource** Identifiers (URIs) - Query and payload for parameters ??? - The Galaxy API follows the REST model typical of web applications. - A REST API specifies a protocol for the interaction between a client and a server. - The client sends requests composed by an HTTP method and a resource. - The main HTTP methods are: GET (to retrieve a resource); POST (to create); PUT (to replace); and DELETE. - Resources are identified by Uniform Resource Identifiers. - Examples of resources in the Galaxy API are: datasets, tools, jobs, histories, libraries, users; essentially anything that is recorded in the Galaxy database. - The client often need to pass additional parameters or data to specify how a request should be carried out. - The server replies to the request with a status code (to indicate if there was an error) and usually some data. --- ## API requests - **HTTP method + URI [+ payload]** 1. `GET https://usegalaxy.org/api/histories?order=name` -> ordered list of histories - URI parameters: IDs in path, others in the *query* (`?name=value&...`) ??? - Let's see some examples of possible API requests to a Galaxy server. - In the first example we use the GET method to retrieve a resource, in this case the list of histories. - The URI starts with the protocol and address of the server, followed by the resource we are interested in: the histories. - A URI may then include an optional query, preceded by a question mark, containing a sequence of request parameters specified as key-equal-value and separated by ampersands. - In this example, the query is used to ask that the list of histories is ordered by name. --- ## API requests - **HTTP method + URI [+ payload]** 1. `GET https://usegalaxy.org/api/histories?order=name` -> ordered list of histories 2. `POST /api/histories {"name": "New analysis"}` -> create a history named "New analysis" 3. `PUT /api/histories/<id> {"published": true}` -> publish a history 4. `DELETE /api/histories/<history_id>/contents/<id>` -> delete a history dataset - URI parameters: IDs in path, others in the *query* (`?name=value&...`) - POST/PUT payload as JSON ??? - In the second example, the POST method is used to create a resource on the server, in this case a new history. - Parameters for the POST request are passed as a payload in JSON format; more on this later. - In this example the payload contains the new name for the history to be created. - In the third example, we use the PUT method to update an existing resource, in this case to make a history public. - The history to modify is indentified by appending its ID to the histories URI. - The parameters for PUT requests are also passed in a payload. - In the fourth example, we use the DELETE method to remove a resource from the server. - In this case, we request the deletion of a particular dataset in a specific history, as indicated in the URI. --- ## JSON format .left[JavaScript Object Notation https://www.json.org/] <img style="float: right;" height="80" width="80" alt="JSON logo" src="../../images/json160.gif" /> - Lightweight data-interchange text format - Easy to read/write for both humans and machines - ECMA-404 open standard (2013), [RFC 8259](https://tools.ietf.org/html/rfc8259) (2017) ```json {"history_id": "b5731bb49a17bf50", "id": "df06cc665d85b6ea", "inputs": {"0": {"id": "bbd44e69cb8906b51528b5d606d1fdd0", "src": "hda"}}, "model_class": "WorkflowInvocation", "outputs": ["bbd44e69cb8906b528819eaaff340ecd", "0ff30b4e2a4bed9e"], "state": "scheduled", "update_time": "2015-07-03T19:28:39.544574", "workflow_id": "56482e194d798eb6"} ``` ??? - Request payloads passed to Galaxy and data returned by the API are encoded in the JSON format. - JSON is a standard format used to exchange text data and is supported in all major programming languages. - In JSON, strings are enclosed by double quotes, dictionaries by curly braces, and arrays by square brackets. - Dictionaries and arrays can be nested at will, as shown in the example. --- ## Status codes and errors - HTTP status codes: - 200 OK, 400 your error, 500 server error, ...<br> https://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml - Galaxy error codes and messages: [`lib/galaxy/exceptions/`](https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/exceptions) - Still a work in progress ??? - In REST APIs, the server should use the standardised HTTP status codes in its responses to indicate either a successful request or the type of error encountered. - In the Galaxy API, error status codes and messages are returned to the client by raising specific Python exceptions in the backend. --- ## How to access a REST API .left[With anything that can communicate over HTTP:] - Command line: - `curl` - GUI: - Browsers: only GET - [RESTClient](https://addons.mozilla.org/firefox/addon/restclient/) add-on for Firefox - [Advanced REST Client](https://advancedrestclient.com/) - Software libraries: - General HTTP libraries (e.g. *requests* for Python) - Service-specific libraries (e.g. *BioBlend* to access Galaxy using Python) ??? - A REST API can be accessed via the HTTP protocol. There are 3 main ways to do that. - On the command line, the curl tool can perform any type of HTTP request; look at its manual for details. - For graphical user interfaces, you can perform GET requests directly on your browser by simply entering the URI on the address bar. - For more complex requests, you can use the Advanced REST Client open source app. - The third and most common way is to write a program. - All programming languages have some general library to communicate over HTTP. - For Python, requests is probably the best library to do that. - Many web services provide dedicated higher-level libraries to access their REST API. - In particular, BioBlend is a Python library to interact with Galaxy that we will describe later. --- ## Security - Most API calls require authentication - When the UI accesses the API, session auth is used - Other callers needs an **API key**: alphanumeric string (32 chars) identifying a registered user<br> Keep it secure, it’s the same as a username+password! - Always use HTTPS: - **https**://example.org/api?key=foo is safer due to the encryption of the transmitted data ??? - A note about security. - When programmatically performing requests that require authentication, the client need to pass an API key. - An API key is an alphanumeric string uniquely identifying a user on a server. - Since it is equivalent to the combination of your username and password, keep it secure! - In particular, always use the HTTPS protocol to make requests, not HTTP which sends data unencrypted. --- ## Advanced Galaxy API config .left[Options in `config/galaxy.yml`:] - User impersonation by adding `run_as` in the payload ```yaml # Optional list of email addresses of API users who can make calls on # behalf of other users. api_allow_run_as: foo@foo.com ``` - Bootstrapping Galaxy ```yaml # API key that allows performing some admin actions without actually # having a real admin user in the database and config. Only set this # if you need to bootstrap Galaxy, in particular to create a real # admin user account via API. You should probably not set this on a # production server. master_api_key: MASTERLOCK ``` ??? - There are 2 options in the Galaxy configuration file that are relevant for the API. - api_allow_run_as allows the specified user to impersonate any other user on the Galaxy instance. - master_api_key instead is an optional special API key that can be used to bootstrap a Galaxy instance, in particular to create an initial admin account. --- ## Galaxy API Modernization .left[Moving to [FastAPI](https://fastapi.tiangolo.com/)] - Main advantages: - [Async requests](https://fastapi.tiangolo.com/async/) - Subscriptions via [websockets](https://fastapi.tiangolo.com/advanced/websockets/) - Integrated [OpenAPI](https://spec.openapis.org/oas/v3.1.0) interactive documentation, e.g. https://usegalaxy.org/api/docs - Reduce maintenance burden - Challenges: - Need to refactor over 200 endpoints ??? - The Galaxy API is currently in the process of migrating towards FastAPI, which is a modern framework for building REST APIs in Python with really interesting features. - For example, FastAPI can serve requests with better performances using asynchronous coroutines. - Another performance advantage is the ability to avoid inefficient polling by using WebSockets subscriptions. - In addition to the performance benefits, by using FastAPI in combination with type annotations, the Galaxy API can comply with the OpenAPI standard. - This standard can greatly enhance interoperability with other systems and reduce the maintenance burden of documentation and client code generation. --- ## Galaxy API pros and cons - Pros: - Integrated with Galaxy - Well tested - Language agnostic - Cons: - Very low-level ??? - To summarise this first part, Galaxy has an extensive REST API that allows users and admins to interact programmatically with a server. - It can be accessed with any programming language, but it's also very low level. - For example, you need to construct complex URIs, encode payloads and decode returned data. --- ## BioBlend - BioBlend is a **Python library** that wraps the functionality of Galaxy and CloudMan APIs - Started by Enis Afgan, Nuwan Goonasekera and Clare Sloggett in 2012. Contributions by the Galaxy Team and the community - Open source (MIT license) - Available via PyPI and from https://github.com/galaxyproject/bioblend ??? - A Python library called BioBlend was created in 2012 to make it easier to interact with the Galaxy API. - BioBlend is open source and developed by a community of contributors. - It is hosted on GitHub and can be installed via pip. --- ## BioBlend features - Stable procedural API - Supported under Python >=3.7 - Wraps all main Galaxy API controllers - Extensive Continuous Integration testing: - on Galaxy release_17.09 and later - \>240 unit tests - Well-documented on https://bioblend.readthedocs.io ??? - BioBlend has a very stable procedural API and works on all supported Python versions. - It provides methods wrapping all the important Galaxy API endpoints. - The library uses Continuous Integration to perform a large number of tests on a wide range of Galaxy releases. - The documentation of BioBlend is very well curated and is often more accurate than the corresponding Galaxy API one. --- ## BioBlend limitations - Python-only (but separate *blend4j* and *blend4php* exist) - Its methods just deserialize the JSON response - No isolation from changes in the Galaxy API - Need to extract the entity ID for further processing - No explicit modeling of Galaxy entities and their relationships - Complex operations still need many function calls - Need for higher-level functionality ??? - Although easier to use than the Galaxy API, BioBlend has some limitations. - First of all, it's only available for Python. - There are alternative libraries for Java and PHP, but they are less complete. - Another limitation of BioBlend is that it doesn't shield the caller from possible changes in responses from the Galaxy API. - It can also be annoying to have to constantly keep track of entity IDs. - This happens because BioBlend does not try to model Galaxy entities and how they are connected. --- ## BioBlend.objects - BioBlend.objects is an extra layer which adds an **object-oriented interface** for the Galaxy API - Started by Simone Leo, Luca Pireddu and Nicola Soranzo at CRS4 in 2013 - Distributed with BioBlend - Presently limited to datasets, histories, invocations, jobs, libraries, tools and workflows ??? - To implement this modelling of Galaxy entities, some years ago an object-oriented interface was added on top of BioBlend: BioBlend objects - This is developed and distributed together with BioBlend itself. - When using this interface, methods will return objects encapsulating the dictionaries returned by the Galaxy API. - The user can then invoke further methods on these objects, for example the download method for a Dataset object. - Only a subset of BioBlend is available through the object interface, but most of the common user functionalities are included. --- ## References - Galaxy API docs: https://docs.galaxyproject.org/en/master/api_doc.html - BioBlend docs: https://bioblend.readthedocs.io/ - BioBlend chat: https://matrix.to/#/#galaxyproject_bioblend:gitter.im - C. Sloggett, N. Goonasekera, E. Afgan. BioBlend: automating pipeline analyses within Galaxy and CloudMan. *Bioinformatics* 29(13), 1685-1686, 2013, doi:[10.1093/bioinformatics/btt199](https://doi.org/10.1093/bioinformatics/btt199) - S. Leo, L. Pireddu, G. Cuccuru, L. Lianas, N. Soranzo, E. Afgan, G. Zanetti. BioBlend.objects: metacomputing with Galaxy. *Bioinformatics* 30 (19), 2816-2817, 2014, doi:[10.1093/bioinformatics/btu386](https://doi.org/10.1093/bioinformatics/btu386) ??? - Here you can find the links to the documentation of the Galaxy API and of BioBlend. - We have a dedicated Gitter channel to chat about BioBlend. - If you use BioBlend or BioBlend objects, please cite these papers. --- ### <i class="fas fa-key" aria-hidden="true"></i><span class="visually-hidden">keypoints</span> Key points - The API allows you to use Galaxy's capabilities programmatically. - BioBlend makes using the Galaxy API from Python easier. - BioBlend objects is an object-oriented interface for interacting with Galaxy. --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Authors:
Olivia Doppelt-Azeroual
Fabien Mareuil
Nicola Soranzo
Dannon Baker
This material is licensed under the Creative Commons Attribution 4.0 International License
.