name: inverse layout: true class: center, middle, inverse
---
# Galaxy Interactive Environments
Authors:
Saskia Hiltemann
Björn Grüning
Helena Rasche
last_modification
Updated: Jul 10, 2021
text-document
Plain-text slides
Tip:
press
P
to view the presenter notes |
arrow-keys
Use arrow keys to move between slides
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ## Requirements Before diving into this slide deck, we recommend you to have a look at: - Docker basics --- ### <i class="far fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions - What are Galaxy Interactive Environments (GIEs)? - How to enable GIEs in Galaxy? - How to develop your own GIE? --- ### <i class="fas fa-bullseye" aria-hidden="true"></i><span class="visually-hidden">objectives</span> Objectives - Implement a Hello-World Galaxy Interactive Environment --- # Interactive Environments --- ### Why IEs? - Embedded access to third-party application inside of Galaxy - Interactively analyze data, access analysis products within Galaxy ![Ipython is shown in the center pane of galaxy](../../images/vis_IE_ipython0.png) - Bring external analysis platform to the data instead of vice-versa - no need to download/re-upload your data --- ### Who should use IEs? - Everyone! - Programming Environments for Bioinformaticians and Data Scientists - Jupyter - Rstudio - Visualization IEs are great for life scientists - IOBIO (bam/vcf visualizations) - Phinch (metagenomics visualizations) --- ## Types of visualizations in Galaxy GIE for visualization? Check that it is the right choice for your project - **Trackster** - built-in genome browser - **Display applications** - UCSC Genome Browser - IGV - **Galaxy tools** - JBrowse - Krona - **Visualization plugins** - Charts - Generic - **Interactive Environments** - Jupyter/Rstudio - IOBIO (bam/vcf visualizations) - Phinch (metagenomics visualizations) --- ## Which should I use? ![Flowchart. Only available on an external website? If yes use a display application. Does it need to be served (e.g. python), if yes use an interactive tool. Is it computationally intensive, then it needs to be a regular tool. Is it written in javascript? Then it shold be a generic plugin. If it passes all these tests it can be a charts plugin.](../../images/which_viz_flowchart.png) --- ### How to launch an IE? - Can be bound to specific datatypes - Available under the visualizations button on the dataset .image-25[![visualisation button in galaxy is clicked on a dataset. Jupyter and Rstudio appear below charts.](../../images/vis_IE_button.png)] - Or more general-purpose applications (Jupyter/Rstudio) - IE launcher .image-25[![A drop down menu from the masthead of galaxy shows interactive environments as well.](../../images/vis_IE_launcher_menu.png)] --- ### IE Launcher - Choose between different available docker images - Attach one or more datasets from history .image-75[![An IE launcher is shown with select boxes for the GIE, image, and datasets. Launch is shown below.](../../images/vis_IE_launcher.png)] --- ### How does it work? - Docker Containers are launched on-demand by users.. - ..and killed automatically when users stop using them ![schematic of request flow. A users requests an IE session with galaxy which connects to a docker host, launches the IE, and communicates this to a NodeJS proxy living next to galaxy. The user opens the connection through the proxy to access the container.](../../images/vis_IE_infra.png) .footnote[ Admin Docs: https://docs.galaxyproject.org/en/master/admin/interactive_environments.html ] --- ### Jupyter - General purpose/ multi-dataset - Provides special functions to interact with the Galaxy history (get/put datasets) - Ability to save and load notebooks .image-75[![jupyter seen in the main panel of galaxy](../../images/vis_IE_ipython1.png)] --- ### Jupyter ![the above screenshot but they have scrolled to show embedded plots computed in jupyter in galaxy](../../images/vis_IE_ipython2.png) --- ### Jupyter ![Another screenshot of jupyter in galaxy, this time fetching datasets from galaxy and running samtools.](../../images/vis_IE_ipython3.png) --- ### Rstudio - General purpose/ multi-dataset - Provides special functions to interact with the Galaxy history - Ability to save and load workbook and R history object ![default rstudio in galaxy, multiple panes are visible for computing, values, and files.](../../images/vis_IE_rstudio.png) --- ### IOBIO - Visualizes single dataset - Only available for datasets of specific formats ![IOBIO are dynamic javascript-y visualisations with lots of constantly updating graphs that are recalculated on the fly. They are shown in two scratchbooks in Galaxy](../../images/vis_IE_iobio.png) --- ### IOBIO ![a close up of iobio with several graphs and illegible text.](../../images/vis_IE_iobio2.png) --- ### Phinch ![Phinch is shown inside galaxy, another visually intensive graphing application for metagenomics data.](../../images/vis_IE_phinch.png) --- ### Admin - Prerequisites: NodeJs (npm) and Docker; Galaxy user must be able to talk to the docker daemon - Enable IEs in `galaxy.yml` ```bash interactive_environment_plugins_directory = config/plugins/interactive_environments ``` - Install node proxy ```bash $ cd $GALAXY/lib/galaxy/web/proxy/js/ $ npm install . ``` - Can configure GIEs to run on another host .footnote[ Advanced configurations: https://docs.galaxyproject.org/en/master/admin/interactive_environments.html] --- ### Development - Not hard to build! - All the magic is in: ```bash $GALAXY/config/plugins/interactive_environments/$ie_name/ ``` | Component | File | |------------------------------------|------------------------------| | Visualization Plugin Configuration | ../config/${ie_name}.xml | | IE specific Configuration | ../config/${ie_name}.ini | | Mako Template | ../templates/${ie_name}.mako | --- ### Development ![schematic of GIE with a box on the left labelled ipython.mako being invoked by a launch ipython notebook call. This has boxes for running a docker container and then proxying authentication. These point to a docker container with a config.yaml containing the history id, api key and password, the ipython galaxy notebook, and the ipython webservice which loops while the connection is active. This proxies the connection and points to a cartoon of ipython in galaxy's center panel.](../../images/vis_IE_ipython_components.png) --- ### Hello World Example - All files in this example available from https://github.com/hexylena/hello-world-interactive-environment/ - Create a GIE that shows the directory listing of `import` folder (datasets loaded into GIE by user) ```bash $ tree $GALAXY_ROOT/config/plugins/interactive_environments/helloworld/ config/plugins/interactive_environments/helloworld/ ├── config │ ├── helloworld.ini │ ├── helloworld.ini.sample │ └── helloworld.xml ├── static │ └── js │ └── helloworld.js └── templates └── helloworld.mako ``` --- Create GIE plugin XML file `config/helloworld.xml` ```xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE interactive_environment SYSTEM "../../interactive_environments.dtd"> <!-- This is the name which will show up in the User's Browser --> <interactive_environment name="HelloWorld"> <data_sources> <data_source> <model_class>HistoryDatasetAssociation</model_class> <!-- filter which types of datasets are appropriate for this GIE --> <test type="isinstance" test_attr="datatype" result_type="datatype">tabular.Tabular</test> <test type="isinstance" test_attr="datatype" result_type="datatype">data.Text</test> <to_param param_attr="id">dataset_id</to_param> </data_source> </data_sources> <params> <param type="dataset" var_name_in_template="hda" required="true">dataset_id</param> </params> <!-- Be sure that your entrypoint name is correct! --> <entry_point entry_point_type="mako">helloworld.mako</entry_point> </interactive_environment> ``` --- Set up `.ini` file, which controls docker interaction `config/helloworld.ini.sample` ```ini [main] # Unused [docker] # Command to execute docker. For example `sudo docker` or `docker-lxc`. #command = docker {docker_args} # The docker image name that should be started. image = hello-ie # Additional arguments that are passed to the `docker run` command. #command_inject = --sig-proxy=true -e DEBUG=false # URL to access the Galaxy API with from the spawn Docker container, if empty # this falls back to galaxy.yml's galaxy_infrastructure_url and finally to the # Docker host of the spawned container if that is also not set. #galaxy_url = # The Docker hostname. It can be useful to run the Docker daemon on a different # host than Galaxy. #docker_hostname = localhost [..] ``` --- - Create mako template `templates/helloworld.mako` - Loads configuration from `ini` file - launches docker container, - builds a URL to the correct endpoint through Galaxy NodeJS proxy - set environment variable `CUSTOM` to be passed to container - attach dataset selected by user (`hda`) ```mako <%namespace name="ie" file="ie.mako" /> <% # Sets ID and sets up a lot of other variables ie_request.load_deploy_config() # Define a volume that will be mounted into the container. # This is a useful way to provide access to large files in the container, # if the user knows ahead of time that they will need it. user_file = ie_request.volume( hda.file_name, '/import/file.dat', how='ro') # Launch the IE. This builds and runs the docker command in the background. ie_request.launch( volumes=[user_file], env_override={ 'custom': '42' } ) [..] ``` --- (continued) ```mako [..] # Only once the container is launched can we template our URLs. The ie_request # doesn't have all of the information needed until the container is running. url = ie_request.url_template('${PROXY_URL}/helloworld/') %> <html> <head> ${ ie.load_default_js() } </head> <body> <script type="text/javascript"> ${ ie.default_javascript_variables() } var url = '${ url }'; ${ ie.plugin_require_config() } requirejs(['interactive_environments', 'plugin/helloworld'], function(){ load_notebook(url); }); </script> <div id="main" width="100%" height="100%"> </div> </body> </html> ``` --- Lastly we must write the `load_notebook` function, `static/js/helloworld.js` ```javascript function load_notebook(url){ $( document ).ready(function() { test_ie_availability(url, function(){ append_notebook(url) }); }); } ``` --- ### Hello World Example - The only thing missing now is the GIE (Docker) container itself - Container typically consists of: - Dockerfile - Proxy configuration (e.g. nginx) - Custom startup script/entrypoint - Script to monitor traffic and kill unused containers - The actual application for the users (here: simple python process which serves directory contents of `/import` folder of container) --- ```dockerfile FROM ubuntu:14.04 # These environment variables are passed from Galaxy to the container # and help you enable connectivity to Galaxy from within the container. # This means your user can import/export data from/to Galaxy. ENV DEBIAN_FRONTEND=noninteractive \ API_KEY=none \ DEBUG=false \ PROXY_PREFIX=none \ GALAXY_URL=none \ GALAXY_WEB_PORT=10000 \ HISTORY_ID=none \ REMOTE_HOST=none RUN apt-get -qq update && \ apt-get install --no-install-recommends -y \ wget procps nginx python python-pip net-tools nginx # Our very important scripts. Make sure you've run `chmod +x startup.sh # monitor_traffic.sh` outside of the container! ADD ./startup.sh /startup.sh ADD ./monitor_traffic.sh /monitor_traffic.sh # /import will be the universal mount-point for IPython # The Galaxy instance can copy in data that needs to be present to the # container RUN mkdir /import [..] ``` --- (continued) ```dockerfile [..] # Nginx configuration COPY ./proxy.conf /proxy.conf VOLUME ["/import"] WORKDIR /import/ # EXTREMELY IMPORTANT! You must expose a SINGLE port on your container. EXPOSE 80 CMD /startup.sh ``` --- - Proxy configuration (nginx) - reverse proxy our directory listing process running on port 8000 ```nginx server { listen 80; server_name localhost; access_log /var/log/nginx/localhost.access.log; # Note the trailing slash used everywhere! location PROXY_PREFIX/helloworld/ { proxy_buffering off; proxy_pass http://127.0.0.1:8000/; proxy_redirect http://127.0.0.1:8000/ PROXY_PREFIX/helloworld/; } } ``` --- - Create the `startup.sh` file which starts our directory listing service ```bash #!/bin/bash # First, replace the PROXY_PREFIX value in /proxy.conf with the value from # the environment variable. sed -i "s|PROXY_PREFIX|${PROXY_PREFIX}|" /proxy.conf; # Then copy into the default location for ubuntu+nginx cp /proxy.conf /etc/nginx/sites-enabled/default; # Here you would normally start whatever service you want to start. In our # example we start a simple directory listing service on port 8000 cd /import/ && python -mSimpleHTTPServer & # Launch traffic monitor which will automatically kill the container if # traffic stops /monitor_traffic.sh & # And finally launch nginx in foreground mode. This will make debugging # easier as logs will be available from `docker logs ...` nginx -g 'daemon off;' ``` --- - Lastly, the script to monitor traffic and shut down if user is no longer connected, `monitor_traffic.sh` ```bash #!/bin/bash while true; do sleep 60 if [ `netstat -t | grep -v CLOSE_WAIT | grep ':80' | wc -l` -lt 3 ] then pkill nginx fi done ``` --- ### Hello World Example - We are now ready to build our container, and try out our new GIE! - If the container is hosted on a service like Dockerhub or quay.io, it will be automatically fetched on first run. ```bash $ cd hello-ie $ docker build -t hello-ie . ``` ![A directory listing is shown in the center panel.](../../images/vis_IE_helloworld.png) .footnote[Try it yourself: https://github.com/hexylena/hello-world-interactive-environment] --- ### <i class="fas fa-key" aria-hidden="true"></i><span class="visually-hidden">keypoints</span> Key points - Interactive Environments offer access to third-party applications within Galaxy - Interactive Environments run in a docker images for sandboxing and easy dependency management --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Authors:
Saskia Hiltemann
Björn Grüning
Helena Rasche
This material is licensed under the Creative Commons Attribution 4.0 International License
.