name: inverse layout: true class: center, middle, inverse
---
# Introduction to Galaxy
Authors:
Andrea Bagnacani
Bérénice Batut
Saskia Hiltemann
Anne Pajon
Nicola Soranzo
Helena Rasche
Christopher Barnett
Michele Maroni
Anne Fouilloux
Nadia Goué
Olha Nahorna
Dave Clements
last_modification
Updated: Mar 10, 2022
text-document
Plain-text slides
Tip:
press
P
to view the presenter notes |
arrow-keys
Use arrow keys to move between slides
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- # What is Galaxy? --- [![Galaxy](../../../shared/images/galaxy_logo_25percent_transparent.png)](https://galaxyproject.org/) **Data Intensive *analysis* for everyone** - Versatile and reproducible workflows - **Web** platform - **Open source** under [Academic Free License](https://opensource.org/licenses/AFL-3.0) - Developed at Penn State, Johns Hopkins, OHSU and Cleveland Clinic with substantial outside contributions ![Galaxy resources](../images/titleless-use-resource-banner.png) ??? - The Galaxy Team is composed by bioinformaticians and software engineers - OHSU = Oregon Health & Science University - Workflows are similar to cooking recipes. This metaphor helps comprehension of workflows for users researchers, citizens and hobbyists that are not familiar with data intensive research. --- ### Core values - **Accessibility** - Users without programming experience can easily upload/retrieve data, run complex tools and workflows, and visualize data - **Reproducibility** - Galaxy captures information so that any user can understand and repeat a complete computational analysis - **Transparency** - Users can share or publish their analyses (histories, workflows, visualizations) - Pages: online Methods for your paper ??? **accessible** **reproducible** **transparent** research means *sharing everything*. The Galaxy framework aims to make as simple as possible for researchers to: - share their analyses - track all used tools and versions - check all parameters - justify each step in the analysis - publish the findings with all aforementioned information Pages: interactive, web-based documents that describe a complete analysis. --- ## Galaxy growth - More than 8,500 ready to use tools for users - More than 11,800 [citations](https://www.zotero.org/groups/1732893/galaxy) - More than 170 [public Galaxy resources](https://galaxyproject.org/use/) - - 130+ public servers, many more non-public - - Both general-purpose and domain-specific ??? - The number of public Galaxy resources should be regularly updated. ![Galaxy timeline](../images/galaxy_timeline.png) # Galaxy ecosystem ![Galaxy ecosystem](/training-material/shared/images/ecosystem_main_scheme.png) --- # User Interface ??? So now that we know what Galaxy and the Galaxy Project are all about, let's look at the Galaxy interface. --- ### Main Galaxy interface ![Galaxy user interface](../images/galaxy_interface.png) Home page divided into 3 panels --- ### Top menu ![Top menu](../images/galaxy_interface_menu.png) Link | Usage -- | -- <i class="fas fa-home" aria-hidden="true"></i><span class="visually-hidden">galaxy-home</span> (or *Analyze Data*) | go back to the homepage *Workflow* | access existing workflows or create new one using the editable diagrammatic pipeline *Visualize* | create new visualisations and launch Interactive Environments *Shared Data* | access data libraries, histories, workflows, visualizations and pages shared with you *Help* | links to Galaxy Help Forum (Q&A), Galaxy Community Hub (Wiki), and Interactive Tours *User* | your preferences and saved histories, datasets, pages and visualizations --- ### Tools ![Tool interface](../images/101_11.png) - The tool search helps in finding a tool in a crowded toolbox --- ### Tool interface .pull-left[.image-50[![Screenshot of the tool interface for the Sort tool showing some inputs and an execute button.](../images/galaxy_tool.png)]] .pull-right[ - A tool form contains: - input datasets and parameters - help, citations, metadata - an `Execute` button to start a job, which will add some output datasets to the history - New tool versions can be installed without removing old ones to ensure reproducibility ] ??? The tool form is generated from a simple XML file describing: - the input datasets and their datatypes - the tool parameters (numerical, text, boolean, selections, colour) - the dependencies required to run the tool - how to generate a command to execute the tool with the specified inputs and parameters - the output datasets the tool should produce and their datatypes - tests - help, citations - various metadata (e.g. the tool version) Tools can be viewed as tiny LEGO pieces: each one solves a specific problem, and they can be combined together to build complex analysis pipelines. --- ### Tool Shed .image-50[![Screenshot of the toolshed homepage](../images/galaxy_ts.png)] - Free "app" store: [Galaxy Tool Shed](https://toolshed.g2.bx.psu.edu/) - Thousands of tools already available - Most software can be integrated - If a tool is not available, ask the Galaxy community for help! - Only a Galaxy admin can install tools --- ### History - Location of all analyses <img style="float: right;" alt="History" src="../images/history.png" /> - collects all datasets produced by tools - collects all operations performed on the data - For each dataset (the heart of Galaxy’s reproducibility), the history tracks - name, format, size, creation time, datatype-specific metadata - tool id, version, inputs, parameters - standard output (`stdout`) and error (`stderr`) - state (<span style="background-color: grey">waiting</span>, <span style="background-color: yellow">running</span>, <span style="background-color: green">success</span>, <span style="background-color: red">failed</span>) - hidden, deleted, purged ??? - We say *datasets* to refer to files as well as databases - Purged means permanently deleted --- ### Multiple histories - You can have as many histories as you want - each history should correspond to a **different analysis** - and should have a meaningful **name** .image-75[![Screenshot of galaxy showing multiple histories next to each other in the multi-history view](../images/galaxy_interface_multiple_histories.png)] ??? - Give it a good name so you can find it later, which can otherwise become difficult when you have a large number of old histories. - You can drag and drop datasets between histories --- ### History options menu .pull-left[ History behavior is controlled by the *History options* (gear icon) ![History menu](../images/galaxy_interface_history_long_menu.png) ] .pull-right[ .image-75[![History options gear button](../images/galaxy_interface_history_menu.png)] - *Create new history* (+ icon) will **not** make your current history disappear - To see all of your histories, use the history switcher .image-75[![History Switcher](../images/galaxy_interface_history_switch.png)] - *Copy Datasets* from one history to another and save disk space for your quota ] ??? - Copying datasets between histories does not affect your quota, only a single copy of the file is stored on disk because datasets are never modified after creation. --- # Loading data ??? So now you know about the tools to manipulate data and the history where you can see your data, your inputs and outputs. Let's discuss how to get data into Galaxy --- ### Importing data - Copy/paste some text - Upload files from your local computer - Upload data from an internet URL - Upload data from online databases: UCSC, BioMart, ENCODE, modENCODE, Flymine etc. - Import from Shared Data (libraries, histories, pages) - Upload data from FTP See [Getting data into Galaxy](../../galaxy-interface/tutorials/get-data/slides.html) --- ### Datatypes - Tools only accept input datasets with the appropriate datatypes - When uploading a dataset, its datatype can be either: - automatically detected - assigned by the user - Datasets produced by a tool have their datatype assigned by the tool - To change the datatype of a dataset, either: - <i class="fas fa-pencil-alt" aria-hidden="true"></i><span class="visually-hidden">galaxy-pencil</span> *Edit attributes* and *Datatypes* (if original wrong), or - <i class="fas fa-pencil-alt" aria-hidden="true"></i><span class="visually-hidden">galaxy-pencil</span> *Edit attributes* and *Convert* ??? - When you upload data, Galaxy will try to autodetect the format of the data, but can sometimes get it wrong, so you may need to correct it later. - Edit Attributes → Datatype is used to fix a wrongly assigned datatype - Edit Attributes → Convert Formats creates a new dataset using a tool that converts the original dataset in the new format - New datatypes can be added to the Galaxy code base, if missing --- ### Reference datasets #### Example: reference Genome .pull-left[ - Genome build specifies which genome assembly a dataset is associated with - e.g. mm10, hg38... - Can be assigned by a tool or by the user - Users can create custom genome builds - New builds can be added by the admin ] .pull-right[ ![Genome Builds](../images/galaxy_interface_build.png) ] ??? - Just like datatypes, you can specify which genome assembly your dataset is about. Some tools need to know this, and Galaxy can tell the tool for you. --- # Workflows ??? Now that you've got data into Galaxy, you know you can use tools to manipulate this data, and histories to keep track of what you've done. You're only missing one key part: workflows. These help you easily reproduce the exact analysis that you ran. --- ### Workflow Editor ![Workflow interface](../images/RNA-Seq_workflow_editing.png) - **Extracted** from a history - **Built manually** by adding and configuring tools using the canvas - **Imported** using an existing shared workflow ??? Biologist: - workflows are great - single button to run all of these 50 different tools - a lot of work once to figure out analysis, but easy in the future to just rerun, go get coffee and wait for thing to be done :) Bioinf / dev: - Boxes are workflow steps - 2 types: *input* and *tool* steps - Steps are connected by arrows representing the flow of datasets - Tool panel on the left with Inputs on top (to add input datasets and collections) - Small tool form on the right - Extracting a workflow from a history allows to easily convert an existing history into an analysis workflow --- ### Why would you want to create workflows? - **Re-run** the same analysis on different input data sets - **Change parameters** before re-running a similar analysis - Make use of the workflow job **scheduling** - jobs are submitted as soon as their inputs are ready - Create **sub**-workflows: a workflow inside another workflow - **Share** workflows for publication and with the community ??? Potential information overload for newbies --- ### Visualizations .image-75[![Examples of 6 different visualisations in galaxy features an MSA, a couple graphs, RNA structure, and 3d visualisations of proteins](../../dev/images/charts_examples2.png)] - Datatypes know what tools can be used to visualize datasets: - Sequencing data has a button for visualizing in IGV - Tabular data will prompt you to build charts - Protein data can be seen in a 3D viewer - Interactive environments: Jupyter, RStudio, etc --- ### Sharing data - Share everything you do in Galaxy - histories, workflows, and visualizations - Directly using a Galaxy account's email addresses on the same instance - Using a web link, with anyone who knows the link - Using a web link and publishing it to make it accessible to everyone from the *Shared Data* menu See [Sharing your History in Galaxy](/training-material/faqs/galaxy/histories_sharing.html) --- ### Community - Support forum: [Galaxy Help](https://help.galaxyproject.org/) ![Galaxy Help](../images/galaxy_help.png) - Community curated documentation: [Galaxy Community Hub](https://galaxyproject.org/) - [Events](https://galaxyproject.org/events/) all around the world - Galaxy Training for scientists, developers, admins, instructors: [Galaxy Training Community](/training-material/) - Training questions? Chat with us on [Gitter](https://gitter.im/Galaxy-Training-Network/Lobby) ??? - know was a lot - we'll come back - slides are always available online - first real analysis after the coffee? --- ## Related tutorials --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Authors:
Andrea Bagnacani
Bérénice Batut
Saskia Hiltemann
Anne Pajon
Nicola Soranzo
Helena Rasche
Christopher Barnett
Michele Maroni
Anne Fouilloux
Nadia Goué
Olha Nahorna
Dave Clements
This material is licensed under the Creative Commons Attribution 4.0 International License
.