Frequently Asked Questions

Account


Can I create multiple Galaxy accounts?

  • You ARE NOT allowed to create more than 1 account per Galaxy server.
  • You ARE allowed to have accounts on different servers.

For example, you are allowed to have 1 account on Galaxy US, and another account on Galaxy EU, but never 2 accounts on the same Galaxy.

WARNING: Having multiple accounts is a violation of the terms of service, and may result in deletion of your accounts.


Need more disk space?


Other tips:

  • Forgot your password? You can request a reset link in on the login page.
  • If you want to associate your account with a different email address, you can do so under User -> Preferences in the top menu bar
  • To start over with a new account, first delete your account(s) first before creating a new account. This can be done in User -> Preferences menu in the top bar.

Changing acount email or password

  1. Make sure you are logged in to Galaxy.
  2. Go to User > Preferences in the top menu bar.
  3. To change email and public name, click on Manage Information and to change password, click on Change Password.
  4. Make the changes and click on the Save button at the bottom.
  5. To change email successfully, verify your account by email through the activation link sent by Galaxy.

Note: Don’t open another account if your email changes, update the existing account email instead. Creating a new account will be detected as a duplicate and will get your account disabled and deleted.

How can I reduce quota usage while still retaining prior work (data, tools, methods)?

  • Download Datasets as individual files or entire Histories as an archive. Then purge them from the public server.
  • Transfer/Move Datasets or Histories to another Galaxy server, including your own Galaxy. Then purge.
  • Copy your most important Datasets into a new/other History (inputs, results), then purge the original full History.
  • Extract a Workflow from the History, then purge it.
  • Back-up your work. It is a best practice to download an archive of your FULL original Histories periodically, even those still in use, as a backup.

Resources Much discussion about all of the above options can be found at the Galaxy Help forum.

How do I create an account on a public Galaxy instance?

  1. To create an account at any public Galaxy instance, choose your server from the available list of Galaxy Platform.

    There are 3 main public Galaxy servers: UseGalaxy.org, UseGalaxy.eu, and UseGalaxy.org.au.

  2. Click on “Login or Register” in the masthead on the server.

    Login or Register on the top panel

  3. Click on Register here and fill in the required information.
  4. Click on the Create button, your account is successfully created.
  5. Check for a Confirmation Email in the email you used for account creation.
  6. Click on the Email confirmation link to fully activate your account.

How to update account preferences?

  1. Log in to Galaxy
  2. Navigate to User > Preferences on the top menu bar.
  3. Here you can update various preferences, such as:
    • pref-info Manage Information (edit your email addresses, custom parameters, or change your public name)
    • pref-password Change Password
    • pref-identities Manage Third-Party Identities (connect or disconnect access to your third-party identities)
    • pref-permissions Set Dataset Permissions for New Histories (grant others default access to newly created histories. Changes made here will only affect histories created after these settings have been stored.
    • pref-dataprivate Make All Data Private
    • pref-apikey Manage API Key (access your current API key or create a new one)
    • pref-cloud Manage Cloud Authorization (add or modify the configuration that grants Galaxy to access your cloud-based resources)
    • pref-toolboxfilters Manage Toolbox Filters (customize your Toolbox by displaying or omitting sets of Tools)
    • pref-custombuilds Manage Custom Builds (add or remove custom builds using history datasets)
    • pref-signout Sign out of Galaxy (signs you out of all sessions)
    • pref-notifications Enable notifications (allow push and tab notifcations on job completion)
    • pref-delete Delete Account (on this Galaxy server)

Analysis


Adding a custom database/build (dbkey)

Galaxy may have several reference genomes built-in, but you can also create your own.
  • In the top menu bar, go to the User, and select Custom Builds
  • Choose a name for your reference build
  • Choose a dbkey for your reference build
  • Under Definition, select the option FASTA-file from history
  • Under FASTA-file, select your fasta file
  • Click the Save button

Beware of Cuts

Galaxy has several different cut tools
Warning: Beware of Cuts

The section below uses Cut tool. There are two cut tools in Galaxy due to historical reasons. This example uses tool with the full name Cut columns from a table (cut). However, the same logic applies to the other tool. It simply has a slightly different interface.

Extended Help for Differential Expression Analysis Tools

The error and usage help in this FAQ applies to:

  • Deseq2
  • Limma
  • edgeR
  • goseq
  • DEXSeq
  • Diffbind
  • StringTie
  • Featurecounts
  • HTSeq
  • Kalisto
  • Salmon
  • Sailfish
  • DexSeq-count

Expect odd errors or content problems if any of the usage requirements below are not met:

  • Differential expression tools all require count dataset replicates when used in Galaxy. At least two per factor level and the same number per factor level. These must all contain unique content.
  • Factor/Factor level names should only contain alphanumeric characters and optionally underscores. Avoid starting these with a number and do not include spaces.
  • If the tool uses Conditions, the same naming requirements apply. DEXSeq additionally requires that the first Condition is labeled as Condition.
  • Reference annotation should be in GTF format for most of these tools, with no header/comment lines. Remove all GTF header lines with the tool Remove beginning of a file. If any are comment lines are internal to the file, those should be removed. The tool Select can be used.
  • Make sure that if a GTF dataset is used, and tool form settings are expecting particular attributes, those are actually in your annotation file (example: gene_id).
  • GFF3 data (when accepted by a tool) should have single # comment line and any others (at the start or internal) that usually start with a ## should be removed. The tool Select can be used.
  • If a GTF dataset is not available for your genome, a two-column tabular dataset containing transcript <tab> gene can be used instead with most of these tools. Some reformatting of a different annotation file type might be needed. Tools in the groups under GENERAL TEXT TOOLS can be used.
  • Make sure that if your count inputs have a header, the option Files have header? is set to Yes. If no header, set to No.
  • Custom genomes/transcriptomes/exomes must be formatted correctly before mapping.
  • Any reference annotation should be an exact match for any genome/transcriptome/exome used for mapping. Build and version matter.
  • Avoid using UCSC’s annotation extracted from their Table Browser. All GTF datasets from the UCSC Table Browser have the same content populated for the transcript_id and gene_id values. Both are the “transcript_id”, which creates scientific content problems, effectively meaning that the counts will be summarized “by transcript” and not “by gene”, even if labeled in a tool’s output as being “by gene”. It is usually possible to extract gene/transcript in tabular format from other related tables. Review the Table Browser usage at UCSC for how to link/extract data or ask them for guidance if you need extra help to get this information for a specific data track.

Note: Selected genomes at UCSC do have a reference anotatation GTF pre-computed and available with a Gene Symbol populated into the “gene_id” value. Find these in the UCSC “Downloads” area. When available, the link can be directly copy/pasted into the Upload tool in Galaxy. Allow Galaxy to autodetect the datatype to produce an uncompressed GTF dataset in your history ready to use with tools.

My jobs aren't running!

  1. Please make sure you are logged in. At the top menu bar, you should see a section labeled “User”. If you see “Login/Register” here you are not logged in.

  2. Activate your account. If you have recently registered your account, you may first have to activate it. You will receive an e-mail with an activation link.
    • Make sure to check your spam folder!
  3. Be patient. Galaxy is a free service, when a lot of people are using it, you may have to wait longer than usual (especially for ‘big’ jobs, e.g. alignments).

  4. Contact Support. If you really think something is wrong with the server, you can ask for support

Reporting usage problems, security issues, and bugs

  • For reporting Usage Problems, related to tools and functions, head to the Galaxy Help site.
    • Red Error Datasets:
    • Unexpected results in Green Success Dataset:
      • To resolve it you may be asked to send in a shared history link and possibly a shared workflow link. For sharing your history, refer to this link.
      • To reach our support team, visit Support FAQs.
    • Functionality problems:
      • Using Galaxy Help is the best way to get help in most cases.
      • If the problem is more complex, email a description of the problem and how to reproduce it.
    • Administrative problems:
      • If the problem is present in your own Galaxy, the administrative configuration may be a factor.
      • For the fastest help directly from the development community, admin issues can be alternatively reported to the mailing list or the GalaxyProject Gitter channel.
  • For Security Issues, do not report them via GitHub. Kindly disclose these as explained in this document.
  • For Bug Reporting, create a Github issue. Include the steps mentioned here.
  • Search the GTN Search to find prior Q & A, FAQs, tutorials, and other documentation across all Galaxy resources, to verify in case your issue was already faced by someone.

Results may vary

Comment: Results may vary

Your results may be slightly different from the ones presented in this tutorial due to differing versions of tools, reference data, external databases, or because of stochastic processes in the algorithms.

Troubleshooting errors

When you get a red dataset in your history, it means something went wrong. But how can you find out what it was? And how can you report errors?

When something goes wrong in Galaxy, there are a number of things you can do to find out what it was. Error messages can help you figure out whether it was a problem with one of the settings of the tool, or with the input data, or maybe there is a bug in the tool itself and the problem should be reported. Below are the steps you can follow to troubleshoot your Galaxy errors.

  1. Expand the red history dataset by clicking on it.
    • Sometimes you can already see an error message here
  2. View the error message by clicking on the bug icon galaxy-bug

  3. Check the logs. Output (stdout) and error logs (stderr) of the tool are available:
    • Expand the history item
    • Click on the details icon
    • Scroll down to the Job Information section to view the 2 logs:
      • Tool Standard Output
      • Tool Standard Error
    • For more information about specific tool errors, please see the Troubleshooting section
  4. Submit a bug report! If you are still unsure what the problem is.
    • Click on the bug icon galaxy-bug
    • Write down any information you think might help solve the problem
      • See this FAQ on how to write good bug reports
    • Click galaxy-bug Report button
  5. Ask for help!

Will my jobs keep running?

Galaxy is a fantastic system, but some users find themselves wondering:

Will my jobs keep running once I’ve closed the tab? Do I need to keep my browser open?

No, you don’t! You can safely:

  1. Start jobs
  2. Shut down your computer

and your jobs will keep running in the background! Whenever you next visit Galaxy, you can check if your jobs are still running or completed.

However, this is not true for uploading data from your computer. You must wait for uploading a dataset from your computer to finish. (Uploading via URL is not affected by this, if you’re uploading from URL you can close your computer.)


Collections


Adding a tag to a collection

  • Click on the collection
  • Add a tag starting with # in the Add tags field

    Tags starting with # will be automatically propagated to the outputs of tools using this dataset.

  • Press Enter
  • Check that the tag is appearing below the collection name

Creating a dataset collection

  • Click on Operations on multiple datasets (check box icon) at the top of the history panel Operations on multiple datasets button
  • Check all the datasets in your history you would like to include
  • Click For all selected.. and choose Build dataset list

    build list collection menu item

  • Enter a name for your collection
  • Click Create List to build your collection
  • Click on the checkmark icon at the top of your history again

Creating a paired collection

  • Click on Operations on multiple datasets (check box icon) at the top of the history panel Operations on multiple datasets button
  • Check all the datasets in your history you would like to include
  • Click For all selected.. and choose Build List of Dataset Pairs

  • Change the text of unpaired forward to a common selector for the forward reads
  • Change the text of unpaired reverse to a common selector for the reverse reads
  • Click Pair these datasets for each valid forward and reverse pair.
  • Enter a name for your collection
  • Click Create List to build your collection
  • Click on the checkmark icon at the top of your history again

Renaming a collection

  1. Click on the collection
  2. Click on the name of the collection at the top
  3. Change the name
  4. Press Enter

Data upload


Data retrieval with “NCBI SRA Tools” (fastq-dump)

This section will guide you through downloading experimental metadata, organizing the metadata to short lists corresponding to conditions and replicates, and finally importing the data from NCBI SRA in collections reflecting the experimental design.

Downloading metadata

  • It is critical to understand the condition/replicate structure of an experiment before working with the data so that it can be imported as collections ready for analysis. Direct your browser to SRA Run Selector and in the search box enter GEO data set identifier (for example: GSE72018). Once the study appears, click the box to download the “RunInfo Table”.

Organizing metadata

  • The “RunInfo Table” provides the experimental condition and replicate structure of all of the samples. Prior to importing the data, we need to parse this file into individual files that contain the sample IDs of the replicates in each condition. This can be achieved by using a combination of the ‘group’, ‘compare two datasets’, ‘filter’, and ‘cut’ tools to end up with single column lists of sample IDs (SRRxxxxx) corresponding to each condition.

Importing data

  • Provide the files with SRR IDs to NCBI SRA Tools (fastq-dump) to import the data from SRA to Galaxy. By organizing the replicates of each condition in separate lists, the data will be imported as “collections” that can be directly loaded to a workflow or analysis pipeline.

Directly obtaining UCSC sourced *genome* identifiers

Option 1

  1. Go to UCSC Genome Browser, navigate to “genomes”, then the species of interest.
  2. On the home page for the genome build, immediately under the top navigation box, in the blue bar next to the full genome build name, you will find View sequences button.
  3. Click on the View sequences button and it will take you to a detail page with a table listing out the contents.

Option 2

  1. Use the tool Get Data -> UCSC Main.
  2. In the Table Browser, choose the target genome and build.
  3. For “group” choose the last option “All Tables”.
  4. For “table” choose “chromInfo”.
  5. Leave all other options at default and send the output to Galaxy.
  6. This new dataset will load as a tabular dataset into your history.
  7. It will list out the contents of the genome build, including the chromosome identifiers (in the first column).

How can I upload data using EBI-SRA?

  1. Search for your data directly in the tool and use the Galaxy links.
  2. Be sure to check your sequence data for correct quality score formats and the metadata “datatype” assignment.

Importing data from a data library

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

  • Go into Shared data (top panel) then Data libraries
  • Navigate to the correct folder as indicated by your instructor
  • Select the desired files
  • Click on the To History button near the top and select as Datasets from the dropdown menu
  • In the pop-up window, select the history you want to import the files to (or create a new one)
  • Click on Import

Importing via links

  • Copy the link location
  • Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)

  • Select Paste/Fetch Data
  • Paste the link into the text field

  • Press Start

  • Close the window

NCBI SRA sourced fastq data

In these FASTQ data:

  • The quality score identifier (+) is sometimes not a match for the sequence identifier (@).
  • The forward and reverse reads may be interlaced and need to be separated into distinct datasets.
  • Both may be present in a dataset. Correct the first, then the second, as explained below.
  • Format problems of any kind can cause tool failures and/or unexpected results.
  • Fix the problems before running any other tools (including FastQC, Fastq Groomer, or other QA tools)

For inconsistent sequence (@) and quality (+) identifiers

  • Correct the format by running the tool Replace Text in entire line with these options:

    • Find pattern: ^\+SRR.+
    • Replace with: +

Note: If the quality score line is named like “+ERR” instead (or other valid options), modify the pattern search to match.

For interlaced forward and reverse reads

Solution 1 (reads named /1 and /2)

  • Use the tool FASTQ de-interlacer on paired end reads

Solution 2 (reads named /1 and /2)

  • Create distinct datasets from an interlaced fastq dataset by running the tool Manipulate FASTQ reads on various attributes on the original dataset. It will run twice.

Note: The solution does NOT use the FASTQ Splitter tool. The data to be manipulated are interlaced sequences. This is different in format from data that are joined into a single sequence.

  • Use the Manipulate FASTQ settings to produce a dataset that contains the /1 reads**

    Match Reads

    • Match Reads by Name/Identifier
    • Identifier Match Type Regular Expression
    • Match by .+/2

    Manipulate Reads

    • Manipulate Reads by Miscellaneous Actions
    • Miscellaneous Manipulation Type Remove Read
  • Use these Manipulate FASTQ settings to produce a dataset that contains the /2 reads**

    • Exact same settings as above except for this change: Match by .+/1

Solution 3 (reads named /1 and /3)

  • Use the same operations as in Solution 2 above, except change the first Manipulate FASTQ query term to be:
  • Match by .+/3

Solution 4 (reads named without /N)

  • If your data has differently formatted sequence identifiers, the “Match by” expression from Solution 2 above can be modified to suit your identifiers.

Alternative identifiers such as:

@M00946:180:000000000-ANFB2:1:1107:14919:14410 1:N:0:1
@M00946:180:000000000-ANFB2:1:1107:14919:14410 2:N:0:1

Upload fastqsanger datasets via links

  1. Click on Upload Data on the top of the left panel:

    UploadDataButton

  2. Click on Paste/Fetch:

    PasteFetchButton

  3. Paste URL into text box that would appear:

    PasteFetchModal

  4. Set Type (set all) to fastqsanger or, if your data is compressed as in URLs above (they have .gz extensions), to fastqsanger.gz

    ChangeTypeDropDown:

Upload few files (1-10)

  1. Click on Upload Data on the top of the left panel
  2. Click on Choose local file and select the files or drop the files in the Drop files here part
  3. Click on Start
  4. Click on Close

Upload many files (>10) via FTP

  1. Make sure to have an FTP client installed

    There are many options. We can recommend FileZilla, a free FTP client that is available on Windows, MacOS, and Linux.

  2. Establish FTP connection to the Galaxy server
    1. Provide the Galaxy server’s FTP server name (e.g. usegalaxy.org, ftp.usegalaxy.eu)
    2. Provide the username (usually the email address) and the password on the Galaxy server
    3. Connect
  3. Add the files to the FTP server by dragging/dropping them or right clicking on them and uploading them

    The FTP transfer will start. We need to wait until they are done.

  4. Open the Upload menu on the Galaxy server
  5. Click on Choose FTP file on the bottom
  6. Select files to import into the history
  7. Click on Start

Datasets


Adding a tag

Tags can help you to better organize your history and track datasets.
  • Click on the dataset
  • Click on galaxy-tags Edit dataset tags
  • Add a tag starting with #

    Tags starting with # will be automatically propagated to the outputs of tools using this dataset.

  • Check that the tag is appearing below the dataset name

Changing database/build (dbkey)

You can tell Galaxy which dbkey (e.g. reference genome) your dataset is associated with. This may be used by tools to automatically use the correct settings.
  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, change the Database/Build field
  • Select your desired database key from the dropdown list
  • Click the Save button

Changing the datatype

Galaxy will try to autodetect the datatype of your files, but you may need to manually set this occasionally.
  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
  • Select your desired datatype
    • tip: you can start typing the datatype into the field to filter the dropdown menu
  • Click the Save button

Converting the file format

Some datasets can be transformed into a different format. Galaxy has some built-in file conversion options depending on the type of data you have.
  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, click on the galaxy-gear Convert tab on the top
  • Select the appropriate datatype from the list
  • Click the Create dataset button to start the conversion.

Creating a new file

Galaxy allows you to create new files from the upload menu. You can supply the contents of the file.
  • Open the Galaxy Upload Manager
  • Select Paste/Fetch Data
  • Paste the file contents into the text field

  • Press Start and Close the window

Datasets not downloading at all

  1. Check to see if pop-ups are blocked by your web browser. Where to check can vary by browser and extensions.
  2. Double check your API key, if used. Go to User > Preferences > Manage API key.
  3. Check the sharing/permission status of the Datasets. Go to Dataset > Pencil icon galaxy-pencil > Edit attributes > Permissions. If you do not see a “Permissions” tab, then you are not the owner of the data.

Notes:

  • If the data was shared with you by someone else from a Shared History, or was copied from a Published History, be aware that there are multiple levels of data sharing permissions.
  • All data are set to not shared by default.
  • Datasets sharing permissions for a new history can be set before creating a new history. Go to User > Preferences > Set Dataset Permissions for New Histories.
  • User > Preferences > Make all data private is a “one click” option to unshare ALL data (Datasets, Histories). Note that once confirmed and all data is unshared, the action cannot be “undone” in batch, even by an administrator. You will need to re-share data again and/or reset your global sharing preferences as wanted.
  • Only the data owner has control over sharing/permissions.
  • Any data you upload or create yourself is automatically owned by you with full access.
  • You may not have been granted full access if the data were shared or imported, and someone else is the data owner (your copy could be “view only”).
  • After you have a fully shared copy of any shared/published data from someone else, then you become the owner of that data copy. If the other person or you make changes, it applies to each person’s copy of the data, individually and only.
  • Histories can be shared with included Datasets. Datasets can be downloaded/manipulated by others or viewed by others.
  • Share access to Datasets is distinct but it relates to Histories’ access.

Detecting the datatype (file format)

  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
  • Click the Auto-detect button to have Galaxy try to autodetect it.

Different dataset icons and their usage

Icons provide a visual experience for objects, actions, and ideas

Dataset icons and their usage:

  • galaxy-eye “Eye icon”: Display data of the job in the browser.
  • galaxy-pencil “Pencil icon”: Edit attributes of the job.
  • galaxy-cross “‘X’ icon”: Delete the job.
  • galaxy-info “Info icon”: Job details and run information.
  • galaxy-refresh “Refresh/Rerun icon”: Run this (selected) job again or examine original submitted form (filled in).
  • galaxy-bug “Bug icon”: Review and optionally submit a bug report.

Downloading datasets

  1. Click on the dataset in your history to expand it
  2. Click on the Download icon galaxy-save to save the dataset to your computer.

Downloading datasets using command line

From the terminal window on your computer, you can use wget or curl.

  1. Make sure you have wget or curl installed.
  2. Click on the Dataset name, then click on the copy link icon galaxy-link. This is the direct-downloadable dataset link.
  3. Once you have the link, use any of the following commands:
    • For wget

      wget '<link>'
      wget -O '<link>'
      wget -O --no-check-certificate '<link>' # ignore SSL certificate warnings
      wget -c '<link>' # continue an interrupted download

    • For curl

      curl -o outfile '<link>'
      curl -o outfile --insecure '<link>' # ignore SSL certificate warnings
      curl -C - -o outfile '<link>' # continue an interrupted download

  4. For dataset collections and datasets within collections you have to supply your API key with the request
    • Sample commands for wget and curl respectively are:

      wget https://usegalaxy.org/api/dataset_collections/d20ad3e1ccd4595de/download?key=MYSECRETAPIKEY

      curl -o myfile.txt https://usegalaxy.org/api/dataset_collections/d20ad3e1ccd4595de/download?key=MYSECRETAPIKEY

Finding BAM dataset identifiers

Quickly learn what the identifiers are in any **BAM** dataset that is the result from mapping
  1. Run Samtools: IdxStats on the aligned data (bam dataset).
  2. The “index header” chromosome names and lengths will be listed in the output (along with read counts).
  3. Compare the chromosome identifiers to the chromosome (aka “chrom”) field in all other inputs: VCF, GTF, GFF(3), BED, Interval, etc.

Note:

  • The original mapping target may have been a built-in genome index, custom genome (transcriptome, exome, other) – the same bam data will still be summarized.
  • This method will not work for “sequence-only” bam datasets, as these usually have no header.

Finding Datasets

  • To review all active Datasets in your account, go to User > Datasets.

Notes:

  • Logging out of Galaxy while the Upload tool is still loading data can cause uploads to abort. This is most likely to occur when a dataset is loaded by browsing local files.
  • If you have more than one browser window open, each with a different Galaxy History loaded, the Upload tool will load data into the most recently used history.
  • Click on refresh icon galaxy-refresh at the top of the History panel to display the current active History with the datasets.

How to unhide "hidden datasets"?

If you have run a workflow with hidden datasets, in your History:

  • Click the gear icon galaxy-gear → Click Unhide Hidden Datasets
  • Or use the toggle hidden to view them

When using the Copy Datasets feature, hidden datasets will not be available to transfer from the Source History list of datasets. To include them:

  1. Click the gear icon galaxy-gear → Click Unhide Hidden Datasets
  2. Click the gear icon galaxy-gear → Click Copy Datasets

Mismatched Chromosome identifiers and how to avoid them

  • The methods listed here help to identify and correct errors or unexpected results linked to inputs having non-identical chromosome identifiers and/or different chromosome sequence content.

  • If using a Custom Reference genome, the methods below also apply, but the first step is to make certain that the Custom Genome is formatted correctly. Improper formatting is the most common root cause of CG related errors.

Method 1: Finding BAM dataset identifiers

Method 2: Directly obtaining UCSC sourced genome identifiers

Method 3: Adjusting identifiers for UCSC sourced data used with other sourced data

Method 4: Adjusting identifiers or input source for any mixed sourced data

A Note on Built-in Reference Genomes

  • The default variant for all genomes is “Full”, defined as all primary chromosomes (or scaffolds/contigs) including mitochondrial plus associated unmapped, plasmid, and other segments.
  • When only one version of a genome is available for a tool, it represents the default “Full” variant.
  • Some genomes will have more than one variant available.

    • The “Canonical Male” or sometimes simply “Canonical” variant contains the primary chromosomes for a genome. For example a human “Canonical” variant contains chr1-chr22, chrX, chrY, and chrM.
    • The “Canonical Female” variant contains the primary chromosomes excluding chrY.

Moving datasets between Galaxy servers

On the origin Galaxy server:

  1. Click on the name of the dataset to expand the info.
  2. Click on the Copy link icon galaxy-link.

On the destination Galaxy server:

  1. Click on Upload data > Paste / Fetch Data and paste the link. Select attributes, such as genome assembly, if required. Hit the Start button.

Note: The copy link icon galaxy-link cannot be used to move HTML datasets (but this can be downloaded using the download button galaxy-save) and SQLite datasets.

Purging datasets

  1. All account Datasets can be reviewed under User > Datasets.
  2. To permanently delete: use the link from within the dataset, or use the Operations on Multiple Datasets functions, or use the Purge Deleted Datasets option in the History menu.

Notes:

  • Within a History, deleted/permanently deleted Datasets can be reviewed by toggling the deleted link at the top of the History panel, found immediately under the History name.
  • Both active (shown by default) and hidden (the other toggle link, next to the deleted link) datasets can be reviewed the same way.
  • Click on the far right “X” to delete a dataset.
  • Datasets in a deleted state are still part of your quota usage.
  • Datasets must be purged (permanently deleted) to not count toward quota.

Quotas for datasets and histories

  • Deleted datasets and deleted histories containing datasets are considered when calculating quotas.
  • Permanently deleted datasets and permanently deleted histories containing datasets are not considered.
  • Histories/datasets that are shared with you are only partially considered unless you import them.

Note: To reduce quota usage, refer to How can I reduce quota usage while still retaining prior work (data, tools, methods)? FAQ.

Renaming a dataset

  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, change the Name field
  • Click the Save button

Understanding job statuses

Job statuses will help you understand the stages of your work.

The following job statuses will help you better understand the working stage of the process.

  • Green: The job was completed successfully.
  • Yellow: The job is executing. Allow this to complete! Should they run longer, they will fail with a “wall-time” error and turn red.
  • Grey: The job is being evaluated to run (new dataset) or is queued. Allow this to complete.
  • Red: The job has failed.
  • Light Blue: The job is paused. This indicates either an input has a problem or that you have exceeded the disk quota set by the administrator of the Galaxy instance you are working on.
  • Grey, Yellow, Grey again: The job is waiting to run due to admin re-run or an automatic fail-over to a longer-running cluster.
  • Bright blue with moving arrow: May be found in earlier Galaxy versions. Applies to the “Get Data → Upload File” tool only - the upload job is queuing or running.

It is essential to allow queued jobs to remain queued and not delete/re-run them.

Working with GFF GFT GTF2 GFF3 reference annotation

  • All annotation datatypes have a distinct format and content specification.
    • Data providers may release variations of any, and tools may produce variations.
    • GFF3 data may be labeled as GFF.
    • Content can overlap but is generally not understood by tools that are expecting just one of these specific formats.
  • Best practices
    • The sequence identifiers must exactly match between reference annotation and reference genomes transcriptomes exomes.
    • Most tools expect GFT format unless the tool form specifically notes otherwise.
      • Get the GTF version from the data providers if it is available.
      • If only GFF3 is available, you can attempt to transform it with the tool gffread.
    • Was GTF data detected as GFF during Upload? It probably has headers. -Remove the headers (lines that start with a “#”) with the Select tool using the option “NOT Matching” with the regular expression: ^#
    • UCSC annotation
      • Find annotation under their Downloads area. The path will be similar to: https://hgdownload.soe.ucsc.edu/goldenPath/<database>/bigZips/genes/
      • Copy the URL from UCSC and paste it into the Upload tool, allowing Galaxy to detect the datatype.

Working with deleted datasets

Deleted datasets and histories can be recovered by users as they are retained in Galaxy for a time period set by the instance administrator. Deleted datasets can be undeleted or permanently deleted within a History. Links to show/hide deleted (and hidden) datasets are at the top of the History panel.

  • To review or adjust an individual dataset:
    1. Click on the name to expand it.
    2. If it is only deleted, but not permanently deleted, you’ll see a message with links to recover or to purge.
      • Click on Undelete it to recover the dataset, making it active and accessible to tools again.
      • Click on Permanently remove it from disk to purge the dataset and remove it from the account quota calculation.
  • To review or adjust multiple datasets in batch:
    1. Click on the checked box icon galaxy-selector near the top right of the history panel to switch into “Operations on Multiple Datasets” mode.
    2. Accordingly for each individual dataset, choose the selection box. Check the datasets you want to modify and choose your option (show, hide, delete, undelete, purge, and group datasets).

Working with very large fasta datasets

  • Run FastQC on your data to make sure the format/content is what you expect. Run more QA as needed.
    • Search GTN tutorials with the keyword “qa-qc” for examples.
    • Search Galaxy Help with the keywords “qa-qc” and “fasta” for more help.
  • Assembly result?
    • Consider filtering by length to remove reads that did not assemble.
    • Formatting criteria:
      • All sequence identifiers must be unique.
      • Some tools will require that there is no description line content, only identifiers, in the fasta title line (“>” line). Use NormalizeFasta to remove the description (all content after the first whitespace) and wrap the sequences to 80 bases.
  • Custom genome, transcriptome exome?
    • Only appropriate for smaller genomes (bacterial, viral, most insects).
    • Not appropriate for any mammalian genomes, or some plants/fungi.
    • Sequence identifiers must be an exact match with all other inputs or expect problems. See GFF GFT GFF3.
    • Formatting criteria:
      • All sequence identifiers must be unique.
      • ALL tools will require that there is no description content, only identifiers, in the fasta title line (“>” line). Use NormalizeFasta to remove the description (all content after the first whitespace) and wrap the sequences to 80 bases.
      • The only exception is when executing the MakeBLASTdb tool and when the input fasta is in NCBI BLAST format (see the tool form).

Working with very large fastq datasets

  • Run FastQC on your data to make sure the format/content is what you expect. Run more QA as needed.
    • Search GTN tutorials with the keyword “qa-qc” for examples.
    • Search Galaxy Help with the keywords “qa-qc” and “fastq” for more help.
  • How to create a single smaller input. Search the tool panel with the keyword “subsample” for tool choices.
  • How to create multiple smaller inputs. Start with Split file to dataset collection, then merge the results back together using a tool specific for the datatype. Example: BAM results? Use MergeSamFiles.

Datatypes


Best practices for loading fastq data into Galaxy

  • As of release 17.09, fastq data will have the datatype fastqsanger auto-detected when that quality score scaling is detected and “autodetect” is used within the Upload tool. Compressed fastq data will be converted to uncompressed in the history.
  • To preserve fastq compression, directly assign the appropriate datatype (eg: fastqsanger.gz).
  • If the data is close to or over 2 GB in size, be sure to use FTP.
  • If the data was already loaded as fastq.gz, don’t worry! Just test the data for correct format (as needed) and assign the metadata type.

Compressed FASTQ files, (`*.gz`)

  • Files ending in .gz are compressed (zipped) files.
    • The fastq.gz format is a compressed version of a fastq dataset.
    • The fastqsanger.gz format is a compressed version of the fastqsanger datatype, etc.
  • Compression saves space (and therefore your quota).
  • Tools can accept the compressed versions of input files
  • Make sure the datatype (compressed or uncompressed) is correct for your files, or it may cause tool errors.

FASTQ files: `fastq` vs `fastqsanger` vs ..

FASTQ files come in various flavours. They differ in the encoding scheme they use. See our QC tutorial for a more detailed explanation of encoding schemes.

Nowadays, the most commonly used encoding scheme is sanger. In Galaxy, this is the fastqsanger datatype. If you are using older datasets, make sure to verify the FASTQ encoding scheme used in your data.

Be Careful: choosing the wrong encoding scheme can lead to incorrect results!

Tip: There are 2 Galaxy datatypes that have similar names, but are not the same, please make sure you fastqsanger and fastqcssanger (not the additional cs).

Tip: When in doubt, choose fastqsanger

How do `fastq.gz` datasets relate to the `.fastqsanger` datatype metadata assignment?

Before assigning fastqsanger or fastqsanger.gz, be sure to confirm the format.

TIP:

  • Using non-fastqsanger scaled quality values will cause scientific problems with tools that expected fastqsanger formatted input.
  • Even if the tool does not fail, get the format right from the start to avoid problems. Incorrect format is still one of the most common reasons for tool errors or unexpected results (within Galaxy or not).
  • For more information on How to format fastq data for tools that require .fastqsanger format?

How to format fastq data for tools that require .fastqsanger format?

  • Most tools that accept FASTQ data expect it to be in a specific FASTQ version: .fastqsanger. The .fastqsanger datatype must be assigned to each FASTQ dataset.

In order to do that:

  • Watch the FASTQ Prep Illumina video for a complete walk-through.
  • Run FastQC first to assess the type.
    • Run FASTQ Groomer if the data needs to have the quality scores rescaled.
    • If you are certain that the quality scores are already scaled to Sanger Phred+33 (the result of an Illumina 1.8+ pipeline), the datatype .fastqsanger can be directly assigned. Click on the pencil icon galaxy-pencil to reach the Edit Attributes form. In the center panel, click on the “Datatype” tab, enter the datatype .fastqsanger, and save.
  • Run FastQC again on the entire dataset if any changes were made to the quality scores for QA.

Other tips

  • If you are not sure what type of FASTQ data you have (maybe it is not Illumina?), see the help directly on the FASTQ Groomer tool for information about types.
    • For Illumina, first run FastQC on a sample of your data (how to read the full report). The output report will note the quality score type interpreted by the tool. If not .fastqsanger, run FASTQ Groomer on the entire dataset. If .fastqsanger, just assign the datatype.
    • For SOLiD, run NGS: Fastq manipulation → AB-SOLID DATA → Convert, to create a .fastqcssanger dataset. If you have uploaded a color space fastq sequence with quality scores already scaled to Sanger Phred+33 (.fastqcssanger), first confirm by running FastQC on a sample of the data. Then if you want to double-encode the color space into psuedo-nucleotide space (required by certain tools), see the instructions on the tool form Fastq Manipulation for the conversion.
    • If your data is FASTA, but you want to use tools that require FASTQ input, then using the tool NGS: QC and manipulation → Combine FASTA and QUAL. This tool will create “placeholder” quality scores that fit your data. On the output, click on the pencil icon galaxy-pencil to reach the Edit Attributes form. In the center panel, click on the “Datatype” tab, enter the datatype .fastqsanger, and save.

Identifying and formatting Tabular Datasets

Format help for Tabular/BED/Interval Datasets

A Tabular datatype is human readable and has tabs separating data columns. Please note that tabular data is different from comma separated data (.csv) and the common datatypes are: .bed, .gtf, .interval, or .txt.

  1. Click the pencil icon galaxy-pencil to reach the Edit Attributes form.
    1. Change the datatype (3rd tab) and save.
    2. Label columns (1st tab) and save.
    3. Metadata will be assigned, then the dataset can be used.
  2. If the required input is a BED or Interval datatype, adjusting (.tab.bed, .tab.interval) maybe possible using a combination of Text Manipulation tools, to create a dataset that matches required specifications.
  3. Some tools require that BED format be followed, even if the datatype Interval (with less strict column ordering) is accepted on the tool form.
    • These tools will fail, if they are run with malformed BED datasets or non-specific column assignments.
    • Solution: reorganize the data to be in BED format and rerun.

Understanding Datatypes

  • Allow Galaxy to detect the datatype during Upload, and adjust from there if needed.
  • Tool forms will filter for the appropriate datatypes it can use for each input.
  • Directly changing a datatype can lead to errors. Be intentional and consider converting instead when possible.
  • Dataset content can also be adjusted (tools: Data manipulation) and the expected datatype detected. Detected datatypes are the most reliable in most cases.
  • If a tool does not accept a dataset as valid input, it is not in the correct format with the correct datatype.
  • Once a dataset’s content matches the datatype, and that dataset is repeatedly used (example: Reference annotation) use that same dataset for all steps in an analysis or expect problems. This may mean rerunning prior tools if you need to make a correction.
  • Tip: Not sure what datatypes a tool is expecting for an input?
    1. Create a new empty history
    2. Click on a tool from the tool panel
    3. The tool form will list the accepted datatypes per input
  • Warning: In some cases, tools will transform a dataset to a new datatype at runtime for you.
    • This is generally helpful, and best reserved for smaller datasets.
    • Why? This can also unexpectedly create hidden datasets that are near duplicates of your original data, only in a different format.
    • For large data, that can quickly consume working space (quota).
    • Deleting/purging any hidden datasets can lead to errors if you are still using the original datasets as an input.
    • Consider converting to the expected datatype yourself when data is large.
    • Then test the tool directly on converted data. If it works, purge the original to recover space.

Using compressed fastq data as tool inputs

  • If the tool accepts fastq input, then .gz compressed data assigned to the datatype fastq.gz is appropriate.
  • If the tool accepts fastqsanger input, then .gz compressed data assigned to the datatype fastqsanger.gz is appropriate.
  • Using uncompressed fastq data is still an option with tools. The choice is yours.

TIP: Avoid labeling compressed data with an uncompressed datatype, and the reverse. Jobs using mismatched datatype versus actual format will fail with an error.


Features


Using the Scratchbook to view multiple datasets

If you would like to view two or more datasets at once, you can use the Scratchbook feature in Galaxy:

  1. Click on the Scratchbook icon galaxy-scratchbook on the top menu bar.
    • You should see a little checkmark on the icon now
  2. View galaxy-eye a dataset by clicking on the eye icon galaxy-eye to view the output
    • You should see the output in a window overlayed over Galaxy
    • You can resize this window by dragging the bottom-right corner
  3. Click outside the file to exit the Scratchbook
  4. View galaxy-eye a second dataset from your history
    • You should now see a second window with the new dataset
    • This makes it easier to compare the two outputs
  5. Repeat this for as many files as you would like to compare
  6. You can turn off the Scratchbook galaxy-scratchbook by clicking on the icon again

Why not use Excel?

Excel is a fantastic tool and a great place to build simple analysis models, but when it comes to scaling, Galaxy wins every time.

You could just as easily use Excel to answer the same question, and if the goal is to learn how to use a tool, then either tool would be great! But what if you are working on a question where your analysis matters? Maybe you are working with human clinical data trying to diagnose a set of symptoms, or you are working on research that will eventually be published and maybe earn you a Nobel Prize?

In these cases your analysis, and the ability to reproduce it exactly, is vitally important, and Excel won’t help you here. It doesn’t track changes and it offers very little insight to others on how you got from your initial data to your conclusions.

Galaxy, on the other hand, automatically records every step of your analysis. And when you are done, you can share your analysis with anyone. You can even include a link to it in a paper (or your acceptance speech). In addition, you can create a reusable workflow from your analysis that others (or yourself) can use on other datasets.

Another challenge with spreadsheet programs is that they don’t scale to support next generation sequencing (NGS) datasets, a common type of data in genomics, and which often reach gigabytes or even terabytes in size. Excel has been used for large datasets, but you’ll often find that learning a new tool gives you significantly more ability to scale up, and scale out your analyses.


Histories


Copy a dataset between histories

Sometimes you may want to use a dataset in multiple histories. You do not need to re-upload the data, but you can copy datasets from one history to another.

There 3 ways to copy datasets between histories

  1. From the original history

    1. Click on the galaxy-gear icon (History options) on the top of the history panel
    2. Click on Copy Dataset
    3. Select the desired files

    4. Give a relevant name to the “New history”

    5. Click on the new history name in the green box that have just appear to switch to this history
  2. From the galaxy-columns View all histories

    1. Click on galaxy-columns View all histories on the top right
    2. Switch to the history in which the dataset should be copied
    3. Drag the dataset to copy from its original history
    4. Drop it in the target history
  3. From the target history

    1. Click on User in the top bar
    2. Click on Datasets
    3. Search for the dataset to copy
    4. Click on it
    5. Click on Copy to History

Creating a new history

Histories are an important part of Galaxy, most people use a new history for every new analysis. Always make sure to give your histories good names, so you can easily find your results back later.

Click the new-history icon at the top of the history panel.

If the new-history is missing:

  1. Click on the galaxy-gear icon (History options) on the top of the history panel
  2. Select the option Create New from the menu

Downloading histories

  1. Click on the gear icon galaxy-gear on the top of the history panel.
  2. Select “Export History to File” from the History menu.
  3. Click on the “Click here to generate a new archive for this history” text.
  4. Wait for the Galaxy server to prepare history for download.
  5. Click on the generated link to download the history.

Find all Histories and purge (aka permanently delete)

  1. Login to your Galaxy account.
  2. On the top navigation bar Click on User.
  3. On the drop down menu that appears Click on Histories.
  4. Click on Advanced Search, additional fields will be displayed.
  5. Next to the Status field, click All, a list of all histories will be displayed.
  6. Check the box next to Name in the displayed list to select all histories.
  7. Click Delete Permanently to purge all histories.
  8. A pop up dialogue box will appear letting you know history contents will be removed and cannot be undone, then click OK to confirm.

Finding Histories

  1. To review all histories in your account, go to User > Histories in the top menu bar.
  2. At the top of the History listing, click on Advanced Search.
  3. Set the status to all to view all of your active, deleted, and permanently deleted (purged) histories.
  4. Histories in all states are listed for registered accounts. Meaning one will always find their data here if it ever appears to be “lost”.
  5. Note: Permanently deleted (purged) Histories may be fully removed from the server at any time. The data content inside the History is always removed at the time of purging (by a double-confirmed user action), but the purged History artifact may still be in the listing. Purged data content cannot be restored, even by an administrator.

Finding and working with "Histories shared with me"

How to find and work on histories shared with you

To find histories shared with me:

  1. Log into your account.
  2. Select User, in the drop-down menu, select Histories shared with me.

To work with shared histories:

  • Import the History into your account via copying it to work with it.
  • Unshare Histories that you no longer want shared with you or that you have already made a copy of.

Note: Shared Histories (when copied into your account or not) do count in portion toward your total account data quota usage. More details on histories shared concerning account quota usage can be found in this link.

How to set Data Privacy Features?

Privacy controls are only enabled if desired. Otherwise, datasets by defaults remain private and unlisted in Galaxy. This means that a dataset you’ve created is virtually invisible until you publish a link to it.

Below are three optional steps to setting private Histories, a user can make use of any of the options below depending on what the user want to achieve:

  1. Changing the privacy settings of individual dataset.

    • Click on the dataset name for a dropdown.
    • Clicking the ‘pencil - galaxy-pencil icon
    • Move on the Permissions tab.
    • On the permission tab is two input tab
    • On the second input with a label of access
    • Search for the name of the user to grant permission
    • Click on save permission

    gif of the process described above, in Galaxy

    Note: Adding additional roles to the ‘access’ permission along with your “private role” does not do what you may expect. Since roles are always logically added together, only you will be able to access the dataset, since only you are a member of your “private role”.

  2. Make all datasets in the current history private.

    • Open the History Options galaxy-gear menu galaxy-gear at the top of your history panel
    • Click the Make Private option in the dropdown menu available
    • Sets the default settings for all new datasets in this history to private.

    gif of the process described above, in Galaxy

  3. Set the default privacy settings for new histories

    • Click user button on top of the main channel for a dropdown galaxy-dropdown
    • Click on the preferences under the dropdown galaxy-dropdown
    • Select Set Dataset Permissions for New Histories icon cofest
    • Add a permission and click save permission

    gif of the process described above, in Galaxy

    Note: Changes made here will only affect histories created after these settings have been stored.

Importing a history

  1. Open the link to the shared history
  2. Click on the new-history Import history button on the top right
  3. Enter a title for the new history
  4. Click on Import

Renaming a history

  1. Click on Unnamed history (or the current name of the history) (Click to rename history) at the top of your history panel
  2. Type the new name
  3. Press Enter

Searching your history

To make it easier to find datasets in large histories, you can filter your history by keywords as follows:

  1. Click on the search datasets box at the top of the history panel.

    history search box

  2. Type a search term in this box
    • For example a tool name, or sample name
  3. To undo the filtering and show your full history again, press on the clear search button galaxy-clear next to the search box

Sharing your History

You can share your work in Galaxy. There are various ways you can give access one of your histories to other users.

Sharing your history allows others to import and access the datasets, parameters, and steps of your history.

  1. Share via link
    • Open the History Options galaxy-gear menu (gear icon) at the top of your history panel
      • galaxy-toggle Make History accessible
      • A Share Link will appear that you give to others
    • Anybody who has this link can view and copy your history
  2. Publish your history
    • galaxy-toggle Make History publicly available in Published Histories
    • Anybody on this Galaxy server will see your history listed under the Shared Data menu
  3. Share only with another user.
    • Click the Share with a user button at the bottom
    • Enter an email address for the user you want to share with
    • Your history will be shared only with this user.
  4. Finding histories others have shared with me
    • Click on User menu on the top bar
    • Select Histories shared with me
    • Here you will see all the histories others have shared with you directly

Note: If you want to make changes to your history without affecting the shared version, make a copy by going to galaxy-gear History options icon in your history and clicking Copy

Transfer entire histories from one Galaxy server to another

  1. Click on galaxy-gear in the history panel of the sender Galaxy server
  2. Click on Export to File
  3. Select either exporting history to a link or to a remote file
  4. Click on the link text to generate a new archive for the history if exporting to a link
  5. Wait for the link to generate
  6. Copy the link address or click on the generated link to download the history archive
  7. Click on User on the top menu of the receiver Galaxy server
  8. Click on Histories to view saved histories
  9. Click on Import history in the grey button on the top right
  10. Select the appropriate importing method based on the choices made in steps 3 and 6
    • Choose Export URL from another galaxy instance if link address was copied in step 6
    • Select Upload local file from your computer if history archive was downloaded in step 6
    • Choose Select a remote file if history was exported to a remote file in step 3
  11. Click the link text to check out your histories if import is successful

If history being transferred is too large, you may:

  1. Click on galaxy-gear in the history panel of the sender Galaxy server
  2. Click Copy Datasets to move just the important datasets into a new history
  3. Create the archive from that smaller history

Undeleting history

Undelete your deleted histories
  • Click on User then select Histories
  • Click on Advanced search on the top left side below Saved Histories
  • On Status click Deleted
  • Select the history you want to undelete using the checkbox on the left side
  • Click Undelete button below the deleted histories

Unsharing unwanted histories

  • All account Histories owned by others but shared with you can be reviewed under User > Histories shared with me.
  • The other person does not need to unshare a history with you. Unshare histories yourself on this page using the pull-down menu per history.
  • Dataset and History privacy options, including sharing, can be set under User > Preferences.

Three key features to work with shared data are:

  • View is a review feature. The data cannot be worked with, but many details, including tool and dataset metadata/parameters, are included.
  • Copy those you want to work with. This will increase your quota usage. This will also allow you to manipulate the datasets or the history independently from the original owner. All History/Dataset functions are available if the other person granted full access to the datasets to you.
  • Unshare any on the list not needed anymore. After a history is copied, you will still have your version of the history, even if later unshared or the other person who shared it with you changes their version later. Meaning, that each account’s version of a History and the Datasets in it are distinct (unless the Datasets were not shared, you will still only be able to “view” but not work with or download them).

Note: “Histories shared with me” result in only a tiny part of your quota usage. Unsharing will not significantly reduce quota usage unless hundreds (or more!) or many significant histories are shared. If you share a History with someone else, that does not increase or decrease your quota usage.


Interactive tools


Knitting RMarkdown documents in RStudio

Hands-on: Knitting RMarkdown documents in RStudio

One of the other nice features of RMarkdown documents is making lovely presentation-quality worthy documents. You can take, for example, a tutorial and produce a nice report like output as HTML, PDF, or .doc document that can easily be shared with colleagues or students.

Screenshot of the metadata with html_notebook and word_document being visible and a number of options controlling their output. TOC, standing for table of contents, has been set to true for both.

Now you’re ready to preview the document:

screenshot of preview dropdown with options like preview, knit to html, knit to pdf, knit to word

Click Preview. A window will popup with a preview of the rendered verison of this document.

screenshot of rendered document with the table of contents on left, title is in a large font, and there are coloured boxes similar to GTN tutorials offering tips and more information

The preview is really similar to the GTN rendering, no cells have been executed, and no output is embedded yet in the preview document. But if you have run cells (e.g. the first few loading a library and previewing the msleep dataset:

screenshot of the rendered document with a fancy table browser embedded as well as the output of each step

When you’re ready to distribute the document, you can instead use the Knit button. This runs every cell in the entire document fresh, and then compiles the outputs together with the rendered markdown to produce a nice result file as HTML, PDF, or Word document.

screenshot of the console with 'chunks' being knitted together

tip Tip: PDF + Word require a LaTeX installation

You might need to install additional packages to compile the PDF and Word document versions

And at the end you can see a pretty document rendered with all of the output of every step along the way. This is a fantastic way to e.g. distribute read-only lesson materials to students, if you feel they might struggle with using an RMarkdown document, or just want to read the output without doing it themselves.

screenshot of a PDF document showing the end of the tutorial where a pretty plot has been rendered and there is some text for conclusions and citations

Launch JupyterLab

Hands-on: Launch JupyterLab

tip Tip: Launch JupyterLab in Galaxy

Currently JupyterLab in Galaxy is available on Live.useGalaxy.eu, usegalaxy.org and usegalaxy.eu.

hands_on Hands-on: Run JupyterLab

  1. Interactive Jupyter Notebook Tool: interactive_tool_jupyter_notebook :
  2. Click Execute
  3. The tool will start running and will stay running permanently
  4. Click on the User menu at the top and go to Active Interactive Tools and locate the JupyterLab instance you started.
  5. Click on your JupyterLab instance

tip Tip: Launch Try JupyterLab if not available on Galaxy

If JupyterLab is not available on the Galaxy instance:

  1. Start Try JupyterLab

Launch RStudio

Hands-on: Launch RStudio

Depending on which server you are using, you may be able to run RStudio directly in Galaxy. If that is not available, RStudio Cloud can be an alternative.

Launch RStudio in Galaxy

Currently RStudio in Galaxy is only available on UseGalaxy.eu and UseGalaxy.org

  1. Open the Rstudio tool tool by clicking here to launch RStudio
  2. Click Execute
  3. The tool will start running and will stay running permanently
  4. Click on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started.
Launch RStudio Cloud if not available on Galaxy

If RStudio is not available on the Galaxy instance:

  1. Register for RStudio Cloud, or login if you already have an account
  2. Create a new project

Learning with RMarkdown in RStudio

Hands-on: Learning with RMarkdown in RStudio

Learning with RMarkdown is a bit different than you might be used to. Instead of copying and pasting code from the GTN into a document you’ll instead be able to run the code directly as it was written, inside RStudio! You can now focus just on the code and reading within RStudio.

  1. Load the notebook if you have not already, following the tip box at the top of the tutorial

    Screenshot of the Console in RStudio. There are three lines visible of not-yet-run R code with the download.file statements which were included in the setup tip box.

  2. Open it by clicking on the .Rmd file in the file browser (bottom right)

    Screenshot of Files tab in RStudio, here there are three files listed, a data-science-r-dplyr.Rmd file, a css and a bib file.

  3. The RMarkdown document will appear in the document viewer (top left)

    Screenshot of an open document in RStudio. There is some yaml metadata above the tutorial showing the title of the tutorial.

You’re now ready to view the RMarkdown notebook! Each notebook starts with a lot of metadata about how to build the notebook for viewing, but you can ignore this for now and scroll down to the content of the tutorial.

You’ll see codeblocks scattered throughout the text, and these are all runnable snippets that appear like this in the document:

Screenshot of the RMarkdown document in the viewer, a cell is visible between markdown text reading library tidyverse. It is slightly more grey than the background region, and it has a run button at the right of the cell in a contextual menu.

And you have a few options for how to run them:

  1. Click the green arrow
  2. ctrl+enter
  3. Using the menu at the top to run all

    Screenshot of the run dropdown menu in R, the first item is run selected lines showing the mentioned shortcut above, the second is run next chunk, and then it also mentions a 'run all chunks below' and 'restart r and run all chunks' option.

When you run cells, the output will appear below in the Console. RStudio essentially copies the code from the RMarkdown document, to the console, and runs it, just as if you had typed it out yourself!

Screenshot of a run cell, its output is included below in the RMarkdown document and the same output is visible below in the console. It shows a log of loading the tidyverse library.

One of the best features of RMarkdown documents is that they include a very nice table browser which makes previewing results a lot easier! Instead of needing to use head every time to preview the result, you get an interactive table browser for any step which outputs a table.

Screenshot of the table browser. Below a code chunk is a large white area with two images, the first reading 'r console' and the second reading 'tbl_df'. The tbl_df is highlighted like it is active. Below that is a pretty-printed table with bold column headers like name and genus and so on. At the right of the table is a small arrow indicating you can switch to seeing more columns than just the initial three. At the bottom of the table is 1-10 of 83 rows written, and buttons for switching between each page of results.

Open a Terminal in Jupyter

Hands-on: Open a Terminal in Jupyter

This tutorial will let you accomplish almost everything from this view, running code in the cells below directly in the training material. You can choose between running the code here, or opening up a terminal tab in which to run it.Here are some instructions for how to do this on various environments.

Jupyter on UseGalaxy.* and MyBinder.org

  1. Use the File → New → Terminal menu to launch a terminal.

    screenshot of jupyterlab showing the File menu expanded to show new and terminal option.

  2. Disable “Simple” mode in the bottom left hand corner, if it activated.

    screenshot of jupyterlab showing a toggle labelled simple

  3. Drag one of the terminal or notebook tabs to the side to have the training materials and terminal side-by-side

    screenshot of jupyterlab with notebook and terminal side-by-side.

CoCalc

  1. Use the Split View functionality of cocalc to split your view into two portions.

    screenshot of cocalc button to split views

  2. Change the view of one panel to a terminal

    screenshot of cocalc swapping view port to that of a terminal

Open interactive tool

  1. Go to User > Active InteractiveTools
  2. Wait for the to be running (Job Info)
  3. Click on

Stop RStudio

Hands-on: Stop RStudio

When you have finished your R analysis, it’s time to stop RStudio.

  1. First, save your work into Galaxy, to ensure reproducibility:
    1. You can use gx_put(filename) to save individual files by supplying the filename
    2. You can use gx_save() to save the entire analysis transcript and any data objects loaded into your environment.
  2. Once you have saved your data, you can proceed in 2 different ways:
    • Deleting the corresponding history dataset named RStudio and showing a “in progress state”, so yellow, OR
    • Clicking on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started, selecting the corresponding box, and finally clicking on the “Stop” button at the bottom.

Reference genomes


How to use Custom Reference Genomes?

A reference genome contains the nucleotide sequence of the chromosomes, scaffolds, transcripts, or contigs for single species. It is representative of a specific genome build or release. There are two options to use reference genomes in Galaxy: native (provided by the server administrators and used by most of the tools) and custom (uploaded by users in FASTA format).

There are five basic steps to use a Custom Reference Genome:

  1. Obtain a FASTA copy of the target genome.
  2. Use FTP to upload the genome to Galaxy and load into a history as a dataset.
  3. Clean up the format with the tool NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
  4. Make sure the chromosome identifiers are a match for other inputs.
  5. Set a tool form’s options to use a custom reference genome from the history and select the loaded genome.

Sorting Reference Genome

Certain tools expect that reference genomes are sorted in lexicographical order. These tools are often downstream of the initial mapping tools, which means that a large investment in a project has already been made, before a problem with sorting pops up in conclusion layer tools. How to avoid? Always sort your FASTA reference genome dataset at the beginning of a project. Many sources only provide sorted genomes, but double checking is your own responsibility, and super easy in Galaxy!

  1. Convert Formats -> FASTA-to-Tabular
  2. Filter and Sort -> Sort on column: c1 with flavor: Alphabetical everything in: Ascending order
  3. Convert Formats -> Tabular-to-FASTA

Note: The above sorting method is for most tools, but not all. In particular, GATK tools have a tool-specific sort order requirement.

Troubleshooting Custom Genome fasta

If a custom genome/transcriptome/exome dataset is producing errors, double check the format and that the chromosome identifiers between ALL inputs. Clicking on the bug icon galaxy-bug will often provide a description of the problem. This does not automatically submit a bug report, and it is not always necessary to do so, but it is a good way to get some information about why a job is failing.

  • Custom genome not assigned as FASTA format

    • Symptoms include: Dataset not included in custom genome “From history” pull down menu on tool forms.
    • Solution: Check datatype assigned to dataset and assign fasta format.
    • How: Click on the dataset’s pencil icon galaxy-pencil to reach the “Edit Attributes” form, and in the Datatypes tab > redetect the datatype.
    • If fasta is not assigned, there is a format problem to correct.
  • Incomplete Custom genome file load

    • Symptoms include: Tool errors result the first time you use the Custom genome.
    • Solution: Use Text Manipulation → Select last lines from a dataset to check last 10 lines to see if file is truncated.
    • How: Reload the dataset (switch to FTP if not using already). Check your FTP client logs to make sure the load is complete.
  • Extra spaces, extra lines, inconsistent line wrapping, or any deviation from strict FASTA format

    • Symptoms include: RNA-seq tools (Cufflinks, Cuffcompare, Cuffmerge, Cuffdiff) fails with error Error: sequence lines in a FASTA record must have the same length!.
    • Solution: File tested and corrected locally then re-upload or test/fix within Galaxy, then re-run.
    • How:
      • Quick re-formatting Run the tool through the tool NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
      • Optional Detailed re-formatting Start with FASTA manipulation → FASTA Width formatter with a value between 40-80 (60 is common) to reformat wrapping. Next, use Filter and Sort → Select with “>” to examine identifiers. Use a combination of Convert Formats → FASTA-to-Tabular, Text Manipulation tools, then Tabular-to-FASTA to correct.
      • With either of the above, finish by using Filter and Sort → Select with ^\w*$ to search for empty lines (use “NOT matching” to remove these lines and output a properly format fasta dataset).
  • Inconsistent line wrapping, common if merging chromosomes from various Genbank records (e.g. primary chroms with mito)

    • Symptoms include: Tools (SAMTools, Extract Genomic DNA, but rarely alignment tools) may complain about unexpected line lengths/missing identifiers. Or they may just fail for what appears to be a cluster error.
    • Solution: File tested and corrected locally then re-upload or test/fix within Galaxy.
    • How: Use NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace. Finish by using Filter and Sort → Select with ^\w*$ to search for empty lines (use “NOT matching” to remove these lines and output a properly format fasta dataset).
  • Unsorted fasta genome file

    • Symptoms include: Tools such as Extract Genomic DNA report problems with sequence lengths.
    • Solution: First try sorting and re-formatting in Galaxy then re-run.
    • How: To sort, follow instructions for Sorting a Custom Genome.
  • Identifier and Description in “>” title lines used inconsistently by tools in the same analysis

    • Symptoms include: Will generally manifest as a false genome-mismatch problem.
    • Solution: Remove the description content and re-run all tools/workflows that used this input. Mapping tools will usually not fail, but downstream tools will. When this comes up, it usually means that an analysis needs to be started over from the mapping step to correct the problems. No one enjoys redoing this work. Avoid the problems by formatting the genome, by double checking that the same reference genome was used for all steps, and by making certain the ‘identifiers’ are a match between all planned inputs (including reference annotation such as GTF data) before using your custom genome.
    • How: To drop the title line description content, use NormalizeFasta using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace. Next, double check that the chromosome identifiers are an exact match between all inputs.
  • Unassigned database

    • Symptoms include: Tools report that no build is available for the assigned reference genome.
    • Solution: This occurs with tools that require an assigned database metadata attribute. SAMTools and Picard often require this assignment.
    • How: Create a Custom Build and assign it to the dataset.

Sequencing


Illumina MiSeq sequencing

Comment: Illumina MiSeq sequencing

Illumina MiSeq sequencing is based on sequencing by synthesis. As the namesuggests, fluorescent labels are measured for every base that bind at aspecific moment at a specific place on a flow cell. These flow cells arecovered with oligos (small single strand DNA strands). In the librarypreparation the DNA strands are cut into small DNA fragments (differs perkit/device) and specific pieces of DNA (adapters) are added, which arecomplementary to the oligos. Using bridge amplification large amounts ofclusters of these DNA fragments are made. The reverse string is washed away,making the clusters single stranded. Fluorescent bases are added one by one,which emit a specific light for different bases when added. This is happeningfor whole clusters, so this light can be detected and this data is basecalled(translation from light to a nucleotide) to a nucleotide sequence (Read). Forevery base a quality score is determined and also saved per read. Thisprocess is repeated for the reverse strand on the same place on the flowcell, so the forward and reverse reads are from the same DNA strand. Theforward and reversed reads are linked together and should always be processedtogether!

For more information watch this video from Illumina

Nanopore sequencing

Comment: Nanopore sequencing

Nanopore sequencing has several properties that make it well-suited for our purposes

  1. Long-read sequencing technology offers simplified and less ambiguous genome assembly
  2. Long-read sequencing gives the ability to span repetitive genomic regions
  3. Long-read sequencing makes it possible to identify large structural variations

How nanopore sequencing works

When using Oxford Nanopore Technologies (ONT) sequencing, the change inelectrical current is measured over the membrane of a flow cell. Whennucleotides pass the pores in the flow cell the current change is translated(basecalled) to nucleotides by a basecaller. A schematic overview is given inthe picture above.

When sequencing using a MinIT or MinION Mk1C, the basecalling software ispresent on the devices. With basecalling the electrical signals are translatedto bases (A,T,G,C) with a quality score per base. The sequenced DNA strand willbe basecalled and this will form one read. Multiple reads will be stored in afastq file.


Support


Contacting Galaxy Administrators

If you suspect there is something wrong with the server, or would like to request a tool to be installed, you should contact the server administrators for the Galaxy you are on.

Where do I get more support?

If you need support for using Galaxy, running your analysis or completing a tutorial, please try one of the following options:


Tools


Changing the tool version

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool.

Switching to a different version of a tool:

  • Open the tool
  • Click on the tool-versions versions logo at the top right
  • Select the desired version from the dropdown list

If a Tool is Missing

To use the tools installed and available on the Galaxy server:

  1. At the top of the left tool panel, type in a tool name or datatype into the tool search box.
  2. Shorter keywords find more choices.
  3. Tools can also be directly browsed by category in the tool panel.

If you can’t find a tool you need for a tutorial on Galaxy, please:

  1. Check that you are using a compatible Galaxy server
    • Navigate to the overview box at the top of the tutorial
    • Find the “Supporting Materials” section
    • Check “Available on these Galaxies”
    • If your server is not listed here, the tutorial is not supported on your Galaxy server
    • You can create an account on one of the supporting Galaxies screenshot of overview box with available Galaxies section
  2. Use the Tutorial mode feature
    • Open your Galaxy server
    • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
    • Navigate to your tutorial
    • Tool names in tutorials will be blue buttons that open the correct tool for you
    • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
  3. Still not finding the tool?

Multipile similar tools available

Sometimes there are multiple tools with very similar names. If the parameters in the tutorial don’t match with what you see in Galaxy, please try the following:

  1. Use Tutorial Mode curriculum in Galaxy, and click on the blue tool button in the tutorial to automatically open the correct tool and version (not available for all tutorials yet)

    Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

    • Open your Galaxy server
    • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
    • Navigate to your tutorial
    • Tool names in tutorials will be blue buttons that open the correct tool for you
    • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
    • You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
  2. Check that the entire tool name matches what you see in the tutorial.

Organizing the tool panel

Galaxy servers can have a lot of tools available, which can make it challenging to find the tool you are looking for. To help find your favourite tools, you can:

  • Keep a list of your favourite tools to find them back easily later.
    • Adding tools to your favourites
      • Open a tool
      • Click on the star icon galaxy-star next to the tool name to add it to your favourites
    • Viewing your favourite tools
      • Click on the star icon galaxy-star at the top of the Galaxy tool panel (above the tool search bar)
      • This will filter the toolbox to show all your starred tools
  • Change the tool panel view
    • Click on the galaxy-panelview icon at the top of the Galaxy tool panel (above the tool search bar)
    • Here you can view the tools by EDAM ontology terms
      • EDAM Topics (e.g. biology, ecology)
      • EDAM Operations (e.g. quality control, variant analysis)
      • You can always get back to the default view by choosing “Full Tool Panel”

Re-running a tool

  1. Expand one of the output datasets of the tool (by clicking on it)
  2. Click re-run galaxy-refresh the tool

This is useful if you want to run the tool again but with slightly different paramters, or if you just want to check which parameter setting you used.

Regular Expressions 101

Regular expressions are a standardized way of describing patterns in textual data. They can be extremely useful for tasks such as finding and replacing data. They can be a bit tricky to master, but learning even just a few of the basics can help you get the most out of Galaxy.

Finding

Below are just a few examples of basic expressions:

Regular expression Matches
abc an occurrence of abc within your data
(abc|def) abc or def
[abc] a single character which is either a, b, or c
[^abc] a character that is NOT a, b, nor c
[a-z] any lowercase letter
[a-zA-Z] any letter (upper or lower case)
[0-9] numbers 0-9
\d any digit (same as [0-9])
\D any non-digit character
\w any alphanumeric character
\W any non-alphanumeric character
\s any whitespace
\S any non-whitespace character
. any character
\.  
{x,y} between x and y repetitions
^ the beginning of the line
$ the end of the line

Note: you see that characters such as *, ?, ., + etc have a special meaning in a regular expression. If you want to match on those characters, you can escape them with a backslash. So \? matches the question mark character exactly.

Examples

Regular expression matches
\d{4} 4 digits (e.g. a year)
chr\d{1,2} chr followed by 1 or 2 digits
.*abc$ anything with abc at the end of the line
^$ empty line

Replacing

Sometimes you need to capture the exact value you matched on, in order to use it in your replacement, we do this using capture groups (...), which we can refer to using \1, \2 etc for the first and second captured values.

Regular expression Input Captures
chr(\d{1,2}) chr14 \1 = 14
(\d{2}) July (\d{4}) 24 July 1984 \1 = 24, \2 = 1984

An expression like s/find/replacement/g indicates a replacement expression, this will search (s) for any occurrence of find, and replace it with replacement. It will do this globally (g) which means it doesn’t stop after the first match.

Example: s/chr(\d{1,2})/CHR\1/g will replace chr14 with CHR14 etc.

Note: In Galaxy, you are often asked to provide the find and replacement expressions separately, so you don’t have to use the s/../../g structure.

There is a lot more you can do with regular expressions, and there are a few different flavours in different tools/programming languages, but these are the most important basics that will already allow you to do many of the tasks you might need in your analysis.

Tip: RegexOne is a nice interactive tutorial to learn the basics of regular expressions.

Tip: Regex101.com is a great resource for interactively testing and constructing your regular expressions, it even provides an explanation of a regular expression if you provide one.

Tip: Cyrilex is a visual regular expression tester.

Select multiple datasets

  1. Click on param-files Multiple datasets
  2. Select several files by keeping the Ctrl (orCOMMAND) key pressed and clicking on the files of interest

Selecting a dataset collection as input

  1. Click on param-collection Dataset collection in front of the input parameter you want to supply the collection to.
  2. Select the collection you want to use from the list

Sorting Tools

Sometimes input errors are caused because of non-sorted inputs. Try using these:

  • Picard SortSam: Sort SAM/BAM by coordinate or queryname.
  • Samtools Sort: Alternate for SAM/BAM, best when used for coordinate sorting only.
  • SortBED order the intervals: Best choice for BED/Interval.
  • Sort data in ascending or descending order: Alternate choice for Tabular/BED/Interval/GTF.
  • VCFsort: Best choice for VFC.
  • Tool Form Options for Sorting: Some tools have an option to sort inputs during job execution. Whenever possible, sort inputs before using tools, especially if jobs fail for not having enough memory resources.

Tool doesn't recognize input datasets

The expected input datatype assignment is explained on the tool form. Review the input select areas and the help section below the Execute button.

Understanding datatypes FAQ.

No datasets or collections available? Solutions:

  1. Upload or Copy an appropriate dataset for the input into the active history.
    • To load new datasets, review the Upload tool and more choices under Get Data within Galaxy.
    • To copy datasets from a different history into the active history see this FAQ.
    • To use datasets loaded into a shared Data Library see this FAQ.
  2. Resolve a datatype assignment incompatibility between the dataset and the tool.
  3. Individual datasets and dataset collections are selected differently on tool forms.
    • To select a collection input on a tool form see this FAQ.

Using tutorial mode

Tutorial mode saves you screen space, finds the tools you need, and ensures you use the correct versions for the tutorials to run.

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

  • Open your Galaxy server
  • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
  • Navigate to your tutorial
  • Tool names in tutorials will be blue buttons that open the correct tool for you
  • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
  • You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface

Viewing tool logs (`stdout` and `stderr`)

Most tools create log files as output, which can contain useful information about how the tool ran (stdout, or standard output), and what went wrong (stderr, or standard error).

To view these log files in Galaxy:

  • Expand one of the outputs of the tool in your history
  • Click on View details details
  • Scroll to the Job Information section
    • Here you will find links to the log files (stdout and stderr).

Where is the tool help?

Finding tool support

There is documentation available on the tool form itself which mentions the following information:

  • Parameters
  • Expected format for input dataset(s)
  • Links to publications and ToolShed source repositories
  • Tool and wrapper version(s)
  • 3rd party author web sites and documentation

Scroll down on the tool form to locate:

  • Information about expected inputs/outputs
  • Expanded definitions
  • Sample data
  • Example use cases
  • Graphics

Troubleshooting


How to find and correct tool errors related to Metadata?

Finding and Correcting Metadata

Tools can error when the wrong dataset attributes (metadata) are assigned. Some of these wrong assignments may be:

  • Tool outputs, which are automatically assigned without user action.
  • Incorrect autodetection of datatypes, which need manual modification.
  • Undetected attributes, which require user action (example: assigning database to newly uploaded data).

How to notice missing Dataset Metadata:

  • Dataset will not be downloaded when using the disk icon galaxy-save.
  • Tools error when using a previously successfully used specific dataset.
  • Tools error with a message that ends with: OSError: [Errno 2] No such file or directory.

Solution:

Click on the dataset’s pencil icon galaxy-pencil to reach the Edit Attributes forms and do one of the following as applies:

  • Directly reset metadata
    • Find the tab for the metadata you want to change, make the change, and save.
  • Autodetect metadata
    • Click on the Auto-detect button. The dataset will turn yellow in the history while the job is processing.

Incomplete Dataset Download

In case the dataset downloads incompletely:

  • Use the Google Chrome web browser. Sometimes Chrome works better at supporting continuous data transfers.
  • Use the command-line option instead. The data may really be too large to download OR your connection is slower. This can also be a faster way to download multiple datasets plus ensure a complete transfer (small or large data).

Understanding 'canceled by admin' or cluster failure error messages

The initial error message could be:


This job failed because it was cancelled by an administrator.
Please click the bug icon to report this problem if you need help.

Or


job info:
Remote job server indicated a problem running or monitoring this job.
  • Causes:
    • Server or cluster error.
    • Less frequently, input problems are a factor.
  • Solutions:

Understanding 'exceeds memory allocation' error messages

The error message to be displayed are as follows:


job info:
This job was terminated because it used more memory than it was allocated.
Please click the bug icon to report this problem if you need help.

Or


stderr:
Fatal error: Exit code 1 ()
slurmstepd: error: Detected 1 oom-kill event(s) in step XXXXXXX.batch cgroup.

Sometimes this message may appear at the bottom


job stderr:
slurmstepd: error: Detected 1 oom-kill event(s) in step XXXXXXX.batch cgroup.

In rare cases when the memory quota is exceeded very quickly, an error message such as the following can appear


job stderr:
Fatal error: Exit code 1 ()
Traceback (most recent call last):
(other lines)
Memory Error

Note: Job runtime memory is different from the amount of free storage space (quota) in an account.

  • Causes:
    • The job ran out of memory while executing on the cluster node that ran the job.
    • The most common reasons for this error are input and tool parameters problems that must be adjusted/corrected.
  • Solutions:
    • Try at least one rerun to execute the job on a different cluster node.
    • Review the Solutions section of the Understanding input error messages FAQ.
    • Your data may actually be too large to process at a public Galaxy server. Alternatives include setting up a private Galaxy server.

Understanding ValueError error messages

The full error is usually a longer message seen only after clicking on the bug icon or by reviewing the job details stderr.

How to do both is covered in the Troubleshooting errors FAQ.


stderr
...
Many lines of text, may include parameters
...
...
ValueError: invalid literal for int() with base 10: some-sequence-read-name
  • Causes:
    • MACS2 produces this error the first time it is run. MACS is not the only tool that can produce this issue, but it is the most common.
  • Solutions:
    • Try at least one rerun.
    • MACS/2 is not capable of interpreting sequence read names with spaces included. Try following these two:
      • Remove unmapped reads from the SAM dataset. There are several filtering tools in the groups SAMTools and Picard that can do this.
      • Convert the SAM input to BAM format with the tool SAMtools: SAM-to-BAM. When compressed input is given to MACS, the spaces are no longer an issue.

Understanding input error messages

Input problems are very common across any analysis that makes use of programmed tools.

  • Causes:
    • No quality assurance or content/formatting checks were run on the first datasets of an analysis workflow.
    • Incomplete dataset Upload.
    • Incorrect or unassigned datatype or database.
    • Tool-specific formatting requirements for inputs were not met.
    • Parameters set on a tool form are a mismatch for the input data content or format.
    • Inputs were in an error state (red) or were putatively successful (green) but are empty.
    • Inputs do not meet the datatype specification.
    • Inputs do not contain the exact content that a tool is expecting or that was input in the form.
    • Annotation files are a mismatch for the selected or assigned reference genome build.
    • Special case: Some of the data were generated outside of Galaxy, but later a built-in indexed genome build was assigned in Galaxy for use with downstream tools. This scenario can work, but only if those two reference genomes are an exact match.
  • Solutions:
    • Review our Troubleshooting Tips for what and where to check.
    • Review the GTN for related tutorials on tools/analysis plus FAQs.
    • Review Galaxy Help for prior discussion with extended solutions.
    • Review datatype FAQs.
    • Review the tool form.
      • Input selection areas include usage help.
      • The help section at the bottom of a tool form often has examples. Does your own data match the format/content?
      • See the links to publications and related resources.
    • Review the inputs.
      • All inputs must be in a success state (green) and actually contain content.
      • Did you directly assign the datatype or convert the datatype? What results when the datatype is detected by Galaxy? If these differ, there is likely a content problem.
      • For most analysis, allowing Galaxy to detect the datatype during Upload is best and adjusting a datatype later should rarely be needed. If a datatype is modified, the change has a specific purpose/reason.
      • Does your data have headers? Is that in specification for the datatype? Does the tool form have an option to specify if the input has headers or not? Do you need to remove headers first for the correct datatype to be detected? Example GTF.
      • Large inputs? Consider modifying your inputs to be smaller. Examples: FASTQ and FASTA.
    • Run quality checks on your data.
      • Search GTN tutorials with the keyword “qa-qc” for examples.
      • Search Galaxy Help with the keywords “qa-qc” and your datatype(s) for more help.
    • Reference annotation tips.
    • Input mismatch tips.
      • Do the chromosome/sequence identifiers exactly match between all inputs? Search Galaxy Help for more help about how to correct build/version identifier mismatches between inputs.
      • “Chr1” and “chr1” and “1” do not mean the same thing to a tool.
    • Custom genome transcriptome exome tips. See FASTA.

Understanding walltime error messages

The full error message will be reported as below, and can be found by clicking on the bug icon for a failed job run (red dataset):


job info:
This job was terminated because it ran longer than the maximum allowed job run time.
Please click the bug icon to report this problem if you need help.

Or sometimes,


job stderr:
slurmstepd: error: *** JOB XXXX ON XXXX CANCELLED AT 2019-XX-XXTXX:XX:XX DUE TO TIME LIMIT ***

job info:
Remote job server indicated a problem running or monitoring this job.
  • Causes:
    • The job execution time exceeded the “wall-time” on the cluster node that ran the job.
    • The server may be undergoing maintenance.
    • Very often input problems also cause this same error.
  • Solutions:
    • Try at least one rerun.
    • Check the server homepage for banners or notices. Selected servers also post status here.
    • Review the Solutions section of the Understanding input error messages FAQ.
    • Your data may actually be too large to process at a public Galaxy server. Alternatives include setting up a private Galaxy server.

What information should I include when reporting a problem?

Writing bug reports is a good skill to have as bioinformaticians, and a key point is that you should include enough information from the first message to help the process of resolving your issue more efficient and a better experience for everyone.

What to include

  1. Which commands did you run, precisely, we want details. Which flags did you set?
  2. Which server(s) did you run those commands on?
  3. What account/username did you use?
  4. Where did it go wrong?
  5. What were the stdout/stderr of the tool that failed? Include the text.
  6. Did you try any workarounds? What results did those produce?
  7. (If relevant) screenshot(s) that show exactly the problem, if it cannot be described in text. Is there a details panel you could include too?
  8. If there are job IDs, please include them as text so administrators don’t have to manually transcribe the job ID in your picture.

It makes the process of answering ‘bug reports’ much smoother for us, as we will have to ask you these questions anyway. If you provide this information from the start, we can get straight to answering your question!

What does a GOOD bug report look like?

The people who provide support for Galaxy are largely volunteers in this community, so try and provide as much information up front to avoid wasting their time:

I encountered an issue: I was working on (this server> and trying to run (tool)+(version number) but all of the output files were empty. My username is jane-doe.

Here is everything that I know:

  • The dataset is green, the job did not fail
  • This is the standard output/error of the tool that I found in the information page (insert it here)
  • I have read it but I do not understand what X/Y means.
  • The job ID from the output information page is 123123abdef.
  • I tried re-running the job and changing parameter Z but it did not change the result.

Could you help me?


User preferences


Does your account usage quota seem incorrect?

  1. Log out of Galaxy, then back in again. This refreshes the disk usage calculation displayed in the Masthead usage (summary) and under User > Preferences (exact).

Note:

  • Your account usage quota can be found at the bottom of your user preferences page.

Forgot Password

  1. Go to the Galaxy server you are using.
  2. Click on Login or Register.
  3. Enter your email on the Public Name or Email Address entry box.
  4. Click on the link under the password entry box titled Forgot password? Click here to reset your password.
  5. An email will be sent with a password reset link. This email may be in your email Spam or Trash folders, depending on your filters.
  6. Click on the reset link in the email or copy and paste it into a web browser window.
  7. Enter your new password and click on Save new password.

Getting your API key

  1. In your browser, open your Galaxy homepage
  2. Log in, or register a new account, if it’s the first time you’re logging in
  3. Go to User -> Preferences in the top menu bar, then click on Manage API key
  4. If there is no current API key available, click on Create a new key to generate it
  5. Copy your API key to somewhere convenient, you will need it throughout this tutorial

Visualisation


Using IGV with Galaxy

You can send data from your Galaxy history to IGV for viewing as follows:

  1. Install IGV on your computer (IGV download page)
  2. Start IGV
  3. In recent versions of IGV, you will have to enable the port:
    • In IGV, go to View > Preferences > Advanced
    • Check the box Enable Port
  4. In Galaxy, expand the dataset you would like to view in IGV
    • Make sure you have set a reference genome/database correctly (dbkey) (instructions)
    • Under display in IGV, click on local

Workflows


Annotate a workflow

  • Open the workflow editor for the workflow
  • Click on galaxy-pencil Edit Attributes on the top right
  • Write a description of the workflow in the Annotation box
  • Add a tag (which will help to search for the workflow) in the Tags section

Creating a new workflow

You can create a Galaxy workflow from scratch in the Galaxy workflow editor.
  1. Click Workflow on the top bar
  2. Click the new workflow galaxy-wf-new button
  3. Give it a clear and memorable name
  4. Clicking Save will take you directly into the workflow editor for that workflow
  5. Need more help? Please see the How to make a workflow subsection here

Extracting a workflow from your history

Galaxy can automatically create a workflow based on the analysis you have performed in a history. This means that once you have done an analysis manually once, you can easily extract a workflow to repeat it on different data.
  1. Clean up your history: remove any failed (red) jobs from your history by clicking on the galaxy-cross button.

    This will make the creation of the workflow easier.

  2. Click on galaxy-gear (History options) at the top of your history panel and select Extract workflow.

    `Extract Workflow` entry in the history options menu

    The central panel will show the content of the history in reverse order (oldest on top), and you will be able to choose which steps to include in the workflow.

  3. Replace the Workflow name to something more descriptive.

  4. Rename each workflow input in the boxes at the top of the second column.

  5. If there are any steps that shouldn’t be included in the workflow, you can uncheck them in the first column of boxes.

  6. Click on the Create Workflow button near the top.

    You will get a message that the workflow was created.

Hiding intermediate steps

When a workflow is executed, the user is usually primarily interested in the final product and not in all intermediate steps.By default all the outputs of a workflow will be shown, but we can explicitly tell Galaxy which outputs to show and which to hide for a given workflow.This behaviour is controlled by the little checkbox in front of every output dataset:

Asterisk for `out_file1` in the `Select First` tool

Importing a workflow

  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on the upload icon galaxy-upload at the top-right of the screen
  • Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  • Click the Import workflow button

Importing a workflow using the search

  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on the galaxy-upload Import icon at the top-right of the screen
  • Click on search form in Import a Workflow from Configured GA4GH Tool Registry Servers (e.g. Dockstore)

  • Select the relevant TRS Server

  • Type the query

  • Expand the correct workflow

  • Click on the wanted version

    The workflow will be imported in your workflows

Make a workflow public

  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows
  • Click on the interesting workflow
  • Click on Share
  • Click on Make Workflow accessible. This makes the workflow publicly accessible but unlisted.
  • To also list the workflow in the Shared Data section (in the top menu bar) of Galaxy, click Make Workflow publicly available in Published Workflows

Opening the workflow editor

  1. In the top menu bar, click on Workflows
  2. Click on the name of the workflow you want to editWorkflow drop down menu showing Edit option
  3. Select galaxy-wf-edit Edit from the dropdown menu to open the workflow in the workflow editor

Renaming workflow outputs

  1. Open the workflow editor
  2. Click on the tool in the workflow to get the details of the tool on the right-hand side of the screen.
  3. Scroll down to the Configure Output section of your desired parameter, and click it to expand it.
    • Under Rename dataset, give it a meaningful name

      Rename output datasets

Running a workflow

  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on the workflow-run (Run workflow) button next to your workflow
  • Configure the workflow as needed
  • Click the Run Workflow button at the top-right of the screen
  • You may have to refresh your history to see the queued jobs

Setting parameters at run-time

  1. Open the workflow editor
  2. Click on the tool in the workflow to get the details of the tool on the right-hand side of the screen.
  3. Scroll down to the parameter you want users to provide every time they run the workflow
  4. Click on the arrow in front of the name workflow-runtime-toggle to toggle to set at runtime

Viewing a workflow report

When creating a workflow in Galaxy, you can also define an output report page that should be created. Here you can display certain outputs of the pipeline (e.g. output files, tables, images, etc.) and other information about the run.
  • Go to User on the top menu bar of Galaxy.
  • Click on Workflow invocations
    • Here you will find a list of all the workflows you have run
  • Click on the name of a workflow invocation to expand itworkflow invocations list
  • Click on View Report to go to the workflow report page
  • Note: The report can also be downloaded in PDF format by clicking on the galaxy-wf-report-download icon.



Still have questions?
Gitter Chat Support
Galaxy Help Forum