InterMine integration with Galaxy

Overview
Questions:
  • How to export your query results from your InterMine of choice to Galaxy?

  • How to export a list of identifiers from Galaxy to your InterMine of choice?

Objectives:
  • Learn how to import/export data from/to InterMine instances

  • Understand the InterMine Interchange Dataset

Time estimation: 1 hour
Supporting Materials:
Last modification: Oct 18, 2022
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT

Introduction

InterMine (Smith et al. 2012) is a well-establish platform to integrate and access life sciences data. It provides the integrated data via a web interface and RESTful web services.

Other organizations download and deploy InterMine on their servers: there are more than 30 instances over the world (registered at registry.intermine.org), covering many organism, including human data, model animals, plants and drug targets.

InterMine has been integrated with Galaxy: the InterMine tool server in Galaxy allows to import the data returned by any InterMine search and viceversa, using the InterMine Interchange format it’s possible to export a list of identifiers from Galaxy into any InterMine instance of your choice.

Learn more in this tutorial.

Agenda

In this tutorial, we will cover:

  1. Introduction
  2. Import data from InterMine
  3. Export identifiers into InterMine
    1. Get data
    2. Create InterMine Interchange dataset
    3. Send identifiers to InterMine
  4. Conclusion

Import data from InterMine

Hands-on: Import

Search Galaxy for InterMine (not case sensitive; intermine is fine too), and click on InterMine Server under Get Data.

  1. InterMine Server Tool: intermine

  2. This will redirect you to the InterMine registry, which shows a full list of InterMines and the various organisms they support. Find an InterMine that has the organism type you’re working with, and click on it to redirect to that InterMine.

  3. Once you arrive at your InterMine of choice, you can run a query as normal - this could be a search, a list results page, a template, or a query in the query builder. Eventually you’ll be presented with an InterMine results table.

  4. Click on Export (top right). This will bring up a modal window.
  5. Select Send to Galaxy and double-check the Galaxy Location” is correct.
  6. Click on the Send to Galaxy button on the bottom right of the pop-up window.

    If you get an error when you click on the Send to Galaxy button, please make sure to allow popups and try again.

You have now exported your query results from InterMine to Galaxy.

Export identifiers into InterMine

Get data

Hands-on: Data upload
  1. Import some fly data from Zenodo or from the data library

    https://zenodo.org/record/3407174/files/GenesLocatedOnChromosome4.tsv
    
    • Copy the link location
    • Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)

    • Select Paste/Fetch Data
    • Paste the link into the text field

    • Press Start

    • Close the window

    As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

    • Go into Shared data (top panel) then Data libraries
    • Navigate to the correct folder as indicated by your instructor
    • Select the desired files
    • Click on the To History button near the top and select as Datasets from the dropdown menu
    • In the pop-up window, select the history you want to import the files to (or create a new one)
    • Click on Import
  2. Rename the dataset to GenesLocatedOnChromosome4

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, change the Name field
    • Click the Save button
  3. Inspect the data

The dataset contains the secondary identifier and the symbol of the Drosophila melanogaster genes and their location on the chromosome 4

Question

Do the data contain the type, e.g Protein or Gene?

No, they don’t. So we have to specify it, when we create the InterMine Interchange file

Create InterMine Interchange dataset

We will use Create InterMine Interchange Dataset tool in order to generate an intermediate file which will be used to send the identifiers (e.g. gene identifiers) to InterMine. This file requires the identifier’s type (e.g. Gene), the identifier (e.g WBGene00007063) and, optionally, the organims’s name.

Hands-on: Generate InterMine file
  1. Create InterMine Interchange dateset Tool: toolshed.g2.bx.psu.edu/repos/iuc/intermine_galaxy_exchange/galaxy_intermine_exchange/0.0.1 with the following parameters:
    • param-file “Tabular file”: select the GenesLocatedOnChromosome4 dataset which contains some fly’s genes
    • “Feature Type Column”: Column: 1
    • “Feature Type”: Gene
    • “Feature Identifier column”: Column: 2
    Comment
    • In this example, because the GenesLocatedOnChromosome4 dataset does not contain the type we have to specify it, in the “Feature Type”
    • “Feature Type”: this is type of the identifiers you are exporting to InterMine, in this example Gene. It must be a class in the InterMine data model.
    • “Feature Identifier column”: select a column from the input file which contains the identifier. We have selected Column 2, which contains the gene symbol.
    • “Feature Identifier”: This could be, as an example, a gene symbol like GATA1 or another other identifier, e.g. FBGN0000099 or perhaps a protein accession. In our example we do not have to edit anything because the values for this field are contained in the GenesLocatedOnChromosome4 dataset, in Column 2.
    • “Organism Name column”: select a column from the input file which contains the organism’s name, if you have multiple organisms in the same dataset.
    • “Organism Name”: alternatively you can directly provide the organism’s name. The organims’ name is not mandatory, but is good to provide if it is known. It does not have to be precise
  2. Click on Execute

Send identifiers to InterMine

Once the generation of the interchange dataset has been completed, open the green box related to Create InterMine Interchange on data.

Hands-on: Send data
  1. Click on view intermine at Registry to be redirected to the InterMine registry, which shows a full list of InterMines and the various organisms they support.
  2. Find an InterMine that has the organism type you’re working with, in our case FlyMine, and click on the Send to green button to export the identifiers to.
    1. You are redirected to FlyMine, in the List Analysis page showing the identifiers you have just exported from Galaxy.

Conclusion

You have now exported your identifiers from Galaxy to InterMine.

Frequently Asked Questions

Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Using Galaxy and Managing your Data topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

References

  1. Smith, R. N., J. Aleksic, D. Butano, A. Carr, S. Contrino et al., 2012 InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics 28: 3163–3165. 10.1093/bioinformatics/bts577

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Click here to load Google feedback frame

Citing this Tutorial

  1. Daniela Butano, Yo Yehudi, 2022 InterMine integration with Galaxy (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/intermine/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012


@misc{galaxy-interface-intermine,
author = "Daniela Butano and Yo Yehudi",
title = "InterMine integration with Galaxy (Galaxy Training Materials)",
year = "2022",
month = "10",
day = "18"
url = "\url{https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/intermine/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                   

Congratulations on successfully completing this tutorial!