Enable upload via FTP

Authors: orcid logoAvatarLucille Delisle
Overview
Questions:
  • How can I setup FTP to be easy for my users?

  • Can I authenticate ftp users with Galaxy credentials?

Objectives:
  • Configure galaxy and install a FTP server.

  • Use an Ansible playbook for this.

Requirements:
Time estimation: 1 hour
Supporting Materials:
Last modification: Oct 18, 2022
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT

This tutorial will guide you to setup an File Transfer Protocol (FTP) server so galaxy users can use it to upload large files. Indeed, as written on the galaxy community hub, uploading data directly from the browser can be unreliable and cumbersome. FTP will allow users to monitor the upload status as well as resume interrupted transfers.

Agenda
  1. FTP
  2. FTP and Galaxy
    1. Installing and Configuring
    2. Check it works
Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1 ansible-galaxy
  2. Step 2 tus
  3. Step 3 cvmfs
  4. Step 4 singularity
  5. Step 5 tool-management
  6. Step 6 data-library
  7. Step 7 connect-to-compute-cluster
  8. Step 8 job-destinations
  9. Step 9 pulsar
  10. Step 10 gxadmin
  11. Step 11 monitoring
  12. Step 12 tiaas
  13. Step 13 reports
  14. Step 14 ftp

FTP

FTP is a very old and reliable communication protocol that has been around since 1971 Bhushan 1971. It requires a server (here our galaxy server) and a client (user’s computer). The FTP server requires to have at least 2 ports accessible from outside (one for the commands and one for the transfer). Usually the port for the command is 21.

FTP supports two different modes: active, and passive. Active mode requires that the user’s computer be reachable from the internet, which in the age of Network Address Translation (NAT) and firewalls is usually unusable. So passive mode is the most commonly used. In passive mode, a client connects to the FTP server, and requests a channel for sending files. The server responds with an IP and port, from its range of “Passive Ports”.

Comment: Requirements for Running This Tutorial

Your VM or wherever you are installing Galaxy needs to have the following ports available:

  • 21
  • Some high range of ports not used by another service, e.g. 56k-60k

You need to know which ports are open so you can use them for the transfer (PassivePorts). In this training we assume that 56k to 60k are open.

Which ports precisely is not important, and these numbers can differ between sites.

FTP and Galaxy

To allow your user to upload via FTP, you will need to:

  • configure Galaxy to know where the files are uploaded.
  • install a FTP server
  • allow your FTP server to read Galaxy’s database so users can use their credential and upload in the good directory.

For secure transmission we will use SSL/TLS (FTPS), not the SSH File Transfer Protocol (SFTP) as the Galaxy users don’t correspond to users on the machine.

Installing and Configuring

Luckily for us, there is an ansible role written by the Galaxy Project for this purpose. It will install proftpd. Firstly, we need to install the role and then update our playbook for using it.

If the terms “Ansible”, “role” and “playbook” mean nothing to you, please checkout the Ansible introduction slides and the Ansible introduction tutorial

It is possible to have ansible installed on the remote machine and run it there, not just from your local machine connecting to the remote machine.

Your hosts file will need to use localhost, and whenever you run playbooks with ansible-playbook -i hosts playbook.yml, you will need to add -c local to your command.

Be certain that the playbook that you’re writing on the remote machine is stored somewhere safe, like your user home directory, or backed up on your local machine. The cloud can be unreliable and things can disappear at any time.

Hands-on: Setting up ftp upload with Ansible
  1. In your playbook directory, add the galaxyproject.proftpd role to your requirements.yml

    --- a/requirements.yml
    +++ b/requirements.yml
    @@ -38,3 +38,5 @@
       version: 0.12.0
     - src: usegalaxy_eu.tiaas2
       version: 0.0.8
    +- src: galaxyproject.proftpd
    +  version: 0.3.1
       
    
  2. Install the role with:

    Input: Bash
    ansible-galaxy install -p roles -r requirements.yml
    
  3. As in this training we are using certbot, we will ask for a private key for proftpd. Add the following line to your group_vars/galaxyserver.yml file:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -159,9 +159,11 @@ certbot_well_known_root: /srv/nginx/_well-known_root
     certbot_share_key_users:
       - nginx
       - rabbitmq
    +  - proftpd
     certbot_post_renewal: |
         systemctl restart nginx || true
         systemctl restart rabbitmq-server || true
    +    systemctl restart proftpd || true
     certbot_domains:
      - "{{ inventory_hostname }}"
     certbot_agree_tos: --agree-tos
       
    

    This will make a copy of the current letsencrypt key available as /etc/ssl/user/privkey-proftpd.pem, and automatically restart proftpd every time the key is updated.

  4. We will configure Galaxy to enable ftp file upload. Add the following line to your group_vars/galaxyserver.yml file in the galaxy_config/galaxy section:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -100,6 +100,9 @@ galaxy_config:
         outputs_to_working_directory: true
         # TUS
         tus_upload_store: /data/tus
    +    # FTP
    +    ftp_upload_dir: /data/uploads
    +    ftp_upload_site: "{{ inventory_hostname }}"
       gravity:
         galaxy_root: "{{ galaxy_root }}/server"
         app_server: gunicorn
       
    

To check the other options for setting up ftp in Galaxy, please check the Galaxy configuration documentation.

  1. Then we will set the different variables for proftpd. Add the following lines to your group_vars/galaxyserver.yml file. Please replace the PassivePorts below with the range of ports that are appropriate for your machine!

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -250,6 +250,27 @@ rabbitmq_users:
         password: "{{ vault_rabbitmq_password_vhost }}"
         vhost: /pulsar/galaxy_au
        
    +# Proftpd:
    +proftpd_galaxy_auth: yes
    +galaxy_ftp_upload_dir: "{{ galaxy_config.galaxy.ftp_upload_dir }}"
    +proftpd_display_connect: |
    +  {{ inventory_hostname }} FTP server
    +
    +  Unauthorized access is prohibited
    +proftpd_create_ftp_upload_dir: yes
    +proftpd_options:
    +  - User: galaxy
    +  - Group: galaxy
    +  - Port: 21
    +proftpd_sql_db: galaxy@/var/run/postgresql
    +proftpd_sql_user: galaxy
    +proftpd_conf_ssl_certificate: /etc/ssl/certs/cert.pem
    +proftpd_conf_ssl_certificate_key: /etc/ssl/user/privkey-proftpd.pem
    +proftpd_global_options:
    +  - PassivePorts: 56000 60000
    +proftpd_use_mod_tls_shmcache: false
    +proftpd_tls_options: NoSessionReuseRequired
    +
     # Telegraf
     telegraf_plugins_extra:
       listen_galaxy_routes:
       
    

    Here is a description of the set variables:

    Variable Description
    proftpd_galaxy_auth Attempt to authenticate users against a Galaxy database.
    galaxy_ftp_upload_dir Path to the Galaxy FTP upload directory, should match ftp_upload_dir in your Galaxy config.
    proftpd_display_connect Message to display when users connect to the FTP server. This should be the message, not the path to a file.
    proftpd_create_ftp_upload_dir Whether to allow the role to create this with owner galaxy_user.
    proftpd_options Any option for proftpd, we will just set up the user and group of the galaxy_user.
    proftpd_sql_db Database name to connect to for authentication info.
    proftpd_sql_user (default: the value of galaxy_user): Value of the username parameter to SQLConnectInfo.
    proftpd_conf_ssl_certificate Path on the remote host where the SSL certificate file is.
    proftpd_conf_ssl_certificate_key Path on the remote host where the SSL private key file is.
    proftpd_global_options Set arbitrary options in the context. We set here the PassivePorts range.
    proftpd_use_mod_tls_shmcache By default proftpd uses mod_tls_shmcache which is not installed on the server so we just disable it.
    proftpd_tls_options Additional options for tls. We will use NoSessionReuseRequired

    mod_tls only accepts SSL/TLS data connections that reuse the SSL session of the control connection, as a security measure. Unfortunately, there are some clients (e.g. curl/Filezilla) which do not reuse SSL sessions. To relax the requirement that the SSL session from the control connection be reused for data connections we set NoSessionReuseRequired.

  2. Add the new role to the list of roles under the roles key in your playbook, galaxy.yml:

    --- a/galaxy.yml
    +++ b/galaxy.yml
    @@ -31,6 +31,7 @@
           become_user: "{{ galaxy_user.name }}"
         - usegalaxy_eu.rabbitmq
         - galaxyproject.nginx
    +    - galaxyproject.proftpd
         - galaxyproject.tusd
         - galaxyproject.cvmfs
         - galaxyproject.gxadmin
       
    
  3. Run the playbook

    Input: Bash
    ansible-playbook galaxy.yml
    

Congratulations, you’ve set up FTP for Galaxy.

Check it works

Hands-on: Checking proftpd from the server
  1. SSH into your machine

  2. Check the active status of proftpd by systemctl status proftpd.

  3. Check the port has been correctly attributed by sudo lsof -i -P -n.

    Question

    What do you see?

    You should see all the ports used by the server. What interests us is the line with proftpd. You should see TCP *:21 (LISTEN).

  4. Check the directory /data/uploads/ has been created and is empty.

    Input: Bash
    sudo tree /data/uploads/
    
Hands-on: Checking galaxy detected the ftp possibility
  1. Open your galaxy in a browser.

  2. Log in with a user (FTP is only possible for authenticated sessions).

  3. Click on the upload button. You should now see on the bottom “Choose FTP files”

  4. Click on the Choose FTP files button. You should see a message “Your FTP directory does not contain any files.”

It’s working!

Hands-on: Upload your first file

There are three options for uploading files, you can choose whichever is easiest for you.

  1. FileZilla

    1. Follow the tutorial to upload a file.
    2. You will have a message which ask you to approve the certificate, approve it.
  2. lftp

    You can use locally lftp to test the ftp.

    1. Install lftp with sudo apt-get install lftp.
    2. Add the public certificate to the list of known certificates (only for LetsEncrypt Staging Certificates!):
      Input: Bash
      mkdir .lftp
      echo "set ssl:ca-file \"/etc/ssl/certs/cert.pem\"" > .lftp/rc
      
    3. Connect to the server with for example the admin account:
      Input: Bash
      lftp admin@example.org@$HOSTNAME
      
    4. Enter the password of the admin@example.org galaxy user.
    5. Put a random file:

      put /srv/galaxy/server/CITATION

    6. Check it is there with ls.
    7. Leave lftp with quit.
  3. Curl

    Input: Bash
    curl -T {"/srv/galaxy/server/CITATION"} ftp://localhost --user admin@example.org:password --ssl -k
    

    Here -T says to upload a file, --ssl ensures that the FTP connection is SSL/TLS encrypted, and -k ignores any certificate issues as the hostname localhost will not match the certificate we have.

Hands-on: Check where the file has been uploaded
  1. SSH into your machine

  2. Check the directory /uploads/.

    Input: Bash
    sudo tree /uploads/
    
    Question

    What do you see?

    As I uploaded a file called CITATION with the admin@example.org user I see:

    /uploads/
    └── admin@example.org
        └── CITATION
    
Hands-on: Use it in galaxy
  1. Open your galaxy in a browser.

  2. Log in with the user you used to upload the file.

  3. Click on the upload button.

  4. Click on the Choose FTP files button. You should see your file.

  5. Click on it and click on Start to launch the upload. It should go to your history as a new dataset.

  6. Click again on Choose FTP files button. Your file has disappeared. By default, the files are removed from the FTP at import.

    You just need to add ftp_upload_purge: false to the galaxy_config/galaxy variables (next to ftp_upload_dir).

Congratulations! Let your users know this is an option, many of them will prefer to start large uploads from an FTP client.

Comment: Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

If you’re using git to track your progress, remember to add your changes and commit with a good commit message!

Key points
  • FTP is easy to deploy thanks to the role

  • Users can be authenticated with their Galaxy credentials simplifying the user management process significantly

Frequently Asked Questions

Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Galaxy Server administration topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

References

  1. Bhushan, A. K., 1971 File Transfer Protocol: RFC Editor RFC 114. https://www.rfc-editor.org/rfc/rfc114

Glossary

FTP
File Transfer Protocol
NAT
Network Address Translation

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Click here to load Google feedback frame

Citing this Tutorial

  1. Lucille Delisle, 2022 Enable upload via FTP (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/admin/tutorials/ftp/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012


@misc{admin-ftp,
author = "Lucille Delisle",
title = "Enable upload via FTP (Galaxy Training Materials)",
year = "2022",
month = "10",
day = "18"
url = "\url{https://training.galaxyproject.org/training-material/topics/admin/tutorials/ftp/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                   

Congratulations on successfully completing this tutorial!