Enable upload via FTP
OverviewQuestions:Objectives:
How can I setup FTP to be easy for my users?
Can I authenticate ftp users with Galaxy credentials?
Requirements:
Configure galaxy and install a FTP server.
Use an Ansible playbook for this.
- Galaxy Server administration
- Ansible: slides slides - tutorial hands-on
- Galaxy Installation with Ansible: slides slides - tutorial hands-on
Time estimation: 1 hourSupporting Materials:Last modification: Oct 18, 2022
This tutorial will guide you to setup an File Transfer Protocol (FTP) server so galaxy users can use it to upload large files. Indeed, as written on the galaxy community hub, uploading data directly from the browser can be unreliable and cumbersome. FTP will allow users to monitor the upload status as well as resume interrupted transfers.
Agenda
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
FTP
FTP is a very old and reliable communication protocol that has been around since 1971 Bhushan 1971. It requires a server (here our galaxy server) and a client (user’s computer). The FTP server requires to have at least 2 ports accessible from outside (one for the commands and one for the transfer). Usually the port for the command is 21.
FTP supports two different modes: active, and passive. Active mode requires that the user’s computer be reachable from the internet, which in the age of Network Address Translation (NAT) and firewalls is usually unusable. So passive mode is the most commonly used. In passive mode, a client connects to the FTP server, and requests a channel for sending files. The server responds with an IP and port, from its range of “Passive Ports”.
Comment: Requirements for Running This TutorialYour VM or wherever you are installing Galaxy needs to have the following ports available:
- 21
- Some high range of ports not used by another service, e.g. 56k-60k
You need to know which ports are open so you can use them for the transfer (PassivePorts). In this training we assume that 56k to 60k are open.
Which ports precisely is not important, and these numbers can differ between sites.
FTP and Galaxy
To allow your user to upload via FTP, you will need to:
- configure Galaxy to know where the files are uploaded.
- install a FTP server
- allow your FTP server to read Galaxy’s database so users can use their credential and upload in the good directory.
For secure transmission we will use SSL/TLS (FTPS), not the SSH File Transfer Protocol (SFTP) as the Galaxy users don’t correspond to users on the machine.
Installing and Configuring
Luckily for us, there is an ansible role written by the Galaxy Project for this purpose. It will install proftpd. Firstly, we need to install the role and then update our playbook for using it.
If the terms “Ansible”, “role” and “playbook” mean nothing to you, please checkout the Ansible introduction slides and the Ansible introduction tutorial
It is possible to have ansible installed on the remote machine and run it there, not just from your local machine connecting to the remote machine.
Your hosts file will need to use
localhost
, and whenever you run playbooks withansible-playbook -i hosts playbook.yml
, you will need to add-c local
to your command.Be certain that the playbook that you’re writing on the remote machine is stored somewhere safe, like your user home directory, or backed up on your local machine. The cloud can be unreliable and things can disappear at any time.
Hands-on: Setting up ftp upload with Ansible
In your playbook directory, add the
galaxyproject.proftpd
role to yourrequirements.yml
--- a/requirements.yml +++ b/requirements.yml @@ -38,3 +38,5 @@ version: 0.12.0 - src: usegalaxy_eu.tiaas2 version: 0.0.8 +- src: galaxyproject.proftpd + version: 0.3.1
Install the role with:
Input: Bashansible-galaxy install -p roles -r requirements.yml
As in this training we are using certbot, we will ask for a private key for proftpd. Add the following line to your
group_vars/galaxyserver.yml
file:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -159,9 +159,11 @@ certbot_well_known_root: /srv/nginx/_well-known_root certbot_share_key_users: - nginx - rabbitmq + - proftpd certbot_post_renewal: | systemctl restart nginx || true systemctl restart rabbitmq-server || true + systemctl restart proftpd || true certbot_domains: - "{{ inventory_hostname }}" certbot_agree_tos: --agree-tos
This will make a copy of the current letsencrypt key available as
/etc/ssl/user/privkey-proftpd.pem
, and automatically restart proftpd every time the key is updated.We will configure Galaxy to enable ftp file upload. Add the following line to your
group_vars/galaxyserver.yml
file in the galaxy_config/galaxy section:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -100,6 +100,9 @@ galaxy_config: outputs_to_working_directory: true # TUS tus_upload_store: /data/tus + # FTP + ftp_upload_dir: /data/uploads + ftp_upload_site: "{{ inventory_hostname }}" gravity: galaxy_root: "{{ galaxy_root }}/server" app_server: gunicorn
To check the other options for setting up ftp in Galaxy, please check the Galaxy configuration documentation.
Then we will set the different variables for proftpd. Add the following lines to your
group_vars/galaxyserver.yml
file. Please replace the PassivePorts below with the range of ports that are appropriate for your machine!--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -250,6 +250,27 @@ rabbitmq_users: password: "{{ vault_rabbitmq_password_vhost }}" vhost: /pulsar/galaxy_au +# Proftpd: +proftpd_galaxy_auth: yes +galaxy_ftp_upload_dir: "{{ galaxy_config.galaxy.ftp_upload_dir }}" +proftpd_display_connect: | + {{ inventory_hostname }} FTP server + + Unauthorized access is prohibited +proftpd_create_ftp_upload_dir: yes +proftpd_options: + - User: galaxy + - Group: galaxy + - Port: 21 +proftpd_sql_db: galaxy@/var/run/postgresql +proftpd_sql_user: galaxy +proftpd_conf_ssl_certificate: /etc/ssl/certs/cert.pem +proftpd_conf_ssl_certificate_key: /etc/ssl/user/privkey-proftpd.pem +proftpd_global_options: + - PassivePorts: 56000 60000 +proftpd_use_mod_tls_shmcache: false +proftpd_tls_options: NoSessionReuseRequired + # Telegraf telegraf_plugins_extra: listen_galaxy_routes:
Here is a description of the set variables:
Variable Description proftpd_galaxy_auth
Attempt to authenticate users against a Galaxy database. galaxy_ftp_upload_dir
Path to the Galaxy FTP upload directory, should match ftp_upload_dir
in your Galaxy config.proftpd_display_connect
Message to display when users connect to the FTP server. This should be the message, not the path to a file. proftpd_create_ftp_upload_dir
Whether to allow the role to create this with owner galaxy_user
.proftpd_options
Any option for proftpd, we will just set up the user and group of the galaxy_user
.proftpd_sql_db
Database name to connect to for authentication info. proftpd_sql_user
(default: the value of galaxy_user): Value of the username parameter to SQLConnectInfo. proftpd_conf_ssl_certificate
Path on the remote host where the SSL certificate file is. proftpd_conf_ssl_certificate_key
Path on the remote host where the SSL private key file is. proftpd_global_options
Set arbitrary options in the context. We set here the PassivePorts range. proftpd_use_mod_tls_shmcache
By default proftpd uses mod_tls_shmcache
which is not installed on the server so we just disable it.proftpd_tls_options
Additional options for tls. We will use NoSessionReuseRequired
mod_tls
only accepts SSL/TLS data connections that reuse the SSL session of the control connection, as a security measure. Unfortunately, there are some clients (e.g. curl/Filezilla) which do not reuse SSL sessions. To relax the requirement that the SSL session from the control connection be reused for data connections we setNoSessionReuseRequired
.Add the new role to the list of roles under the
roles
key in your playbook,galaxy.yml
:--- a/galaxy.yml +++ b/galaxy.yml @@ -31,6 +31,7 @@ become_user: "{{ galaxy_user.name }}" - usegalaxy_eu.rabbitmq - galaxyproject.nginx + - galaxyproject.proftpd - galaxyproject.tusd - galaxyproject.cvmfs - galaxyproject.gxadmin
Run the playbook
Input: Bashansible-playbook galaxy.yml
Congratulations, you’ve set up FTP for Galaxy.
Check it works
Hands-on: Checking proftpd from the server
SSH into your machine
Check the active status of proftpd by
systemctl status proftpd
.Check the port has been correctly attributed by
sudo lsof -i -P -n
.QuestionWhat do you see?
You should see all the ports used by the server. What interests us is the line with proftpd. You should see TCP *:21 (LISTEN).
Check the directory
/data/uploads/
has been created and is empty.Input: Bashsudo tree /data/uploads/
1.sh
Hands-on: Checking galaxy detected the ftp possibility
Open your galaxy in a browser.
Log in with a user (FTP is only possible for authenticated sessions).
Click on the upload button. You should now see on the bottom “Choose FTP files”
Click on the Choose FTP files button. You should see a message “Your FTP directory does not contain any files.”
It’s working!
Hands-on: Upload your first fileThere are three options for uploading files, you can choose whichever is easiest for you.
FileZilla
- Follow the tutorial to upload a file.
- You will have a message which ask you to approve the certificate, approve it.
lftp
You can use locally lftp to test the ftp.
- Install lftp with
sudo apt-get install lftp
.- Add the public certificate to the list of known certificates (only for LetsEncrypt Staging Certificates!):
Input: Bashmkdir .lftp echo "set ssl:ca-file \"/etc/ssl/certs/cert.pem\"" > .lftp/rc
- Connect to the server with for example the admin account:
Input: Bashlftp admin@example.org@$HOSTNAME
- Enter the password of the admin@example.org galaxy user.
Put a random file:
put /srv/galaxy/server/CITATION
- Check it is there with
ls
.- Leave lftp with
quit
.Curl
Input: Bashcurl -T {"/srv/galaxy/server/CITATION"} ftp://localhost --user admin@example.org:password --ssl -k
Here
-T
says to upload a file,--ssl
ensures that the FTP connection is SSL/TLS encrypted, and-k
ignores any certificate issues as the hostnamelocalhost
will not match the certificate we have.
Hands-on: Check where the file has been uploaded
SSH into your machine
Check the directory
/uploads/
.Input: Bashsudo tree /uploads/
QuestionWhat do you see?
As I uploaded a file called
CITATION
with the admin@example.org user I see:/uploads/ └── admin@example.org └── CITATION
Hands-on: Use it in galaxy
Open your galaxy in a browser.
Log in with the user you used to upload the file.
Click on the upload button.
Click on the Choose FTP files button. You should see your file.
Click on it and click on Start to launch the upload. It should go to your history as a new dataset.
Click again on Choose FTP files button. Your file has disappeared. By default, the files are removed from the FTP at import.
You just need to add
ftp_upload_purge: false
to the galaxy_config/galaxy variables (next toftp_upload_dir
).
Congratulations! Let your users know this is an option, many of them will prefer to start large uploads from an FTP client.
Comment: Got lost along the way?If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.
If you’re using
git
to track your progress, remember to add your changes and commit with a good commit message!
Key points
FTP is easy to deploy thanks to the role
Users can be authenticated with their Galaxy credentials simplifying the user management process significantly
Frequently Asked Questions
Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Galaxy Server administration topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help ForumReferences
- Bhushan, A. K., 1971 File Transfer Protocol: RFC Editor RFC 114. https://www.rfc-editor.org/rfc/rfc114
Glossary
- FTP
- File Transfer Protocol
- NAT
- Network Address Translation
Feedback
Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.
Citing this Tutorial
- Lucille Delisle, 2022 Enable upload via FTP (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/admin/tutorials/ftp/tutorial.html Online; accessed TODAY
- Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012
Congratulations on successfully completing this tutorial!@misc{admin-ftp, author = "Lucille Delisle", title = "Enable upload via FTP (Galaxy Training Materials)", year = "2022", month = "10", day = "18" url = "\url{https://training.galaxyproject.org/training-material/topics/admin/tutorials/ftp/tutorial.html}", note = "[Online; accessed TODAY]" } @article{Batut_2018, doi = {10.1016/j.cels.2018.05.012}, url = {https://doi.org/10.1016%2Fj.cels.2018.05.012}, year = 2018, month = {jun}, publisher = {Elsevier {BV}}, volume = {6}, number = {6}, pages = {752--758.e1}, author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning}, title = {Community-Driven Data Analysis Training for Biology}, journal = {Cell Systems} }