name: inverse layout: true class: center, middle, inverse
---
# Submitting SARS-CoV-2 sequences to ENA
Authors:
Miguel Roncoroni
last_modification
Updated: Aug 10, 2021
text-document
Plain-text slides
Tip:
press
P
to view the presenter notes |
arrow-keys
Use arrow keys to move between slides
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ### <i class="fas fa-bullseye" aria-hidden="true"></i><span class="visually-hidden">objectives</span> Objectives - Introduce the European Nucleotide Archive (ENA) - Learn the requirements to submit raw SARS-CoV-2 sequences to ENA in Galaxy - Overview ENA's metadata model and how metadata objects are linked --- ### The European Nucleotide Archive .pull-left[ .left[ ENA is: - a FAIR and Open repository for sequence data (reads, assemblies, annotations) - part of the International Nucleotide Sequence Database Collaboration (INSDC) with NCBI and DDJB - the [COVID-19 data portal](https://www.covid19dataportal.org/) repository for SARS-CoV-2 sequences ] ] .pull-right[ ![ENA-FAIR](../../images/upload-data-to-ena/ENA-FAIR.png) The European Nucleotide Archive and INSDC ] --- ## SARS-CoV-2 sequences .left[ Why is raw SARS-CoV-2 sequence data important? - Allows reuse of data and reproducibility of analysis - Enables discovery of minor allelic variants and [intrahost variation](https://virological.org/t/global-framework-for-sars-cov-2-data-analysis-application-to-intrahost-variation-part-1/623) ] .image-40[ ![Intrahost variation](../../images/upload-data-to-ena/intra-host.png) ] .reduce70[Minor allelic-variants can be used to detect intrahost variation. From Maier et al., 2021 [doi.org/10.1101/2021.03.25.437046](https://doi.org/10.1101/2021.03.25.437046)] --- ### Submitting reads with Galaxy .left[ Why use Galaxy to submit to ENA? - intuitive graphical user interface (GUI) - simple metadata input via a template spreadsheet or interactively - no bioinformatics skills needed ] .image-75[![upload-tool](../../images/upload-data-to-ena/upload-tool.png) ] --- ### Submission overview .image-100[ ![reads-submission](../../images/upload-data-to-ena/reads_submission.png) ] --- ### What you need .left[ Data: - compressed fastq format (*.fastq.gz, *.fastq.bz2) - human traces removed ([tutorial](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/human-reads-removal/tutorial.html)) Metadata: - interactive metadata input (for a few submissions) or; - metadata [template spreadsheet](https://drive.google.com/file/d/1Gx78GKh58PmRjdmJ05DBbpObAL-3oUFX/view?usp=sharing) (for bulk submissions) Credentials: - [ENA Webin credentials](https://www.ebi.ac.uk/ena/submit/sra/#registration) in your Galaxy user information ] .left[ ![ENA-credentials](../../images/upload-data-to-ena/ENA-credentials.png) ] --- ### Metadata .left[ For the submission of SARS-CoV-2 reads [ENA's metadata model](https://ena-docs.readthedocs.io/en/latest/submit/general-guide/metadata.html) requires: - study, sample, experiment and run information - additional information for viral samples ([viral checklist](https://www.ebi.ac.uk/ena/browser/view/ERC000033)) ] ![metadata-model](../../images/upload-data-to-ena/metadata_model_reads.png) --- ### Metadata .left[ Interactive metadata input in Galaxy: ] ![interactive metadata](../../images/upload-data-to-ena/interactive_metadata.png) --- ### Metadata .left[ Metadata [template spreadsheet](https://drive.google.com/file/d/1Gx78GKh58PmRjdmJ05DBbpObAL-3oUFX/view?usp=sharing): - one sheet each for study, sample, experiment and run - built-in controlled vocabulary ] ![metadata_template](../../images/upload-data-to-ena/metadata_template.png) --- ### Metadata .left[ - Different metadata objects are linked using Aliases - Aliases must be unique ] ![metadata-model](../../images/upload-data-to-ena/metadata_model_reads.png) --- ### Aliases .left[ Aliases link metadata objects: - Experiments are linked to Study and Samples - Runs are linked to Experiments ] ![study-sample](../../images/upload-data-to-ena/study_sample_exp.png) --- ### Aliases .left[ Aliases link metadata objects: - Experiments are linked to Study and Samples - Runs are linked to Experiments ] ![exp-run](../../images/upload-data-to-ena/exp_run.png) --- ### Aliases .left[ Aliases link metadata to data: - Data (filename.fastq.gz) is linked to Run Alias ] ![data-metadata](../../images/upload-data-to-ena/data_metadata.png) --- --- ### <i class="fas fa-key" aria-hidden="true"></i><span class="visually-hidden">keypoints</span> Key points - ENA is a FAIR data repository for SARS-CoV-2 raw and assembled nucleotide data - You can easily submit reads to ENA using Galaxy's ENA upload tool (GUI, no bioinformatic skills needed) --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Authors:
Miguel Roncoroni
This material is licensed under the Creative Commons Attribution 4.0 International License
.