+ - 0:00:00
Notes for current slide

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other.

Useful when presenting.

Notes for next slide

Submitting SARS-CoV-2 sequences to ENA

Authors: AvatarMiguel Roncoroni

last_modification Updated: Aug 10, 2021

text-document Plain-text slides

Tip: press P to view the presenter notes | arrow-keys Use arrow keys to move between slides
1 / 17

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other.

Useful when presenting.

objectives Objectives

  • Introduce the European Nucleotide Archive (ENA)

  • Learn the requirements to submit raw SARS-CoV-2 sequences to ENA in Galaxy

  • Overview ENA's metadata model and how metadata objects are linked

2 / 17

The European Nucleotide Archive

ENA is:

  • a FAIR and Open repository for sequence data (reads, assemblies, annotations)
  • part of the International Nucleotide Sequence Database Collaboration (INSDC) with NCBI and DDJB
  • the COVID-19 data portal repository for SARS-CoV-2 sequences

ENA-FAIR

The European Nucleotide Archive and INSDC

3 / 17

SARS-CoV-2 sequences

Why is raw SARS-CoV-2 sequence data important?

  • Allows reuse of data and reproducibility of analysis
  • Enables discovery of minor allelic variants and intrahost variation

Intrahost variation

Minor allelic-variants can be used to detect intrahost variation. From Maier et al., 2021 doi.org/10.1101/2021.03.25.437046

4 / 17

Submitting reads with Galaxy

Why use Galaxy to submit to ENA?

  • intuitive graphical user interface (GUI)
  • simple metadata input via a template spreadsheet or interactively
  • no bioinformatics skills needed

upload-tool

5 / 17

Submission overview

reads-submission

6 / 17

What you need

Data:

  • compressed fastq format (.fastq.gz, .fastq.bz2)
  • human traces removed (tutorial)

Metadata:

  • interactive metadata input (for a few submissions) or;
  • metadata template spreadsheet (for bulk submissions)

Credentials:

ENA-credentials

7 / 17

Metadata

For the submission of SARS-CoV-2 reads ENA's metadata model requires:

  • study, sample, experiment and run information
  • additional information for viral samples (viral checklist)
metadata-model
8 / 17

Metadata

Interactive metadata input in Galaxy:

interactive metadata
9 / 17

Metadata

Metadata template spreadsheet:

  • one sheet each for study, sample, experiment and run
  • built-in controlled vocabulary
metadata_template
10 / 17

Metadata

  • Different metadata objects are linked using Aliases
  • Aliases must be unique
metadata-model
11 / 17

Aliases

Aliases link metadata objects:

  • Experiments are linked to Study and Samples
  • Runs are linked to Experiments
study-sample
12 / 17

Aliases

Aliases link metadata objects:

  • Experiments are linked to Study and Samples
  • Runs are linked to Experiments
exp-run
13 / 17

Aliases

Aliases link metadata to data:

  • Data (filename.fastq.gz) is linked to Run Alias
data-metadata
14 / 17
15 / 17

keypoints Key points

  • ENA is a FAIR data repository for SARS-CoV-2 raw and assembled nucleotide data

  • You can easily submit reads to ENA using Galaxy's ENA upload tool (GUI, no bioinformatic skills needed)

16 / 17

Thank You!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!

Authors: AvatarMiguel Roncoroni
Galaxy Training Network

This material is licensed under the Creative Commons Attribution 4.0 International License.

17 / 17

objectives Objectives

  • Introduce the European Nucleotide Archive (ENA)

  • Learn the requirements to submit raw SARS-CoV-2 sequences to ENA in Galaxy

  • Overview ENA's metadata model and how metadata objects are linked

2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow