Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.
Press P
again to switch presenter notes off
Press C
to create a new window where the same presentation will be displayed.
This window is linked to the main window. Changing slides on one will cause the
slide to change on the other.
Useful when presenting.
Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.
Press P
again to switch presenter notes off
Press C
to create a new window where the same presentation will be displayed.
This window is linked to the main window. Changing slides on one will cause the
slide to change on the other.
Useful when presenting.
Before diving into this slide deck, we recommend you to have a look at:
N50: given a set of sequences of varying lengths, the N50 is defined as the length L of the shortest contig for which longer and equal length contigs cover at least 50% of the assembly.
L50: given a set of sequences of varying lengths, the L50 is defined as count of smallest number of sequences whose length sum makes up 50% of the assembly.
N50 describes a sequence length whereas L50 describes a number of sequences.
Example:
N50 = 8 and L50 = 4
Alhakami, H., Mirebrahim, H., & Lonardi, S. (2017). A comparative evaluation of genome assembly reconciliation tools. Genome biology, 18(1), 1-14.
It also includes:
Quantitative assessment of genome assembly based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs.
Tips: Reference databases are constructed using known genomes. Species with few/no close genomes available can have very bad scores.
BUSCO: Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs
Eukaryota: 255 single copy from 70 species; Arthropoda: 1013 single copy from 90 species; Fungi: 758 single copy from 549 species
Waterhouse, R. M., Zdobnov, E. M. & Kriventseva, E. V. Correlating Traits of Gene Retention, Sequence Divergence, Duplicability and Essentiality in Vertebrates, Arthropods, and Fungi. Genome Biol Evol 3, 75–86 (2011).
The aim is to check assembly coherence against the content within reads that were used to produce the assembly. Basically, how many elements of each frequency on the read’s spectrum ended up being not included in the assembly, included once, included twice etc.
http://lh3.github.io/2021/04/17/concepts-in-phased-assemblies
Dot plots are widely used to quickly compare 2 sequence sets. They provide a synthetic overview of:
A non-exhaustive list of tools for making dot plots:
The quality of an assembly is often validated by using other data from the same individual or from other individuals (RNA-Seq alignment, Hi-C alignment, DNA-Seq alignment,...).
The positions of the telomeric repeats in the chromosome assemblies are also of interesting to evaluate the correctness.
The identification of organelles (mitochondria, chloroplast,...) can also inform us about the quality of the assembly in terms of completness. However, the structure of the organelles may lead the assembler to think that they are repeats and he discards them.
In the case of diploid organisms, one of the classical problems of assemblies is the conservation of the two haplotypes. We obtains particular BUSCO / kmer / assembly size metrics that can be corrected by removing, "purging", the haplotigs.
This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!
This material is licensed under the Creative Commons Attribution 4.0 International License.
Before diving into this slide deck, we recommend you to have a look at:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |