Sorting Reference Genome
Certain tools expect that reference genomes are sorted in lexicographical order. These tools are often downstream of the initial mapping tools, which means that a large investment in a project has already been made, before a problem with sorting pops up in conclusion layer tools. How to avoid? Always sort your FASTA reference genome dataset at the beginning of a project. Many sources only provide sorted genomes, but double checking is your own responsibility, and super easy in Galaxy!
- Convert Formats -> FASTA-to-Tabular
- Filter and Sort -> Sort on column: c1 with flavor: Alphabetical everything in: Ascending order
- Convert Formats -> Tabular-to-FASTA
Note: The above sorting method is for most tools, but not all. In particular, GATK tools have a tool-specific sort order requirement.
Still have questions?
Gitter Chat Support
Galaxy Help Forum
Want to embed this snippet (FAQ) in your GTN Tutorial?
{% snippet faqs/galaxy/reference_genomes_sorting_reference_genome.md %}