Working with GFF GFT GTF2 GFF3 reference annotation


  • All annotation datatypes have a distinct format and content specification.
    • Data providers may release variations of any, and tools may produce variations.
    • GFF3 data may be labeled as GFF.
    • Content can overlap but is generally not understood by tools that are expecting just one of these specific formats.
  • Best practices
    • The sequence identifiers must exactly match between reference annotation and reference genomes transcriptomes exomes.
    • Most tools expect GFT format unless the tool form specifically notes otherwise.
      • Get the GTF version from the data providers if it is available.
      • If only GFF3 is available, you can attempt to transform it with the tool gffread.
    • Was GTF data detected as GFF during Upload? It probably has headers. -Remove the headers (lines that start with a “#”) with the Select tool using the option “NOT Matching” with the regular expression: ^#
    • UCSC annotation
      • Find annotation under their Downloads area. The path will be similar to: https://hgdownload.soe.ucsc.edu/goldenPath/<database>/bigZips/genes/
      • Copy the URL from UCSC and paste it into the Upload tool, allowing Galaxy to detect the datatype.
Still have questions?
Gitter Chat Support
Galaxy Help Forum
Want to embed this snippet (FAQ) in your GTN Tutorial?
{% snippet  faqs/galaxy/datasets_working_with_reference_annotation.md %}