View markdown source on GitHub

De Bruijn Graph Assembly

Contributors

Authors: AvatarSimon Gladman

Questions

Objectives

last_modification Last modification: Feb 7, 2022

.enlarge120[

De novo Genome Assembly

Part 2: De Bruijn Graph Assembly

]

With thanks to T Seemann, D Bulach, I Cooke and Simon Gladman


.enlarge120[

de Bruijn Graphs

] .pull-left[


.enlarge120[

K-mers? .pull-right[.image-25[Special K Logo]]


To help us get around these problems, we use all k-length subsequences of the reads, these are the k-mers. ] — .enlarge120[

What are K-mers? .pull-right[.image-25[Special K Logo]]]

k=12 kmers show three sequences with length 12 overlapping. For k=5 10 different shorter sequences are shown.


.enlarge120[

K-mers de Bruijn graph .pull-right[.image-25[Wooden blocks with letters A, B, C.]]]

An example of happiness is given with 4 fragments, happi, pine, iness, appin. — .enlarge120[

K-mers de Bruijn graph .pull-right[.image-25[Wooden blocks with letters A, B, C.]]]

all 4-mers are shown from those 5-mers, so happi becomes happ and appi. These are then uniqued. — .enlarge120[

K-mers de Bruijn graph .pull-right[.image-25[Wooden blocks with letters A, B, C.]]]

A graph is drawn based on overlaps between those 4-mers. — .enlarge120[

K-mers de Bruijn graph .pull-right[.image-25[Wooden blocks with letters A, B, C.]]]

The graph is collapsed into the word happiness, success.


.enlarge120[

The problem of repeats .pull-right[.image-25[Repeat icon]]]

A second example with the word mississippi is shown. The fragments are now missis, ssissi, ssippi. — .enlarge120[

The problem of repeats .pull-right[.image-25[Repeat icon]]]

All 4-mers are generated so missis becomes miss, issi, and ssis. These are then uniqued to get 7 unique 4-mers. — .enlarge120[

The problem of repeats .pull-right[.image-25[Repeat icon]]]

A graph is generated but this graph is more complex and has a loop. — .enlarge120[

The problem of repeats .pull-right[.image-25[Repeat icon]]]

This graph is walked and now the results are shown, mississippi, mississississippi, or more could be found by following the loop repeatedly. — .enlarge120[

Different k .pull-right[.image-25[Letter K]]]

The same mississippi example from the start. — .enlarge120[

Different k .pull-right[.image-25[Letter K]]]

Instead of 4-mers, we now produce 5-mers. — .enlarge120[

Different k .pull-right[.image-25[Letter K]]]

The graph is produced, and now there are two disconnected sets of nodes. — .enlarge120[

Different k .pull-right[.image-25[Letter K]]]

When collapsing this graph we see two results, MISSISSIS and SSIPPI.


.enlarge120[

Choose k wisely .pull-right[.image-25[An old man from some movie.]]]

.enlarge120[

Optimum value for k will balance these effects.

]

.enlarge120[

Read errors .pull-right[.image-25[hazard symbol]]]

.image-75[Example 3, happiness again, with fragments happi, iness, aplin, pinet] Blank image — .enlarge120[

Read errors .pull-right[.image-25[hazard symbol]]]

.image-75[Happiness example 5-mers] all 3-mers are produced — .enlarge120[

Read errors .pull-right[.image-25[hazard symbol]]]

.image-75[Happiness example 5-mers] The three mers are collapsed into an overlap graph — .enlarge120[

Read errors .pull-right[.image-25[hazard symbol]]]

.image-75[Happiness example 5-mers] The graph is coloured and six contigs are produced based on overlaps.


.enlarge120[

More coverage .pull-right[.image-50[hazard symbol]]]

.enlarge120[

More coverage depth will help overcome errors! ] — .enlarge120[

Read errors revisited .pull-right[.image-50[hazard symbol]]]

Same happiness example, all three mers are present, but now the counts are shown. One path has many more counts than the other. Text asks which path looks most valid? why?


.enlarge120[

Another parameter - coverage cutoff]

.enlarge120[

Coverage cutoff ]


.enlarge120[

de Bruijn graph assembly process]

.enlarge120[

  1. Select a value for k
  2. “Hash” the reads (make the kmers)
  3. Count the kmers
  4. Make the de Bruijn graph
  5. Perform graph simplification steps - use cov cutoff
  6. Read off contigs from simplified graph

]


.enlarge120[

Graph simplification]

Step 1: Chain merging

Looking at a doubly connected graph with forward/reverse sequences and piles of overlaps. Nodes are connected with lines. One node is labelled as already merged, and two nodes sharing a prefix are labelled as further merging possible.


.enlarge120[

Graph simplification]

Step 2: Tip clipping

a graph with every node doubled for forward/reverse strands is shown, clip tips if the length of the tip of the node is less than 2 times k.


.enlarge120[

Graph simplification]

.pull-left[ Three graphs are shown, B, C, and D. B is the most complex, C combines two nodes B and B prime, while D collapses even more nodes to further simplify the graph. ] .pull-right[

Step 3: Bubble collapsing

.reduce70[Image: Zerbino & Birney 2008] ]


.enlarge120[

Graph simplification

Step 4: Remove low coverage nodes

]

.enlarge120[

Make contigs

]


.enlarge120[

Velvet

Velvet has two separate programs:

But: You need to choose k and c wisely!

]


.enlarge120[

Velvet - Paired end scaffolding

Two boxes A and B are at opposite ends, there are many nodes along a line between them, and some sitting outside of the graph or otherwise weirdly connected.


.enlarge120[

Extensions of the idea

] — .enlarge120[

SPAdes .pull-right[.image-50[spades logo]]]

.enlarge120[

]

.enlarge120[

A move back to OLC]

.pull-left[ .enlarge120[

.pull-right[ image of a minion connected to a computer via a cable. Below a man stands next to a very large machine. ] — .enlarge120[# Bandage


Key Points

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network This material is licensed under the Creative Commons Attribution 4.0 International License.