Feedforward neural networks (FNN) Deep Learning - Part 1

Contributors

Authors:

Kaivan Kamali

Questions

What is a feedforward neural network (FNN)?
What are some applications of FNN?

Objectives

Understand the inspiration for neural networks
Learn activation functions & various problems solved by neural networks
Discuss various loss/cost functions and backpropagation algorithm
Learn how to create a neural network using Galaxy’s deep learning tools
Solve a sample regression problem via FNN in Galaxy

Requirements

Statistics and machine learning
- Introduction to deep learning: tutorial hands-on

last_modification Last modification: Jul 9, 2021

What is an artificial neural network?

Speaker Notes

What is an artificial neural network?

Artificial Neural Networks

ML discipline roughly inspired by how neurons in a human brain work
Huge resurgence due to availability of data and computing capacity
Various types of neural networks (Feedforward, Recurrent, Convolutional)
FNN applied to classification, clustering, regression, and association

Inspiration for neural networks

Neuron a special biological cell with information processing ability
- Receives signals from other neurons through its dendrites
- If the signals received exceeds a threshold, the neuron fires
- Transmits signals to other neurons via its axon
Synapse: contact between axon of a neuron and denderite of another
- Synapse either enhances/inhibits the signal that passes through it
- Learning occurs by changing the effectiveness of synapse

Sketch of a biological neuron and its components

Celebral cortex

Outter most layer of brain, 2 to 3 mm thick, surface area of 2,200 sq. cm
Has about 10^11 neurons
- Each neuron connected to 10^3 to 10^4 neurons
- Human brain has 10^14 to 10^15 connections

Celebral cortex

Neurons communicate by signals ms in duration
- Signal transmission frequency up to several hundred Hertz
- Millions of times slower than an electronic circuit
- Complex tasks like face recognition done within a few hundred ms
- Computation involved cannot take more than 100 serial steps
The information sent from one neuron to another is very small
- Critical information not transmitted
- But captured by the interconnections
Distributed computation/representation of the brain
- Allows slow computing elements to perform complex tasks quickly

Perceptron

Neurons forming the input and output layers of a single layer feedforward neural network

Learning in Perceptron

Given a set of input-output pairs (called training set)
Learning algorithm iteratively adjusts model parameters
- Weights and biases
So the model can accurately map inputs to outputs
Perceptron learning algorithm

Limitations of Perceptron

Single layer FNN cannot solve problems in which data is not linearly separable
- E.g., the XOR problem
Adding one (or more) hidden layers enables FNN to represent any function
- Universal Approximation Theorem
Perceptron learning algorithm could not extend to multi-layer FNN
- AI winter
Backpropagation algorithm in 80’s enabled learning in multi-layer FNN

Multi-layer FNN

Neurons forming the input, output, and hidden layers of a multi-layer feedforward neural network

More hidden layers (and more neurons in each hidden layer)
- Can estimate more complex functions
- More parameters increases training time
- More likelihood of overfitting

Activation functions

Table showing the formula, graph, derivative, and range of common activation functions

Supervised learning

Training set of size m: { (x^1,y^1),(x^2,y^2),…,(x^m,y^m) }
- Each pair (x^i,y^i) is called a training example
- x^i is called feature vector
- Each element of feature vector is called a feature
- Each x^i corresponds to a label y^i
We assume an unknown function y=f(x) maps feature vectors to labels
The goal is to use the training set to learn or estimate f
- We want the estimate to be close to f(x) not only for training set
- But for training examples not in training set

Classification problems

Three images illustrating binary, multiclass, and multilabel classifications and their label representation

Output layer

Binary classification
- Single neuron in output layer
- Sigmoid activation function
- Activation > 0.5, output 1
- Activation <= 0.5, output 0
Multilabel classification
- As many neurons in output layer as number of classes
- Sigmoid activation function
- Activation > 0.5, output 1
- Activation <= 0.5, output 0

Output layer (Continued)

Multiclass classification
- As many neurons in output layer as number of classes
- Softmax activation function
- Takes input to neurons in output layer
- Creates a probability distribution, sum of outputs adds up to 1
- The neuron with the highest proability is the predicted label
Regression problem
- Single neuron in output layer
- Linear activation function

Loss/Cost functions

During training, for each training example (x^i,y^i), we present x^i to neural network
- Compare predicted output with label y^1
- Need loss function to measure difference between predicted & expected output
Use Cross entropy loss function for classification problems
And Quadratic loss function for regression problems
- Quadratic cost function is also called Mean Squared Error (MSE)

Cross Entropy Loss/Cost functions

Cross Entropy loss function

Cross Entropy cost function

Quadratic Loss/Cost functions

Quadratic loss function

Quadratic cost function

Backpropagation (BP) learning algorithm

A gradient descent technique
- Find local minimum of a function by iteratively moving in opposite direction of gradient of function at current point
Goal of learning is to minimize cost function given training set
- Cost function is a function of network weights & biases of all neurons in all layers
- Backpropagation iteratively computes gradient of cost function relative to each weight and bias
- Updates weights and biases in the opposite direction of gradient
- Gradients (partial derivatives) are used to update weights and biases
- To find a local minimum

Backpropagation error

Backpropagation formulas

Types of Gradient Descent

Batch gradient descent
- Calculate gradient for each weight/bias for all samples
- Average gradients and update weights/biases
- Slow, if we have too many samples
Stochastic gradient descent
- Update weights/biases based on gradient of each sample
- Fast. Not accurate if sample gradient not representiative
Mini-batch gradient descent
- Middle ground solution
- Calculate gradient for each weight/bias for all samples in batch
  - batch size is much smaller than training set size
- Average batch gradients and update weights/biases

Vanishing gradient problem

Second BP equation is recursive
- We have derivative of activation function
- Calc. error in layer prior to output: 1 mult. by derivative value
- Calc. error in two layers prior output: 2 mult. by derivative values
If derivative values are small (e.g. for Sigmoid), product of multiple small values will be a very small value
- Since error values decide updates for biases/weights
- Update to biases/weights in first layers will be very small
  - Slowing the learning algorithm to a halt
- The reason Sigmoid not used in deep networks
  - Why ReLU is popular in deep networks

Car purchase price prediction

Given 5 features of an individual (age, gender, miles driven per day, personal debt, and monthly income)
- And, money they spent buying a car
- Learn a FNN to predict how much someone will spend buying a car
We evaluate FNN on test dataset and plot graphs to assess the model’s performance
- Training dataset has 723 training examples
- Test dataset has 242 test examples
- Input features scaled to be in 0 to 1 range

For references, please see tutorial’s References section

Galaxy Training Materials (training.galaxyproject.org)

Screenshot of the gtn stats page with 21 topics, 170 tutorials, 159 contributors, 16 scientific topics, and a growing community

Speaker Notes

If you would like to learn more about Galaxy, there are a large number of tutorials available.
These tutorials cover a wide range of scientific domains.

Getting Help

Help Forum (help.galaxyproject.org)
Gitter Chat
- Main Chat
- Galaxy Training Chat
- Many more channels (scientific domains, developers, admins)

Speaker Notes

If you get stuck, there are ways to get help.
You can ask your questions on the help forum.
Or you can chat with the community on Gitter.

Join an event

Many Galaxy events across the globe
Event Horizon: galaxyproject.org/events

Event schedule

Speaker Notes

There are frequent Galaxy events all around the world.
You can find upcoming events on the Galaxy Event Horizon.

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!

This material is licensed under the Creative Commons Attribution 4.0 International License.

Feedforward neural networks (FNN) Deep Learning - Part 1

Contributors

Questions

Objectives

Requirements

What is an artificial neural network?

Artificial Neural Networks

Inspiration for neural networks

Celebral cortex

Celebral cortex

Perceptron

Learning in Perceptron

Limitations of Perceptron

Multi-layer FNN

Activation functions

Supervised learning

Classification problems

Output layer

Activation <= 0.5, output 0

Output layer (Continued)

Loss/Cost functions

Cross Entropy Loss/Cost functions

Quadratic Loss/Cost functions

Backpropagation (BP) learning algorithm

Backpropagation error

Backpropagation formulas

Types of Gradient Descent

Vanishing gradient problem

Car purchase price prediction

For references, please see tutorial’s References section

Getting Help

Join an event

Thank you!