A Brief Introduction to Neural Networks
Motivation
Artificial Neurons
Issues
- Architecture: How are neurons connected together?
How do we pick the activation function? ...
- Training: How do we modify the connection
weights so that the network can learn a task?
Goal
- Suppose you are given is a task/behavior to perform.
- To learn this task, you are presented with a set of examples
(x1,t1) to
(xn,tn), where x is the stimulus and t is the response.
- From these examples you somehow figure out the relationship. (i.e.
determine the mapping from x to t.)
- Generalization: given an input that you have never seen, you are asked to predict
the appropriate behavior. (i.e. given an new x, what is t.)?
- Examples:
- Neural networks are unique because that are a method for doing nonparametric
estimation. Given a task (e.g. classifying iris)
- We do not need to know rules underlying the task.
- We do not have to assume a particular functional form for input/output relationship.
- Instead, we "present" the network with a representative set of examples of the
task and the network through training "learns" the appropriate relationship.
Architecture
How neurons are connected together? Most common approaches are
Applications
- Regression: linear, non-linear (Example, Auto Data)
- Classification: linear, nonlinear
(Example: Iris types)
- Unsupervised learning
Learning Algorithms
- Details depend on application.
- Early training algorithms for one layer networks
- Most learning algorithms though are a form of nonlinear recursive optimization:
- Define an loss (aka cost, objective, energy, error) function that quantifies
how well the network is performing.
- Perform gradient descent to minimize loss function.
- Backpropagation (really just a form of gradient descent)
Loss (aka cost, objective, energy, error) Functions
- The loss function is typically defined as the negative log-likelihood
E = - ln P(t|x,w)
- Regression: P is gaussian resulting in
- Classification results in E = cross entropy. Two cases
- P is a binomial distribution, activation function is sigmoid
- P corresponds to one-of-many, activation is the softmax function
- For details of these see Table.
Image Compression