Hello all

I've been trying to implement the classic multi layer perceptron example of recognising hand written digits based on the MNIST training data from watching the video series on 3 blue 1 brown's youtube channel here https://www.youtube....7000Dx_ZCJB-3pi

I believe I have a feed forward implementation worked out but it's the back propagation that has really confused me. I think I understand the explanation for the most basic example: a single training example for a network with only one neuron in each layer.

But I have some specific problems:

I believe the idea with stochastic gradient descent is to feed a batch of training examples through the network and sum the difference of the desired outputs and actual outputs in order to direct the back propagation process. How should I sum these results, is it a simple average?

And then, to alter the outputs, how will I know if I want to change the bias or the weights of a neuron?

I'm quite lost on specifically how to calculate the gradient of the descent when it comes to neurons with many weights and how this should adjust the weights and biases.

I'm hoping, perhaps in vain, there's someone here who knows about this stuff and would be willing to talk with me about it to help my understanding. I'm aware that there are code implementations out there, like this one http://neuralnetwork....com/chap1.html, but just copying that code leaves me with no satisfaction. If I could just figure out how to work back propagation into my attempt to program this I would be very happy.

Here is my code: https://gist.github....30264a293c1f2c6

# Multi layer perceptron

Page 1 of 1## 3 Replies - 334 Views - Last Post: 12 June 2020 - 12:22 PM

##
**Replies To:** Multi layer perceptron

### #2

## Re: Multi layer perceptron

Posted 05 June 2020 - 01:08 PM

Sorry I don't have time at the moment to pull through your code, but will try to get to it after my hike this afternoon (assuming I don't get eaten by wolves and heat exhaustion).

In the meantime a few solid options I had bookmarked.

https://www.youtube....h?v=An5z8lR8asY

https://www.guru99.c...al-network.html

https://visualstudio...on-using-c.aspx

https://www.youtube....h?v=8d6jf7s6_Qs

In the meantime a few solid options I had bookmarked.

https://www.youtube....h?v=An5z8lR8asY

https://www.guru99.c...al-network.html

https://visualstudio...on-using-c.aspx

https://www.youtube....h?v=8d6jf7s6_Qs

### #3

## Re: Multi layer perceptron

Posted 05 June 2020 - 04:30 PM

Sorry - not going to dive into your python tonight. Bushed.

The general math is a complex derivative equation for weights on each layer and for each node. It's honestly been a significant amount of time since thinking about doing that by hand, and for the most part wave my hand and let the library do it.

This guy breaks it down math wise if you want to follow along. There's also a series on youtube by Mandy with deeplizard that is pretty good on breaking the terms down.

https://mattmazur.co...gation-example/

Bleh.. Derivatives are super rusty with me right now.

The general math is a complex derivative equation for weights on each layer and for each node. It's honestly been a significant amount of time since thinking about doing that by hand, and for the most part wave my hand and let the library do it.

This guy breaks it down math wise if you want to follow along. There's also a series on youtube by Mandy with deeplizard that is pretty good on breaking the terms down.

https://mattmazur.co...gation-example/

Bleh.. Derivatives are super rusty with me right now.

### #4

## Re: Multi layer perceptron

Posted 12 June 2020 - 12:22 PM

Thank you for the sources, that Matt Mazur page makes things particularly easy to understand for me. I'm still struggling to understand what to do with the first hidden layer but I think I have the right method for the output neurons' weights. That page doesn't mention what to do about biases but maybe I can get that from your other sources once I've done the previous layers.

So I think the thing to do is this:

You feed forward and back propagate once for each training example, storing the adjustments to the weights and biases. Then you take the average of this for a training batch and add it to the weights and biases times the training rate eta.

This is what I have for the output layer's weights, I hope it's correct:

So I think the thing to do is this:

You feed forward and back propagate once for each training example, storing the adjustments to the weights and biases. Then you take the average of this for a training batch and add it to the weights and biases times the training rate eta.

This is what I have for the output layer's weights, I hope it's correct:

for x in range(self.output_neuron_count): weights = self.weights_and_biases[weights_start : weights_end] error_derivative = outputs[x] - desired_outputs[x] sigmoid_derivative = outputs[x] - (1 - outputs[x]) for y in range(self.hidden_layer_neuron_count): net_derivative = activations[y] nabla[weights_start + y] = error_derivative * sigmoid_derivative * net_derivative

Page 1 of 1