Please enable JavaScript.
Coggle requires JavaScript to display documents.
Classification problem solvers:, types, Neural Networks, image, image,…
-
-
-
-
-
-
-
-
-
-
-
-
-
we want to show an algorithm a whole bunch of training data which comes in the form of hand written digits, together with a label that describes what the writing represents. Then, the weights and biases should be adjusted to fit the training data. Then when we show the network a handwritten digit that it never saw before, we want it to predict the correct label.
cost functions: to start things of, we can put random weights and biases on the network. If we give an input of a 3 into this network, it will just give a random output. We now have to give a cost function that tells the network how wrong it was and how it can improve. Mathematically this means, we calculate the square of the differences and add them together. This creates the cost of a single training example. This cost is small, if the network classifies correctly and big if it is classified wrongly. The cost is calculated for every training sample that is available. This average cost is a measure of how bad the network is
The neural network then minimizes this cost function by adjusting the weights and biases.
How does it minimize the cost function? This is done with gradient descent. The algorithm that computes this efficiently with gradient descent is called Backpropagation. So, a network learning just means that a cost function is minimized.
Information about gradient vector: the higher the value into a certain direction (the higher the derivative) the more adjustment that needs.
How good does a network with 2 hidden layers with each 16 nodes perform in this dataset? ... result here
own recap: network gets an input of the training data and calculates the cost. Then it does that for all the training data and gets the total cost. Then it creates the gradient vector by calculating the derivative of how the cost changes by changing the weights and biases. Then with that it starts doing gradient descent to find the minimum
Analyzing the network: How does the network perform on the mnist dataset when we have x hidden layers etc. it performs 96% or so... Let's look at the images that it messes up on. This shit does not look like a 4 my man. Other neural network types can get this up to 99.97% (reference to that) We had a hope that the first layer pics up on points and edges kinda and the second combines them in a way. To analyze that we can visualize the weights. If we put in a random noise image it gives a value with high amount of certainty. That's because all the network knows are the images in the grid. Its whole universe consists of digit images. Why did we say that it will pick up on edges and points when it doesn't pick up on them in the first layers? Because that is relevant to other networks and not a multilayer perception. This concept needs to be understood though to understand the more detailed modern networks like convolutional NN or LSTM
Learning more: book from michael nielsen about neural networks. Lisha li also has some pretty good shit about image learning in neural networks. She had a dataset that randomly shuffled the labels around before learning.
-
-
Stochastic gradient descent. Almost like gradient descent, but instaed of carefully walking down thte path to find the minima it looks like a drunk person going down a hill, but this is way more efficient computgation wise.
-
-