Is it appropriate to pass in a derivative to calculate the error of a Neural Net?

Question

I'm reading a guide for back-propagation of a neural net, and it gives the error function as:

Next work out the error for neuron B. The error is What you want – What you actually get, in other words: ErrorB = OutputB (1-OutputB)(TargetB – OutputB)

But then, underneath that it says:

The “Output(1-Output)” term is necessary in the equation because of the Sigmoid Function – if we were only using a threshold neuron it would just be (Target – Output).

I am in fact using the sigmoid function for my net, but I'd like to write things to be as general as possible. I thought of trying something like this:

public List<Double> calcError(List<Double> expectedOutput, UnaryOperator<Double> derivative) {
    List<Double> actualOutput = outputLayer.getActivations();

    List<Double> errors = new ArrayList<Double>();

    for (int i = 0; i < actualOutput.size(); i++) {
        errors.add(
            derivative.apply(actualOutput.get(i)) * (expectedOutput.get(i) - actualOutput.get(i))
        );
    }

    return errors;
}

Where I allow the caller to pass in the derivative (I think that's what it's called), so the error can be calculated for any activation function.

The problem is, I haven't learned calculus yet, and don't even know if this makes sense. Can I have the derivative passed in like this and still have it give accurate results? Or will the error calculation change depending on activation function/derivative used?

I wasn't sure if this should be put in the Math SE; as it contains code.

score 1 · Accepted Answer · answered Jul 30 '15 at 21:28

Yes you can pass in the derivative, however, it must be the derivative of your activation function. N.b. the network will fail if the derivative isn't actually the derivative of the activation function.

The options you have are:

statically define your activation function and its derivative
statically define a list of activation functions that can be selected, (and the derivative selected automatically)
parametrize the activation function and its derivative as a pair
parametrize the activation function, and then use some method to calculate the derivative automatically (n.b. some neural network libraries use this approach.)

Since rolling your own neural network code really only makes sense as learning exercise, any of the methods are appropriate. A practical library would want to prevent users from easily misconfiguration the network.

N.b. I assume a 'threshold' neuron, mentioned in your quote, refers to a neuron with a rectified linear activation function. The derivative in this case is 1, unless the output is 0, then it is 0. So leaving out the derivative is the same as using it, in that particular case.

Just a quick note that might help when looking at other activation functions: you are currently calculating the derivative from the node's output, because this is the easiest way of calculating the derivative of the sigmoid function, but with other functions it may be more efficient to calculate it from the weighted sum of inputs, so you should probably pass this as well if you want to experiment with other activation functions. — Jules, Jul 31 '15 at 00:37

Is it appropriate to pass in a derivative to calculate the error of a Neural Net?

1 Answers1