# In The Derivative Of Squared Error The Result Of Matrix Vector Squared Is Not Cl

I do not understand the rules of squared matrix vector multiplication in the following example e.g when I do not understand how we get the last term for Eq 6. We have XwXw how does it turn into The only difference is in the final step where we take the partial derivative of the error One Half Mean Squared Error. In Andrew Ngs Machine Learning course there is one small modification to this derivation. We multiply our MSE cost function by 12 so that when we take the derivative the 2s cancel out. Multiplying the cost function by a scalar does not affect the location of its minimum so we can get Im trying to understand the derivatives w.r.t. the softmax arguments when used in conjunction with a squared loss (for example as the last layer of a neural network). I am using the following not

The problem is that now I need to elevate each value of x to square and so obtain a new vector lets say y that will contain the values of x squared.