I'm reading a book on deep learning and I'm a bit confused about one of the ideas the author mentioned. This is from the book *Deep Learning with Python* by Francois Chollet:

A gradient is the derivative of a tensor operation. It’s the generalization of the concept of derivatives to functions of multidimensional inputs: that is, to functions that take tensors as inputs.

Consider an input vector x, a matrix W, a target y, and a loss function loss. You can use W to compute a target candidate y_pred, and compute the loss, or mismatch,between the target candidate y_pred and the target y:

y_pred = dot(W, x)

loss_value = loss(y_pred, y)

If the data inputs x and y are frozen, then this can be interpreted as a function mapping values of W to loss values:

loss_value = f(W)

Let’s say the current value of W is W0. Then the derivative of f in the point W0 is a tensor gradient(f)(W0) with the same shape as W, where each coefficient gradient(f)(W0)[i,j] indicates the direction and magnitude of the change in loss_value you observe when modifying W0[i,j]. That tensor gradient(f)(W0) is the gradient of the function f(W)=loss_value in W0.

You saw earlier that the derivative of a function f(x) of a single coefficient can be interpreted as the slope of the curve of f. Likewise, gradient(f)(W0) can be interpreted as the tensor describing the curvature of f(W) around W0.

For this reason, in much the same way that, for a function f(x), you can reduce the value of f(x) by moving x a little in the opposite direction from the derivative,with a function f(W) of a tensor, you can reduce f(W) by moving W in the opposite direction from the gradient: for example, W1=W0-step*gradient(f)(W0) (where step is a small scaling factor). That means going against the curvature, which intuitively should put you lower on the curve. Note that the scaling factor step is needed because gradient(f)(W0) only approximates the curvature when you’re close to W0,so you don’t want to get too far from W0.

I don't understand why we subtract -step * gradient (f) (W0) from the weight and not just -step, since -step * gradient (f) (W0) represents a loss while -step is the parameter (i.e the x value i.e small change in weight)

- Catch resetModel events from QAbstractItemModel in QAbstractItemView?
- ASP.NET WebAPI multiple actions were found
- Spring MCV - Removing element for DOM not resulting null because int
- Specify path to CSS image in web.config
- how to close a parent window?
- Dynamic url in MenuItem NavigateUrl
- Specified cast is not valid in linq query
- Required attribute 'duplicateFilePattern' not found
- How do i Implement JqueryUI Datepicker
- Inline image rendered twice by OSX mail app

- ASPX engine removing name property from form element
- How to use RouteConfig to append a page URL
- Sending notifications automatically everyday based on a todo list
- ASP.NET MVC OAuth2 Pass Provider Additional Data to be Returned on URL
- Repository pattern in Entity framework where clause
- Control to validate can't be validated
- What is proper alternative to @helper directive in ASP.NET 5 and Razor 4?
- Why mono refuses to parse web-pages with ApplyAppPathModifier in server controls?
- Assign server side variable as a text of textbox
- Is possible to add planned posts to Facebook page via API?

- ASP.Net MVC - doing MVVM correctly
- Include AngularJs in Built in Html helper in Razor
- Validator.IsValid is true even when value is invalid
- Ajax can't working on the textbox for currency
- How to change Dynamic Dropdown items in mvc 4(It not a casecaddind dropdown)
- How to embed razor variable in string?
- Uploading files to the server jQuery ASP.NET
- Search word in CSV and replace with tag
- Listening to the keyPressed event, Razor
- razor add class if condition is false