Loss and Cost Functions¶
Shark uses the notion of loss and cost functions to define machine learning tasks.
Loss functions¶
Consider a model (a hypothesis) \(f\) mapping inputs \(x\) to predictions \(y=f(x)\in Y\). Let \(t\in Y\) be the true label of input pattern \(x\). Then a loss function \(L:Y\times Y\to\mathbb{R}^+_0\) measures the quality of the prediction. If the prediction is perfectly accurate, the loss function is zero (\(t=y\Rightarrow L(t, y)=0\)). If not, the loss function measures “how bad” the mistake is. The loss can be interpreted as a penalty or error measure.
For a classification tast, a fundamental loss function is the 0-1-loss:
For regression, the squared loss is most popular:
Using the concept of a loss function, the goal of supervised learning can be described as finding a model \(f\) minimizing the risk:
Here the expectation \(\mathbb{E}\) is over the joint distribution underlying the observations of inputs and corresponding labels.
Cost functions¶
Now let us consider a collection of observations \(S=\{(x_1,t_1),(x_2,t_2),\dots,(x_N,t_N)\}\in(X\times Y)^N\) and corresponding predictions \(y_1.y_2,\dots,y_N\) by a model \(f\). A cost function \(C\) is a mapping assigning an overall cost value, which can be interpreted as an overall error, to \(\{(y_1,t_1),(y_2,t_2),\dots,(y_N,t_N)\}\in(Y\times Y)^N\). Every loss function induces a cost function, namely the empirical risk:
The cost function induced by the 0-1-loss is the average misclassification error and the cost function induced by the squared loss is the mean squared error (MSE).
However, there are cost functions which cannot be decomposed using a loss function. For example, the area under the curve (AUC). In other words, all loss functions generate a cost function, but not all cost functions must be based on a loss function.
List of Classes¶
See the documentation for Loss functions and Cost functions.
Derivatives¶
When both the loss function and the model are differentiable, it is possible to calculate the derivative of the empirical risk with respect to the model parameters \(w\):
This allows embarrassingly parallelizable gradient descent on the cost function. Please see the tutorial Shark Conventions for Derivatives for learning more about the handling of derivatives in Shark.
The base class ‘AbstractCost<LabelTypeT,OutputTypeT>’¶
The base class AbstractCost is templatized with respect to both the label and output type. Using batches, that is, collections of input elements, is an important concept in Shark, see the tutorial Data Batches. The proper batch types are inferred from the label and output types:
Types |
Description |
---|---|
LabelType |
Type of a label \(t_i\) |
OutputType |
Type of a model output \(z_i\) |
BatchLabelType |
Batch of Labels; same as Batch<LabelType>::type |
BatchOutputType |
Batch of Outputs; same as Batch<OutputType>::type |
Like all other interfaces in Shark, cost functions have flags indicating their internal capabilities:
Flag, Accessor function |
Description |
---|---|
HAS_FIRST_DERIVATIVE, hasFirstDerivative |
Can the cost function calculate its first derivative? |
IS_LOSS_FUNCTION, isLossFunction |
Is the cost function a loss in the above terms (i.e., separable)? |
The interface of AbstractCost reflects the fact that costs can only be evaluated
on a complete set of data. The following functions can be used for evaluation of
AbstractCost
. For brevity let L
be the LabelType
and O
the
OutputType
:
Method |
Description |
---|---|
|
Returns the mean cost of the predictions \(z_i\) given the label \(t_i\). |
|
Convenience function Returning eval(label,predictions) |
The base class ‘AbstractLoss<LabelTypeT,OutputTypeT>’¶
The base class AbstractLoss is derived from AbstractCost. It implements
all methods of its base class and offers several additional methods. Shark code is
allowed to read the flag IS_LOSS_FUNCTION
via the public method isLossFunction()
and to downcast an AbstractCost object to an AbstractLoss. This enables the use of the
following much more efficient interface:
Method |
Description |
---|---|
|
Returns the error of the prediction \(z\) given the label \(t\). |
|
Returns the sum of errors of the predictions \(z_i \in Z\) given the label \(t_i \in T\). |
|
Calls eval(t,z) |
|
Calls eval(T,Z |
|
Returns the error of the predictions \(z_i\) given the label \(t_i\) and computes \(\frac {\partial}{\partial z_i}L(z_i,t_i)\) |