Shark machine learning library
Installation
Tutorials
Benchmarks
Documentation
Quick references
Class list
Global functions
include
shark
Algorithms
GradientDescent
Rprop.h
Go to the documentation of this file.
1
/*!
2
*
3
*
4
* \brief implements different versions of Resilient Backpropagation of error.
5
*
6
* \author Oswin Krause
7
* \date 2010
8
*
9
*
10
* \par Copyright 1995-2017 Shark Development Team
11
*
12
* <BR><HR>
13
* This file is part of Shark.
14
* <https://shark-ml.github.io/Shark/>
15
*
16
* Shark is free software: you can redistribute it and/or modify
17
* it under the terms of the GNU Lesser General Public License as published
18
* by the Free Software Foundation, either version 3 of the License, or
19
* (at your option) any later version.
20
*
21
* Shark is distributed in the hope that it will be useful,
22
* but WITHOUT ANY WARRANTY; without even the implied warranty of
23
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
24
* GNU Lesser General Public License for more details.
25
*
26
* You should have received a copy of the GNU Lesser General Public License
27
* along with Shark. If not, see <http://www.gnu.org/licenses/>.
28
*
29
*/
30
31
#ifndef SHARK_ML_OPTIMIZER_RPROP_H
32
#define SHARK_ML_OPTIMIZER_RPROP_H
33
34
#include <
shark/Algorithms/AbstractSingleObjectiveOptimizer.h
>
35
36
namespace
shark
{
37
38
/*!
39
* \brief This class offers methods for the usage of the
40
* Resilient-Backpropagation-algorithm with/out weight-backtracking.
41
*
42
* The Rprop algorithm is an improvement of the algorithms with adaptive
43
* learning rates, which use increments for the update
44
* of the weights which are independent from the absolute partial
45
* derivatives. This makes sense, because large flat regions
46
* in the search space (plateaus) lead to small absolute partial
47
* derivatives and so the increments are chosen small, but the increments
48
* should be large to skip the plateau. In contrast, the absolute partial
49
* derivatives are very large at the "slopes" of very "narrow canyons",
50
* which leads to large increments that will skip the minimum lying
51
* at the bottom of the canyon, but it would make more sense to
52
* chose small increments to hit the minimum.
53
*
54
* So, the Rprop algorithm only uses the signs of the partial derivatives
55
* and not the absolute values to adapt the parameters. <br>
56
* Instead of individual learning rates, it uses the parameter
57
* \f$\Delta_i^{(t)}\f$ for weight \f$w_i,\ i = 1, \dots, n\f$ in
58
* iteration \f$t\f$, where the parameter will be adapted before the
59
* change of the weights: <br>
60
*
61
* \f$
62
* \Delta_i^{(t)} = \Bigg\{
63
* \begin{array}{ll}
64
* min( \eta^+ \cdot \Delta_i^{(t-1)}, \Delta_{max} ), & \mbox{if\ }
65
* \frac{\partial E^{(t-1)}}{\partial w_i} \cdot
66
* \frac{\partial E^{(t)}}{\partial w_i} > 0 \\
67
* max( \eta^- \cdot \Delta_i^{(t-1)}, \Delta_{min} ), & \mbox{if\ }
68
* \frac{\partial E^{(t-1)}}{\partial w_i} \cdot
69
* \frac{\partial E^{(t)}}{\partial w_i} < 0 \\
70
* \Delta_i^{(t-1)}, & \mbox{otherwise}
71
* \end{array}
72
* \f$
73
*
74
* The parameters \f$\eta^+ > 1\f$ and \f$0 < \eta^- < 1\f$ control
75
* the speed of the adaptation. To stabilize the increments, they are
76
* restricted to the interval \f$[\Delta_{min}, \Delta_{max}]\f$. <br>
77
* After the adaptation of the \f$\Delta_i\f$ the update for the
78
* weights will be calculated as
79
*
80
* \f$
81
* \Delta w_i^{(t)} := - \mbox{sign}
82
* \left( \frac{\partial E^{(t)}}{\partial w_i}\right) \cdot \Delta_i^{(t)}
83
* \f$
84
*
85
* There are several variants of the algorithm depending on what happens
86
* when the optimum is overstepped, i.e. a sign change of the gradient occurs
87
* and/or the new objective value is larger than the old.
88
*
89
* Weight-backtracking can be used to increase the
90
* stability of the method.
91
* if \f$\frac{\partial E^{(t-1)}}{\partial w_i} \cdot
92
* \frac{\partial E^{(t)}}{\partial w_i} < 0\f$ then
93
* \f$\Delta w_i^{(t)} := - \Delta w_i^{(t-1)}\f$.
94
* This heuristic can be improved by further taking the value of the last iteration
95
* into ccount: only undo an updated if the sign changed and the new function value
96
* is worse than the last. The idea of this modification is, that a change of the sign of the
97
* partial derivation \f$\frac{\partial E}{\partial w_i}\f$
98
* only states, that a minimum was skipped and not, whether this step
99
* lead to an approach to the minimum or not.
100
*
101
* Furthermore, it has been shown to be beneficial to use gradient freezing
102
* when the rgadient changes sign, i.e. ,
103
* if \f$\frac{\partial E^{(t-1)}}{\partial w_i} \cdot
104
* \frac{\partial E^{(t)}}{\partial w_i} < 0\f$ then
105
* \f$\frac{\partial E^{(t)}}{\partial w_i} := 0\f$;
106
* Thus, after an unsuccessful step is performed, delta is not changed
107
* for one iteration.
108
*
109
* Based on this, 4 major algorithms can be derived:
110
* Rprop-: (no backtracking, no freezing)
111
* IRprop-: (no backtracking, freezing)
112
* Rprop+: (gradient based backtracking, freezing)
113
* IRprop+: (function value based backtracking, freezing)
114
*
115
* By default, IRprop+ is chosen.
116
*
117
* For further information about the algorithm, please refer to: <br>
118
*
119
* Martin Riedmiller and Heinrich Braun, <br>
120
* "A Direct Adaptive Method for Faster Backpropagation Learning: The
121
* RPROP Algorithm". <br>
122
* In "Proceedings of the IEEE International Conference on Neural Networks",
123
* pp. 586-591, <br>
124
* Published by IEEE Press in 1993
125
*
126
* Martin Riedmiller, <br>
127
* "Advanced Supervised Learning in Multi-layer Perceptrons -
128
* From Backpropagation to Adaptive Learning Algorithms". <br>
129
* In "International Journal of Computer Standards and Interfaces", volume 16,
130
* no. 5, 1994, pp. 265-278 <br>
131
*
132
* Christian Igel and Michael Hüsken, <br>
133
* "Empirical Evaluation of the Improved Rprop Learning Algorithm". <br>
134
* In Neurocomputing Journal, 2002, in press <br>
135
* \ingroup gradientopt
136
*/
137
template
<
class
SearchPo
int
Type = RealVector>
138
class
Rprop
:
public
AbstractSingleObjectiveOptimizer
<SearchPointType >
139
{
140
public
:
141
typedef
AbstractObjectiveFunction<SearchPointType,double>
ObjectiveFunctionType
;
142
Rprop
();
143
144
/// \brief From INameable: return the class name.
145
std::string
name
()
const
146
{
return
"Rprop"
; }
147
148
void
init
(
ObjectiveFunctionType
const
& objectiveFunction,
SearchPointType
const
& startingPoint);
149
void
init
(
ObjectiveFunctionType
const
& objectiveFunction,
SearchPointType
const
& startingPoint,
double
initDelta);
150
using
AbstractSingleObjectiveOptimizer
<
SearchPointType
>
::init
;
151
152
void
step
(
ObjectiveFunctionType
const
& objectiveFunction);
153
154
virtual
void
read
(
InArchive
& archive );
155
virtual
void
write
(
OutArchive
& archive )
const
;
156
157
//! set decrease factor
158
void
setEtaMinus
(
double
etaMinus) {
159
RANGE_CHECK
( etaMinus < 1 );
160
RANGE_CHECK
( etaMinus > 0 );
161
m_decreaseFactor
= etaMinus;
162
}
163
164
//! set increase factor
165
void
setEtaPlus
(
double
etaPlus) {
166
RANGE_CHECK
( etaPlus > 1 );
167
m_increaseFactor
= etaPlus;
168
}
169
170
//! set upper limit on update
171
void
setMaxDelta
(
double
d) {
172
RANGE_CHECK
( d > 0 );
173
m_maxDelta
= d;
174
}
175
176
//! set lower limit on update
177
void
setMinDelta
(
double
d) {
178
RANGE_CHECK
( d >= 0 );
179
m_minDelta
= d;
180
}
181
182
void
setUseOldValue
(
bool
useOldValue){
183
m_useOldValue
= useOldValue;
184
if
(
m_useOldValue
)
185
this->
m_features
|= this->
REQUIRES_VALUE
;
186
else
187
this->
m_features
.
reset
(this->
REQUIRES_VALUE
);
188
}
189
void
setUseFreezing
(
bool
useFreezing){
190
m_useFreezing
= useFreezing;
191
}
192
void
setUseBacktracking
(
bool
useBacktracking){
193
m_useBacktracking
= useBacktracking;
194
}
195
196
//! return the maximal step size component
197
double
maxDelta
()
const
{
198
return
*std::max_element(
m_delta
.begin(),
m_delta
.end());
199
}
200
201
/// \brief Returns the derivative at the current point. Can be used for stopping criteria.
202
SearchPointType
const
&
derivative
()
const
{
203
return
m_derivative
;
204
}
205
protected
:
206
SearchPointType
m_derivative
;
207
208
//! The increase factor \f$ \eta^+ \f$, set to 1.2 by default.
209
double
m_increaseFactor
;
210
211
//! The decrease factor \f$ \eta^- \f$, set to 0.5 by default.
212
double
m_decreaseFactor
;
213
214
//! The upper limit of the increments \f$ \Delta w_i^{(t)} \f$, set to 1e100 by default.
215
double
m_maxDelta
;
216
217
//! The lower limit of the increments \f$ \Delta w_i^{(t)} \f$, set to 0.0 by default.
218
double
m_minDelta
;
219
//! The last function value observed
220
double
m_oldValue
;
221
222
size_t
m_parameterSize
;
223
224
//! The last error gradient.
225
SearchPointType
m_oldDerivative
;
226
//! the step eprformed last. used for weight backtracking
227
SearchPointType
m_deltaw
;
228
229
//! The absolute update values (increment) for all weights.
230
SearchPointType
m_delta
;
231
232
bool
m_useFreezing
;
233
bool
m_useBacktracking
;
234
bool
m_useOldValue
;
235
};
236
237
extern
template
class
Rprop<RealVector>
;
238
extern
template
class
Rprop<FloatVector>
;
239
240
}
241
242
#endif
243