shark::KernelBudgetedSGDTrainer< InputType, CacheType > Class Template Reference

Budgeted stochastic gradient descent training for kernel-based models. More...

#include <shark/Algorithms/Trainers/Budgeted/KernelBudgetedSGDTrainer.h>

+ Inheritance diagram for shark::KernelBudgetedSGDTrainer< InputType, CacheType >:

Public Types

enum  preInitializationMethod { NONE , RANDOM }
 preinitialization methods More...
 
typedef AbstractKernelFunction< InputTypeKernelType
 
typedef KernelClassifier< InputTypeClassifierType
 
typedef KernelExpansion< InputTypeModelType
 
typedef AbstractLoss< unsigned int, RealVector > LossType
 
typedef ConstProxyReference< typenameBatch< InputType >::typeconst >::type ConstBatchInputReference
 
typedef CacheType QpFloatType
 
typedef LabeledData< InputType, unsignedint >::element_type ElementType
 
typedef KernelMatrix< InputType, QpFloatTypeKernelMatrixType
 
typedef PartlyPrecomputedMatrix< KernelMatrixTypePartlyPrecomputedMatrixType
 
- Public Types inherited from shark::AbstractTrainer< KernelClassifier< InputType > >
typedef KernelClassifier< InputTypeModelType
 
typedef ModelType::InputType InputType
 
typedef typename Model::OutputType LabelType
 
typedef LabeledData< InputType, LabelTypeDatasetType
 
- Public Types inherited from shark::IParameterizable< VectorType >
typedef VectorType ParameterVectorType
 

Public Member Functions

 KernelBudgetedSGDTrainer (KernelType *kernel, const LossType *loss, double C, bool offset, bool unconstrained=false, size_t budgetSize=500, AbstractBudgetMaintenanceStrategy< InputType > *budgetMaintenanceStrategy=NULL, size_t epochs=1, size_t preInitializationMethod=NONE, double minMargin=1.0f)
 Constructor Note that there is no cache size involved, as merging vectors will always create new ones, which makes caching roughly obsolete.
 
size_t budgetSize () const
 
void setBudgetSize (std::size_t budgetSize)
 
AbstractBudgetMaintenanceStrategy< InputType > * budgetMaintenanceStrategy () const
 
void setBudgetMaintenanceStrategy (AbstractBudgetMaintenanceStrategy< InputType > *budgetMaintenanceStrategy)
 
double minMargin () const
 
void setMinMargin (double minMargin)
 
std::string name () const
 From INameable: return the class name.
 
void train (ClassifierType &classifier, const LabeledData< InputType, unsigned int > &dataset)
 
std::size_t epochs () const
 
void setEpochs (std::size_t value)
 
KernelTypekernel ()
 get the kernel function
 
const KernelTypekernel () const
 get the kernel function
 
void setKernel (KernelType *kernel)
 set the kernel function
 
bool isUnconstrained () const
 
double C () const
 return the value of the regularization parameter
 
void setC (double value)
 set the value of the regularization parameter (must be positive)
 
bool trainOffset () const
 check whether the model to be trained should include an offset term
 
RealVector parameterVector () const
 Returns the vector of hyper-parameters.
 
void setParameterVector (RealVector const &newParameters)
 Sets the vector of hyper-parameters.
 
size_t numberOfParameters () const
 Returns the number of hyper-parameters.
 
- Public Member Functions inherited from shark::AbstractTrainer< KernelClassifier< InputType > >
virtual void train (ModelType &model, DatasetType const &dataset)=0
 Core of the Trainer interface.
 
- Public Member Functions inherited from shark::INameable
virtual ~INameable ()
 
- Public Member Functions inherited from shark::ISerializable
virtual ~ISerializable ()
 Virtual d'tor.
 
virtual void read (InArchive &archive)
 Read the component from the supplied archive.
 
virtual void write (OutArchive &archive) const
 Write the component to the supplied archive.
 
void load (InArchive &archive, unsigned int version)
 Versioned loading of components, calls read(...).
 
void save (OutArchive &archive, unsigned int version) const
 Versioned storing of components, calls write(...).
 
 BOOST_SERIALIZATION_SPLIT_MEMBER ()
 
- Public Member Functions inherited from shark::IParameterizable< VectorType >
virtual ~IParameterizable ()
 

Protected Attributes

KernelTypem_kernel
 pointer to kernel function
 
const LossTypem_loss
 pointer to loss function
 
double m_C
 regularization parameter
 
bool m_offset
 should the resulting model have an offset term?
 
bool m_unconstrained
 should C be stored as log(C) as a parameter?
 
std::size_t m_budgetSize
 
AbstractBudgetMaintenanceStrategy< InputType > * m_budgetMaintenanceStrategy
 
std::size_t m_epochs
 number of training epochs (sweeps over the data), or 0 for default = max(10, C)
 
std::size_t m_preInitializationMethod
 
double m_minMargin
 

Detailed Description

template<class InputType, class CacheType = float>
class shark::KernelBudgetedSGDTrainer< InputType, CacheType >

Budgeted stochastic gradient descent training for kernel-based models.

This is an implementation of the BSGD algorithm, developed by
Wang, Crammer and Vucetic: Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training, JMLR 2012. Basically this is pegasos, so something similar to a perceptron. The main difference is that we do restrict the sparsity of the weight vector to a (currently predefined) value. Therefore, whenever this sparsity is reached, we have to decide how to add a new vector to the model, without destroying this sparsity. Several methods have been proposed for this, Wang et al. main insight is that merging two budget vectors (i.e. two vectors in the model). If the first one is searched by norm of its alpha coefficient, the second one can be found by some optimization problem, yielding a roughly optimal pair. This pair can be merged and by doing so the budget has now space for a new vector. Such strategies are called budget maintenance strategies.
This implementation owes much to the 'reference' implementation
in the BudgetedSVM software.
For the documentation of the basic SGD algorithm, please refer to
KernelSGDTrainer.h. Note that we did not take over the special alpha scaling from that class. Therefore this class is perhaps numerically not as robust as SGD.

Definition at line 97 of file KernelBudgetedSGDTrainer.h.

Member Typedef Documentation

◆ ClassifierType

template<class InputType , class CacheType = float>
typedef KernelClassifier<InputType> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::ClassifierType

Definition at line 102 of file KernelBudgetedSGDTrainer.h.

◆ ConstBatchInputReference

template<class InputType , class CacheType = float>
typedef ConstProxyReference<typenameBatch<InputType>::typeconst>::type shark::KernelBudgetedSGDTrainer< InputType, CacheType >::ConstBatchInputReference

Definition at line 105 of file KernelBudgetedSGDTrainer.h.

◆ ElementType

template<class InputType , class CacheType = float>
typedef LabeledData<InputType,unsignedint>::element_type shark::KernelBudgetedSGDTrainer< InputType, CacheType >::ElementType

Definition at line 107 of file KernelBudgetedSGDTrainer.h.

◆ KernelMatrixType

template<class InputType , class CacheType = float>
typedef KernelMatrix<InputType, QpFloatType> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::KernelMatrixType

Definition at line 109 of file KernelBudgetedSGDTrainer.h.

◆ KernelType

template<class InputType , class CacheType = float>
typedef AbstractKernelFunction<InputType> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::KernelType

Definition at line 101 of file KernelBudgetedSGDTrainer.h.

◆ LossType

template<class InputType , class CacheType = float>
typedef AbstractLoss<unsigned int, RealVector> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::LossType

Definition at line 104 of file KernelBudgetedSGDTrainer.h.

◆ ModelType

template<class InputType , class CacheType = float>
typedef KernelExpansion<InputType> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::ModelType

Definition at line 103 of file KernelBudgetedSGDTrainer.h.

◆ PartlyPrecomputedMatrixType

template<class InputType , class CacheType = float>
typedef PartlyPrecomputedMatrix< KernelMatrixType > shark::KernelBudgetedSGDTrainer< InputType, CacheType >::PartlyPrecomputedMatrixType

Definition at line 110 of file KernelBudgetedSGDTrainer.h.

◆ QpFloatType

template<class InputType , class CacheType = float>
typedef CacheType shark::KernelBudgetedSGDTrainer< InputType, CacheType >::QpFloatType

Definition at line 106 of file KernelBudgetedSGDTrainer.h.

Member Enumeration Documentation

◆ preInitializationMethod

template<class InputType , class CacheType = float>
enum shark::KernelBudgetedSGDTrainer::preInitializationMethod

preinitialization methods

Enumerator
NONE 
RANDOM 

Definition at line 115 of file KernelBudgetedSGDTrainer.h.

Constructor & Destructor Documentation

◆ KernelBudgetedSGDTrainer()

template<class InputType , class CacheType = float>
shark::KernelBudgetedSGDTrainer< InputType, CacheType >::KernelBudgetedSGDTrainer ( KernelType kernel,
const LossType loss,
double  C,
bool  offset,
bool  unconstrained = false,
size_t  budgetSize = 500,
AbstractBudgetMaintenanceStrategy< InputType > *  budgetMaintenanceStrategy = NULL,
size_t  epochs = 1,
size_t  preInitializationMethod = NONE,
double  minMargin = 1.0f 
)
inline

Constructor Note that there is no cache size involved, as merging vectors will always create new ones, which makes caching roughly obsolete.

Parameters
[in]kernelkernel function to use for training and prediction
[in]loss(sub-)differentiable loss function
[in]Cregularization parameter - always the 'true' value of C, even when unconstrained is set
[in]offsetwhether to train with offset/bias parameter or not
[in]unconstrainedwhen a C-value is given via setParameter, should it be piped through the exp-function before using it in the solver?
[in]budgetSizesize of the budget/model that the final solution will have. Note that it might be smaller though.
[in]budgetMaintenanceStrategyobject that contains the logic for maintaining the budget size.
[in]epochsnumber of epochs the SGD solver should run. if zero is given, the size will be the max of 10*datasetsize or C*datasetsize
[in]preInitializationMethodthe method to preinitialize the budget.
[in]minMarginthe margin every vector has to obey. Usually this is 1.

Definition at line 134 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetMaintenanceStrategy, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_loss, and SHARK_RUNTIME_CHECK.

Member Function Documentation

◆ budgetMaintenanceStrategy()

template<class InputType , class CacheType = float>
AbstractBudgetMaintenanceStrategy< InputType > * shark::KernelBudgetedSGDTrainer< InputType, CacheType >::budgetMaintenanceStrategy ( ) const
inline

return pointer to the budget maintenance strategy

Returns
pointer to the budget maintenance strategy.

Definition at line 185 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetMaintenanceStrategy.

Referenced by shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setBudgetMaintenanceStrategy().

◆ budgetSize()

template<class InputType , class CacheType = float>
size_t shark::KernelBudgetedSGDTrainer< InputType, CacheType >::budgetSize ( ) const
inline

◆ C()

template<class InputType , class CacheType = float>
double shark::KernelBudgetedSGDTrainer< InputType, CacheType >::C ( ) const
inline

return the value of the regularization parameter

Definition at line 437 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_C.

◆ epochs()

template<class InputType , class CacheType = float>
std::size_t shark::KernelBudgetedSGDTrainer< InputType, CacheType >::epochs ( ) const
inline

Return the number of training epochs. A value of 0 indicates that the default of max(10, C) should be used.

Definition at line 400 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_epochs.

◆ isUnconstrained()

template<class InputType , class CacheType = float>
bool shark::KernelBudgetedSGDTrainer< InputType, CacheType >::isUnconstrained ( ) const
inline

check whether the parameter C is represented as log(C), thus, in a form suitable for unconstrained optimization, in the parameter vector

Definition at line 431 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_unconstrained.

◆ kernel() [1/2]

template<class InputType , class CacheType = float>
KernelType * shark::KernelBudgetedSGDTrainer< InputType, CacheType >::kernel ( )
inline

◆ kernel() [2/2]

template<class InputType , class CacheType = float>
const KernelType * shark::KernelBudgetedSGDTrainer< InputType, CacheType >::kernel ( ) const
inline

get the kernel function

Definition at line 418 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel.

◆ minMargin()

template<class InputType , class CacheType = float>
double shark::KernelBudgetedSGDTrainer< InputType, CacheType >::minMargin ( ) const
inline

◆ name()

template<class InputType , class CacheType = float>
std::string shark::KernelBudgetedSGDTrainer< InputType, CacheType >::name ( ) const
inlinevirtual

From INameable: return the class name.

Reimplemented from shark::INameable.

Definition at line 219 of file KernelBudgetedSGDTrainer.h.

Referenced by main().

◆ numberOfParameters()

template<class InputType , class CacheType = float>
size_t shark::KernelBudgetedSGDTrainer< InputType, CacheType >::numberOfParameters ( ) const
inlinevirtual

◆ parameterVector()

◆ setBudgetMaintenanceStrategy()

template<class InputType , class CacheType = float>
void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setBudgetMaintenanceStrategy ( AbstractBudgetMaintenanceStrategy< InputType > *  budgetMaintenanceStrategy)
inline

set budget maintenance strategy

Parameters
[in]budgetMaintenanceStrategyset strategy to given object.

Definition at line 194 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::budgetMaintenanceStrategy(), and shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetMaintenanceStrategy.

◆ setBudgetSize()

template<class InputType , class CacheType = float>
void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setBudgetSize ( std::size_t  budgetSize)
inline

◆ setC()

template<class InputType , class CacheType = float>
void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setC ( double  value)
inline

set the value of the regularization parameter (must be positive)

Definition at line 443 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_C, and RANGE_CHECK.

◆ setEpochs()

template<class InputType , class CacheType = float>
void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setEpochs ( std::size_t  value)
inline

Set the number of training epochs. A value of 0 indicates that the default of max(10, C) should be used.

Definition at line 407 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_epochs.

Referenced by main().

◆ setKernel()

template<class InputType , class CacheType = float>
void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setKernel ( KernelType kernel)
inline

◆ setMinMargin()

template<class InputType , class CacheType = float>
void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setMinMargin ( double  minMargin)
inline

◆ setParameterVector()

◆ train()

template<class InputType , class CacheType = float>
void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::train ( ClassifierType classifier,
const LabeledData< InputType, unsigned int > &  dataset 
)
inline

Train routine.

Parameters
[in]classifierclassifier object for the final solution.
[in]datasetdataset to work with.

Definition at line 229 of file KernelBudgetedSGDTrainer.h.

References shark::KernelExpansion< InputType >::alpha(), shark::createBatch(), shark::Classifier< Model >::decisionFunction(), shark::random::discrete(), shark::Data< Type >::element(), shark::LabeledData< InputT, LabelT >::element(), shark::Data< Type >::elements(), shark::KernelExpansion< InputType >::eval(), shark::random::globalRng, shark::LabeledData< InputT, LabelT >::inputs(), shark::LabeledData< InputT, LabelT >::labels(), shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetMaintenanceStrategy, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetSize, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_C, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_epochs, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_minMargin, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_offset, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_preInitializationMethod, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::NONE, shark::numberOfClasses(), shark::LabeledData< InputT, LabelT >::numberOfElements(), shark::KernelBudgetedSGDTrainer< InputType, CacheType >::RANDOM, shark::KernelExpansion< InputType >::setStructure(), SHARK_ASSERT, and shark::KernelExpansion< InputType >::sparsify().

Referenced by main().

◆ trainOffset()

template<class InputType , class CacheType = float>
bool shark::KernelBudgetedSGDTrainer< InputType, CacheType >::trainOffset ( ) const
inline

check whether the model to be trained should include an offset term

Definition at line 450 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_offset.

Member Data Documentation

◆ m_budgetMaintenanceStrategy

◆ m_budgetSize

◆ m_C

◆ m_epochs

template<class InputType , class CacheType = float>
std::size_t shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_epochs
protected

◆ m_kernel

◆ m_loss

template<class InputType , class CacheType = float>
const LossType* shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_loss
protected

◆ m_minMargin

◆ m_offset

template<class InputType , class CacheType = float>
bool shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_offset
protected

◆ m_preInitializationMethod

template<class InputType , class CacheType = float>
std::size_t shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_preInitializationMethod
protected

◆ m_unconstrained


The documentation for this class was generated from the following file: