Shark Data Containers Quick Reference

Relevant Types

Data, UnlabeledData, LabeledData (also the typedefs ClassificationDataset, CompressedClassificationDataset, RegressionDataset), DataView, Data, DataDistribution, LabeledDataDistribution, CVFolds.

Container / View Creation

Data<T>()

create empty data container

Dataset.h

Data<T>(data)

create shallow copy with content sharing

Dataset.h

Data<T>(N)

create new data container with N batches

Dataset.h

Data<T>(N, elem)

create new data container with N elements, with blueprint elem

Dataset.h

UnlabeledData<T>()

create empty data container

Dataset.h

UnlabeledData<T>(data)

create shallow copy with content sharing

Dataset.h

UnlabeledData<T>(N)

create new data container with N batches

Dataset.h

UnlabeledData<T>(N, elem)

create new data container with N elements, with blueprint elem

Dataset.h

LabeledData<I,L>()

create empty data container

Dataset.h

LabeledData<I,L>(input, labels)

create shallow copy with content sharing

Dataset.h

LabeledData<I,L>(N)

create new data container with N batches

Dataset.h

LabeledData<I,L>(N, elem)

create new data container with N elements, with blueprint elem

Dataset.h

DataView<DatasetType>(data)

create view of data for fast random access to elements

DataView.h

createDataFromRange()

create from begin+end iterators, e.g., from std::vector

Dataset.h

createLabeledDataFromRange()

create from two ranges for inputs and labels

Dataset.h

toDataset()

create data container from view

DataView.h

Batch Access

data.empty()

true iff data.numberOfBatches() == 0

Dataset.h

data.numberOfBatches()

number of batches in the container

Dataset.h

data.batch(i)

(reference to) the i-th batch

Dataset.h

data.batches()

stl-compliant access to batches as a range

Dataset.h

Element Access

Warning

Random access to elements is a linear time operation! Never iterate over elements by index. Consider employing a DataView for random access.

data.numberOfElements()

number of elements in the container

Dataset.h

data.element(i)

(proxy to) the i-th elements

Dataset.h

data.elements()

stl-compliant access to (proxies to) elements as a range

Dataset.h

Batch Access

data.empty()

true iff data.numberOfBatches() == 0

Dataset.h

data.numberOfBatches()

number of batches in the container

Dataset.h

data.batch(i)

(reference to) the i-th batch

Dataset.h

data.batches()

stl-compliant access to batches as a range

Dataset.h

Further Methods

LabeledData::inputShape()

Shape of the input vectors

Dataset.h

LabeledData::labelShape()

Shape of the label vectors

Dataset.h

Data::shape()

Shape of the data vectors

Dataset.h

swap()

swap container contents (constant time)

Dataset.h

makeIndependent()

make sure data is not shared with other containers

Dataset.h

shuffle()

randomly reorder elements (not only batches)

Dataset.h

append(data)

concatenate containers

Dataset.h

LabeledData::inputs()

underlying container of inputs

Dataset.h

LabeledData::labels()

underlying container of labels

Dataset.h

Sizes and Dimensions

numberOfClasses()

number of classes (maximal class label + 1)

Dataset.h

classSizes()

vector of class sizes

Dataset.h

dataDimension()

dimension of vectors in the data set

Dataset.h

inputDimension()

dimension of input vectors in the data set

Dataset.h

labelDimension()

dimension of label vectors in the data set

Dataset.h

Subset Creation and Folds for Cross-validation

splitAtElement()

split data into front and back part (often training and test)

Dataset.h

subset()

create indexed subset from DataView

DataView.h

createCVIID()

create folds by i.i.d. assignment of element to folds

CVDatasetTools.h

createCVSameSize()

create folds of roughly equal size

CVDatasetTools.h

createCVSameSizeBalanced()

create folds of roughly equal size, stratifying classes

CVDatasetTools.h

createCVIndexed()

create folds explicitly by index

CVDatasetTools.h

createCVFullyIndexed()

create folds explicitly by index with reordering

CVDatasetTools.h

Data::splice()

split data at batch boundaries (contrary of append)

Dataset.h

indexedSubset()

obtain subset of batches from indices

Dataset.h

selectFeatures()

filter out a subset of features from Data

Dataset.h

selectInputFeatures()

filter out a subset of features from LabeledData

Dataset.h

Import / Export

importCSV()

import from comma separated values (CSV) file

Csv.h

exportCSV()

export to comma separated values (CSV) file

Csv.h

importSparseData()

import from sparse vector (libSVM) format

SparseData.h

exportSparseData()

export to sparse vector (libSVM) format

SparseData.h

importPGM()

import single PGM image

Pgm.h

importPGMSet()

import set of PGM images

Pgm.h

exportPGM()

export single PGM image

Pgm.h