Shark Data Containers Quick Reference¶
Related tutorials¶
Data Containers, Label Formats, Importing Data, Creating and Using Subsets of Data, Normalization of Input Data.
Relevant Types¶
Data, UnlabeledData, LabeledData (also the typedefs ClassificationDataset, CompressedClassificationDataset, RegressionDataset), DataView, Data, DataDistribution, LabeledDataDistribution, CVFolds.
Container / View Creation¶
Data<T>() |
create empty data container |
|
Data<T>(data) |
create shallow copy with content sharing |
|
Data<T>(N) |
create new data container with N batches |
|
Data<T>(N, elem) |
create new data container with N elements, with blueprint elem |
|
UnlabeledData<T>() |
create empty data container |
|
UnlabeledData<T>(data) |
create shallow copy with content sharing |
|
UnlabeledData<T>(N) |
create new data container with N batches |
|
UnlabeledData<T>(N, elem) |
create new data container with N elements, with blueprint elem |
|
LabeledData<I,L>() |
create empty data container |
|
LabeledData<I,L>(input, labels) |
create shallow copy with content sharing |
|
LabeledData<I,L>(N) |
create new data container with N batches |
|
LabeledData<I,L>(N, elem) |
create new data container with N elements, with blueprint elem |
|
DataView<DatasetType>(data) |
create view of data for fast random access to elements |
|
create from begin+end iterators, e.g., from std::vector |
|
|
create from two ranges for inputs and labels |
|
|
create data container from view |
|
Batch Access¶
data.empty() |
true iff data.numberOfBatches() == 0 |
|
data.numberOfBatches() |
number of batches in the container |
|
data.batch(i) |
(reference to) the i-th batch |
|
data.batches() |
stl-compliant access to batches as a range |
|
Element Access¶
Warning
Random access to elements is a linear time operation!
Never iterate over elements by index. Consider employing
a DataView
for random access.
data.numberOfElements() |
number of elements in the container |
|
data.element(i) |
(proxy to) the i-th elements |
|
data.elements() |
stl-compliant access to (proxies to) elements as a range |
|
Batch Access¶
data.empty() |
true iff data.numberOfBatches() == 0 |
|
data.numberOfBatches() |
number of batches in the container |
|
data.batch(i) |
(reference to) the i-th batch |
|
data.batches() |
stl-compliant access to batches as a range |
|
Further Methods¶
Shape of the input vectors |
|
|
Shape of the label vectors |
|
|
Shape of the data vectors |
|
|
swap() |
swap container contents (constant time) |
|
makeIndependent() |
make sure data is not shared with other containers |
|
shuffle() |
randomly reorder elements (not only batches) |
|
append(data) |
concatenate containers |
|
LabeledData::inputs() |
underlying container of inputs |
|
LabeledData::labels() |
underlying container of labels |
|
Sizes and Dimensions¶
number of classes (maximal class label + 1) |
|
|
vector of class sizes |
|
|
dimension of vectors in the data set |
|
|
dimension of input vectors in the data set |
|
|
dimension of label vectors in the data set |
|
Subset Creation and Folds for Cross-validation¶
split data into front and back part (often training and test) |
|
|
create indexed subset from DataView |
|
|
create folds by i.i.d. assignment of element to folds |
|
|
create folds of roughly equal size |
|
|
create folds of roughly equal size, stratifying classes |
|
|
create folds explicitly by index |
|
|
create folds explicitly by index with reordering |
|
|
split data at batch boundaries (contrary of append) |
|
|
obtain subset of batches from indices |
|
|
filter out a subset of features from Data |
|
|
filter out a subset of features from LabeledData |
|
Import / Export¶
import from comma separated values (CSV) file |
|
|
export to comma separated values (CSV) file |
|
|
import from sparse vector (libSVM) format |
|
|
export to sparse vector (libSVM) format |
|
|
import single PGM image |
|
|
import set of PGM images |
|
|
export single PGM image |
|