neural_network¶

Most of the neural_network code is taken directly from scikit-learn. Some modified and additional functions are provided here.

The `_backprop` function¶

_backprop(self, X, y, activations, deltas, coef_grads, intercept_grads)¶

Compute the MLP loss function and its corresponding derivatives with respect to each parameter: weights and bias vectors. The derivative of the Wasserstein squared with respect to the softmax activation function is implemented here.

Parameters

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The input data.
y (array-like, shape (n_samples,)) – The target values.
activations (list, length = n_layers - 1) – The ith element of the list holds the values of the ith layer.
deltas (list, length = n_layers - 1) – The ith element of the list holds the difference between the activations of the i + 1 layer and the backpropagated error. More specifically, deltas are gradients of loss with respect to z in each layer, where z = wx + b is the value of a particular layer before passing through the activation function
coef_grad (list, length = n_layers - 1) – The ith element contains the amount of change used to update the coefficient parameters of the ith layer in an iteration.
intercept_grads (list, length = n_layers - 1) – The ith element contains the amount of change used to update the intercept parameters of the ith layer in an iteration.

Returns

loss (float)
coef_grads (list, length = n_layers - 1)
intercept_grads (list, length = n_layers - 1)

Notes

The implementation of the Wassertein loss derivative with respect to the softmax output activation is implemented here.

The `_compute_loss_grad` function¶

_compute_loss_grad(self, layer, n_samples, activations, deltas, coef_grads, intercept_grads)¶

Compute the gradient of loss with respect to coefs and intercept for specified layer.

This function does backpropagation for the specified one layer.

Notes

This is code is modified to allow either L1 or L2 regularization.

The `softmax` function¶

softmax(X)[source]¶

Compute the K-way softmax function inplace.

Parameters: X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The input data.
Returns: X_new – The transformed data.
Return type: {array-like, sparse matrix}, shape (n_samples, n_features)

The `kl_div_loss` function¶

kl_div_loss(y_true, y_pred)[source]¶

Compute the KL divergence for regression.

Parameters

y_true (array-like or label indicator matrix) – Ground truth (correct) values.
y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.

Returns

loss – The degree to which the samples are correctly predicted.

Return type

float

The `wasserstein_loss` function¶

wasserstein_loss(y_true, y_pred)[source]¶

Compute the l2 wasserstein loss

Parameters

y_true (array-like or label indicator matrix) – Ground truth (correct) values.
y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.

Returns

loss – The degree to which the samples are correctly predicted.

Return type

float

The `mixed` function¶

mixed(y_true, y_pred)[source]¶

a mixed wasserstein and kl-divergence loss

Parameters

y_true (array-like or label indicator matrix) – Ground truth (correct) values.
y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.

Returns

loss – The degree to which the samples are correctly predicted.

Return type

float

neural_network¶

The _backprop function¶

The _compute_loss_grad function¶

The softmax function¶

The kl_div_loss function¶

The wasserstein_loss function¶

The mixed function¶

The `_backprop` function¶

The `_compute_loss_grad` function¶

The `softmax` function¶

The `kl_div_loss` function¶

The `wasserstein_loss` function¶

The `mixed` function¶