neural_network

Most of the neural_network code is taken directly from scikit-learn. Some modified and additional functions are provided here.

The _backprop function

_backprop(self, X, y, activations, deltas, coef_grads, intercept_grads)

Compute the MLP loss function and its corresponding derivatives with respect to each parameter: weights and bias vectors. The derivative of the Wasserstein squared with respect to the softmax activation function is implemented here.

Parameters
  • X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The input data.

  • y (array-like, shape (n_samples,)) – The target values.

  • activations (list, length = n_layers - 1) – The ith element of the list holds the values of the ith layer.

  • deltas (list, length = n_layers - 1) – The ith element of the list holds the difference between the activations of the i + 1 layer and the backpropagated error. More specifically, deltas are gradients of loss with respect to z in each layer, where z = wx + b is the value of a particular layer before passing through the activation function

  • coef_grad (list, length = n_layers - 1) – The ith element contains the amount of change used to update the coefficient parameters of the ith layer in an iteration.

  • intercept_grads (list, length = n_layers - 1) – The ith element contains the amount of change used to update the intercept parameters of the ith layer in an iteration.

Returns

  • loss (float)

  • coef_grads (list, length = n_layers - 1)

  • intercept_grads (list, length = n_layers - 1)

Notes

The implementation of the Wassertein loss derivative with respect to the softmax output activation is implemented here.

The _compute_loss_grad function

_compute_loss_grad(self, layer, n_samples, activations, deltas, coef_grads, intercept_grads)

Compute the gradient of loss with respect to coefs and intercept for specified layer.

This function does backpropagation for the specified one layer.

Notes

This is code is modified to allow either L1 or L2 regularization.

The softmax function

softmax(X)[source]

Compute the K-way softmax function inplace.

Parameters

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The input data.

Returns

X_new – The transformed data.

Return type

{array-like, sparse matrix}, shape (n_samples, n_features)

The kl_div_loss function

kl_div_loss(y_true, y_pred)[source]

Compute the KL divergence for regression.

Parameters
  • y_true (array-like or label indicator matrix) – Ground truth (correct) values.

  • y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.

Returns

loss – The degree to which the samples are correctly predicted.

Return type

float

The wasserstein_loss function

wasserstein_loss(y_true, y_pred)[source]

Compute the l2 wasserstein loss

Parameters
  • y_true (array-like or label indicator matrix) – Ground truth (correct) values.

  • y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.

Returns

loss – The degree to which the samples are correctly predicted.

Return type

float

The mixed function

mixed(y_true, y_pred)[source]

a mixed wasserstein and kl-divergence loss

Parameters
  • y_true (array-like or label indicator matrix) – Ground truth (correct) values.

  • y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.

Returns

loss – The degree to which the samples are correctly predicted.

Return type

float