neural_network¶
Most of the neural_network code is taken directly from scikit-learn. Some modified and additional functions are provided here.
The _backprop
function¶
-
_backprop
(self, X, y, activations, deltas, coef_grads, intercept_grads)¶ Compute the MLP loss function and its corresponding derivatives with respect to each parameter: weights and bias vectors. The derivative of the Wasserstein squared with respect to the softmax activation function is implemented here.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The input data.
y (array-like, shape (n_samples,)) – The target values.
activations (list, length = n_layers - 1) – The ith element of the list holds the values of the ith layer.
deltas (list, length = n_layers - 1) – The ith element of the list holds the difference between the activations of the i + 1 layer and the backpropagated error. More specifically, deltas are gradients of loss with respect to z in each layer, where z = wx + b is the value of a particular layer before passing through the activation function
coef_grad (list, length = n_layers - 1) – The ith element contains the amount of change used to update the coefficient parameters of the ith layer in an iteration.
intercept_grads (list, length = n_layers - 1) – The ith element contains the amount of change used to update the intercept parameters of the ith layer in an iteration.
- Returns
loss (float)
coef_grads (list, length = n_layers - 1)
intercept_grads (list, length = n_layers - 1)
Notes
The implementation of the Wassertein loss derivative with respect to the softmax output activation is implemented here.
The _compute_loss_grad
function¶
-
_compute_loss_grad
(self, layer, n_samples, activations, deltas, coef_grads, intercept_grads)¶ Compute the gradient of loss with respect to coefs and intercept for specified layer.
This function does backpropagation for the specified one layer.
Notes
This is code is modified to allow either L1 or L2 regularization.
The softmax
function¶
The kl_div_loss
function¶
-
kl_div_loss
(y_true, y_pred)[source]¶ Compute the KL divergence for regression.
- Parameters
y_true (array-like or label indicator matrix) – Ground truth (correct) values.
y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.
- Returns
loss – The degree to which the samples are correctly predicted.
- Return type
The wasserstein_loss
function¶
-
wasserstein_loss
(y_true, y_pred)[source]¶ Compute the l2 wasserstein loss
- Parameters
y_true (array-like or label indicator matrix) – Ground truth (correct) values.
y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.
- Returns
loss – The degree to which the samples are correctly predicted.
- Return type
The mixed
function¶
-
mixed
(y_true, y_pred)[source]¶ a mixed wasserstein and kl-divergence loss
- Parameters
y_true (array-like or label indicator matrix) – Ground truth (correct) values.
y_pred (array-like or label indicator matrix) – Predicted values, as returned by a regression estimator.
- Returns
loss – The degree to which the samples are correctly predicted.
- Return type