bet.calculateP package¶

Submodules¶

bet.calculateP.calculateError module¶

This module provides methods for calculating error estimates of the probability measure for calculate probability measures. See Butler et al. 2015. <http://arxiv.org/pdf/1407.3851>.

cell_connectivity_exact calculates the connectivity of cells.
boundary_sets calculates which cells are on the boundary and strictly interior for contour events.
sampling_error is for calculating error estimates due to sampling.
model_error is for calculating error estimates due to error in solution of QoIs

bet.calculateP.calculateError.boundary_sets(disc, nei_list)¶

Calculates the the neighboring Voronoi cells for each cell.

Parameters

disc (bet.sample.discretization) – An object containing the discretization information.
nei_list (list) – list of lists defining contour events of neighboring cells.

Return type

tuple

Returns

(\(B_N, C_N\)) where B_N are the cells strictly on the interior of a contour event and C_N are the cells on the boundary of a contour eventas defined in Butler et al. 2015. <http://arxiv.org/pdf/1407.3851>

bet.calculateP.calculateError.cell_connectivity_exact(disc)¶

Calculates contour events of the cells and its neighbors.

Parameters: disc (bet.sample.discretization) – An object containing the discretization information.
Return type: list
Returns: list of lists of neighboring cells

class bet.calculateP.calculateError.model_error(disc)¶

Bases: object

A class for calculating the error due to numerical error for a discretization.

calculate_for_contour_events()¶

Calculate the numerical error for each contour event.

Return type: list
Returns: er_list, a list of the error estimates for each contour event.

calculate_for_sample_set_region(s_set, region, emulated_set=None)¶

Calculate the numerical error estimate for a region of the input space defined by a sample set object.

Parameters

s_set (bet.sample.sample_set_base) – sample set for which to calculate error
region (int) – region of s_set for which to calculate error
emulated_set – sample set for volume emulation

Return type

float

Returns

er_est, the numerical error estimate for the region

calculate_for_sample_set_region_mc(s_set, region)¶

Calculate the numerical error estimate for a region of the input space defined by a sample set object, using the MC assumption.

Parameters

s_set (bet.sample.sample_set_base) – sample set for which to calculate error
region (int) – region of s_set for which to calculate error

:rtype float :returns: er_est, the numerical error estimate for the region

disc¶: bet.sample.discretiztion defining the problem

disc_new¶: bet.sample.discretiztion from adding error estimates

num¶: number of inputs and outputs

class bet.calculateP.calculateError.sampling_error(disc, exact=True)¶

Bases: object

A class for calculating the error due to sampling for a discretization.

B_N¶: dictionaries of interior and boundary sets

C_N¶: dictionaries of interior and boundary sets

calculate_for_contour_events()¶

Calculate the sampling error bounds for each contour event. If volumes are already calculated these are used. If not, volume emulation is used if an emulated input set exists. Otherwise the MC assumption is made.

Return type: tuple
Returns: (up_list, low_list) where up_list is a list of the upper bounds for each contour event and low_list is a list of the lower bounds.

calculate_for_sample_set_region(s_set, region, emulated_set=None)¶

Calculate the sampling error bounds for a region of the input space defined by a sample set object which defines an event \(A\).

Parameters

s_set (bet.sample.sample_set_base) – sample set for which to calculate error
region (int) – region of s_set for which to calculate error
emulated_set (bet.sample_set_base) – sample set for volume emulation

Return type

tuple

Returns

(upper_bound, lower_bound) the upper and lower bounds for the error.

disc¶: bet.sample.discretization that defines the problem

num¶: number of inputs and outputs

exception bet.calculateP.calculateError.wrong_argument_type¶

Bases: Exception

Exception for when the argument is not one of the acceptible types.

bet.calculateP.calculateP module¶

This module provides methods for calculating the probability measure \(P_{\Lambda}\).

prob_on_emulated_samples provides a skeleton class and calculates the probability for a set of emulation points.
prob estimates the probability based on pre-defined volumes.
prob_with_emulated estimates the probability using volume emulation.
prob_from_sample_set estimates the probability based on probabilities from another

sample set on the same space.

bet.calculateP.calculateP.prob(discretization, globalize=True)¶

Calculates \(P_{\Lambda}(\mathcal{V}_{\lambda_{samples}})\), the probability associated with a set of cells defined by the model solves at \((\lambda_{samples})\) where the volumes of these cells are provided.

Parameters

discretization (class:bet.sample.discretization) – An object containing the discretization information.
globalize (bool) – Makes local variables global.

bet.calculateP.calculateP.prob_from_discretization_input(disc, set_new)¶

Calculates \(P_{\Lambda}(\mathcal{V}_{\lambda_{samples_new}})\) from \(P_{\Lambda}(\mathcal{V}_{\lambda_{samples_old}})\) where \(\lambda_{samples_old}\) come from an input discretization.

Parameters

disc (discretization) – Discretiztion on which probabilities have already been calculated
set_new (sample_set_base) – Sample set for which probabilities will be calculated.

bet.calculateP.calculateP.prob_from_sample_set(set_old, set_new)¶

Calculates \(P_{\Lambda}(\mathcal{V}_{\lambda_{samples_new}})\) from \(P_{\Lambda}(\mathcal{V}_{\lambda_{samples_old}})\) using the MC assumption with respect to set_old.

Parameters

set_old (sample_set_base) – Sample set on which probabilities have already been calculated
set_new (sample_set_base) – Sample set for which probabilities will be calculated.

bet.calculateP.calculateP.prob_from_sample_set_with_emulated_volumes(set_old, set_new, set_emulate=None)¶

Calculates \(P_{\Lambda}(\mathcal{V}_{\lambda_{samples_new}})\) from \(P_{\Lambda}(\mathcal{V}_{\lambda_{samples_old}})\) using a set of emulated points are distributed with respect to the volume measure.

Parameters

set_old (sample_set_base) – Sample set on which probabilities have already been calculated
set_new (sample_set_base) – Sample set for which probabilities will be calculated.
set_emulate (sample_set_base) – Sample set for volume emulation

bet.calculateP.calculateP.prob_on_emulated_samples(discretization, globalize=True)¶

Calculates \(P_{\Lambda}(\mathcal{V}_{\lambda_{emulate}})\), the probability associated with a set of Voronoi cells defined by num_l_emulate iid samples \((\lambda_{emulate})\). This is added to the emulated input sample set object.

Parameters

discretization (class:bet.sample.discretization) – An object containing the discretization information.
globalize (bool) – Makes local variables global.

bet.calculateP.calculateP.prob_with_emulated_volumes(discretization)¶

Calculates \(P_{\Lambda}(\mathcal{V}_{\lambda_{samples}})\), the probability associated with a set of cells defined by the model solves at \((\lambda_{samples})\) where the volumes are calculated with the given emulated input points.

Parameters

discretization (class:bet.sample.discretization) – An object containing the discretization information.
globalize – Makes local variables global.

bet.calculateP.calculateR module¶

This module contains functions for the density-based approach that utilizes a ratio of observed to predicted densities to update an initial density on the parameter space.

generate_output_kdes() generates KDEs on output sets.
invert_to_kde() solves SIP for weighted KDEs.
invert_to_gmm() solves SIP for a Gaussian Mixture Model.
invert_to_multivariate_gaussian() solves SIP for a multivariate Gaussian.
invert_to_random_variable() solves SIP for random variables.
invert_rejection_sampling() solves SIP with rejection sampling.

bet.calculateP.calculateR.generate_output_kdes(discretization, bw_method=None)¶

Generate Kernel Density Estimates on predicted and observed output sample sets.

Parameters

discretization (bet.sample.discretization) – Discretization used to calculate KDes
bw_method (str) – bandwidth method for scipy.stats.gaussian_kde. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.

Returns

prediction set, prediction kdes, observation set, observation kdes, number of clusters

Return type

bet.discretization.sample_set, list, bet.discretization.sample_set, list, int

bet.calculateP.calculateR.invert(discretization, bw_method=None)¶

Solve the data consistent stochastic inverse problem, solving for input sample weights.

Parameters

discretization (bet.sample.discretization) – Discretization on which to perform inversion.
bw_method (str) – bandwidth method for scipy.stats.gaussian_kde. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.

Returns

acceptance rate, mean acceptance rate, pointers for samples to clusters

Return type

list, np.ndarray, list

bet.calculateP.calculateR.invert_rejection_sampling(discretization, bw_method=None)¶

Solve the data consistent stochastic inverse problem by rejection sampling.

Parameters

discretization (bet.sample.discretization) – Discretization on which to perform inversion.
bw_method (str) – bandwidth method for scipy.stats.gaussian_kde. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.

Returns

sample set containing samples

Return type

bet.sample.sample_set

bet.calculateP.calculateR.invert_to_gmm(discretization, bw_method=None)¶

Solve the data consistent stochastic inverse problem, solving for a Gaussian mixture model.

Parameters

discretization (bet.sample.discretization) – Discretization on which to perform inversion.
bw_method (str) – bandwidth method for scipy.stats.gaussian_kde. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.

Returns

means, covariances, and weights for Gaussians

Return type

list, list, list

bet.calculateP.calculateR.invert_to_kde(discretization, bw_method=None)¶

Solve the data consistent stochastic inverse problem, solving for a weighted kernel density estimate.

Parameters

discretization (bet.sample.discretization) – Discretization on which to perform inversion.
bw_method (str) – bandwidth method for scipy.stats.gaussian_kde. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.

Returns

marginal probabilities and cluster weights

Return type

list, np.ndarray

bet.calculateP.calculateR.invert_to_multivariate_gaussian(discretization, bw_method=None)¶

Solve the data consistent stochastic inverse problem, solving for a multivariate Gaussian.

Parameters

discretization (bet.sample.discretization) – Discretization on which to perform inversion.
bw_method (str) – bandwidth method for scipy.stats.gaussian_kde. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.

Returns

marginal probabilities and cluster weights

Return type

list, np.ndarray

bet.calculateP.calculateR.invert_to_random_variable(discretization, rv, num_reweighted=10000, bw_method=None)¶

Solve the data consistent stochastic inverse problem, fitting a random variable.

rv can take multiple types of formats depending on type of distribution.

A string is used for the same distribution with default parameters in each dimension. ex. rv = ‘uniform’ or rv = ‘beta’

A list or tuple of length 2 is used for the same distribution with fixed user-defined parameters in each dimension as a dictionary. ex. rv = [‘uniform’, {‘floc’:-2, ‘fscale’:5}] or rv = [‘beta’, {‘fa’: 2, ‘fb’:5, ‘floc’:-2, ‘fscale’:5}]

A list of length dim which entries of lists or tuples of length 2 is used for different distributions with fixed user-defined parameters in each dimension as a dictionary. ex. rv = [[‘uniform’, {‘floc’:-2, ‘fscale’:5}],

[‘beta’, {‘fa’: 2, ‘fb’:5, ‘floc’:-2, ‘fscale’:5}]]

Parameters

discretization (bet.sample.discretization) – Discretization on which to perform inversion.
rv (str, list, or tuple) – Type and parameters for continuous random variables.
num_reweighted (int) – number of reweighted samples for fitting
bw_method (str) – bandwidth method for scipy.stats.gaussian_kde. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.

Returns

marginal probabilities and cluster weights

Return type

list, np.ndarray

bet.calculateP.simpleFunP module¶

This module provides methods for creating simple function approximations to be used by calculateP. These simple function approximations are returned as bet.sample.sample_set objects.

bet.calculateP.simpleFunP.check_inputs(data_set, Q_ref)¶: Checks inputs to methods.

bet.calculateP.simpleFunP.check_inputs_no_reference(data_set)¶: Checks inputs to methods.

bet.calculateP.simpleFunP.check_type(val, data_set=None)¶: Add support for different data types that can be passed as keyword arguments. Attempt to infer dimension and set it correctly.

bet.calculateP.simpleFunP.infer_Q(data_set)¶: Attempt to infer reference value around which to define a sample set.

bet.calculateP.simpleFunP.normal_partition_normal_distribution(data_set, Q_ref=None, std=1, M=1, num_d_emulate=1000000.0)¶

Creates a simple function approximation of \(\rho_{\mathcal{D},M}\) where \(\rho_{\mathcal{D},M}\) is a multivariate normal probability density centered at Q_ref with standard deviation std using M bins sampled from the given normal distribution.

Parameters

data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
M (int) – Defines number M samples in D used to define \(\rho_{\mathcal{D},M}\) The choice of M is something of an “art” - play around with it and you can get reasonable results with a relatively small number here like 50.
num_d_emulate (int) – Number of samples used to emulate using an MC assumption
Q_ref (ndarray of size (mdim,)) – \(Q(\lambda_{reference})\)
std (ndarray of size (mdim,)) – The standard deviation of each QoI

Return type

voronoi_sample_set

Returns

sample_set object defining simple function approximation

bet.calculateP.simpleFunP.regular_partition_uniform_distribution_rectangle_domain(data_set, rect_domain, cells_per_dimension=1)¶

Creates a simple function appoximation of \(\rho_{\mathcal{D},M}\) where \(\rho{\mathcal{D}, M}\) is a uniform probablity density over the hyperrectangular domain specified by rect_domain.

Since \(\rho_\mathcal{D}\) is a uniform distribution on a hyperrectangle we are able to represent it exactly.

Parameters

data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
rect_domain (numpy.ndarray of shape (2, mdim)) – The domain overwhich \(\rho_\mathcal{D}\) is uniform.
cells_per_dimension (list) – number of cells per dimension

Return type

rectangle_sample_set

Returns

sample_set object defining simple function approximation

bet.calculateP.simpleFunP.regular_partition_uniform_distribution_rectangle_scaled(data_set, Q_ref=None, rect_scale=1, cells_per_dimension=1)¶

Creates a simple function approximation of \(\rho_{\mathcal{D},M}\) where \(\rho_{\mathcal{D},M}\) is a uniform probability density centered at Q_ref (or the reference_value of a sample set. If Q_ref is not given the reference value is used.) with rect_scale of the width of D.

Since rho_D is a uniform distribution on a hyperrectanlge we are able to represent it exactly.

Parameters

data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
rect_scale (double or list) – The scale used to determine the width of the uniform distribution as rect_size = (data_max-data_min)*rect_scale
Q_ref (ndarray of size (mdim,)) – \(Q(\lambda_{reference})\)
cells_per_dimension (list) – number of cells per dimension

Return type

rectangle_sample_set

Returns

sample_set object defining simple function approximation

bet.calculateP.simpleFunP.regular_partition_uniform_distribution_rectangle_size(data_set, Q_ref=None, rect_size=None, cells_per_dimension=1)¶

Creates a simple function approximation of \(\rho_{\mathcal{D},M}\) where \(\rho_{\mathcal{D},M}\) is a uniform probability density centered at Q_ref (or the reference_value of a sample set. If Q_ref is not given the reference value is used) with rect_size of the width of a hyperrectangle.

Since rho_D is a uniform distribution on a hyperrectanlge we can represent it exactly.

Parameters

rect_size (double or list) – The size used to determine the width of the uniform distribution
data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
Q_ref (ndarray of size (mdim,)) – \(Q(\lambda_{reference})\)
cells_per_dimension (list) – number of cells per dimension.

Return type

rectangle_sample_set

Returns

sample_set object defining simple function approximation

bet.calculateP.simpleFunP.uniform_partition_normal_distribution(data_set, Q_ref=None, std=1, M=1, num_d_emulate=1000000.0)¶

Creates a simple function approximation of \(\rho_{\mathcal{D},M}\) where \(\rho_{\mathcal{D},M}\) is a multivariate normal probability density centered at Q_ref with standard deviation std using M bins sampled from a uniform distribution with a size 4 standard deviations in each direction.

Parameters

data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
M (int) – Defines number M samples in D used to define \(\rho_{\mathcal{D},M}\) The choice of M is something of an “art” - play around with it and you can get reasonable results with a relatively small number here like 50.
num_d_emulate (int) – Number of samples used to emulate using an MC assumption
Q_ref (ndarray of size (mdim,)) – \(Q(\lambda_{reference})\)
std (ndarray of size (mdim,)) – The standard deviation of each QoI

Return type

voronoi_sample_set

Returns

sample_set object defininng simple function approximation

bet.calculateP.simpleFunP.uniform_partition_uniform_distribution_data_samples(data_set)¶

Creates a simple function approximation of \(\rho_{\mathcal{D},M}\) where \(\rho_{\mathcal{D},M}\) is a uniform probability density over the entire data_domain. Here the data_domain is the union of Voronoi cells defined by data. In other words we assign each sample the same probability, so M = len(data) or rather len(d_distr_samples) == len(data). The purpose of this method is to approximate uniform distributions over irregularly shaped domains.

Parameters: data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
Return type: voronoi_sample_set
Returns: sample_set object defininng simple function approximation

bet.calculateP.simpleFunP.uniform_partition_uniform_distribution_rectangle_domain(data_set, rect_domain, M=50, num_d_emulate=1000000.0)¶

Creates a simple function approximation of \(\rho_{\mathcal{D}}\) where \(\rho_{\mathcal{D}}\) is a uniform probability density on a generalized rectangle defined by rect_domain. The simple function approximation is then defined by determining M Voronoi cells (i.e., “bins”) partitioning \(\mathcal{D}\). These bins are only implicitly defined by M ``samples in :math:`\mathcal{D}`. Finally, the probabilities of each of these bins is computed by sampling from :math:`\rho{\mathcal{D}}` and using nearest neighbor searches to bin these samples in the ``M implicitly defined bins. The result is the simple function approximation denoted by \(\rho_{\mathcal{D},M}\).

Note that all computations in the measure-based approach that follow from this are for the fixed simple function approximation \(\rho_{\mathcal{D},M}\).

Parameters

M (int) – Defines number M samples in D used to define \(\rho_{\mathcal{D},M}\) The choice of M is something of an “art” - play around with it and you can get reasonable results with a relatively small number here like 50.
rect_domain (double or list) – The support of the density
num_d_emulate (int) – Number of samples used to emulate using an MC assumption
data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
Q_ref (ndarray of size (mdim,)) – \(Q(\)lambda_{reference})`

Return type

voronoi_sample_set

Returns

sample_set object defininng simple function approximation

bet.calculateP.simpleFunP.uniform_partition_uniform_distribution_rectangle_scaled(data_set, Q_ref=None, rect_scale=0.2, M=50, num_d_emulate=1000000.0)¶

Creates a simple function approximation of \(\rho_{\mathcal{D}}\) where \(\rho_{\mathcal{D}}\) is a uniform probability density on a generalized rectangle centered at Q_ref or the reference_value of a sample set. If Q_ref is not given the reference value is used.. The support of this density is defined by rect_scale, which determines the size of the generalized rectangle by scaling the circumscribing generalized rectangle of \(\mathcal{D}\). The simple function approximation is then defined by determining M `` Voronoi cells (i.e., "bins") partitioning :math:`\mathcal{D}`. These bins are only implicitly defined by ``M samples in \(\mathcal{D}\). Finally, the probabilities of each of these bins is computed by sampling from \(\rho{\mathcal{D}}\) and using nearest neighbor searches to bin these samples in the M implicitly defined bins. The result is the simple function approximation denoted by \(\rho_{\mathcal{D},M}\).

Note that all computations in the measure-based approach that follow from this are for the fixed simple function approximation \(\rho_{\mathcal{D},M}\).

Parameters

M (int) – Defines number M samples in D used to define \(\rho_{\mathcal{D},M}\) The choice of M is something of an “art” - play around with it and you can get reasonable results with a relatively small number here like 50.
rect_scale (double or list) – The scale used to determine the support of the uniform distribution as rect_size = (data_max-data_min)*rect_scale
num_d_emulate (int) – Number of samples used to emulate using an MC assumption
data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
Q_ref (ndarray of size (mdim,)) – \(Q(\)lambda_{reference})`

Return type

voronoi_sample_set

Returns

sample_set object defininng simple function approximation

bet.calculateP.simpleFunP.uniform_partition_uniform_distribution_rectangle_size(data_set, Q_ref=None, rect_size=None, M=50, num_d_emulate=1000000.0)¶

Creates a simple function approximation of \(\rho_{\mathcal{D}}\) where \(\rho_{\mathcal{D}}\) is a uniform probability density on a generalized rectangle centered at Q_ref or the reference_value of a sample set. If Q_ref is not given the reference value is used. The support of this density is defined by rect_size, which determines the size of the generalized rectangle. The simple function approximation is then defined by determining M Voronoi cells (i.e., “bins”) partitioning \(\mathcal{D}\). These bins are only implicitly defined by M samples in \(\mathcal{D}\). Finally, the probabilities of each of these bins is computed by sampling from \(\rho{\mathcal{D}}\) and using nearest neighbor searches to bin these samples in the M implicitly defined bins. The result is the simple function approximation denoted by \(\rho_{\mathcal{D},M}\).

Note

data_set is only used to determine dimension.

Note that all computations in the measure-based approach that follow from this are for the fixed simple function approximation \(\rho_{\mathcal{D},M}\).

Parameters

M (int) – Defines number M samples in D used to define \(\rho_{\mathcal{D},M}\) The choice of M is something of an “art” - play around with it and you can get reasonable results with a relatively small number here like 50.
rect_size (double or list) – Determines the size of the support of the uniform distribution on a generalized rectangle
num_d_emulate (int) – Number of samples used to emulate using an MC assumption
data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
Q_ref (ndarray of size (mdim,)) – \(Q(\)lambda_{reference})`

Return type

voronoi_sample_set

Returns

sample_set object defininng simple function approximation

bet.calculateP.simpleFunP.user_partition_user_distribution(data_set, data_partition_set, data_distribution_set)¶

Creates a user defined simple function approximation of a user defined distribution. The simple function discretization is specified in the data_partition_set, and the set of i.i.d. samples from the distribution is specified in the data_distribution_set.

Parameters

data_set (discretization or sample_set or ndarray) – Sample set that the probability measure is defined for.
data_partition_set (discretization or sample_set or ndarray) – Sample set defining the discretization of the data space into Voronoi cells for which a simple function is defined upon.
data_distribution_set (discretization or sample_set or ndarray) – Sample set containing the i.i.d. samples from the distribution on the data space that are binned within the Voronoi cells implicitly defined by the data_discretization_set.

Return type

voronoi_sample_set

Returns

sample_set object defininng simple function approximation

exception bet.calculateP.simpleFunP.wrong_argument_type¶

Bases: Exception

Exception for when the argument for data_set is not one of the acceptible types.

Module contents¶

This subpackage provides classes and methods for calculating the probability measure \(P_{\Lambda}\).

calculateP provides methods for approximating probability densities in the measure-based approach.
simpleFunP provides methods for creating simple function approximations of probability densities for the measure-based approach.
calculateR provides methods for density-based approach.
calculateError provides methods for approximating numerical and sampling errors.