On the generation of direction numbers for Sobol sequences and their application to Quasi Monte Carlo methods

Ilya Manyakin, Andres Oliva Denis, Gregory Kell, Adrien Papaioannou
September 5, 2020

Monte Carlo (MC) methods lie at the heart of many numeric computing tasks ranging from modelling how high velocity ions impinging on an atomic lattice to the pricing of a financial instruments. While being generic and scaling independently of problem dimension, practical applications of Monte-Carlo suffers from the slow error convergence rate of O(N1/2)O(N^{-1/2}) as the number of simulations, NN, increases. Thus, achieving high precision estimates with Monte Carlo often requires large numbers of simulations. Quasi-Monte Carlo methods attempt to accelerate this by replacing the pseudo-random sequences in MC with deterministic `low discrepancy` sequences that have greater uniformity and produce error convergence rates of order O(N1)O(N^{-1}) to O(N3/2)O(N^{-3/2}) on suitable problem types. While the asymptotically faster convergence rates of QMC methods may at first be appealing, use of QMC carries with it a set of preconditions that must be met in order to achieve such results in practice. Moreover, a naive implementation that merely replaces the pseudo-random numbers with low discrepancy sequences may in fact produce worse results than MC. In this report, we provide an introduction to applying Quasi-Monte Carlo to numeric integration, address the issues of low dimensional projections of Sobol sequences and sequence randomization. We present the tkrg-a-ap5 set of Sobol` sequence direction numbers satisfying uniformity properties AA for 195,000 dimensions and AA` for every 5 adjacent dimensions. Additionally, in this report we have assembled a comprehensive set of benchmark test functions from literature that we hope will provide a useful reference for evaluating low discrepancy sequences.

Keywords Monte Carlo · Quasi-random number · Sobol sequences

1 Introduction

Monte Carlo methods are a class of algorithms that utilize random numbers for solving problems in numerical computing such as simulation of complex systems, optimization or integration. In the context of finance, Monte Carlo simulations are often utilized for pricing financial instruments and scenario analysis, whereby Monte Carlo is used to generate possible realizations of a model by varying the input model parameters drawn from a predefined distributions. In scenario analysis Monte Carlo is often used to generate possible future trajectories for quantities of interest. After generating multiple scenarios, one can combine them to compute aggregated metrics such as (but not limited to) means and variances of the metrics at different time points, from which risk estimates can be calculated.
Both the task of generating scenarios and computing aggregated metrics can be recast as solutions to integration problems, whereby we attempt to evaluate an integrand f(x)f(x) over an integration domain Ω\Omega, with xΩx \in \Omega representing the model parameters, f()f(\cdot) representing the model and II representing the result of the integration:
I=Ωf(x)dx I = \int_\Omega f(x) dx
While analytic solutions to some integrals exist and can be derived by hand, this endeavour becomes impractical for even moderate increases in integral dimensionality or integrand complexity. In such cases, users may resort to symbolic mathematics software to derive analytic solutions or to numeric approximations. Numeric techniques for evaluating integrals are termed quadrature methods - these attempt to compute a numeric estimate I^\hat{I} from a set of evaluations of the integrand at different locations in the integration domain Ω\Omega. The set of integrand evaluations f(xi)f(x_i) is then combined in a weighted sum, with weights determined by the quadrature rule, to give the resulting estimate:
I^=iωif(xi) \hat{I} = \sum_i \omega_i f(x_i)
The points xiΩx_i \in \Omega at which the integrand f(x)f(x) is evaluated are termed abscissa while the weights of each evaluations contribution to the final result are denoted ωi\omega_i. The specification of the abscissa xix_i and weights ωi\omega_i act to define a given quadrature method. One of the most well known low dimensional example of a quadrature method is the trapezium rule which approximates a definite integral:
I=abf(x)dxI^=baNi=1Nf(xi1)+f(xi)2\begin{align} I &= \int_a^b f(x) dx \\ \hat{I} &= \frac{b-a}{N} \sum_{i=1}^{N}\frac{f(x_{i-1})+f(x_i)}{2}\end{align}
For uniformly spaced abscissa xi=a+baNix_i = a + \frac{b-a}{N} i, the rate of at which the error II^|I - \hat{I}| shrinks as we increase the number of abscissa NN is bounded from the top by:
II^=(ba)212N2[f(b)f(a)]+O(N3)|I - \hat{I}| = -\frac{(b-a)^2}{12N^2}[f`(b) - f`(a)] + O(N^{-3})
The estimate error in this case thus scales as O(N2)O(N^{-2}) [12] with the number of points and depends also on the gradient of the function f(x)f`(x). Generalizing to dd dimensions the above rate scales as O(N2/d)O(N^{-2/d}) - slowing down and thus requiring more integrand evaluations to achieve the same level of accuracy. Combined with the exponentially increasing number of abscissa in higher dimensions, this limits applicability this technique to high dimensional integrals. To understand why - consider using only 10 abscissa per dimension. In five dimensions (d=5d=5), this would required 10510^5 integrand evaluations, while in d=20d=20 dimensions this becomes 102010^{20} - an impractical number.
This motivates the development of stochastic quadrature techniques where, instead of using deterministic abscissa, the points at which the integrand is evaluated are selected at random. Counter-intuitively, such a random selection of points results in a generic procedure allowing a much broader class of integrals to be evaluated, with the precision of the estimate determined only by the number of points and independent of the dimensionality O(N1/2)O(N^{-1/2}). Furthermore, only some minor smoothness assumptions are placed on the integrand.

Monte Carlo methods

Monte Carlo integration are stochastic quadrature techniques utilizing random numbers to perform the integration. Similarly to the deterministic quadratures above, the Monte Carlo integral estimates a dd-dimensional integral. A key feature of MC is that the NN abscissa are selected at random throughout the domain and each evaluation of the integrand is given an equal weighting i  ωi=1/N\forall i \; \omega_i = 1/N
I^=V(1Ni=1Nf(xi))=Vf \hat{I} = V \left( \frac{1}{N} \sum_{i=1}^{N} f(x_i) \right) = V \langle f \rangle
Here the `volume` of the integration domain is represented by V=ΩdxnV = \int_\Omega dx^n while f\langle f \rangle represents the average of all integrand evaluations. The choice of abscissa to be independent and identically distributed (IID) across the domain allows us reframe the problem as an evaluation of the expectation:
I=E[f(x)]=Ωf(x)u(x)dx I = \mathbb{E}[f(x)] = \int_\Omega f(x) u(x) dx
Where u(x)u(x) denotes the uniform density on the domain. Imposing an additional assumption of a bounded integrand satisfying Var[f(x)]=σ2Var[f(x)] = \sigma^2 \le \infty enables the Central Limit Theorem (CLT) that ensures MC estimator I^\hat{I} convergence to the true value II as the number of abscissa tends to infinity (NN\rightarrow \infty), a result known as the Law of Large Numbers (LLN):
P(Ωf(x)dx1Nif(xi)ϵ)σ2Nϵ2 \mathbb{P}\left(\left| \int_\Omega f(x) dx - \frac{1}{N}\sum_i f(x_i) \right| \geq \epsilon \right) \leq \frac{\sigma^2}{N \epsilon^2}
This is the key result that enables one to be confident in the results produced by MC integration, providing both an estimate of the value as well as a bound on the error associated with the estimate - obtained by computing the variance Var[f(x)]Var[f(x)]. Finally, the rate of convergence of the estimate increase can be seen to scale as O(N1/2)O(N^{-1/2}), the standard error of the MC estimate (applying the CLT). This O(N1/2)O(N^{-1/2}) scaling of estimator error, independently of dimension, is key benefit of Monte Carlo. Additionally, unlike in a deterministic quadrature rule, the number of points NN need not be known before-hand - we can continually improve our estimate by simplify evaluating more points without re-discretizing the grid (of abscissa) from scratch - giving the MC algorithm the any-time property. Through using this technique we rely not on the unpredictability of the random numbers but on the uniformity of their distribution within the integration volume and the rate at which we can generate the random numbers. For purposes of performing this calculation, we therefore do not need true random number generators - instead we can use pseudo-random number generators (PRNG). Such generators mimic the properties of true RNGs but are fully deterministic algorithms, where an initial seed (number) controls the resulting sequence, indistinguishable from truly random sequences for our application. A widely used instance of a PRNG is the Mersenne-Twister algorithm [13] that is implemented in many software libraries used for numeric computing.
While such generators are a workhorse for modern numeric computing one may ask if it is possible to attain faster convergence using sequences with improved uniformity properties, without attempting to emulate random numbers at all? The answer to this is yes! From the early 1960s such methods termed Quasi-Monte Carlo have been developed and applied for exactly this purpose - accelerating convergence of numeric integration. In this setting, the PRNG is replaced with so-called low discrepancy sequence generators, designed to produce sequences equi-distributed number sequences. This allows for both deterministic error convergence analysis and improved rates, of order O(log(N)d/N)O(log(N)^d/N) [11]. We discuss these sequences next.

2 Quasi-Random Numbers

fig1/
Figure 1: These plots show an example of a two-dimensional distribution of points n=64n=64 on a unit square, divided into a 8×88\times8 grid. The `Grid` and `LDS` schemes can be seen to stratify the unit square, with each of the sub-regions containing a single point, the `RNG` points on the other hand exhbit greater non-uniformity and show clustering of points, with some regions containing >1>1 point, while others contain none. The histograms (3232 bins) below each plot demonstrates how the given 2-d sequence projects projects onto a lower-dimensional space, with the `Grid` scheme having multiple points project onto a single xx-coordinate. Comparatively the `LDS` sequence points do not, thus providing a greater `effective` point count.

An intuitive explanation for why Monte-Carlo works for integration is that in the limit as the number of points, NN\rightarrow \infty, the point set fills the entire domain Ω\Omega uniformly and averages the integrand together to get an estimate of the integral. A key requirement for this work is that the points fill the space uniformly. Formally, this criterion is defined using a measure called discrepancy (denoted DN)D^*_N) which in turn can be used to define the uniformity.

Definition 1 Uniformly distributed sequence
A sequence x1,x2,...,xNx_1,x_2,...,x_N is uniformly distributed if DN(x1,...,xN)0D^*_N(x_1,...,x_N) \rightarrow 0 as NN \rightarrow \infty

To simplify discussion, we will consider the integration domain to by a dd-dimensional unit hypercube Ω=[0,1)d\Omega = [0,1)^d, with a lower bound including the origin and the upper bound of unity. Before defining the star-discrepancy DND^*_N used above, we, following Owen [23], consider local discrepancy:
δ(a;x1,...,xN)=1Ni=1N1xi[0,a)j=1daj \delta ( a; x_1, ..., x_N ) = \frac{1}{N} \sum_{i=1}^{N} 1_{x_i \in [0,a)} - \prod_{j=1}^d a_j
The local discrepancy is the difference between the fraction of points from the sequence xix_i that fall into the interval [0,a)[0,a) (1Ni=1N1xi[0,a)\frac{1}{N} \sum_{i=1}^{N} 1_{x_i \in [0,a)}) and the volume occupied by this interval (j=1daj\prod_{j=1}^d a_j) relative to the entire domain Ω=[0,1)d\Omega = [0,1)^d. In this way, the local discrepancy is a measure of deviation of the expected number of points in that interval relative to the volume occupied by the interval. The star-discrepancy DND^*_N is then the maximum absolute deviation in local discrepancy over all possible intervals [0,a)[0,a) in the unit hypercube:

Definition 2 Star-Discrepancy
DN(x1,...,xN)=supa[0,1)dδ(a;x1,...,xN) D^*_N(x_1,...,x_N) = \underset{a \in [0,1)^d}{sup} |\delta ( a; x_1, ..., x_N )|

While in practice there are a multitude of other discrepancy measures, we will not discuss them here, noting only that for a practical sequences their estimation is often an NP-hard problem [14], [15], something hinted at by the definition of the star-discrepancy, requiring the evaluation of all possible sub-volumes of the unit cube. Practical techniques for estimating thus attempt to provide bounds on the discrepancies instead of estimating them exactly.

Koksma-Hlawka inequality


The key reason for defining the discrepancy measures is their application to deriving theoretical bounds on the convergence of the error. A central result of this analysis is the Koksma-Hlawka (KH) inequality relating a sequences` discrepancy to the approximation error produced when using the sequence for numeric integration. This acts as the analogue of the central limit theorem (CLT) for Monte-Carlo, but has the additional constraint that the function being integrated is required to be Riemann integrable. Unlike the CLT the bound is deterministic (not probabilistic) - it holds always and applies for finite N (not like the CLT which holds in the limit as NN\rightarrow \infty). The inequality relates the error of approximating the integral [0,1)df(x)dx\int_{[0,1)^d}f(x)dx with a quadrature approximation 1Ni=1Nf(xi)\frac{1}{N}\sum_{i=1}^{N}f(x_i) and gives a bound on the absolute error as the product of V(f)V(f), describing the variation of the integrand f(x)f(x) and the discrepancy D(x1,...,xN)D^*(x_1,...,x_N) of the sequence of numbers x1,...,xNx_1,...,x_N at which we evaluate the integrand.
ϵ=1Ni=1Nf(xi)[0,1)df(x)dxV(f)D(x1,...,xN) \epsilon = \left|\frac{1}{N} \sum_{i=1}^{N} f(x_i) - \int_{[0,1)^d}f(x) dx \right| \leq V(f)D^*(x_1,...,x_N)
In practice, it is often overly pessimistic and difficult to apply due to the computational complexity of estimating the Hardy-Krause (HK) variation V(f)V(f) of the function. This is often more computationally demanding than the computation of the integral itself, requiring the calculation of partial derivations of ff. In the one-dimensional case, this reduces to:
V(f)=Ωf(x)dx V(f) = \int_\Omega |f`(x)|dx
We rarely compute this bound in practice, but it does provide the key connection between the error convergence and the discrepancy DND^*_N of the point set used to evaluate the integral. Thus, one can hope to do better than the MC estimate using low discrepancy sequences instead. A sequence is considered to be low discrepancy if it satisfies the following:

Definition 3 Low discrepancy sequence
A sequence x1,x2,...x_1,x_2,... of points in [0,1)d[0,1)^d is a low discrepancy sequence if for any N>1N>1
D(x1,...,xN)Cdlog(N)dN D^*(x_1,...,x_N) \leq C_d \frac{log(N)^d}{N}
where CdC_d is a constant depending only on the dimension

Combining this definition with the HK inequality, under the assumption that the functions variation V(f)V(f) is finite (and f(x)f(x) is Riemann integrable), we obtain the the upper bound of the error convergence for a given number of function evaluations NN with the integral of dimension dd:
ϵO(log(N)dN)O(N1)\epsilon \sim O\left(\frac{log(N)^d}{N} \right) \sim O(N^{-1})
This compares favourably with the MC bound of ϵO(N1/2)\epsilon \sim O(N^{-1/2}) and scales with dimension dd better than the quadrature bound of O(N2/d)O(N^{-2/d}) for high dimensions. However, it relies on the central assumption of finite V(f)V(f): an assumption that can break when we consider transforming the unit hypercube [0,1)d[0,1)^d. An example of such a transformation is the inverse Gaussian cumulative distribution function that maps the limits of the interval (0,1)(,)(0,1) \rightarrow (-\infty, \infty). This mapping produces an function with an infinite Hardy-Krause variation V(f)=V(f) = \infty. While of concern in theory, practical applications of low discrepancy sequences still demonstrate good performance. Finally, practical applications should consider that, the asymptotic convergence rates may kick in only for sufficiently large sample counts. An example of this is reported by Owen [23] who derived a bound nb2sn \geq b^{2s} an s-dimensional sequence in base-b.

Low disrepancy sequences

There are many techniques for generating low discrepancy sequences. A common theme among them is partitioning the unit interval and generating suitably stratified points by utilizing an nn digit (number of positions after the decimal point) base bb representation of the each number xix_i in the sequence:
xi=j=1nvijbj=(0.vi1vi2vi3...)b x_i = \sum_{j=1}^n v_{ij} b^{-j} = (0.v_{i1}v_{i2}v_{i3}...)_b
The process of generating the sequence then reduces to generating a set of coefficients vijv_ij for each xix_i in the sequence the sequence x1,x2,...x_1,x_2,... such that the resulting sequence has the desired low discrepancy property. Given a base bb, we can thus represent each value in the sequence by the set of coefficients vi1vi2vi3...v_{i1}v_{i2}v_{i3}... allows us to work using integer arithmetic instead of a floating point format.

Sobol sequences

Sobol sequences are a base 2 low discrepancy sequence, making them comparatively faster to generate than sequences in a another base, with the generation rate being comparable to the rate of generation by PRNGs. To generate a Sobol sequence, we need a primitive polynomial and a set of direction numbers; these define and initialize a recurrence relation that, given the current point xix_i, generates the next point xi+1x_{i+1} for every xix_i in the sequence. The first of these, the primitive polynomial, has the form given by:
p(x)=xs+a1xs1+a2xs2++as1x+1 p(x) = x^s + a_1 x^{s - 1} + a_2 x^{s - 2} + \cdots + a_{s - 1} x + 1
Here, ss denotes the degree of the polynomial, while the coefficients aka_k can take values from the field GF(2)={0,1}GF(2) = \{0,1\}. By treating the coefficients of the polynomials are binary digits, we view each polynomial as a binary number, defining operations of (+,×)(+,\times) (and their inverses) on the set of all such polynomials, forming a field. The polynomial is then considered primitive if it is minimal and cannot be factored into a product of two any other polynomials, i.e. it is irreducible. Given a primitive polynomial, we can define a recurrence relation that generates each of the digits of the integer representation of xi=(0.vi1vi2vi3...)bx_i = (0.v_{i1}v_{i2}v_{i3}...)_b [6]:
vi,j=(k=1sakvik,j)vid,j[vid,j/2d] for i>d v_{i,j} = \left( \bigoplus_{k=1}^{s} a_k v_{i-k,j} \right) \oplus v_{i-d,j} \oplus [v_{i-d,j}/2^d] \text{ for } i > d
This recurrence allows us to generate the next term for the given digit, and thereby construct the sequence xix_i. However, to initialize this recurrence relation, we need the first v0,j,...,vd,jv_{0,j}, ..., v_{d,j} values; these are the direction numbers. Owing to the parallels between Sobol sequences and linear congruential random number generators we can view these direction numbers as analogous to random seeds. Nonetheless, unlike seeds, we must be careful with the choice of direction numbers, as these can drastically impact the quality of the sequences, particularly high dimensional sequences where we need one set of direction numbers for every dimension. Thus, two central questions to the practical application of Sobol sequences are:

  • How do we find good direction numbers?

  • Given a set of direction numbers, how do we quantify what we mean by good?

(t,m,s)(t,m,s)-nets

As Sobol sequences consist of NN randomly generated points x1,xNx_1,\dots x_N, they can be described as point sets PN={x1,,xN}P_N = \{x_1,\dots,x_N\} [20], [19]. When the base of the set is chosen to be bb and N=bmN = b^m, each dimension jj can be partitioned into bqjb^{q_j} equal parts for some qi>0q_i>0. In other words, the unit hypercube [0,1)s[0,1)^s is partitioned into 2q1++qs2^{q_1+\dots+q_s} rectangles of equal sizes. If each rectangle contains exactly the same number of points from PNP_N, then PNP_N is (q1,q2,,qN)(q_1, q_2,\dots,q_N)-equidistributed in base bb. Ultimately, if the equidistribution property holds for smaller values of qjq_j, then the point set can be seen as more uniformly distributed. In order to explore the uniformity properties of PNP_N further, a (t,m,s)(t,m,s)-net is defined as follows:

Definition 4 (t,m,s)(t,m,s)-net
PNP_N is a (t,m,s)(t,m,s)-net if and only if it is (q1,q2,,qN)(q_1, q_2,\dots,q_N)-equidistributed whenever q1++qs=mtq_1+\dots+q_s=m-t.

The tt-value is thus the smallest value for which the tt-value of PNP_N is smallest. Given that a greater tt-value for a given mm minimises q1++qsq_1+\dots+q_s, the tt-value can be seen as a uniformity measure of a point set, where a higher tt-value indicates greater uniformity.

(t,s)(t,s)-sequences

In addition to being (t,m,s)(t,m,s)-nets, Sobol sequences can be referred to as (t,s)(t,s)-sequences of points that possess certain stratification properties, which describe how the points in the sequence fill sub-regions of a unit hypercube. More specifically, a (t,s)(t,s)-sequence is defined as:

Definition 5 (t,s)(t,s)-sequence
A (t,s)(t,s)-sequence constructed in base b can be considered an infinite series of (t,m,s)(t,m,s)-nets for every positive integer m, each net being a set of bmb^m points, with each elementary interval of volume 1/bm1/b^m containing at most btb^t points.

While being a key feature of Sobol and other low discrepancy sequences, such stratification properties are not unique to them. To illustrate, consider Figure [1] which divides the unit square into a 8×88\times 8 grid. The left-most plot shows that deterministic quadratures like the trapezoid rule discussed above also stratify the unit square. Comparatively, the lack of stratification of (pseudo)random sequences can be seen on the right, where some of the boxes have an excess number of points while others are empty. Another feature of the plot is the projections of the sequences onto 1-dimension at the bottom of each figure, shown as histograms. Here, we observe the benefit of the `LDS` sequence relative to the `Grid` scheme where all points in a given axis project onto the same coordinate in 1-dimension, thus reducing the `effective` number of points. On the other hand, the `LDS` points have distinct values and thus do not suffer from this problem, as can be seen by the uniform histogram.

fig2/
Figure 2: Points for a two dimensional Sobol sequence satisfying properties A (left) and A` (right). For property A each point lies in one of the binary intervals obtained by subdividing each dimension into [0,12)[12,1)=[0,1)[0,\frac{1}{2}) \cup [\frac{1}{2},1) = [0,1). Similarly for property A` the unit square is subdivided into [0,14)[14,12)[12,34)[34,1)=[0,1)[0,\frac{1}{4}) \cup [\frac{1}{4},\frac{1}{2}) \cup [\frac{1}{2},\frac{3}{4}) \cup [\frac{3}{4},1)= [0,1)

Properties A and A`

Before considering numeric properties calculated from Sobol sequences, we can consider properties of the direction numbers used to initialize a given sequence. As the direction numbers used to initialize the recurrence relations determine how the sequence is generated, their properties can influence the quality of the resulting sequence. Initially formulated in [17], properties A and A` properties represent uniformity conditions [16] that determine how the points in the sequence stratify the unit hypercube.

Definition 6 Property A [6]
A d-dimensional Sobol sequence with 2d2^d points satisfies property A if any binary segment of the unit cube, obtained by diving in half the unit hypercube along each dimension, contains exactly one point from the sequence.

Definition 7 Property A` [6]
A d-dimensional Sobol sequence with 4d4^d points satisfies property A` if any binary segment of the unit cube, obtained by diving by four the unit hypercube along each dimension, contains exactly one point from the sequence.

Points from a two-dimensional Sobol sequence satisfying both properties A and A` are shown in Figure [2], where one can observe that there is a single point in each subinterval. Thus, properties A and A` can be related to stratified sampling schemes, used for variance reduction [23].
Property A
A d-dimensional Sobol` sequence posesses Property A if and only if:
det(Vd)1(mod 2)det(V_d) \equiv 1 \:\: (\text{mod } 2)
Where the matrix Vd({0,1})d×dV_d \in (\{0,1\})^{d \times d} and is given by:
Vd=[10001v2,2,1v3,2,1vd,2,11v2,3,1v3,3,1vd,3,11v2,d,1v3,d,1vd,d,1]V_d = \left[ {\begin{array}{ccccc} 1 & 0 & 0 & \cdots & 0 \\ 1 & v_{2,2,1} & v_{3,2,1} & \cdots & v_{d,2,1} \\ 1 & v_{2,3,1} & v_{3,3,1} & \cdots & v_{d,3,1} \\ \vdots & \vdots & \vdots & \ddots & \vdots\\ 1 & v_{2,d,1} & v_{3,d,1} & \cdots & v_{d,d,1} \\ \end{array} } \right]
Here, the binary values vi,j,kv_{i,j,k} denote the kthk^{th} binary digit of the ithi^{th} direction number for the dimension jj, where the entire direction number is represented by vi,j=(0.vi,j,1vi,j,2...)2v_{i,j} = (0.v_{i,j,1}v_{i,j,2}...)_2.
Property A`
A d-dimensional Sobol` sequence posesses Property A` if and only if:
det(Ud)1(mod 2)det(U_d) \equiv 1 \:\: (\text{mod } 2)
Where the matrix Ud({0,1})2d×2dU_d \in (\{0,1\})^{2d \times 2d} is given by:
Ud=[v1,1,1v1,1,2v2,1,1v2,1,2vd,1,1vd,1,2v1,2,1v1,2,2v2,2,1v2,2,2vd,2,1vd,2,2v1,3,1v1,3,2v2,3,1v2,3,2vd,3,1vd,3,2.v1,2d,1v1,2d,2v3,2d,1v3,2d,2vd,2d,1vd,2d,2]U_d = \left[ {\begin{array}{ccccccc} v_{1,1,\textcolor{red}{1}} & v_{1,1,\textcolor{blue}{2}} & v_{2,1,\textcolor{red}{1}} & v_{2,1,\textcolor{blue}{2}} & \cdots & v_{d,1,\textcolor{red}{1}} & v_{d,1,\textcolor{blue}{2}} \\ v_{1,2,\textcolor{red}{1}} & v_{1,2,\textcolor{blue}{2}} & v_{2,2,\textcolor{red}{1}} & v_{2,2,\textcolor{blue}{2}} & \cdots & v_{d,2,\textcolor{red}{1}} & v_{d,2,\textcolor{blue}{2}} \\ v_{1,3,\textcolor{red}{1}} & v_{1,3,\textcolor{blue}{2}} & v_{2,3,\textcolor{red}{1}} & v_{2,3,\textcolor{blue}{2}} & \cdots & v_{d,3,\textcolor{red}{1}} & v_{d,3,\textcolor{blue}{2}} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & . \\ v_{1,2d,\textcolor{red}{1}} & v_{1,2d,\textcolor{blue}{2}} & v_{3,2d,\textcolor{red}{1}} & v_{3,2d,\textcolor{blue}{2}} & \cdots & v_{d,2d,\textcolor{red}{1}}& v_{d,2d,\textcolor{blue}{2}} \\ \end{array} } \right]
Similarly to Property A, the binary values vi,j,kv_{i,j,k} represent the kthk^{th} binary digit of the (i,j)th(i,j)^{th} direction number vi,jv_{i,j} given by: vi,j=(0.vi,j,1vi,j,2...)2v_{i,j} = (0.v_{i,j,1}v_{i,j,2}...)_2.
Properties A and A` in adjacent dimensions
In the case when the dimensionality, dd, of the Sobol sequence is large, we may choose to impose properties A and A` on a subset of the {1,...,d}\{1,...,d\} dimensions. One such choice can be subsets of variables forming a consecutive sequence, such that i{1,...dk}\forall i \in \{1,...d-k\} the dimensions {i,i+1,...,i+k}\{i,i+1,...,i+k\} satisfy AA and/or AA`. These `adjacency` properties generalize AA and AA` into AkA_k and AkA`_k, with the former being special cases where k=dk=d. Such a generalization is useful when attempting to construct Sobol sequences in practice in order to reduce the search algorithm time complexity, while satisfying AA and/or AA only in adjacent dimensions. For a more in-depth discussion of AkA_k and AkA`_k, see [1].
Given that the conditions for properties A and A` can be satisfied independently, a given sequence can satisfy both/one of/neither of these properties. As mentioned above, when searching for practical sequences, we may choose to utilize any combination of AkA_k and AkA`_k, as well as potentially different values of kk. This affects the algorithmic complexity through the size of the determinants of direction numbers we need to compute.

Low dimensional projections of Sobol sequences

fig3/
Figure 3: Two dimensional projections of a multidimensional Sobol sequence for 2102^{10} points. The left-most projection displays good distribution over the unit square, while the middle and right-most show pathological cases where large gaps arise in the sequences.

Although the properties described above are useful theoretically, we must also consider practical sequence equi-distribution properties for finite numbers of samples. While univariate Sobol sequences are always well distributed, higher dimensional sequences may exhibit significant co-dependence between individual dimensions for a finite number of samples. This can be illustrated by considering two dimensional projections of multidimensional Sobol sequences (Figure [3]), where we can contrast a well distributed two dimensional projection in Figure [3]a with two highly co-dependent projections showing in Figure [3]b and [3]c. Aside from filling the space non-uniformly, the poor equidistribution of the points lead to integration errors as some regions in the integration domain are completely empty. While increasing the number of points will asymptotically fill in such regions, these asymptotes will occur for impractically large samples. Thus, it is desirable to design sequences without such pathologies in the first place. One practical measure of the quality of the two dimensional projections is the Pearson correlation coefficient which we can compute after transforming each one-dimensional sequences through the inverse cumulative Gaussian distribution. Using this, we can quickly estimate the quality of the two-dimensional projections in the multidimensional sequence. We can construct a metric to minimize using this estimate, such as the maximum absolute correlation, that can be used when searching for direction number sets with the aim of generating sequences with improved projections.
While the two-dimensional projection evaluated at a low sample counts may show high correlations, a simple approach to reducing them, without searching for an alternative low discrepancy sequence, is to simply increase the number of points sampled - in the limit as the number of points nn\rightarrow \infty the sequence discrepancy tends to zero. This space is filled uniformly and thus the correlations go to zero as well.
An alternative approach to reducing, but not eliminating, correlations is to scramble the individual one dimensional sequences using a method like Owen`s scrambling (see Section [3]). This preserves the discrepancy of the sequence and shifts the distribution of correlations towards lower values by permuting the elements of a given univariate sequence (Figure [7]). Being based on permutations of the unit interval, the scrambling method is thus unlikely completely wipe out all pathological two-dimensional projections, but one would expect it to improve the overall quality of the multidimensional sequence.

Mathematical operations for low discrepancy sequences

In this section, we describe two mathematical operations that need to performed when generating low discrepancy sequences: direction number generation and determinant calculation. Direction number generation involves mapping any integer to a direction number list, while determinant calculation is used to ascertain whether the direction numbers satisfy properties A and A`. We provide the theory and mathematical formulation for these concepts here and refer to them in section [Sequence: tkrg-a-ap5 (this work)] when searching for direction numbers.

Direction number generation

To understand the process of direction number generation consider that they form an ordered sequence. A consequence of this is that there exists a mapping that converts the index of the sequence N\in \mathbb{N} into the corresponding set of direction numbers. To help explain this we begin by considering integer sequences and their representation in an arbitrary in-place system like binary or decimal. When using such `in-place` systems a number is written as a sequence of digits in some base. The `base` assigns a possible number of values that a given digit can take. For example, every binary digit can take on either 0 or 1, every decimal digit can take on any value in {0,1,2...9}.
For common systems each digit has the same number of values, so the digit denoting the number of 10`s (in base 10) can take on the same number of values as the digit denoting the number of millions (10610^6). For a given base aa, we denote aia_i as the number of combinations for the ithi^{th} digit, with i1i \geq 1. A given number with nn digits in base aa then consists of the following digits combination values:
a1  a2  a3  a4  a5  a5  ...  an a_1 \;a_2\; a_3\; a_4\; a_5\; a_5\; ... \;a_n
The total number of possible combinations this digit string can represent is i=1nai\prod_{i=1}^{n} a_i. In the example of base 2 with n=10n=10 bits, we get i=1102=210\prod_{i=1}^{10} 2 = 2^{10}. The number of combinations aia_i relates to how a number xx can be represented in the base. Consider the binary system again:
x=x120+x221+x122+...xn2n1=i=1nxi2i1x = x_1 2^0 + x_2 2^1 + x_1 2^2 + ... x_n 2^{n-1} = \sum_{i=1}^{n} x_i 2^{i-1}
Denoting the powers of 2i12^{i-1} as bib_i allows the product to be represented as:
x=i=1nxibix = \sum_{i=1}^{n} x_i b_i
Where bi=2i1b_i = 2^{i-1}. For the case of the binary system, this means that the nthn^{th} term is:
bn=i=0n1aib_n = \prod_{i=0}^{n-1} a_i
Where we define a0=1a_0 = 1 to make b1=1b_1 = 1. In this way, we can represent a given integer xx using a non-constant base system, where the base of a given digit ii is aia_i. Using this, we can then represent the coefficients xnx_n of the number x=i=1nxnbnx = \sum_{i=1}^{n} x_n b_n by computing:
xn=(x  mod  bn)  /  anx_n = (x \;mod\; b_n) \;/\; a_n Counting with direction numbers
Consider the recurrence relation 3.6 of [1]. To be able to construct a given Sobol sequence using direction numbers vi,jv_{i,j} (where ii denotes the position of the direction number in the dimensions sequence, and jj denotes the dimension), we need a total of sjs_j initialization direction numbers: v0,j,v1,j,...vsj,jv_{0,j}, v_{1,j}, ... v_{s_j,j}. These direction numbers can be obtained from their integer representation mi,j=vi,j2im_{i,j} = v_{i,j} 2^i. For a given position ii, the corresponding direction number mi,jm_{i,j} must satisfy:

  • mi,j1m_{i,j} \geq 1

  • mi,j2im_{i,j} \leq 2^i

  • mi,jm_{i,j} is odd.

Thus, for the position ii, there are 2i12^{i-1} possible direction numbers mi,jm_{i,j}. The number of coefficients aia_i is therefore calculated using the following equation:
ai=2i a_i = 2^{i}
Hence, we can view the generation of direction numbers as decomposing a given number xx into a non-uniformly sized digit representation with the number of combinations per digit given by aia_i. Using this method, we can construct functions that map a given number xx onto a corresponding direction number list by treating each of the sjs_j direction numbers as a digit.

Determinants in GF(2)

A Galois field consists of only two elements {0,1}\{0, 1\}, and defines the following addition (++) and multiplication (×\times) operators:

AA BB A+BA + B A×BA \times B \
0000
0110
1010
1101

While this appears to be a trivial modification to standard addition and multiplication defined on the real numbers, the consequences of this for the calculation of determinants in high dimensions and on computers are profound. The first of these is the ability to use one bit per matrix element giving a ×64\times 64 per-element memory reduction. This allows us to represent a ×8\times 8 larger matrix in GF(2)GF(2) using the same memory, compared to a 64-bit floating point number. The second set of consequences relates to the algorithms for calculating determinants; we must modify these to account for the new arithmetic. In python, this can be done by casting numpy matrices to contain elements in GF(2) using the `galois` python library. There are other aspects of GF(2) matrices that are more mathematical. The first of these is that in GF(2) there are vectors of elements which are self-orthogonal, i.e. vi,vi=0\langle v_i,v_i \rangle = 0 when vi0v_i \neq 0. To illustrate, consider the following:
[11][11]=1×1+1×1=1+1=0 \begin{bmatrix} 1 \\ 1 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 1 \end{bmatrix} = 1 \times 1 + 1 \times 1 = 1 + 1 = 0
In addition, for a random n×nn \times n matrix of elements, the probability that the matrix is non-singular (has a determinant = 1) is given by [24]:
i=1i=n12i1n \prod_{i=1}^{i=n} 1 - 2^{i-1-n}
Taking the limit as nn\rightarrow \infty, the probability of finding a non-singular matrix at random tends to 0.288780.28878 [24]. This contrasts with the case for random matrices with elements in R\mathbb{R}, where the probability of finding a singular matrix tends to 00; high dimensional random vectors become approximately orthogonal. Thus, the problem of finding a non-singular matrix is comparatively harder for GF(2) than for R\mathbb{R}. However, it may still be possible to find a non-singular matrix within a tractable number of attempts.

3 Randomized Quasi Monte-Carlo

Low discrepancy sequences (LDS), unlike random numbers, are deterministic and thus unable to provide a standard error on the estimate due to the non-applicability of the CLT. This is problematic as we are then unable to quantify the uncertainty around the estimate. Additionally, problems might arise when a `bad` number of points in a given sequence is used. While we can simply add a random number to each point of the sequence, this would increase the discrepancy of the resulting sequence. Hence, the challenge is to restore `randomness` (for the purposes of applying the CLT) without sacrificing the low discrepancy of the sequences (and by proxy - the convergence rate of the estimator). This can be achieved with `Randomized Quasi Monte-Carlo` where we apply a `scrambling` operation to the low discrepancy sequences. A good scrambling operation (defined by the `seed` parameter) will map the input LDS to another LDS with sufficient `randomness` to allow the computation of a standard error estimate. This is because scrambling the sequence with different seeds allows the application of CLT. Moreover, a good scrambling algorithm may also help reduce the non-uniformity of the Sobol sequences.

fig6a/
fig6b/
Figure 4: Effect of scrambling on two dimensions of sequences generated using the new-joe-kuo-6.21201 direction numbers.

Cranley-Patternson rotation

The simplest method for scrambling a low discrepancy is simply adding a uniform random number separately to each of the d-dimensions. This is known as a Cranley-Patternson (CP) rotation [23].
xij=zij+uj  mod  1 x_{ij} = z_{ij} + u_j\; mod\; 1
The rotation preserves the low discrepancy nature of a sequence (Equation \ref{eq:cranley_patterson_discrepancy}), but does not eliminate pathological non-uniform regions in the sequences; it merely shifts them. As the shifts are continuous, the CP rotation also does not preserve the stratification properties of the digital sequences achieved by subdividing the unit hypercube.
Dn(x1,...,xn)2dDn(a1,...,an) D_n(x_1,...,x_n) \leq 2^d D_n(a_1,...,a_n)

Digital scrambling

Digital scrambling is a similar operation to the CP rotation described above. However, here we work with the integer representation of the low discrepancy sequences, performing instead a summation in base-b between values in each dimension of the sequence and the uniform number:
xij=zijuj x_{ij} = z_{ij} \oplus u_j
Representing both the low discrepancy value zz and the scrambling value uu in base-b where an arbitrary value yy can be written as:
y=k=0pykbk1 y = \sum_{k=0}^p y_k b^{-k-1}
The per-digit scrambling operation can then be expressed as:
xk=zk+uk  mod  b x_k = z_k + u_k \; mod \; b
For the case of Sobol sequences, the base-2 operation summation is the XOR making it fast to compute on modern computers. Similar to the CP rotation, a key downside of digital shift scrambling is that they do not randomize the sequence sufficiently, and thus do not achieve a significant improvement of estimate variance properties.

Owen`s scrambling

A more powerful approach to randomization is Owen`s scrambling, also known as nested uniform scrambling. Like the previous scrambling methods, Owen scrambling applies to each dimension separately. Within each dimension the nested uniform scrambling can be viewed as an extension of the digital scrambling where the bits of a number are XOR-ed with bits of a scrambling sequences. However, unlike digital scrambling where the scrambling sequence is constant, the scrambling sequence is sampled in a path-dependent way from a set of random bits generated for the scrambling. To illustrate with an example, consider the following binary sequences of length m=4m=4:
z1=0110z2=0010\begin{align} \color{blue}{z_1 = 0110} \\ \color{violet}{z_2 = 0010}\end{align}
Generation of scrambling sequences for these requires a 2m1=152^m - 1 = 15 random values which can be interpreted as forming a binary tree, as shown in Figure [5]. Given this binary tree, we can generate scrambling sequences for z1\color{blue}{z_1} and z2\color{violet}{z_2} by traversing the binary tree, taking the left branch when we encounter a 00 in the value of z1\color{blue}{z_1} (or z2\color{violet}{z_2}) and right when we encounter a 11. Examples of this sort of tree traversal are shown in Figure [6].

fig7/
Figure 5: An interpretation of a linear array as a binary tree by suitable indexing into the array.

The scrambling sequences generated by traversing the binary tree for z1\color{blue}{z_1} and z2\color{violet}{z_2} are:
u1=0101u2=0111\begin{align} \color{blue}{u_1 = 0101}\\ \color{violet}{u_2 = 0111}\end{align}
The path through the binary tree used to obtain these scrambling sequences is shown in Figure [6] which highlights the nodes of the binary tree that make up the digits of each value of uu. The edges of the binary tree are annotated with the digits of the scrambled value (z1\color{blue}{z_1}/z2\color{violet}{z_2}) to show how this path through the tree was obtained.

fig8/
Figure 6: Generating a value-dep

Having generated the scrambling sequences we can now calculate the resulting scrambled values using digital scrambling:
x1=z1u1=01100101=0011x2=z2u2=00100111=0101\begin{align} \color{blue}{x_1 = z_1 \oplus u_1 = 0110 \oplus 0101 = 0011}\\ \color{violet}{x_2 = z_2 \oplus u_2 = 0010 \oplus 0111 = 0101} \end{align}
Formally, nested uniform scrambling can be viewed as a path-dependent permutation of the intervals. In this way, the scrambling rearranges the sub-intervals, but does not change the number of points in each sub-interval, thus preserving the uniformity properties. In this context, the source of randomness is the binary tree - defining the permutations to be performed. A key limitation of Owen`s scrambling is that for a d-dimensional Sobol sequence with a scrambling depth of mm in base-b, the scrambling requires O(d×bm)O(d\times b^m) values in order to store the binary trees for each dimension. Thus, scrambling 32-bit Sobol sequences requires 512\approx 512 MiB/dimension. Practical techniques for reducing memory usage are to limit the scrambling depth mm to 32\le 32 bits and apply a digital scramble to the remaining digits. An alternative is to approximate a binary tree with a hash function to scramble the bits.

fig18/
Figure 7: Owen scrambling of a single dimension for N=48N=48 points where the initial unscrambled sequence forms the x-coordinate while the result of scrambling in the y-coordinate.

Hash-based Owen scrambling

The hash-based Owen scrambling replaces an explicit binary tree used to calculate the scrambling bit with an implicit traversal - a hash function. Here, instead of traversing down the tree we note that, for a given bit, the previous bits form a path or address of the next random bit to XOR with the bit being scrambled. Thus, to scramble a given bit aia_i of z=0.z1z2z3...zkz = 0.z_1z_2z_3...z_k, we use the preceding bits: vi=z1z2...zi1v_i = z_1z_2...z_{i-1} we can treat the binary tree as a scrambling function into which we pass this address. Replacing the binary tree by a hash-function we can get the effect - the next random bit. To emulate randomness, we replace the RNG used to generate the binary tree with a random seed, also passed into the hash function. Hence, to compute the random bit uiu_i that is then XOR`ed with the input bit to generate the scrambling, we perform:
xi=ziuiui=hash(vi,seed)=hash(z1z2...zi1,seed)\begin{aligned} x_i &= z_i \oplus u_i \\ u_i &= hash(v_i,seed) = hash(z_1z_2...z_{i-1}, seed) \end{aligned}
A naive implementation would iteratively apply this formula to perform a nested scrambling. However, applying a per-bit hash function is computationally expensive. An alternative approach described in [8] and based on the Laine-Karras (LK) hash function [9]. This hash function approximates the set of bit-wise hash calculations and vector XOR operations in one go, improving efficiency. Finally, the LK-hash must be applied to Sobol values in reverse order starting from the least significant bit. With this, the LK-hashing becomes a good approximation to Owen scrambling, permuting the elementary intervals and retaining the good stratification properties \\cite{building_better_lk_hash}.

fig17/
Figure 8: Effect of shuffling on the discrepancy of a 5-dimensional Sobol sequence showing that while shuffling is effective at eliminating bad two dimensional projections it also degrades the discrepancy scaling of the sequences, making it similar to that of the pcg64 pseudo-random generator.

Shuffling

While scrambling acts on the values of a points in the sequence, shuffling acts on the index (the order), reordering them without changing the values. This approach, described in [8], is aimed solely at reducing the correlations between dimensions, as opposed to scrambling which aims to improve sequence properties. While [8] described a `nested uniform shuffle`, our implementation utilizes a random shuffling to remove the correlations between individual one-dimensional Sobol sequences within the multidimensional sequence. As the one-dimensional points remain unchanged by the shuffling, the distribution of the points remains unchanged. However, as shown in Figure [8], the shuffling degrades the scaling of discrepancy with increasing point count for higher dimensions. The relation, on a log-log scale, between the centered discrepancy and number of points, nn, is described by the formula:
log10(CDn)=αlog10(n)+β log_{10}(CD_n) = \alpha log_{10}(n) + \beta
Prior to shuffling, the gradient, α\alpha, of the tkrg-a-ap5 and new-joe-kuo-6.21201 were 1.63-1.63 and 1.67-1.67 respectively. Meanwhile after shuffling, the new values of α\alpha are 1.03-1.03 and 1.05-1.05, significantly closer to that of the pcg64 generator, at 1.01-1.01. This acts to illustrate that by shuffling the sequence we can decorrelate the individual dimensions, but at the cost of reducing the high dimensional equidistribution properties. An example of how scrambling and shuffling influence the two dimensional projections is illustrated in Figure [9] which illustrates that we can decorrelate individual dimensions by shuffling and scrambling, and thus achieve both randomization and decorrelation by combining the techniques.

fig19/
Figure 9: Effects of shuffling, scrambling on two dimensional projections of a multidimensional Sobol sequence for the first 256 points of the new-joe-kuo-6.21201 sequence. The top left plot shows projections of the `Original`, unscrambled and unshuffled multidimensional Sobol sequence onto a two dimensions, where we are selecting only the 1st1^{st} (x-axis) and 90th90^{th} (y-axis) dimensions. As evident from the plot the two dimensional projection suffers from poor equidistribution on unit square. Randomly shuffling the sequence index produces the `Shuffled` plot, while applying Hash-based Owen scrambling produces the `Scrambled` plot. Combining the two techniques results in the `Scrambled & Shuffled` plot.

4 Experimental results

To evaluate the quality of the directed numbers generated by the search procedure outlined above, we compare the performance of the generated Sobol sequences with the new-joe-kuo-6.21201 direction numbers described in [22],[4], using pseudo-random sequences generated by the pcg64 [10] as a baseline. In the following section, we present the key findings of our benchmarking, while the full set of test integrals and discrepancy measures is included in Appendix [C]. We note that low discrepancy sequences are expected to outperform pseudo-random numbers in the case when the test problem has a low `effective` dimension (distinguishing it from the `nominal dimension` of a problem - the number of terms in the integral). Thus, we plot a mean dimension of the test problem alongside the test metric where applicable to allow the reader to verify the suitability of applying low discrepancy sequences. Discussion on mean dimension alongside the relevant formulae are presented in the Appendix [A].

Sequence: new-joe-kuo-6.21201 sequence [4]

The approach employed by Joe and Kuo for searching for new direction numbers accounts for the fact that property A is not sufficient to prevent correlations between pairs of dimensions (we denote dimension as dd). Joe and Kuo, therefore, take advance of the fact that a Sobol sequence can be viewed as a (t,d)(t, d)-sequence in base 2, where every block of 2m2^m points forms a (t,m,d)(t, m, d)-net. Hence, the tt-value, given by ,t(j,d;m)=j=1d(sj1)t(j, d; m) = \sum^d_{j=1}(s_j - 1), can measure the uniformity of point sets; smaller tt-values lead to finer partitions, and hence more uniformly distributed points. Using the aforementioned concepts, Joe and Kuo adopted the following method:

  • Search for direction numbers that satisfy property A for dimension 1,111 by assigning polynomials for each dimension up to degree 13.

  • After 1,111 dimensions, search for direction numbers that optimize the t-values of the two-dimensional projections, reaching 21,201 dimensions.

Joe and Kuo made the following assumptions when defining their search criterion: direction numbers were already chosen up to dimension d1d-1 and that mm is given and fixed. They would thus aim to choose direction numbers for dimension dd so as to minimize the t-value.

Sequence: tkrg-a-ap5 (this work)

Similarly to [1], our search strategy to generate tkrg-a-ap5 was to search for Sobol sequences satisfying the following properties:

  • Property A for all 195,000 dimensions (Ak=195,000A_{k=195,000})

  • Property A` for 5 adjacent dimensions (Ak=5A`_{k=5})

The reason for choosing to search only for Ak=195,000A_{k=195,000} is Lemma 2 of [1] which states that it is not possible to construct a Sobol sequence for d4d\geq 4 satisfying AkA_k for all kdk\leq d. In order to obtain Sobol sequences which satisfy properties A (Ak=195,000A_{k=195,000}) and A` (Ak=5A`_{k=5}), we needed to find sets of direction numbers whose digits satisfy the equations \ref{eq:property_A} and \ref{eq:property_Ap}. In [1], the suggested proposal appeared to formulate this as a set of constraints, in effect transforming this search into a satisfiability problem - solvable by automated theorem provers. A potential benefit of this approach is to generate all possible direction numbers satisfying the given constraints. However, such boolean satisfiability is in general NP-hard. Instead of this the key to our approach is the efficient calculation of the determinants of high-dimensional matrices in the Galois field GF(2) (Section [Determinants in GF(2)]).
One of the factors that could affect the uniformity properties of tkrg-a-ap5 is the correlation between dimensions. To investigate the effect of the numper of sampled points in the sequence, we compare the correlation coefficient histograms in two dimenions for the first 1,000 ([10]a) and 10,000 ([10]b) dimensions of the new-joe-kuo-6.211201 and the tkrg-a-ap5 sequences. We show the heat maps of both sequences in Figure [10]. In both cases, we observe that as the number of points `n` increase, the distribution of correlation coefficients shifts towards zero. This is most apparent for the new-joe-kuo-6.211201 direction numbers in the case of 1,000 dimensions. The figures also help to illustrate that choosing an `n` arbitrarily, instead an integer power of 2, produced unbalanced point sets - increasing the number of highly correlated dimensions unnecessarily.

correlation_evolution1000/
(a) A logarithmic-scale heatmap of correlation coefficients for 1,000 dimensions.
correlation_evolution10000/
(b) A logarithmic-scale heatmap of correlation coefficients for 10,000 dimensions.
Figure 10: A logarithmic-scale (log10) heatmap of correlation coefficients for 1,000 (a) and 10,000 (b) dimensions is plotted for the tkrg-a-ap5 and new-joe-kuo-6.21201 sequences. The heatmap illustrates that increasing the number of points reduces the the number dimension pairs with a high correlation coefficient. For the first 1,000 dimensions we see that the new-joe-kuo-6.21201 sequence exhibits better correlation decay, while for the 10,000 dimension case - tkrg-a-ap5 has an advantage, particularly for high (0.7\geq 0.7) correlation values. The plot also underscores the need to choose the number of points `n` to be a power of 2 - otherwise the the `unbalanced` point sets produces high correlation between dimensions.

Spurious variance

Here, we examine the spurious variance of a multidimensional Sobol sequence. This quantity is a measure of the the average pairwise correlation between dimensions of a Sobol sequence (see Appendix Section [B] for full description) and is given by:
ρs=1+1di=1djidρij \rho_s = 1 + \frac{1}{d}\sum_{i=1}^{d}\sum_{j \neq i}^d \rho_{ij}
Plotting ρS1\rho_S - 1 against dimensions dd in Figure [11]a, we observe that the tkrg-a-ap5 direction numbers exhibit substantially less deviation from zero than the new-joe-kuo-6.21201 sequences. This may be attributed to the increased uniformity of the tkrg-a-ap5 sequence that gives rise to a more symmetric distribution of correlation coefficients, in turn resulting in the maximum deviation of the spurious variance being smaller. In contrast, the new-joe-kuo-6.21201 sequences have consecutive sets of correlation coefficients whose distribution is highly asymmetric (around zero), thus resulting in the spurious variance drifting away from zero until approximately d1,000d\approx1,000. This behaviour of having highly biased distributions of correlation coefficients between dimensions appears to be a feature of the new-joe-kuo-6.21201, as other sequences such as sobol-levitan-lemieux or jaeckel do not exhibit such deviations; such a structure may have arisen as a result of the employed direction number search strategy used in [4] to generate the sequence.
An effective strategy to decorrelate and make symmetric the correlation coefficient distribution is to scramble the underlying low discrepancy sequence, before recomputing the spurious variance for each scrambled sequence. The effect of this is shown in Figure [11]b where the mean, obtained by averaging 30 spurious variance traces, of both the tkrg-a-ap5 and new-joe-kuo-6.21201 are shown - with the latter no longer exhibiting the large drift away from zero, present in Figure [11]a.

fig9a/
fig9b/
Figure 11: a) Spurious variance component plotted against the number of dimensions for 1,024 simulations for several low discrepancy sequence generators, where the tkrg-a-ap5 does not exhibit the large spurious variance component manifesting in both new-joe-kuo sequences. b) Shows the effect of using Owen`s scrambling and sequence shuffling to reduce the spurious variance component and showing the standard deviation of the spurious variance estimate for the tkrg-a-ap5 sequence.

Integral convergence tests

A second method for assessing quality of the low discrepancy sequences is to evaluate numerical integrals, whose value can be calculated analytically, and comparing the results obtained (see Appendix [C] for a full comparison of test integrals). Here, one considers the absolute error between the numeric and analytic result against the number of points used, as well as other parameters like dimension of the integral or other terms in its definition that can affect its difficulty. Unlike with Monte-Carlo, in the case of QMC we cannot rely on the central limit theorem, and thus must know analytic results to measure convergence. Using randomization (RQMC), we can attempt to recover the ability to estimate the error using the CLT. However, we must compute multiple scrambled versions of the sequence - increasing the computation time.
Consider first the integral I1I_{1} below. Its integrand is a product of individual dimensions xix_i, with dd being the nominal integral dimension and cic_i being a constant term, parameterized by the index of the dimension ci=ikc_i = i^k. We can scale the integrals difficulty, its effective dimension (see Appendix [A]), by setting cic_i which determines the size of the denominator relative to the numerator of each of the product terms.
[0,1]di=1d4xi2+ci1+cidxi \int_{[0,1]^d} \prod_{i=1}^d \frac{|4x_i - 2 | + c_i}{1+c_i} dx_i
In [21], the authors discuss three variants for cic_i, a constant ci=1c_i=1, a quadratic ci=i2c_i = i^2 and a partial power dependence ci=i1/3c_i = i^{1/3}. In all cases the analytic solution is I1=1I_{1}=1. However, the integrals have different effective dimensions, with the quadratic scaling ci=i2c_i = i^2 having the lowest effective dimension for a given actual dimension dd. In Figure [12], we consider the first case, with ci=1c_i = 1 for d=50d=50, where we show that the error of QMC does not appear to decay at a faster rate that of MC, represented by the pcg64 line. This may be attributed to the form of the product terms, where each term contributes equally to the effective dimension - albeit with a scaling coefficient of 0.06\sim 0.06, meaning that each nominal dimension contributes only slightly to the effective dimensionality. Of the three cases considered, convergence benefits of low discrepancy sequences for test integral I1I_1 are observed only for the case of ci=i2c_i = i^2, where the effective dimension tends to 1.016\sim 1.016 as the nominal dimension dd \rightarrow \infty.

sobol1_test_integral/
(a) Absolute numeric integration error for I1I_1.
sobol1_mean_effective_dimension/
(b) Mean effective (superposition) dimension for I1I_1.
Figure 12: a) Shows the absolute error of the numeric estimate for I1I_1 with ci=1c_i = 1 for all ii with d=50d=50. We see that for this integral there is little benefit to using Sobol sequences as the pseudo-random pcg64 appear to converge at the same rate as both of the low discrepancy sequences b) The effective dimension (multiplicative superposition sense) for the integral I1I_1 with ci=1c_i = 1. The effective and nominal dimension relationship is approximated linear with a gradient of 0.060.06, thus every nominal dimension contributes approximately 0.060.06 to the mean effective dimension.

A second test integral I2I_2, also considered in [21], is defined as:
[0,1]di=1di+2xii+1dxi \int_{[0,1]^d} \prod_{i=1}^d\frac{i + 2x_i}{i + 1} dx_i
Here, the decay of each nominal dimension to the mean effective dimension (in the multiplicative superposition sense) is similar to the case of I1I_1 with ci=ic_i = i thus the contribution of successive nominal dimensions decays to zero with increasing dimension index. Consequently the effective dimension for I2I_2 tends to an asymptotic value of 1.084\sim 1.084 as dd\rightarrow \infty, as shown in Figure [13b]. The result of this small value of effective dimension gives rise to the fast onset of the O(n1)O(n^{-1}) convergence rate, shown in Figure [13a].

sobol2_test_integral/
(a) Absolute numeric integration error for I2I_2.
sobol2_mean_effective_dimension/
(b) Mean effective (superposition) dimension for I2I_2.
Figure 13: a) Shows the absolute error of the numeric estimate for integral I2I_2. b) The effective dimension (multiplicative superposition) for the integral I2I_2 reaches 1.08\sim 1.08 in the limit as dd\rightarrow \infty with higher dimensions being dominated by index ii in both the numerator and denominator - resulting in an almost constant value of higher order terms in the product.}

Discrepancy measures

Figure [14] shows plots of centred discrepancy versus number of points for the tkrg-a-ap5, new-joe-kuo-6.21201 and pseudorandom sequences at dimensions 5, 10 and 30. For all dimensions and sample sizes, the performance of tkrg-a-ap5 and new-joe-kuo-6.21201 are similar. The pseudorandom sequences are the worst performing on dimensions 5 and 10, while the convergence rate of the pseudorandom sequence at dimension 20 is similar to that of tkrg-a-ap5 and new-joe-kuo-6.21201.

d5/
(a) Centred discrepancy for d=5d=5.
d10/
(b) Centred discrepancy for d=10d=10.
d30/
(c) Centred discrepancy for d=30d=30.
Figure 14: Centred discrepancy for dimensions a) d=5 b) d=10 c) d=30. In all cases the plots show an expected reduction of discrepancy with increasing point count with the pseudo-random pcg64 performing worse that the low discrepancy sequences for d=5 and d=10. However for d = 30 this difference appears smaller. This may be attributed to the difficulty in estimating the discrepancy as well as the number of points required to see improvements in discrepancy increases with increasing dimension, as described by Owen.

5 Conclusion

In this report, we reviewed the basic theory of low discrepancy sequences and the conditions under which such they can be applied to Monte-Carlo numeric integration. When applied to suitable problems, the Quasi-Monte Carlo methods estimator convergence rates of O(n1)O(n^{-1}), converging faster than conventional Monte-Carlo which exhibits an O(n1/2)O(n^{-1/2}) rate. We then considered Sobol sequences which are base-2 low discrepancy sequences, defined using recurrence relations from an initial set of direction numbers and primitive polynomials. We showed that when sampling, it is critical to keep the number of points in a sequence a power of 2, required to maintain `balanced` sequences, in maintaining the improved convergence rate and minimizing bad two-dimensional projections quantified by computing correlation coefficients. We then presented properties A and A`, relating them to properties of the direction numbers and outlined a strategy to search for such direction numbers by performing determinant calculations in GF(2) - allowing us to generate the tkrg-a-ap5 sequence. Following this, we covered Randomized QMC introducing three different scrambling techniques followed by presenting a comprehensive set of benchmarks for evaluating LDS sequences in both low and high dimension.

References
[1]
Ilya M. Sobol and D. I. Asotsky and Alexander Kreinin and Sergei S. Kucherenko. Construction and Comparison of High-Dimensional Sobol` Generators, pages 64-79. Wilmott, 2011.
[2]
S. Kucherenko and D. Asotsky. Comparison of different Sobol` Sequence Generators, pages . , .
[3]
Kocis, Ladislav and Whiten, William J.. Computational Investigations of Low-Discrepancy Sequences, pages 266–294. ACM Trans. Math. Softw.. Association for Computing Machinery, 1997.
[4]
Stephen Joe and Frances Y. Kuo. Constructing Sobol Sequences with Better Two-Dimensional Projections, pages 2635-2654. SIAM J. Sci. Comput., 2008.
[5]
Emanouil I. Atanassov and Sofiya Ivanovska and Aneta Karaivanova. Optimization of the Direction Numbers of the Sobol Sequences, 2019.
[6]
Joe, Stephen and Kuo, Frances Y.. Remark on Algorithm 659: Implementing Sobol`s Quasirandom Sequence Generator, pages 49–57. ACM Trans. Math. Softw.. Association for Computing Machinery, 2003.
[7]
B. Owen. THE DIMENSION DISTRIBUTION AND QUADRATURE TEST FUNCTIONS Art, 2003.
[8]
Burley, Brent. Practical hash-based Owen scrambling, pages 29. Journal of Computer Graphics Techniques (JCGT), 2020.
[9]
Laine, Samuli and Karras, Tero. Stratified sampling for stochastic transparency, pages 1197--1204, 2011.
[10]
O’neill, Melissa E. PCG: A family of simple fast space-efficient statistically good algorithms for random number generation. ACM Transactions on Mathematical Software, 2014.
[11]
Dick, Josef and Kuo, Frances Y and Sloan, Ian H. High-dimensional integration: the quasi-Monte Carlo way, pages 133--288. Acta Numerica. Cambridge University Press, 2013.
[12]
Stormark, Kristian. An introduction to Quasi Monte Carlo methods. Project-Fall, 2005.
[13]
Matsumoto, Makoto and Nishimura, Takuji. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator, pages 3--30. ACM Transactions on Modeling and Computer Simulation (TOMACS). ACM New York, NY, USA, 1998.
[14]
Gnewuch, Michael and Srivastav, Anand and Winzen, Carola. Finding optimal volume subintervals with k points and calculating the star discrepancy are NP-hard problems, pages 115--127. Journal of Complexity. Elsevier, 2009.
[15]
Chen, William and Srivastav, Anand and Travaglini, Giancarlo and others. A panorama of discrepancy theory. Springer, 2014.
[16]
Sobol`, Ilya M and Asotsky, Danil and Kreinin, Alexander and Kucherenko, Sergei. Construction and comparison of high-dimensional Sobol`generators, pages 64--79. Wilmott. Wiley Online Library, 2011.
[17]
Sobol`, Il`ya Meerovich. On the distribution of points in a cube and the approximate evaluation of integrals, pages 784--802. Zhurnal Vychislitel`noi Matematiki i Matematicheskoi Fiziki. Russian Academy of Sciences, Branch of Mathematical Sciences, 1967.
[18]
Genz, Alan. Testing Multidimensional Integration Routines, pages 81–94. Elsevier North-Holland, Inc., 1984.
[19]
Chen, William and Srivastav, Anand and Travaglini, Giancarlo. A Panorama of Discrepancy Theory, pages , 2014.
[20]
L’Ecuyer, Pierre. Randomized Quasi-Monte Carlo: An Introduction for Practitioners, pages 29-52, 2018.
[21]
Joe, Stephen and Kuo, Frances Y. Remark on algorithm 659: Implementing Sobol`s quasirandom sequence generator, pages 49--57. ACM Transactions on Mathematical Software (TOMS). ACM New York, NY, USA, 2003.
[22]
Joe, Stephen and Kuo, Frances Y. Constructing Sobol sequences with better two-dimensional projections, pages 2635--2654. SIAM Journal on Scientific Computing. SIAM, 2008.
[23]
Art B. Owen. Monte Carlo theory, methods and examples. \url{https://artowen.su.domains/mc/}, 2013.
[24]
Bard, Gregory. Algebraic cryptanalysis. Springer Science & Business Media, 2009.

A Effective dimension

While the `nominal` dimension of Ω\Omega (the integration domain) can give an indication of the difficulty of the integration problem, some integration problems may have a `effective` dimension that is significantly lower this nominal value. In this case, the majority of the contribution lies in a lower dimensional manifold, sampling from which we can converge to the target solution at a rate substantially faster than predicted by the nominal dimension. This class of problems is the one most suitable for QMC integration. Calculating an `effective` dimension in practice is challenging, but analytic equations exist for integrands that have a certain functional forms. Two forms, as discussed in [7] are given below. Example of function with these forms are shown in [3].
In the subsequent sections, we use the following terms:

  • ss: the dimensionality of the function;

  • gjg_j: a subfunction applied the jjth dimension;

  • xjx_j: the jjth element of a given random sample.

Additive effective dimension

Integrands in the additive form can be represented as a sum of simpler terms:
f(x)=j=1sgj(xj) f(x) = \sum^s_{j=1}g_j(x_j)
The mean effective dimension of the additive function in the superposition sense is one:
dsup=1 d_{sup} = 1
The mean effective dimension of the additive function in the truncation sense is given by:
dtrunc=1σ2j=1sjγj2 d_{trunc} = \frac{1}{\sigma^2}\sum^s_{j=1}j\gamma^2_j

Multiplicative effective dimension

Integrands in the multiplicative form can be represented as a product of simpler terms as follows:
f(x)=j=1sgj(xj) f(x) = \prod^s_{j=1}g_j(x^j)
The mean effective dimension of the multiplicative function in the superposition sense is:
dsup=j=1sγj2/(γj2+μj2)1j=1sμj2/(γj2+μj2) d_{sup} = \frac{\sum^s_{j=1}\gamma^2_j/(\gamma^2_j + \mu^2_j)}{1 - \prod^s_{j=1}\mu^2_j/(\gamma^2_j + \mu^2_j)}
The mean effective dimension in the truncation sense is not defined.

B Spurious variance

In this section, we describe the `spurious variance` test used to evaluate quality of the low discrepancy sequences. In it we consider independent and identically distributed random variables XiNormal(0,1)X_i \sim Normal(0,1). Performing a summation over dd sum variables, we obtain a new random variable YY:
Y=i=1dXi Y = \sum_{i=1}^d X_i
In the case when the random variables XiX_i satisfy the IID assumption, the resulting variance of YY reduces to:
Var(Y)=i=1dVar(Xi)=dVar(Y) = \sum_{i=1}^d Var(X_i) = d
While this equality will not hold exactly for finite sample sizes, it should hold approximately and can thus be used to check whether the samples generated are approximately independent and identically distributed. These assumptions may not hold in practice when the underlying number generators exhibit pairwise correlations between dimensions of the random numbers. Such correlations will give rise to an additional unwanted or `spurious` variance term - causing the sum of the random variable samples to deviate from the theoretically predicted value. Thus, by calculating this spurious variance term, we can assess the quality of multidimensional low discrepancy sequence due to the one-to-one mapping between random variables XiX_i and the dimensions ii of the low discrepancy sequence.

Variance of d-dimensional standard normal distribution

Generalizing the above two variable sum to dd variables, and applying the bilinearity of covariance:
Cov[iaiXi,jbjZj]=ijaibjCov[Xi,Zj]Cov\left[\sum_{i}a_iX_i, \sum_j b_j Z_j\right] = \sum_i \sum_j a_i b_j Cov[X_i,Z_j]
where aia_i and bjb_j are scaling factors, while XiX_i and ZiZ_i are random variables drawn from a Gaussian distribution.
We can obtain the following expression for the variance of the sum YY:
Var(Y)=d+i=1djidρij Var(Y) = d + \sum_{i=1}^{d}\sum_{j \neq i}^d \rho_{ij}
Here, the second term is the pairwise sum of correlation coefficients running over all unequal indices jij\neq i, where again we have applied the reductions σi=1\sigma_i = 1 for all i[1,,d]i\in [1,\cdots, d]. In the case when the individual random variables XiX_i are independent and identically distributed the correlations, hoij=0 ho_{ij} = 0 for all iji \neq j. However, when the two dimension projections of the sequences exhibit non-zero correlations, the resulting correlations directly impact the variance Var(Y)Var(Y), as Cov(xi,xj)0Cov(x_i, x_j)\neq0. Thus, by computing variance of YY and measuring its deviation from the ideal case when ρij=0\rho_{ij} = 0 for iji \neq j, we can detect problematic two dimensional projections in a d-dimensional sequence.

C Test integrals

One way of comparing low discrepancy sequences is by estimating the error of various integrals and comparing these estimates to the analytic solutions of the same integrals. Given that the analytic solution of the test integral is known, the approximation errors and its rate of convergence (or lack thereof) can be used to assess the quality of the low discrepancy sequence, or other MC/QMC approach. In general the rate of convergence will depend both the sample generation method and the integral being estimated. By plotting the estimate error with increasing number of samples on a log-log plot and extracting the gradient one can then estimate the convergence rate. This is the topic of this appendix

Classes of test integrals

The first set of integrals we examine are those that can be categorised as either additive or multiplicative. These categories refer to the means of decomposing the integrals: either as a summation or a product of several sub-integrals (one sub-integral per dimension). We provide the mathematical formulations of each category, as well as possible instantiations of the two classes in the following two tables. these integrals have analytic solutions against which the estimation error can be calculated. The sources of the test integrals are given in the right-most column of the second table. Additionally, dd is defined as the number of dimensions, gjg_j is the sub-integrand for dimension jj and c(d)c(d) is the scaling factor.

Table 2: Possible classes of test integrals used to evaluate the quality of random number generators
Name Functional form Expected value Domain Reference
Additivec(d)j=1dgj(xj)c(d)\sum_{j=1}^d g_j(x_j)c(d)j=1d01gj(z)dzc(d)\sum_{j=1}^d\int^1_0 g_j(z)dz[0,1]d[0,1]^d[7]
Multiplicativec(d)Πj=1dgj(xj)c(d)\Pi_{j=1}^d g_j(x_j)c(d)Πj=1d01gj(z)dzc(d)\Pi{j=1}^d\int^1_0 g_j(z)dz[0,1]d[0,1]^d[7]

Name Integral class c(d)c(d) Functional form Parameter values Expected value Reference
hellekalek_functionMultiplicative1g(xj)=xjα1α+1g(x_j)=\frac{x_j^\alpha-1}{\alpha +1}α[1,3]\alpha\in[1,3]μ=(α(α+1)2)d\mu=(-\frac{\alpha}{(\alpha+1)^2})^d[7]
sobol_1Multiplicative1gj(xj)=4xj2+ajaj+1g_j(x_j) = \frac{|4x_j - 2| + a_j}{a_j + 1}aj0a_j\geq0μ=1\mu=1[7]
sobol_2Multiplicative1g(xj)=j+2xjj+1g(x_j) = \frac{j+2x_j}{j+1}-μ=1\mu=1[7]
owens_exampleMultiplicative12d/212^{d/2}g(xj)=(xj12)g(x_j) = (x_j - \frac{1}{2})-μ=0\mu=0[7]
roos_and_arnold_1Additive1d\frac{1}{d}g(xj)=4xj2g(x_j)=|4x_j-2|-μ=0\mu=0[7]
roos_and_arnold_2Multiplicative1g(xj)=4xj2g(x_j)=|4x_j-2|-μ=0\mu=0[7]
roos_and_arnold_3Multiplicative1g(xj)=π2sin(πxj)g(x_j)=\frac{\pi}{2} sin(\pi x_j)-μ=πd\mu=\pi^d[7]
genzs_exampleMultiplicative1g(xj)=(aj2+(xjuj)2)g(x_j) = (a_j^{-2}+(x_j-u_j)^2)ujU(0,1)aj>0\begin{aligned} u_j &\sim U(0,1)\\a_j&>0\\\end{aligned}μ=Πj=1d(aj2+13uj+uj2)\mu=\Pi_{j=1}^d (a_j^{-2}+\frac{1}{3}-u_j+u_j^2)[7]
subcube_volumeMultiplicative1g(xj)=H(axj)g(x_j) = H(a-x_j)H(x)H(x)= {1for x00for x0\begin{cases} 1 \text{for } x\geq 0\\0 \text{for } x \le 0 \end{cases}μ=(a12)d\mu=(a - \frac{1}{2})^d[1]
high_dim_1Multiplicative1g(xj)=1+0.01×(xj0.5)g(x_j)=1+0.01\times(x_j-0.5)-μ=1\mu=1[1]
high_dim_2Multiplicative1g(xj)=1+0.01j×(xj0.5)g(x_j)=1+\frac{0.01}{j}\times(x_j-0.5)-μ=1\mu=1[1]
high_dim_3Multiplicative1d+1\sqrt{\frac{1}{d+1}}g(xj)=xjλj1g(x_j) = x_j^{\lambda_j-1}λj=jj+1\lambda_j = \sqrt{\frac{j}{j+1}}μ=Πj=1d1λi\mu = \Pi_{j=1}^d\frac{1}{\lambda_i}[1]
high_dim_4Multiplicative1(d12)d\frac{1}{(d-\frac{1}{2})^d}g(xj)=dxjg(x_j) = d - x_j-μ=1\mu = 1[2]
joe_kuo_1Multiplicative1g(xj)=4xj2+j1/31+j1/3g(x_j) = \frac{|4x_j-2| + j^{1/3}}{1+j^{1/3}}-μ=1\mu=1[6]
lds_investigations_f1Additive12d\sqrt{\frac{12}{d}}g(xj)=xj12g(x_j) = x_j - \frac{1}{2}-μ=0\mu = 0[3]
lds_investigations_f2Additive454d\sqrt{\frac{45}{4d}}g(xj)=xj213g(x_j) = x_j^2 - \frac{1}{3}-μ=0\mu = 0[3]
lds_investigations_f3Additive18d\sqrt{\frac{18}{d}}g(xj)=xj1/223g(x_j)= x_j^{1/2} - \frac{2}{3}-μ=0\mu = 0[3]
lds_investigations_f4Multiplicative1g(xj)g(x_j)={1for xj<0.51for xj0.5\begin{cases} -1 &\text{for } x_j<0.5\\1 &\text{for } x_j\geq0.5 \end{cases}-μ=0\mu = 0[3]
lds_investigations_f5Multiplicative1g(xj)=K×exp(30xj15)1exp(30xj15)+1g(x_j) = K \times \frac{\exp(30 x_j-15)-1}{\exp(30 x_j-15)+1}K=(15exp(15)+1513exp(15)+17)1/2K = \left(\frac{15 exp(15)+15}{13 exp(15)+17}\right)^{1/2}μ=0\mu = 0[3]
lds_investigations_f6Multiplicative1g(xj)=2.47(xj1/2)+87(xj1/2)3g(x_j) = -2.4\sqrt{7}(x_j-1/2)+8\sqrt{7}(x_j-1/2)^3-μ=0\mu = 0[3]
lds_investigations_f7Multiplicative1g(xj)=23(xj1/2)g(x_j) = 2\sqrt{3}(x_j-1/2)-μ=0\mu = 0[3]
lds_investigations_f8Additive2d(d1)\sqrt{\frac{2}{d(d-1)}}f(xi)f(xj) where j>if(x_i)f(x_j)\text{ where }j>if(x)f(x)={1for x(1/6,4/6)0for x[1/6,4/6]1for x[0,1/6][4/6,1]\begin{cases} -1 \text{for } x \in (1/6,4/6)\\0 \text{for } x \in [1/6,4/6]\\1 \text{for } x \in [0,1/6]\cup [4/6,1]\\\end{cases}μ=0\mu = 0[3]
lds_investigations_f9Additive2d(d1)\sqrt{\frac{2}{d(d-1)}}f(xi)f(xj) where j>if(x_i)f(x_j)\text{ where }j>if(x)=27.20917094x336.19250850x2+8.983337562x+0.7702079855 f(x) = 27.20917094x^3 - 36.19250850 x^2\\+ 8.983337562 x + 0.7702079855μ=0\mu=0[3]
optimization_f1Additive4(d1)\frac{4}{(d-1)}g(xj)=xjxj+1g(x_j) = x_j x_{j+1}-μ=1\mu=1[5]
optimization_f2Additive8(d1)\frac{8}{(d-1)}g(xj)=xjxj+1xj+2g(x_j) = x_j x_{j+1} x_{j+2}-μ=1\mu=1[5]
optimization_f3Multiplicative1g(xj)=4min(xj,1xj)g(x_j)= 4\min(x_j,1-x_j)-μ=1\mu=1[5]
optimization_f4Multiplicative1g(xj)=exp(aj2(xjuj)2)g(x_j) = \exp(- a_j^2(x_j - u_j)^2)ujU(0,1)u_j\sim U(0,1) and aj>0a_j>0μ=j=1d(πerf(ajuj)+erf(aj(1uj))2aj)\mu = \prod^d_{j=1} \Bigg(\frac{\sqrt{\pi}\text{erf}(a_j u_j) + \text{erf}(a_j(1-u_j))}{2a_j}\Bigg)[5]
optimization_f5Multiplicative1g(xj)=exp(ajxjuj)g(x_j) = \exp(- a_j |x_j - u_j|)ujU(0,1)u_j\sim U(0,1) and aj>0a_j>0μ=j=1d(eajujajeaj(1uj)aj)\mu = \prod^d_{j=1}\Big(\frac{e^{a_j u_j}}{a_j} - \frac{e^{-a_j(1-u_j)}}{a_j} \Big)[5]

Non-Decomposable test integrals

discont_surface/
(a) 3D projection of surface of discontinuous function.
discont_estimate_errors/
(b) Absolute numeric estimate error for discontinuous function.
Figure 15: a) 3D projection of surface of discontinuous function. The discontinuity is induced by the indicator functions which results with the transition from 0 to exp(ax)exp(-ax) in the z-axis. b) Absolute estimate error for discontinuous function. The gradient of tkrg-a-ap5 is the greatest, while that of pcg64 is the lowest.

This section contains test integrals that are neither additive or multiplicative and whose integrands cannot be decomposed into sums or products of sub-functions for each nominal dimension. Nevertheless, such test integrals are also useful as decompositions of the integrands into sums or products is not guaranteed. Thus, in this section, we present non-decomposable integrals and discuss their characteristics. In addition to the absolute numeric estimate errors, we provide 3D projections of the surfaces where the difficulty of the integral can be assessed qualitatively. This is because there is no simple method of calculating the effective dimension, so the 3D projections act as a substitute.

atanassov_surface/
(a) 3D projection of surface of Atanassov function.
atanassov_estimate_errors/
(b) Absolute numeric estimate error for Atanassov function.
Figure 16: a) 3D projection of surface of Atanassov function. Even though the function cannot be decomposed, it is smooth and continuous. b) The smoothness and continuity allows the LDS generators to effectively compete with the pseudo-random generator. Both the tkrg-a-ap5 and new-joe-kuo-6.21201 errors form a trough during their decrease.

Genz`s discontinuous function

We first examine the function introduced by Genz [18]. This function, visualized in Figure [15]. This functions is of interest due to its discontinuity as this may prove challenging for numerical integration methods; the uniformity properties of the sequences become especially important for this function. Given random parameters uju_j (drawn from uniform distribution U(0,1)U(0,1)) and aja_j (aj>0a_j>0), the integrand is given by:
f(x)=exp(j=1dajxj)1x1>u11x2>u2 f(x) = \exp(-\sum_{j=1}^d a_j x_j)1_{x_1>u_1}1_{x_2>u_2}
For the purposes of illustration, both aja_j and uju_j are fixed at 0.5. The expected value of the discontinuous function when d=50d=50 is 1.201e-6 (3 d.p.). Figures [15a] and [15b] show the 3D surface projections and absolute numeric error estimates, respectively. There is a steady decrease in the error estimates observed for all three generators. Furthermore, the gradient of tkrg-a-ap5 is the greatest, while that of pcg64 is the lowest. The tkrg-a-ap5 errors are less noisy than the others, indicating that estimates of tkrg-a-ap5 are less sensitive to the discontinuity observed in Figure [15a].

Atanassov function

Next, we investigate the integrand provided by Atanassov et al. [5]. Although the function is smooth, as seen in Figure [16]), it cannot be decomposed into sub-integrands:
f(x)=(1+1d)(Πi=1dxi)1d f(x) = (1+\frac{1}{d})(\Pi^d_{i=1}x_i)^{\frac{1}{d}}
The expected value of the Atanassov function when d=50d=50 is 0.379 (3 d.p.). While the 1/d1/d power prevents this function from being decomposed, the smoothness and continuity allow for the successful numerical estimation of the integral (Figure [16a]). These features also allow the LDS generators to effectively compete with the pseudo-random generator.

fig16/
Figure 17: Value of cnc_n for each sample size NN. The cnc_n is lower for pseudorandom sequences than for LDS for sample sizes less than 10410^4. tkrg-a-ap5 has a lower cnc_n than new-joe-kuo-6.21201 for all sample sizes and has the lowest cnc_n overall for sample sizes greater that 10410^4.

Improper integrals

As noted in [16], using low discrepancy sequences allows one to evaluate improper integrals - those which
cannot be computed using conventional quadrature techniques owing to a singularity of the integrand at the origin x=0x=0. As the singularity occurs only at the point x=0x=0, we can obtain successively better approximations to the integrals values by using an increasing amount of points. As this point number increases, they fill the [0,1][0,1] interval with greater uniformity, shrinking the interval [0,cN][0,c_N] around the singularity - cNc_N represents the position of the closest point from x=0x=0 along any of the dimensions. The evolution of cNc_N with increasing NN can be used as a measure of uniformity of distribution of the points, defined by:
cN=min1kN(xk,1xk,d) c_N = \underset{1 \leq k \leq N}{min} (x_{k,1} \cdots x_{k,d})
Here, xkx_k is the kthk^{th} point in the d-dimensional sequence. In effect, cNc_N is the minimum of any dimensions away from zero. The cNc_N for different sample sizes NN are shown in Figure [17] for pcg64, new-joe-kuo-6.21201 and tkrg-a-ap5. It can be seen that larger sample sizes lead to lower values of cNc_N and thus more uniformly distributed sequences. While pcg64 dominates for sample sizes of less than 10410^4, tkrg-a-ap5 is the most uniformly distributed for greater sample sizes.

roos_arnold2_test_integral/
(a) Absolute numeric error for roos_and_arnold_2, d=50.
roos_arnold2_mean_effective_dimension/
(b) Mean effective (superposition) dimension for roos_and_arnold_2.
Figure 18: a) Shows the absolute error of the numeric estimate for the integral roos_and_arnold_2. The estimate errors associated with all three sequence generators increase with respect to sample size. The variance of pcg64 appears to be the greatest, while that of tkrg-a-ap5 is the smallest. The noisiness of estimate errors is likely because the test integral is a product of modulus functions which a non-smooth. b) The effective dimension (multiplicative superposition) for the integral roos_and_arnold_2. The effective and nominal dimension relationship is approximated linear with a gradient of 0.22, thus every nominal dimension contributes 0.22 to the mean effective dimension.

Experiments

In the following section, we compare the generated sequences on test integrals roos_and_arnold_2, roos_and_arnold_3 and joe_kuo_1 (Section [Classes of test integrals]). These are selected to illustrate the effect of integral properties like effective dimension and discontinuities on convergence of the estimators. All convergence plots are shown for the d=50d=50 versions of the integrals. The examples help illustrate that low discrepancy sequences may produce comparable or better results as well as exhibit different convergence properties to pseudo-randon sequences and that this depends not only on the sequences but also on the integral. In the first of these experiments, shown in Figure [18], we consider the roos_and_arnold_2 function:
f(x)=j=0j=d4xj2 f(x) = \prod_{j=0}^{j=d} |4x_j - 2|
The function is formed as a product of linear `v` type components for each dimension inducing a highly discontinuous function whose mass concentrates on the edges of the [0,1] domain. The discontinuities of the function concentrate the mass in progressively smaller regions in the domain as dimensionality increases, thus causing large jumps in the estimate when a point `gets lucky` and lands in a high value region, as shown in Figure [18a]. Another feature is that the effective dimension scales linearly with nominal dimension (as shown in Figure [18b]). As a result low discrepancy sequences (QMC) do not produce improved convergence when compared with pseudo-random sequences (MC). The second experiment uses the roos_and_arnold_3 (Figure [19]) function:
f(x)=j=0j=dπ2sin(πxj) f(x) = \prod_{j=0}^{j=d} \frac{\pi}{2} sin(\pi x_j)
Contrary to roos_and_arnold_2 this function concentrates its mass around the midpoint of the unit interval (=0.5) and does not contain sharp discontinuities. Consequently we see a smoother variation and a gradual reduction in estimator error. Similarly to the previous case the effective dimension also grows linearly with increasing nominal dimension but with a lower gradient (0.18 for roos_and_arnold_3 compared with 0.22 for roos_and_arnold_2). This linear increase in effective dimension can again be a potential justification for similar performance of the estimator with using QMC and MC. It should however be noted that for certain numbers of samples, nn the QMC estimate error is orders of magnitude smaller than the MC estimate, potentially explained by the sequence being `balanced` at that sample count (nn). Finally, joe_kuo_1 represents a similar function roos_and_arnold_3 but adds a weighting to each dimensions contribution to the integrals value by means of the j1/3j^{1/3} present in both numerator and denominator:

roos_arnold3_test_integral/
(a) Absolute numeric error for roos_and_arnold_3 d=50.
roos_arnold3_mean_effective_dimension/
(b) Mean effective (superposition) dimension for roos_and_arnold_3.
Figure 19: a) Shows the absolute error of the numeric estimate for the integral roos_and_arnold_3. The estimate errors of all three generators oscillate around 10110^{-1}, although a gradual decrease is observed with respect to sample size. The estimate errors for new-joe-kuo-6.21201 and tkrg-a-ap5 appear to be periodic and exhibit peaks and troughs. This could be related to the fact that the test integral is a product of periodic, sinusoidal functions. b) The effective dimension (multiplicative superposition) for the integral roos_and_arnold_3. The effective and nominal dimension relationship is approximated linear with a gradient of 0.18, thus every nominal dimension contributes 0.18 to the mean effective dimension.

f(x)=4xj2+j1/31+j1/3 f(x) = \prod \frac{|4x_j -2| + j^{1/3}}{1+j^{1/3}}
This decay in the contribution of higher dimensions causes the effective dimension of the integral to be low, reaching only \sim1.7, compared to the nominal dimension (d=50d=50) of the integral, as shown in Figure [20]. While the effective dimension is low, the estimate convergence rate of QMC does not appear to improve over MC for the considered range of nn, unlike joe_kuo_2 shown in Figure [13]). This may be attributed to the functional form of the components of this integral, again being concentrated around the edge of the [0,1] integration domain. Consequently the improved convergence of QMC relative to MC in this case likely manifests only at a number of samples, nn greater than 10510^5. This underlines the importance of considering all available information about the integral, not only the effective dimension when using QMC.

joekuo1_test_integral/
(a) Absolute numeric error for the integral joe_kuo_1, d=50.
joekuo1_mean_effective_dimension/
(b) Mean effective (superposition) dimension for joe_kuo_1.
Figure 20: a) Shows the absolute error of the numeric estimate for the integral joe_kuo_1. The estimate errors associated with all three begin in the range [0.1,1][0.1, 1] and decrease at roughly the same rate. b) The mean effective (superposition) dimension increases at a roughly logarithmic rate, where the mean effective dimension is 1.6\sim 1.6 at nominal dimension 40. The slow increase of mean effective dimension with respect to nominal dimension could contribute to the steady decrease of estimate errors of all three generators with respect to sample size.


Tenokonda