Title: | Miscellaneous Useful Functions for Statistics |
---|---|
Description: | A series of functions in some way considered useful to the author. These include methods for subsetting tables and generating indices for arrays, conditioning and intervening in probability distributions, generating combinations, fast transformations, and more... |
Authors: | Robin Evans [aut, cre], Mathias Drton [ctb] |
Maintainer: | Robin Evans <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.13.0 |
Built: | 2025-02-02 04:55:40 UTC |
Source: | https://github.com/rje42/rje |
Fast but loose implementations of AND and OR logical operators.
and0(x, y)
and0(x, y)
x , y
|
logical or numerical vectors |
Returns pairwise application of logical operators AND and OR. Vectors are recycled as usual.
A logical vector of length max(length(x), length(y))
with
entries x[1] & x[2]
etc.; each entry of x
or y
is
TRUE
if it is non-zero.
These functions should only be used with well understood vectors, and may not deal with unusual cases correctly.
and0(c(0,1,0), c(1,1,0)) ## Not run: set.seed(1234) x = rbinom(5000, 1, 0.5) y = rbinom(5000, 1, 0.5) # 3 to 4 times improvement over `&` system.time(for (i in 1:5000) and0(x,y)) system.time(for (i in 1:5000) x & y) ## End(Not run)
and0(c(0,1,0), c(1,1,0)) ## Not run: set.seed(1234) x = rbinom(5000, 1, 0.5) y = rbinom(5000, 1, 0.5) # 3 to 4 times improvement over `&` system.time(for (i in 1:5000) and0(x,y)) system.time(for (i in 1:5000) x & y) ## End(Not run)
Allows use of an Armijo rule or coarse line search as part of minimisation (or maximisation) of a differentiable function of multiple arguments (via gradient descent or similar). Repeated application of one of these rules should (hopefully) lead to a local minimum.
armijo( fun, x, dx, beta = 3, sigma = 0.5, grad, maximise = FALSE, searchup = TRUE, adj.start = 1, ... ) coarseLine(fun, x, dx, beta = 3, maximise = FALSE, ...)
armijo( fun, x, dx, beta = 3, sigma = 0.5, grad, maximise = FALSE, searchup = TRUE, adj.start = 1, ... ) coarseLine(fun, x, dx, beta = 3, maximise = FALSE, ...)
fun |
a function whose first argument is a numeric vector |
x |
a starting value to be passed to |
dx |
numeric vector containing feasible direction for search; defaults
to |
beta |
numeric value (greater than 1) giving factor by which to adjust step size |
sigma |
numeric value (less than 1) giving steepness criterion for move |
grad |
numeric gradient of |
maximise |
logical: if set to |
searchup |
logical: if set to |
adj.start |
an initial adjustment factor for the step size. |
... |
other arguments to be passed to |
coarseLine
performs a stepwise search and tries to find the integer
minimising
where
Note
may be negative. This is genearlly quicker and dirtier
than the Armijo rule.
armijo
implements an Armijo rule for moving, which is to say that
This has better convergence guarantees than a simple line search, but may be slower in practice. See Bertsekas (1999) for theory underlying the Armijo rule.
Each of these rules should be applied repeatedly to achieve convergence (see example below).
A list comprising
best |
the value of the function at the final point of evaluation |
adj |
the constant in the step, i.e.
|
move |
the final move; i.e. |
code |
an integer indicating the result of the function; 0 = returned
OK, 1 = very small move suggested, may be at minimum already, 2 = failed to
find minimum: function evaluated to |
coarseLine()
: Coarse line search
Robin Evans
Bertsekas, D.P. Nonlinear programming, 2nd Edition. Athena, 1999.
# minimisation of simple function of three variables x = c(0,-1,4) f = function(x) ((x[1]-3)^2 + sin(x[2])^2 + exp(x[3]) - x[3]) tol = .Machine$double.eps mv = 1 while (mv > tol) { # or replace with coarseLine() out = armijo(f, x, sigma=0.1) x = out$x mv = sum(out$move^2) } # correct solution is c(3,0,0) (or c(3,k*pi,0) for any integer k) x
# minimisation of simple function of three variables x = c(0,-1,4) f = function(x) ((x[1]-3)^2 + sin(x[2])^2 + exp(x[3]) - x[3]) tol = .Machine$double.eps mv = 1 while (mv > tol) { # or replace with coarseLine() out = armijo(f, x, sigma=0.1) x = out$x mv = sum(out$move^2) } # correct solution is c(3,0,0) (or c(3,k*pi,0) for any integer k) x
Returns a matrix containing each possible combination of one entry from vectors of the lengths provided.
combinations(p) powerSetMat(n)
combinations(p) powerSetMat(n)
p |
vector of non-negative integers. |
n |
non-negative integer. |
Returns a matrix, each row being one possible combination of integers from
the vectors , for
between 1 and
length(p)
.
Based on bincombinations
from package e1071
, which provides
the binary case.
powerSetMat
is just a wrapper for combinations(rep(2, n))
.
A matrix with number of columns equal to the length of p
, and
number of rows equal to ,
each row corresponding to a different combination. Ordering is
reverse-lexographic.
Robin Evans
combinations(c(2,3,3)) powerSetMat(3)
combinations(c(2,3,3)) powerSetMat(3)
Given a numeric array or matrix (of probabilities), calculates margins of some dimensions conditional on particular values of others.
conditionMatrix( x, variables, condition = NULL, condition.value = NULL, dim = NULL, incols = FALSE, undef = NaN ) conditionTable( x, variables, condition = NULL, condition.value = NULL, undef = NaN, order = TRUE ) conditionTable2(x, variables, condition, undef = NaN)
conditionMatrix( x, variables, condition = NULL, condition.value = NULL, dim = NULL, incols = FALSE, undef = NaN ) conditionTable( x, variables, condition = NULL, condition.value = NULL, undef = NaN, order = TRUE ) conditionTable2(x, variables, condition, undef = NaN)
x |
A numeric array. |
variables |
An integer vector containing the margins of interest from
|
condition |
An integer vector containing the dimensions of |
condition.value |
An integer vector or list of the same length as
|
dim |
Integer vector containing dimensions of variables. Assumed all binary if not specified. |
incols |
Logical specifying whether not the distributions are stored as the columns in the matrix; assumed to be rows by default. |
undef |
if conditional probability is undefined, what should the value be given as |
order |
logical - if |
conditionTable
calculates the marginal distribution over the
dimensions in variables
for each specified value of the dimensions in
condition
. Single or multiple values of each dimension in
condition
may be specified in condition.value
; in the case of
multiple values, condition.value
must be a list.
The sum over the dimensions in variables
is normalized to 1 for each
value of condition
.
conditionTable2
is just a wrapper which returns the conditional
distribution as an array of the same dimensions and ordering as the original
x
. Values are repeated as necessary.
conditionMatrix
takes a matrix whose rows (or columns if incols
= TRUE
) each represent a separate multivariate probability distribution and
finds the relevant conditional distribution in each case. These are then
returned in the same format. The order of the variables under
conditionMatrix
is always as in the original distribution, unlike for
conditionTable
above.
The probabilities are assumed in reverse lexicographic order, as in a flattened R array: i.e. the first value changes fastest: (1,1,1), (2,1,1), (1,2,1), ..., (2,2,2).
condition.table
and condition.table2
are identical to
conditionTable
and conditionTable2
.
conditionTable
returns an array whose first
length(variables)
corresponds to the dimensions in variables
,
and the remainder (if any) to dimensions in condition
with a
corresponding entry in condition.value
of length > 1.
conditionTable2
always returns an array of the same dimensions as
x
, with the variables in the same order.
conditionMatrix()
: Conditioning in matrix of distributions
conditionTable2()
: Conditioning whilst preserving all dimensions
Mathias Drton, Robin Evans
marginTable
, margin.table
,
interventionTable
x = array(1:16, rep(2,4)) x = x/sum(x) # probability distribution on 4 binary variables x1, x2, x3, x4. # distribution of x2, x3 given x1 = 1 and x4=2. conditionTable(x, c(2,3), c(1,4), c(1,2)) # x2, x3 given x1 = 1,2 and x4 = 2. conditionTable(x, c(2,3), c(1,4), list(1:2,2)) # complete conditional of x2, x3 given x1, x4 conditionTable(x, c(2,3), c(1,4)) # conditionTable2 leaves dimensions unchanged tmp = conditionTable2(x, c(2,3), c(1,4)) aperm(tmp, c(2,3,1,4)) #### set.seed(2314) # set of 10 2x2x2 probability distributions x = rdirichlet(10, rep(1,8)) conditionMatrix(x, 3, 1) conditionMatrix(x, 3, 1, 2)
x = array(1:16, rep(2,4)) x = x/sum(x) # probability distribution on 4 binary variables x1, x2, x3, x4. # distribution of x2, x3 given x1 = 1 and x4=2. conditionTable(x, c(2,3), c(1,4), c(1,2)) # x2, x3 given x1 = 1,2 and x4 = 2. conditionTable(x, c(2,3), c(1,4), list(1:2,2)) # complete conditional of x2, x3 given x1, x4 conditionTable(x, c(2,3), c(1,4)) # conditionTable2 leaves dimensions unchanged tmp = conditionTable2(x, c(2,3), c(1,4)) aperm(tmp, c(2,3,1,4)) #### set.seed(2314) # set of 10 2x2x2 probability distributions x = rdirichlet(10, rep(1,8)) conditionMatrix(x, 3, 1) conditionMatrix(x, 3, 1, 2)
Cube Helix is a colour scheme designed to be appropriate for screen display of intensity images. The scheme is intended to be monotonically increasing in brightness when displayed in greyscale. This might also provide improved visualisation for colour blindness sufferers.
cubeHelix(n, start = 0.5, r = -1.5, hue = 1, gamma = 1)
cubeHelix(n, start = 0.5, r = -1.5, hue = 1, gamma = 1)
n |
integer giving the number of colours in the scale |
start |
numeric: start gives the initial angle (in radians) of the helix |
r |
numeric: number of rotations of the helix over the scale; can be negative |
hue |
numeric controling the saturation of colour: 0 gives pure greyscale, defaults to 1 |
gamma |
numeric which can be used to emphasise lower or higher intensity values, defaults to 1 |
The function evaluates a helix which moves through the RGB "cube", beginning at black (0,0,0) and finishing at white (1,1,1). Evenly spaced points on this helix in the cube are returned as RGB colours. This provides a colour palette in which intensity increases monotonically, which makes for good transfer to greyscale displays or printouts. This also may have advantages for colour blindeness sufferers. See references for further details.
Vector of RGB colours (strings) of length 'n'.
Dave Green
Robin Evans
Green, D. A., 2011, A colour scheme for the display of astronomical intensity images. Bulletin of the Astronomical Society of India, 39, 289. https://ui.adsabs.harvard.edu/abs/2011BASI...39..289G/abstract
See Dave Green's page at https://www.mrao.cam.ac.uk/~dag/CUBEHELIX/ for other details.
rainbow
(for other colour palettes).
cubeHelix(21) ## Not run: cols = cubeHelix(101) plot.new() plot.window(xlim=c(0,1), ylim=c(0,1)) axis(side=1) for (i in 1:101) { rect((i-1)/101,0,(i+0.1)/101,1, col=cols[i], lwd=0) } ## End(Not run) ## Not run: require(grDevices) # comparison with other palettes n = 101 cols = cubeHelix(n) heat = heat.colors(n) rain = rainbow(n) terr = terrain.colors(n) plot.new() plot.window(xlim=c(-0.5,1), ylim=c(0,4)) axis(side=1, at=c(0,1)) axis(side=2, at=1:4-0.5, labels=1:4, pos=0) for (i in 1:n) { rect((i-1)/n,3,(i+0.1)/n,3.9, col=cols[i], lwd=0) rect((i-1)/n,2,(i+0.1)/n,2.9, col=heat[i], lwd=0) rect((i-1)/n,1,(i+0.1)/n,1.9, col=rain[i], lwd=0) rect((i-1)/n,0,(i+0.1)/n,0.9, col=terr[i], lwd=0) } legend(-0.6,4,legend=c("4. cube helix", "3. heat", "2. rainbow", "1. terrain"), box.lwd=0) ## End(Not run)
cubeHelix(21) ## Not run: cols = cubeHelix(101) plot.new() plot.window(xlim=c(0,1), ylim=c(0,1)) axis(side=1) for (i in 1:101) { rect((i-1)/101,0,(i+0.1)/101,1, col=cols[i], lwd=0) } ## End(Not run) ## Not run: require(grDevices) # comparison with other palettes n = 101 cols = cubeHelix(n) heat = heat.colors(n) rain = rainbow(n) terr = terrain.colors(n) plot.new() plot.window(xlim=c(-0.5,1), ylim=c(0,4)) axis(side=1, at=c(0,1)) axis(side=2, at=1:4-0.5, labels=1:4, pos=0) for (i in 1:n) { rect((i-1)/n,3,(i+0.1)/n,3.9, col=cols[i], lwd=0) rect((i-1)/n,2,(i+0.1)/n,2.9, col=heat[i], lwd=0) rect((i-1)/n,1,(i+0.1)/n,1.9, col=rain[i], lwd=0) rect((i-1)/n,0,(i+0.1)/n,0.9, col=terr[i], lwd=0) } legend(-0.6,4,legend=c("4. cube helix", "3. heat", "2. rainbow", "1. terrain"), box.lwd=0) ## End(Not run)
Produces a matrix whose rows correspond to an orthogonal binary design matrix.
designMatrix(n)
designMatrix(n)
n |
integer containing the number of elements in the set. |
An integer matrix of dimension 2^n by 2^n containing 1 and -1.
The output matrix has orthogonal columns and is symmetric, so (up to a constant) is its own inverse. Operations with this matrix can be performed more efficiently using the fast Hadamard transform.
Robin Evans
designMatrix(3)
designMatrix(3)
Density function and random generation for Dirichlet distribution with
parameter vector alpha
.
ddirichlet(x, alpha, log = FALSE, tol = 1e-10) rdirichlet(n, alpha)
ddirichlet(x, alpha, log = FALSE, tol = 1e-10) rdirichlet(n, alpha)
x |
vector (or matrix) of points in sample space. |
alpha |
vector of Dirichlet hyper parameters. |
log |
logical; if TRUE, natural logarithm of density is returned. |
tol |
tolerance of vectors not summing to 1 and negative values. |
n |
number of random variables to be generated. |
If x
is a matrix, each row is taken to be a different point whose
density is to be evaluated. If the number of columns in (or length of, in
the
alpha
, the
vector sum to 1.
The k-dimensional Dirichlet distribution has density
assuming that and
, and zero otherwise.
If the sum of row entries in x
differs from 1 by more than
tol
,
is assumed to be
rdirichlet
returns a matrix, each row of which is an
independent draw
alpha
.
ddirichlet
returns a vector, each entry being the density of the
corresponding row of x
. If x
is a vector, then the output
will have length 1.
Robin Evans
https://en.wikipedia.org/wiki/Dirichlet_distribution
x = rdirichlet(10, c(1,2,3)) x # Find densities at random points. ddirichlet(x, c(1,2,3)) # Last column to be inferred. ddirichlet(x[,c(1,2)], c(1,2,3)) ddirichlet(x, matrix(c(1,2,3), 10, 3, byrow=TRUE))
x = rdirichlet(10, c(1,2,3)) x # Find densities at random points. ddirichlet(x, c(1,2,3)) # Last column to be inferred. ddirichlet(x[,c(1,2)], c(1,2,3)) ddirichlet(x, matrix(c(1,2,3), 10, 3, byrow=TRUE))
Functions to take the expit and logit of numerical vectors.
expit(x) logit(x)
expit(x) logit(x)
x |
vector of real numbers; for |
logit
implements the usual logit function, which is
and expit
its inverse:
It is assumed
that logit(0) = -Inf
and logit(1) = Inf
, and correspondingly
for expit
.
A real vector corresponding to the expits or logits of x
logit()
: logit function
Choosing very large (positive or negative) values to
apply to expit
may result in inaccurate inversion (see example
below).
Robin Evans
x = c(5, -2, 0.1) y = expit(x) logit(y) # Beware large values! logit(expit(100))
x = c(5, -2, 0.1) y = expit(x) logit(y) # Beware large values! logit(expit(100))
Passes vector through Hadamard orthogonal design matrix. Also known as the Fast Walsh-Hadamard transform.
fastHadamard(x, pad = FALSE)
fastHadamard(x, pad = FALSE)
x |
vector of values to be transformed |
pad |
optional logical asking whether vector not of length |
This is equivalent to multiplying by designMatrix(log2(length(x)))
but should run much faster
A vector of the same length as x
Robin Evans
fastHadamard(1:8) fastHadamard(1:5, pad=TRUE)
fastHadamard(1:8) fastHadamard(1:5, pad=TRUE)
Uses the fast method of Kennes and Smets (1990) to obtain Moebius and inverse Moebius transforms.
fastMobius(x, pad = FALSE) invMobius(x, pad = FALSE)
fastMobius(x, pad = FALSE) invMobius(x, pad = FALSE)
x |
vector to transform |
pad |
logical, should vector not of length 2^k be padded with zeroes? |
These are respectively equivalent to multiplying abs(subsetMatrix(k))
and subsetMatrix(k)
by x
, when x
has length , but is
much faster if
is large.
invMobius()
: inverse transform
x <- c(1,0,-1,2,4,3,2,1) M <- subsetMatrix(3) M %*% abs(M) %*% x invMobius(fastMobius(x))
x <- c(1,0,-1,2,4,3,2,1) M <- subsetMatrix(3) M %*% abs(M) %*% x invMobius(fastMobius(x))
Faster highly stripped down version of sapply()
fsapply(x, FUN)
fsapply(x, FUN)
x |
a vector (atomic or list) or an expression object. |
FUN |
the function to be applied to each element of |
This is just a wrapper for unlist(lapply(x, FUN))
, which will behave
as sapply
if FUN
returns an atomic vector of length 1 each
time.
Speed up over sapply is not dramatic, but can be useful in time critical code.
A vector of results of applying FUN
to x
.
Very loose version of sapply
which should really
only by used if you're confident about how FUN
is applied to each
entry in x
.
Robin Evans
x = list(1:1000) tmp = fsapply(x, sin) ## Not run: x = list() set.seed(142313) for (i in 1:1000) x[[i]] = rnorm(100) system.time(for (i in 1:100) sapply(x, function(x) last(x))) system.time(for (i in 1:100) fsapply(x, function(x) last(x))) ## End(Not run)
x = list(1:1000) tmp = fsapply(x, sin) ## Not run: x = list() set.seed(142313) for (i in 1:1000) x[[i]] = rnorm(100) system.time(for (i in 1:100) sapply(x, function(x) last(x))) system.time(for (i in 1:100) fsapply(x, function(x) last(x))) ## End(Not run)
Just a wrapper for comparing numerical values, for use with quicksort.
greaterThan(x, y)
greaterThan(x, y)
x |
A numeric vector. |
y |
A numeric vector. |
Just returns -1
if x
is less than y
, 1
if
x
is greater, and 0
if they are equal (according to
==
). The vectors wrap as usual if they are of different lengths.
An integer vector.
Robin Evans
`<`
for traditional Boolean operator.
greaterThan(4,6) # Use in sorting algorithm. quickSort(c(5,2,9,7,6), f=greaterThan) order(c(5,2,9,7,6))
greaterThan(4,6) # Use in sorting algorithm. quickSort(c(5,2,9,7,6), f=greaterThan) order(c(5,2,9,7,6))
Get inclusion maximal subsets from a list
inclusionMax(x, right = FALSE)
inclusionMax(x, right = FALSE)
x |
list containing the subsets |
right |
logical indicating whether right-most entry is always inclusion maximal |
Returns the inclusion maximal elements of x
. The
indicator right
may be set to TRUE
in order to indicate
that the right-most entry is always an inclusion maximal set over all earlier
sets.
letlist <- list(LETTERS[1:2], LETTERS[2:4], LETTERS[1:3]) inclusionMax(letlist)
letlist <- list(LETTERS[1:2], LETTERS[2:4], LETTERS[1:3]) inclusionMax(letlist)
Determines the relative vector positions of entries which are adjacent in an array.
indexBox(upp, lwr, dim)
indexBox(upp, lwr, dim)
upp |
A vector of non-negative integers, giving the distance in the positive direction from the centre in each co-ordinate. |
lwr |
A vector of non-positive integers, giving the negative distance from the centre. |
dim |
integer vector of array dimensions. |
Given a particular cell in an array, which are the entries within (for example) 1 unit in any direction? This function gives the (relative) value of such indices. See examples.
Indices may be repeated if the range exceeds the size of the array in any dimension.
An integer vector giving relative positions of the indices.
Robin Evans
arr = array(1:144, dim=c(3,4,3,4)) arr[2,2,2,3] # which are entries within 1 unit each each direction of 2,2,2,3? inds = 89 + indexBox(1,-1,c(3,4,3,4)) inds = inds[inds > 0 & inds <= 144] arrayInd(inds, c(3,4,3,4)) # what about just in second dimension? inds = 89 + indexBox(c(0,1,0,0),c(0,-1,0,0),c(3,4,3,4)) inds = inds[inds > 0 & inds <= 144] arrayInd(inds, c(3,4,3,4))
arr = array(1:144, dim=c(3,4,3,4)) arr[2,2,2,3] # which are entries within 1 unit each each direction of 2,2,2,3? inds = 89 + indexBox(1,-1,c(3,4,3,4)) inds = inds[inds > 0 & inds <= 144] arrayInd(inds, c(3,4,3,4)) # what about just in second dimension? inds = 89 + indexBox(c(0,1,0,0),c(0,-1,0,0),c(3,4,3,4)) inds = inds[inds > 0 & inds <= 144] arrayInd(inds, c(3,4,3,4))
Alternate between sets and integers representing sets of integers via bits
int2set(n, index = 1, simplify = FALSE) set2int(x, index = 1)
int2set(n, index = 1, simplify = FALSE) set2int(x, index = 1)
n |
integer respresenting a set |
index |
integer to start from |
simplify |
logical: return a single list if |
x |
list of sets |
Converts an integer into its binary representation and interprets this as a set of integers. Cannot handle sets with more than 31 elements.
For int2set
a list of sets one for each integer
supplied, for set2int
a vector of the same length as the number
of sets supplied.
set2int()
: Convert sets to integers
Calculate interventional distributions from a probability table or matrix of multivariate probability distributions.
interventionMatrix(x, variables, condition, dim = NULL, incols = FALSE) interventionTable(x, variables, condition)
interventionMatrix(x, variables, condition, dim = NULL, incols = FALSE) interventionTable(x, variables, condition)
x |
An array of probabilities. |
variables |
The margin for the intervention. |
condition |
The dimensions to be conditioned upon. |
dim |
Integer vector containing dimensions of variables. Assumed all binary if not specified. |
incols |
Logical specifying whether not the distributions are stored as the columns in the matrix; assumed to be rows by default. |
This just divides the joint distribution by
, where
is
variables
and is
condition
.
Under certain causal assumptions this is the interventional distribution
(i.e. if the direct causes of
are precisely
.)
intervention.table()
is identical to interventionTable()
.
A numerical array of the same dimension as .
interventionMatrix()
: Interventions in matrix of distributions
Robin Evans
Pearl, J., Causality, 2nd Edition. Cambridge University Press, 2009.
set.seed(413) # matrix of distributions p = rdirichlet(10, rep(1,16)) interventionMatrix(p, 3, 2) # take one in an array ap = array(p[1,], rep(2,4)) interventionTable(ap, 3, 2)
set.seed(413) # matrix of distributions p = rdirichlet(10, rep(1,16)) interventionMatrix(p, 3, 2) # take one in an array ap = array(p[1,], rep(2,4)) interventionTable(ap, 3, 2)
Determines whether one vector contains all the elements of another.
is.subset(x, y) x %subof% y
is.subset(x, y) x %subof% y
x |
vector. |
y |
vector. |
Determines whether or not every element of x
is also found in
y
. Returns TRUE
if so, and FALSE
if not.
A logical of length 1.
x %subof% y
: operator version
Robin Evans
is.subset(1:2, 1:3) is.subset(1:2, 2:3) 1:2 %subof% 1:3 1:2 %subof% 2:3
is.subset(1:2, 1:3) is.subset(1:2, 2:3) 1:2 %subof% 1:3 1:2 %subof% 2:3
Checks whether a numeric value is integral, up to machine or other specified prescision.
is.wholenumber(x, tol = .Machine$double.eps^0.5)
is.wholenumber(x, tol = .Machine$double.eps^0.5)
x |
numeric vector to be tested. |
tol |
The desired precision. |
A logical vector of the same length as x
, containing the
results of the test.
Robin Evans
x = c(0.5, 1, 2L, 1e-20) is.wholenumber(x)
x = c(0.5, 1, 2L, 1e-20) is.wholenumber(x)
Kronecker power of a matrix or vector
kronPower(x, n)
kronPower(x, n)
x |
matrix or vector |
n |
integer containing power to take |
This computes x %x% ... %x% x
for n
instances of x
.
Returns the last element of a list or vector.
last(x)
last(x)
x |
a list or vector. |
Designed to be faster than using tail()
or rev()
, and cleaner
than writing x[length(x)]
.
An object of the same type as x
of length 1 (or empty if
x
is empty).
Robin Evans
last(1:10)
last(1:10)
Computes the margin of a contingency table given as an array, by summing out over the dimensions not specified.
marginTable(x, margin = NULL, order = TRUE) marginMatrix(x, margin, dim = NULL, incols = FALSE, order = FALSE)
marginTable(x, margin = NULL, order = TRUE) marginMatrix(x, margin, dim = NULL, incols = FALSE, order = FALSE)
x |
a numeric array |
margin |
integer vector giving margin to be calculated (1 for rows, etc.) |
order |
logical - should indices of output be ordered as in the vector
|
dim |
Integer vector containing dimensions of variables. Assumed all binary if not specified. |
incols |
Logical specifying whether not the distributions are stored as the columns in the matrix; assumed to be rows by default. |
With order = TRUE
this is the same as the base function
margin.table()
, but faster.
With order = FALSE
the function is even faster, but the indices in
the margin are returned in their original order, regardless of the way they
are specified in margin
.
propTable()
returns a renormalized contingency table whose entries
sum to 1. It is equivalent to prop.table()
, but faster.
The relevant marginal table. The class of x
is copied to the
output table, except in the summation case.
Original functions are margin.table
and
prop.table
.
m <- matrix(1:4, 2) marginTable(m, 1) marginTable(m, 2) propTable(m, 2) # 3-way example m <- array(1:8, rep(2,3)) marginTable(m, c(2,3)) marginTable(m, c(3,2)) marginTable(m, c(3,2), order=FALSE) #' set.seed(2314) # set of 10 2x2x2 probability distributions x = rdirichlet(10, rep(1,8)) marginMatrix(x, c(1,3)) marginMatrix(t(x), c(1,3), incols=TRUE)
m <- matrix(1:4, 2) marginTable(m, 1) marginTable(m, 2) propTable(m, 2) # 3-way example m <- array(1:8, rep(2,3)) marginTable(m, c(2,3)) marginTable(m, c(3,2)) marginTable(m, c(3,2), order=FALSE) #' set.seed(2314) # set of 10 2x2x2 probability distributions x = rdirichlet(10, rep(1,8)) marginMatrix(x, c(1,3)) marginMatrix(t(x), c(1,3), incols=TRUE)
Match rows in a matrix with duplicates to set of unique values
match_rows(x, y, nomatch = NA_integer_)
match_rows(x, y, nomatch = NA_integer_)
x |
matrix with unique rows |
y |
matrix to be matched |
nomatch |
value to insert when there is no match |
Recreate patterns for collapsed arrays
patternRepeat(x, which, n, careful = TRUE, keep.order = FALSE) patternRepeat0(which, n, careful = TRUE, keep.order = FALSE)
patternRepeat(x, which, n, careful = TRUE, keep.order = FALSE) patternRepeat0(which, n, careful = TRUE, keep.order = FALSE)
x |
A vector to be repeated. |
which |
Which indices of the implicit array are given in |
n |
Dimensions of implicit array. |
careful |
logical indicating whether to check vailidty of arguments, but therefore slow things down. |
keep.order |
logical indicating whether to respect the ordering of the
entries in the vector |
These functions allow for the construction of complex repeating patterns
corresponding to those obtained by unwrapping arrays. Consider an array
with dimensions n
; then for each value of the dimensions in
which
, this function returns a vector which places the corresponding
entry of x
into every place which would match this pattern when the
full array is unwrapped.
For example, if a full 4-way array has dimensions 2*2*2*2 and we consider the margin of variables 2 and 4, then the function returns the pattern c(1,1,2,2,1,1,2,2,3,3,4,4,3,3,4,4). The entries 1,2,3,4 correspond to the patterns (0,0), (1,0), (0,1) and (1,1) for the 2nd and 4th indices.
In patternRepeat()
the argument x
is repeated according to the
pattern, while patternRepeat0()
just returns the indexing pattern.
So patternRepeat(x,which,n)
is effectively equivalent to
x[patternRepeat0(which,n)]
.
The length of x
must be equal to prod(n[which])
.
Both return a vector of length prod(n)
;
patternRepeat()
one containing suitably repeated and ordered elements
of x
, for patternRepeat0()
it is always the integers from 1 up
to prod(n[which])
.
patternRepeat0()
: Stripped down version that just gives indices
Robin Evans
patternRepeat(1:4, c(1,2), c(2,2,2)) c(array(1:4, c(2,2,2))) patternRepeat0(c(1,3), c(2,2,2)) patternRepeat0(c(2,3), c(2,2,2)) patternRepeat0(c(3,1), c(2,2,2)) patternRepeat0(c(3,1), c(2,2,2), keep.order=TRUE) patternRepeat(letters[1:4], c(1,3), c(2,2,2))
patternRepeat(1:4, c(1,2), c(2,2,2)) c(array(1:4, c(2,2,2))) patternRepeat0(c(1,3), c(2,2,2)) patternRepeat0(c(2,3), c(2,2,2)) patternRepeat0(c(3,1), c(2,2,2)) patternRepeat0(c(3,1), c(2,2,2), keep.order=TRUE) patternRepeat(letters[1:4], c(1,3), c(2,2,2))
Produces the power set of a vector.
powerSet(x, m, rev = FALSE) powerSetCond(x, y, m, rev = FALSE, sort = FALSE)
powerSet(x, m, rev = FALSE) powerSetCond(x, y, m, rev = FALSE, sort = FALSE)
x |
vector of elements (the set). |
m |
maximum cardinality of subsets |
rev |
logical indicating whether to reverse the order of subsets. |
y |
set to condition on |
sort |
logical: should sets be sorted? |
Creates a list containing every subset
of the elements of the vector x
.
powerSet
returns subsets up to size m
(if this is specified).
powerSetCond
includes some non-empty subset of x
in every set.
A list of vectors of the same type as x
.
With rev = FALSE
(the default) the list is ordered such that all
subsets containing the last element of x
come after those which do
not, and so on.
powerSetCond()
: Add sets that can't be empty
Robin Evans
powerSet(1:3) powerSet(letters[3:5], rev=TRUE) powerSet(1:5, m=2) powerSetCond(2:3, y=1)
powerSet(1:3) powerSet(letters[3:5], rev=TRUE) powerSet(1:5, m=2) powerSetCond(2:3, y=1)
Prints percentage (or alternatively just a count) of loop or similar process which has been completed to the standard output.
printPercentage(i, n, dp = 0, first = 1, last = n, prev = i - 1)
printPercentage(i, n, dp = 0, first = 1, last = n, prev = i - 1)
i |
the number of iterations completed. |
n |
total number of iterations. |
dp |
number of decimal places to display. |
first |
number of the first iteration for which this percentage was displayed |
last |
number of the final iteration for which this percentage will be displayed |
prev |
number of the previous iteration for which this percentage was displayed |
printPercentage
will use cat
to print the proportion of loops
which have been completed (i.e. i/n
) to the standard output. In
doing so it will erase the previous such percentage, except when i =
first
. A new line is added when i = last
, assuming that the loop is
finished.
NULL
This will fail to work nicely if other information is printed to the standard output
Robin Evans
x = numeric(100) for (i in 1:100) { x[i] = mean(rnorm(1e5)) printPercentage(i,100) } i = 0 repeat { i = i+1 if (runif(1) > 0.99) { break } printCount(i) } print("\n")
x = numeric(100) for (i in 1:100) { x[i] = mean(rnorm(1e5)) printPercentage(i,100) } i = 0 repeat { i = i+1 if (runif(1) > 0.99) { break } printCount(i) } print("\n")
Implements the quicksort algorithm for partial orderings based on pairwise comparisons.
quickSort(x, f = greaterThan, ..., random = TRUE)
quickSort(x, f = greaterThan, ..., random = TRUE)
x |
A list or vector of items to be sorted. |
f |
A function on two arguments for comparing elements of |
... |
other arguments to |
random |
logical - should a random pivot be chosen? (this is recommended) Otherwise middle element is used. |
Implements the usual quicksort algorithm, but may return the same positions
for items which are incomparable (or equal). Does not test the validity of
f
as a partial order.
If x
is a numeric vector with distinct entries, this behaves just
like rank
.
Returns an integer vector giving each element's position in the order (minimal element(s) is 1, etc).
Output may not be consistent for certain partial orderings (using random pivot), see example below. All results will be consistent with a total ordering which is itselft consistent with the true partial ordering.
f
is not checked to see that it returns a legitimate partial order,
so results may be meaningless if it is not.
Robin Evans
https://en.wikipedia.org/wiki/Quicksort.
set.seed(1) quickSort(powerSet(1:3), f=subsetOrder) quickSort(powerSet(1:3), f=subsetOrder) # slightly different answers, but both correposnding # to a legitimate total ordering.
set.seed(1) quickSort(powerSet(1:3), f=subsetOrder) quickSort(powerSet(1:3), f=subsetOrder) # slightly different answers, but both correposnding # to a legitimate total ordering.
Row-wise minima and maxima
rowMins(x) rowMaxs(x)
rowMins(x) rowMaxs(x)
x |
a numeric (or logical) matrix or data frame |
The function coerces x
to be a data frame and then
uses pmin
(pmax
) on it. This is the same as
apply(x, 1, min)
but generally faster if the number of rows
is large.
numeric vector of length nrow(x)
giving the row-wise minima (or maxima) of x
.
Wrapper functions to quickly generate discrete joint (or conditional) distributions using Dirichlets
rprobdist(dim, d, cond, alpha = 1)
rprobdist(dim, d, cond, alpha = 1)
dim |
the joint dimension of the probability table |
d |
number of dimensions |
cond |
optionally, vertices to condition upon |
alpha |
Dirichlet hyper parameter, defaults to 1 (flat density). |
rprobdist
gives an array of dimension dim
(recycled as necessary to have length d
, if this is
supplied) whose entries are probabilities drawn from a Dirichlet
distribution whose parameter vector has entries equal to
alpha
(appropriately recycled).
an array of appropriate dimensions
Uses as many gamma random variables as cells in the table, so will alter the random seed accordingly.
Robin Evans
rprobdist(2, 4) # 2x2x2x2 table rprobdist(c(2,3,2)) # 2x3x2 table rprobdist(2, 4, alpha=1/16) # using unit information prior # get variables 2 and 4 conditioned upon rprobdist(2, 4, cond=c(2,4), alpha=1/16)
rprobdist(2, 4) # 2x2x2x2 table rprobdist(c(2,3,2)) # 2x3x2 table rprobdist(2, 4, alpha=1/16) # using unit information prior # get variables 2 and 4 conditioned upon rprobdist(2, 4, cond=c(2,4), alpha=1/16)
Obtain generalized Schur complement
schur(M, x, y, z)
schur(M, x, y, z)
M |
symmetric positive definite matrix |
x , y , z
|
indices of M to calculate with (see below) |
Calculates , which
(if M is a Gaussian covariance matrix) is the covariance between
x and y after conditioning on z.
y defaults to equal x, and z to be the complement of .
Series of functions extending existing vector operations to lists of vectors.
setmatch(x, y, nomatch = NA_integer_) setsetequal(x, y) setsetdiff(x, y) subsetmatch(x, y, nomatch = NA_integer_) supersetmatch(x, y, nomatch = NA_integer_)
setmatch(x, y, nomatch = NA_integer_) setsetequal(x, y) setsetdiff(x, y) subsetmatch(x, y, nomatch = NA_integer_) supersetmatch(x, y, nomatch = NA_integer_)
x |
list of vectors. |
y |
list of vectors. |
nomatch |
value to be returned in the case when no match is found. Note that it is coerced to integer. |
'setmatch' checks whether each vector in the list 'x' is also contained in the list 'y', and if so returns position of the first such vector in 'y'. The ordering of the elements of the vector is irrelevant, as they are considered to be sets.
'subsetmatch' is similar to 'setmatch', except vectors in 'x' are searched to see if they are subsets of vectors in 'y'. Similarly 'supersetmatch' consideres if vectors in 'x' are supersets of vectors in 'y'.
'setsetdiff' is a setwise version of 'setdiff', and 'setsetequal' a setwise version of 'setequal'.
'setmatch' and 'subsetmatch' return a vector of integers of length the same as the list 'x'.
'setsetdiff' returns a sublist 'x'.
'setsetequal' returns a logical of length 1.
setsetequal()
: Test for equality of sets
setsetdiff()
: Setdiff for lists
subsetmatch()
: Test for subsets
supersetmatch()
: Test for supersets
These functions are not recursive, in the sense that they cannot be used to test lists of lists. They also do not reduce to the vector case.
Robin Evans
x = list(1:2, 1:3) y = list(1:4, 1:3) setmatch(x, y) subsetmatch(x, y) setsetdiff(x, y) x = list(1:3, 1:2) y = list(2:1, c(2,1,3)) setsetequal(x, y)
x = list(1:2, 1:3) y = list(1:4, 1:3) setmatch(x, y) subsetmatch(x, y) setsetdiff(x, y) x = list(1:3, 1:2) y = list(2:1, c(2,1,3)) setsetequal(x, y)
Check list of sets is nested
sets_nested(x)
sets_nested(x)
x |
list containing collection of sets |
If the sets are nested it returns an ordering, otherwise 'NA'.
Produces a matrix whose rows indicate what subsets of a set are included in which other subsets.
subsetMatrix(n)
subsetMatrix(n)
n |
integer containing the number of elements in the set. |
This function returns a matrix, with each row and column corresponding to a
subset of a hypothetical set of size n
, ordered lexographically. The
entry in row i
, column j
corresponds to whether or not the
subset associated with i
is a superset of that associated with
j
.
A 1 or -1 indicates that i
is a superset of j
, with the sign
referring to the number of fewer elements in j
. 0 indicates that
i
is not a superset of j
.
An integer matrix of dimension 2^n by 2^n.
The inverse of the output matrix is just abs(subsetMatrix(n))
.
Robin Evans
combinations
, powerSet
, designMatrix
.
subsetMatrix(3)
subsetMatrix(3)
A wrapper for is.subset
which returns set inclusions.
subsetOrder(x, y)
subsetOrder(x, y)
x |
A vector. |
y |
A vector of the same type as |
If x
is a subset of y
, returns -1
, for the reverse
returns 1
. If sets are equal or incomparable, it returns 0
.
A single integer, 0, -1 or 1.
Robin Evans
subsetOrder(2:4, 1:4) subsetOrder(2:4, 3:5)
subsetOrder(2:4, 1:4) subsetOrder(2:4, 3:5)
More flexible calls of [
on an array.
subtable(x, variables, levels, drop = TRUE) subarray(x, levels, drop = TRUE) subtable(x, variables, levels) <- value subarray(x, levels) <- value
subtable(x, variables, levels, drop = TRUE) subarray(x, levels, drop = TRUE) subtable(x, variables, levels) <- value subarray(x, levels) <- value
x |
An array. |
variables |
An integer vector containing the dimensions of |
levels |
A list or vector containing values to retain. |
drop |
Logical indicating whether dimensions with only 1 retained
should be dropped. Defaults to |
value |
Value to assign to entries in table. |
Essentially just allows more flexible calls of [
on an array.
subarray
requires the values for each dimension should be specified,
so for a array
x
,
subarray(x, list(1,2,1:2))
is just x[1,2,1:2]
.
subtable
allows unspecified dimensions to be retained automatically.
Thus, for example subtable(x, c(2,3), list(1, 1:2))
is
x[,1,1:2]
.
Returns an array of dimension sapply(value, length)
if
drop=TRUE
, otherwise specified dimensions of size 1 are
dropped. Dimensions which are unspecified in subtable
are never
dropped.
subarray()
: Flexible subsetting
subtable(x, variables, levels) <- value
: Assignment in a table
subarray(x, levels) <- value
: Assignment in an array
Mathias Drton, Robin Evans
x = array(1:8, rep(2,3)) subarray(x, c(2,1,2)) == x[2,1,2] x[2,1:2,2,drop=FALSE] subarray(x, list(2,1:2,2), drop=FALSE) subtable(x, c(2,3), list(1, 1:2))
x = array(1:8, rep(2,3)) subarray(x, c(2,1,2)) == x[2,1,2] x[2,1:2,2,drop=FALSE] subarray(x, list(2,1:2,2), drop=FALSE) subtable(x, c(2,3), list(1, 1:2))