Introduction to Jaatha

Paul Staab

jaatha 3.2.2

Jaatha is an estimation method that can use computer simulations to produce maximum-likelihood estimates even when the likelihood function can not be evaluated directly. It can be applied whenever it is feasible to conduct many simulations, but works best when the data is at least approximately possion distributed.

Jaatha was originally designed for demographic inference in evolutionary biology. Please also refer to the vignette

vignette("jaatha-evolution", package = jaatha)

if you are interested in this application.

Before we start, we need to load the package and set a seed to ensure that jaatha’s results are reproducible:

library(jaatha)
set.seed(112233)

The “real” data

Imagine that we have observed the following data

data_obs <- c(2, 8, 0, 6, 1, 3, 2, 2, 0, 7)
data_obs
##  [1] 2 8 0 6 1 3 2 2 0 7

and we assume that the data are independent samples from two Poisson distributions with parameters p1 and p2, respectively. The odd positions of the vector are samples from the first distribution, and the even positions are samples taken from the second distribution.

In order to run jaatha, we need first formalize this model and convert the data into a format that jaatha can work with.

Creating a model

The usual way to describe the data generating model for jaatha is trough a simulation function. The function takes model parameters as input, and simulates data according to the model. In our example of the mixed samples from Poisson distributions, we can use the function

sim_func <- function(x) rpois(10, x)
sim_func(c(p1 = 1, p2 = 10))
##  [1]  4 10  0 10  3  6  1  9  2  9

Simulation functions for jaatha must have exactly one argument, which is a vector of model parameters for which the simulation is conducted. There are no requirements on the return format of a simulation function from jaatha’s site, any R objects work with Jaatha. Here, the function returns an vector of ten integers.

Jaatha does not use the simulated data directly, but works on a number of summary statistics instead. Summary statistics are transformations of the data that capture an aspect of the data. The transformation is again described by a function that takes the simulation results as input, and returns a fix number of Poisson distributed values. Here, this already applies to the simulation results, and we can just use them:

sum_stats <- list(create_jaatha_stat("id", function(x, opts) x))

Note that we create a list containing our statistic. In our example, we’ll use just one statistic, but it is possible to add more than one statistic to this list. Please refer to the documentation for create_jaatha_stat for additional information, in particular if you can not generate Poisson distributed statistics from the simulation results easily.

Next, we need to define the possible values for the model parameters. These range should cover all reasonable values that the parameters can take:

par_ranges <- matrix(c(0.1, 0.1, 10, 10), 2, 2)
rownames(par_ranges) <- c("x", "y")
colnames(par_ranges) <- c("min", "max")
par_ranges
##   min max
## x 0.1  10
## y 0.1  10

This three components – a simulation function, parameter ranges and a list of summary statistics – are required to describe an probabilistic framework within witch jaatha can fit parameters. Since we have the pieces together now, we can use the create_jaatha_model function to combine them into a formal model that we can pass to the jaatha function later:

jaatha_model <- create_jaatha_model(sim_func, par_ranges, sum_stats)
## A simulation takes less than a second

The function performs a test simulation to ensure that the components fit together. Again, the documentation for create_jaatha_model provides additional details.

Preparing the Data

Use the create_jaatha_data function to prepare the observed data for being used in Jaatha. The function expected the data to be in the format that is also returned by simulation function. In our example, sim_func returns a numeric vector of length 10, and data_obs already has the same format. So we can import it

jaatha_data <- create_jaatha_data(data_obs, jaatha_model)

Running Jaatha

Now that we have prepared model and data, we can use the jaatha function to estimate parameters:

estimates <- jaatha(jaatha_model, jaatha_data, 
                    sim = 50, repetitions = 2, verbose = FALSE)

Here, were are conducting 50 simulations in each step of the optimization procedure and repeat the complete optimization two times from different starting positions. For real applications, higher values are recommended.

In this simple toy example, the above values work quite well:

estimates
##        x        y 
## 1.098902 5.059388