Does minimal processing of data to use as argument to fitting function

create_data(
  data,
  min_number = 0,
  variable = "number",
  time = "year",
  date = "doy",
  asymmetric_model = TRUE,
  mu = ~1,
  sigma = ~1,
  covar_data = NULL,
  est_sigma_re = TRUE,
  est_mu_re = TRUE,
  tail_model = "student_t",
  family = "lognormal",
  max_theta = 10,
  share_shape = TRUE,
  nu_prior = c(2, 10),
  beta_prior = c(2, 1)
)

Arguments

data

A data frame

min_number

A minimum threshold to use, defaults to 0

variable

A character string of the name of the variable in 'data' that contains the response (e.g. counts)

time

A character string of the name of the variable in 'data' that contains the time variable (e.g. year)

date

A character string of the name of the variable in 'data' that contains the response (e.g. day of year). The actual #' column should contain a numeric response -- for example, the result from using lubridate::yday(x)

asymmetric_model

Boolean, whether or not to let model be asymmetric (e.g. run timing before peak has a different shape than run timing after peak)

mu

An optional formula allowing the mean to be a function of covariates. Random effects are not included in the formula but specified with the est_mu_re argument

sigma

An optional formula allowing the standard deviation to be a function of covariates. For asymmetric models, each side of the distribution is allowed a different set of covariates. Random effects are not included in the formula but specified with the est_sigma_re argument

covar_data

a data frame containing covariates specific to each time step. These are used in the formulas mu and sigma

est_sigma_re

Whether to estimate random effects by year in sigma parameter controlling tail of distribution. Defaults to TRUE

est_mu_re

Whether to estimate random effects by year in mu parameter controlling location of distribution. Defaults to TRUE

tail_model

Whether to fit Gaussian ("gaussian"), Student-t ("student_t") or generalized normal ("gnorm"). Defaults to Student-t

family

Response for observation model, options are "gaussian", "poisson", "negbin", "binomial", "lognormal". The default ("lognormal") is not a true lognormal distribution, but a normal-log in that it assumes log(y) ~ Normal()

max_theta

Maximum value of log(pred) when limits=TRUE. Defaults to 10

share_shape

Boolean argument for whether asymmetric student-t and generalized normal distributions should share the shape parameter (nu for the student-t; beta for the generalized normal). Defaults to TRUE

nu_prior

Two element vector (optional) for penalized prior on student t df, defaults to a Gamma(shape=2, scale=10) distribution

beta_prior

Two element vector (optional) for penalized prior on generalized normal beta, defaults to a Normal(2, 1) distribution

Examples

data(fishdist)
datalist <- create_data(fishdist,
  min_number = 0, variable = "number", time = "year",
  date = "doy", asymmetric_model = TRUE, family = "gaussian"
)