Julia simGH: simulating times to event from a general hazard structure

The `simGH` command.

The simGH command from the HazReg.jl Julia package allows one to simulate times to event from the following models:

General Hazard (GH) model [2] [3].
Accelerated Failure Time (AFT) model [4].
Proportional Hazards (PH) model [5].
Accelerated Hazards (AH) model [6].

A description of these hazard models is presented below as well as the available baseline hazards.

General Hazard model

The GH model is formulated in terms of the hazard structure

\[h(t; \alpha, \beta, \theta, {\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

where ${\bf x}\in{\mathbb R}^p$ are the covariates that affect the hazard level; $\tilde{\bf x} \in {\mathbb R}^q$ are the covariates the affect the time level (typically $\tilde{\bf x} \subset {\bf x}$); $\alpha \in {\mathbb R}^q$ and $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

This hazard structure leads to an identifiable model as long as the baseline hazard is not a hazard associated to a member of the Weibull family of distributions [2].

Accelerated Failure Time (AFT) model

The AFT model is formulated in terms of the hazard structure

\[h(t; \beta, \theta, {\bf x}) = h_0\left(t \exp\{{\bf x}^{\top}\beta\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

where ${\bf x}\in{\mathbb R}^p$ are the available covariates; $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

Proportional Hazards (PH) model

The PH model is formulated in terms of the hazard structure

\[h(t; \beta, \theta, {\bf x}) = h_0\left(t ; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

Accelerated Hazards (AH) model

The AH model is formulated in terms of the hazard structure

\[h(t; \alpha, \theta, \tilde{\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) .\]

where $\tilde{\bf x}\in{\mathbb R}^q$ are the available covariates; $\alpha \in {\mathbb R}^q$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

Available baseline hazards

The current version of the simGH command implements the following parametric baseline hazards for the models discussed in the previous section.

Power Generalised Weibull (PGW) distribution.
Exponentiated Weibull (EW) distribution.
Generalised Gamma (GenGamma) distribuiton.
Gamma (Gamma) distribution.
Lognormal (LogNormal) distribution.
Log-logistic (LogLogistic) distribution.
Weibull (Weibull) distribution. (only for AFT, PH, and AH models)

Illustrative example: Julia code

In this example, we simulate $n=1,000$ times to event from the GH, PH, AFT, and AH models with PGW baseline hazards, using the Julia simGH command from the HazReg package. We censor these samples at a fixed value, and fit the corresponding models using the Julia package HazReg.

PGW-GH model

# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))
des_t = rand(dist, n)


#----------------------------
# PGW-GH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
alpha0 = 0.5
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = des_t,
      theta = theta0, alpha = alpha0, beta = beta0,
      hstr = "GH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWGH = GHMLE(init = fill(0.0, 3 + 1 + size(des)[2]), times = simdat,
            status = status, hstr = "GH", dist = "PGW",
            des = des, des_t = des_t, method = "NM", maxit = 1000)

MLEPGWGH = [exp(OPTPGWGH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWGH, OPTPGWGH[1].minimizer[4:end])

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,alpha0,beta0),MLEPGWGH);
COMP = DataFrame(COMP, :auto);

rename!( COMP, ["True", "MLE"] );
println(COMP)

6×2 DataFrame
 Row │ True     MLE        
     │ Float64  Float64    
─────┼─────────────────────
   1 │    0.1    0.0987151
   2 │    2.0    2.00311
   3 │    5.0    5.06478
   4 │    0.5    0.498934
   5 │   -0.5   -0.489382
   6 │    0.75   0.760255

PGW-PH model

# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-PH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = nothing,
      theta = theta0, alpha = nothing, beta = beta0,
      hstr = "PH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWPH = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "PH", dist = "PGW",
            des = des, des_t = nothing, method = "NM", maxit = 1000)

MLEPGWPH = [exp(OPTPGWPH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWPH, OPTPGWPH[1].minimizer[4:end])

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWPH);
COMP = DataFrame(COMP, :auto);

rename!( COMP, ["True", "MLE"] );
println(COMP)

5×2 DataFrame
 Row │ True     MLE        
     │ Float64  Float64    
─────┼─────────────────────
   1 │    0.1    0.0987943
   2 │    2.0    2.0111
   3 │    5.0    5.08918
   4 │   -0.5   -0.49086
   5 │    0.75   0.764558

PGW-AFT model

# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-AFT simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = nothing,
      theta = theta0, alpha = nothing, beta = beta0,
      hstr = "AFT", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWAFT = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "AFT", dist = "PGW",
            des = des, des_t = nothing, method = "NM", maxit = 1000)

MLEPGWAFT = [exp(OPTPGWAFT[1].minimizer[j]) for j in 1:3]
append!(MLEPGWAFT, OPTPGWAFT[1].minimizer[4:end])

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWAFT);
COMP = DataFrame(COMP, :auto);

rename!( COMP, ["True", "MLE"] );
println(COMP)

5×2 DataFrame
 Row │ True     MLE       
     │ Float64  Float64   
─────┼────────────────────
   1 │    0.1    0.100572
   2 │    2.0    1.97949
   3 │    5.0    4.96817
   4 │   -0.5   -0.482975
   5 │    0.75   0.764053

PGW-AH model

# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des_t = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-AH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
alpha0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = nothing, des_t = des_t,
      theta = theta0, alpha = alpha0, beta = nothing,
      hstr = "AH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWAH = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "AH", dist = "PGW",
            des = nothing, des_t = des_t, method = "NM", maxit = 1000)

MLEPGWAH = [exp(OPTPGWAH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWAH, OPTPGWAH[1].minimizer[4:end])

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWAH);
COMP = DataFrame(COMP, :auto);

rename!( COMP, ["True", "MLE"] );
println(COMP)

5×2 DataFrame
 Row │ True     MLE       
     │ Float64  Float64   
─────┼────────────────────
   1 │    0.1    0.103402
   2 │    2.0    1.94664
   3 │    5.0    4.85217
   4 │   -0.5   -0.484067
   5 │    0.75   0.732805

[2]: Y. Chen and N. Jewell. On a general class of semiparametric hazards regression models. Biometrika 88, 687–702 (2001).
[3]: F. Rubio, L. Remontet, N. Jewell and A. Belot. On a general structure for hazard-based regression models: an application to population-based cancer research. Statistical Methods in Medical Research 28, 2404–2417 (2019).
[4]: J. Kalbfleisch and R. Prentice. The statistical analysis of failure time data (John Wiley & Sons, 2011).
[5]: D. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
[6]: Y. Chen and M. Wang. Analysis of accelerated hazards models. Journal of the American Statistical Association 95, 608–618 (2000).