Julia simGH: simulating times to event from a general hazard structure

The simGH command.

The simGH command from the HazReg.jl Julia package allows one to simulate times to event from the following models:

  • General Hazard (GH) model [2] [3].

  • Accelerated Failure Time (AFT) model [4].

  • Proportional Hazards (PH) model [5].

  • Accelerated Hazards (AH) model [6].

A description of these hazard models is presented below as well as the available baseline hazards.

General Hazard model

The GH model is formulated in terms of the hazard structure

\[h(t; \alpha, \beta, \theta, {\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

where ${\bf x}\in{\mathbb R}^p$ are the covariates that affect the hazard level; $\tilde{\bf x} \in {\mathbb R}^q$ are the covariates the affect the time level (typically $\tilde{\bf x} \subset {\bf x}$); $\alpha \in {\mathbb R}^q$ and $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

This hazard structure leads to an identifiable model as long as the baseline hazard is not a hazard associated to a member of the Weibull family of distributions [2].

Accelerated Failure Time (AFT) model

The AFT model is formulated in terms of the hazard structure

\[h(t; \beta, \theta, {\bf x}) = h_0\left(t \exp\{{\bf x}^{\top}\beta\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

where ${\bf x}\in{\mathbb R}^p$ are the available covariates; $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

Proportional Hazards (PH) model

The PH model is formulated in terms of the hazard structure

\[h(t; \beta, \theta, {\bf x}) = h_0\left(t ; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

where ${\bf x}\in{\mathbb R}^p$ are the available covariates; $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

Accelerated Hazards (AH) model

The AH model is formulated in terms of the hazard structure

\[h(t; \alpha, \theta, \tilde{\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) .\]

where $\tilde{\bf x}\in{\mathbb R}^q$ are the available covariates; $\alpha \in {\mathbb R}^q$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

Available baseline hazards

The current version of the simGH command implements the following parametric baseline hazards for the models discussed in the previous section.

Illustrative example: Julia code

In this example, we simulate $n=1,000$ times to event from the GH, PH, AFT, and AH models with PGW baseline hazards, using the Julia simGH command from the HazReg package. We censor these samples at a fixed value, and fit the corresponding models using the Julia package HazReg.

See also:

PGW-GH model

# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))
des_t = rand(dist, n)


#----------------------------
# PGW-GH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
alpha0 = 0.5
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = des_t,
      theta = theta0, alpha = alpha0, beta = beta0,
      hstr = "GH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWGH = GHMLE(init = fill(0.0, 3 + 1 + size(des)[2]), times = simdat,
            status = status, hstr = "GH", dist = "PGW",
            des = des, des_t = des_t, method = "NM", maxit = 1000)

MLEPGWGH = [exp(OPTPGWGH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWGH, OPTPGWGH[1].minimizer[4:end])

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,alpha0,beta0),MLEPGWGH);
COMP = DataFrame(COMP, :auto);

rename!( COMP, ["True", "MLE"] );
println(COMP)
6×2 DataFrame
 Row  True     MLE         Float64  Float64    
─────┼─────────────────────
   1 │    0.1    0.0987151
   2 │    2.0    2.00311
   3 │    5.0    5.06478
   4 │    0.5    0.498934
   5 │   -0.5   -0.489382
   6 │    0.75   0.760255

PGW-PH model

# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-PH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = nothing,
      theta = theta0, alpha = nothing, beta = beta0,
      hstr = "PH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWPH = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "PH", dist = "PGW",
            des = des, des_t = nothing, method = "NM", maxit = 1000)

MLEPGWPH = [exp(OPTPGWPH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWPH, OPTPGWPH[1].minimizer[4:end])

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWPH);
COMP = DataFrame(COMP, :auto);

rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
 Row  True     MLE         Float64  Float64    
─────┼─────────────────────
   1 │    0.1    0.0987943
   2 │    2.0    2.0111
   3 │    5.0    5.08918
   4 │   -0.5   -0.49086
   5 │    0.75   0.764558

PGW-AFT model

# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-AFT simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = nothing,
      theta = theta0, alpha = nothing, beta = beta0,
      hstr = "AFT", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWAFT = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "AFT", dist = "PGW",
            des = des, des_t = nothing, method = "NM", maxit = 1000)

MLEPGWAFT = [exp(OPTPGWAFT[1].minimizer[j]) for j in 1:3]
append!(MLEPGWAFT, OPTPGWAFT[1].minimizer[4:end])

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWAFT);
COMP = DataFrame(COMP, :auto);

rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
 Row  True     MLE        Float64  Float64   
─────┼────────────────────
   1 │    0.1    0.100572
   2 │    2.0    1.97949
   3 │    5.0    4.96817
   4 │   -0.5   -0.482975
   5 │    0.75   0.764053

PGW-AH model

# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des_t = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-AH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
alpha0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = nothing, des_t = des_t,
      theta = theta0, alpha = alpha0, beta = nothing,
      hstr = "AH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWAH = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "AH", dist = "PGW",
            des = nothing, des_t = des_t, method = "NM", maxit = 1000)

MLEPGWAH = [exp(OPTPGWAH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWAH, OPTPGWAH[1].minimizer[4:end])

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWAH);
COMP = DataFrame(COMP, :auto);

rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
 Row  True     MLE        Float64  Float64   
─────┼────────────────────
   1 │    0.1    0.103402
   2 │    2.0    1.94664
   3 │    5.0    4.85217
   4 │   -0.5   -0.484067
   5 │    0.75   0.732805
[2]
Y. Chen and N. Jewell. On a general class of semiparametric hazards regression models. Biometrika 88, 687–702 (2001).
[3]
F. Rubio, L. Remontet, N. Jewell and A. Belot. On a general structure for hazard-based regression models: an application to population-based cancer research. Statistical Methods in Medical Research 28, 2404–2417 (2019).
[4]
J. Kalbfleisch and R. Prentice. The statistical analysis of failure time data (John Wiley & Sons, 2011).
[5]
D. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
[6]
Y. Chen and M. Wang. Analysis of accelerated hazards models. Journal of the American Statistical Association 95, 608–618 (2000).