Julia simGH: simulating times to event from a general hazard structure
The simGH
command.
The simGH command from the HazReg.jl
Julia package allows one to simulate times to event from the following models:
Accelerated Failure Time (AFT) model [4].
Proportional Hazards (PH) model [5].
Accelerated Hazards (AH) model [6].
A description of these hazard models is presented below as well as the available baseline hazards.
General Hazard model
The GH model is formulated in terms of the hazard structure
\[h(t; \alpha, \beta, \theta, {\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]
where ${\bf x}\in{\mathbb R}^p$ are the covariates that affect the hazard level; $\tilde{\bf x} \in {\mathbb R}^q$ are the covariates the affect the time level (typically $\tilde{\bf x} \subset {\bf x}$); $\alpha \in {\mathbb R}^q$ and $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.
This hazard structure leads to an identifiable model as long as the baseline hazard is not a hazard associated to a member of the Weibull family of distributions [2].
Accelerated Failure Time (AFT) model
The AFT model is formulated in terms of the hazard structure
\[h(t; \beta, \theta, {\bf x}) = h_0\left(t \exp\{{\bf x}^{\top}\beta\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]
where ${\bf x}\in{\mathbb R}^p$ are the available covariates; $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.
Proportional Hazards (PH) model
The PH model is formulated in terms of the hazard structure
\[h(t; \beta, \theta, {\bf x}) = h_0\left(t ; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]
where ${\bf x}\in{\mathbb R}^p$ are the available covariates; $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.
Accelerated Hazards (AH) model
The AH model is formulated in terms of the hazard structure
\[h(t; \alpha, \theta, \tilde{\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) .\]
where $\tilde{\bf x}\in{\mathbb R}^q$ are the available covariates; $\alpha \in {\mathbb R}^q$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.
Available baseline hazards
The current version of the simGH
command implements the following parametric baseline hazards for the models discussed in the previous section.
Power Generalised Weibull (PGW) distribution.
Exponentiated Weibull (EW) distribution.
Generalised Gamma (GenGamma) distribuiton.
Gamma (Gamma) distribution.
Lognormal (LogNormal) distribution.
Log-logistic (LogLogistic) distribution.
Weibull (Weibull) distribution. (only for AFT, PH, and AH models)
Illustrative example: Julia code
In this example, we simulate $n=1,000$ times to event from the GH, PH, AFT, and AH models with PGW baseline hazards, using the Julia simGH
command from the HazReg
package. We censor these samples at a fixed value, and fit the corresponding models using the Julia package HazReg
.
See also:
PGW-GH model
# Required packages
using HazReg
using Distributions
using Random
using DataFrames
# Sample size
n = 10000
# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))
des_t = rand(dist, n)
#----------------------------
# PGW-GH simulation
#----------------------------
# True parameters
theta0 = [0.1,2.0,5.0]
alpha0 = 0.5
beta0 = [-0.5,0.75]
# censoring
cens = 10
# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = des_t,
theta = theta0, alpha = alpha0, beta = beta0,
hstr = "GH", dist = "PGW")
# status variable
status = collect(Bool,(simdat .< cens))
# Inducing censoring
simdat = min.(simdat, cens)
# Model fit
OPTPGWGH = GHMLE(init = fill(0.0, 3 + 1 + size(des)[2]), times = simdat,
status = status, hstr = "GH", dist = "PGW",
des = des, des_t = des_t, method = "NM", maxit = 1000)
MLEPGWGH = [exp(OPTPGWGH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWGH, OPTPGWGH[1].minimizer[4:end])
# True parameter values vs. MLE
COMP = hcat(vcat(theta0,alpha0,beta0),MLEPGWGH);
COMP = DataFrame(COMP, :auto);
rename!( COMP, ["True", "MLE"] );
println(COMP)
6×2 DataFrame
Row │ True MLE
│ Float64 Float64
─────┼─────────────────────
1 │ 0.1 0.0987151
2 │ 2.0 2.00311
3 │ 5.0 5.06478
4 │ 0.5 0.498934
5 │ -0.5 -0.489382
6 │ 0.75 0.760255
PGW-PH model
# Required packages
using HazReg
using Distributions
using Random
using DataFrames
# Sample size
n = 10000
# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))
#----------------------------
# PGW-PH simulation
#----------------------------
# True parameters
theta0 = [0.1,2.0,5.0]
beta0 = [-0.5,0.75]
# censoring
cens = 10
# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = nothing,
theta = theta0, alpha = nothing, beta = beta0,
hstr = "PH", dist = "PGW")
# status variable
status = collect(Bool,(simdat .< cens))
# Inducing censoring
simdat = min.(simdat, cens)
# Model fit
OPTPGWPH = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
status = status, hstr = "PH", dist = "PGW",
des = des, des_t = nothing, method = "NM", maxit = 1000)
MLEPGWPH = [exp(OPTPGWPH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWPH, OPTPGWPH[1].minimizer[4:end])
# True parameter values vs. MLE
COMP = hcat(vcat(theta0,beta0),MLEPGWPH);
COMP = DataFrame(COMP, :auto);
rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
Row │ True MLE
│ Float64 Float64
─────┼─────────────────────
1 │ 0.1 0.0987943
2 │ 2.0 2.0111
3 │ 5.0 5.08918
4 │ -0.5 -0.49086
5 │ 0.75 0.764558
PGW-AFT model
# Required packages
using HazReg
using Distributions
using Random
using DataFrames
# Sample size
n = 10000
# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))
#----------------------------
# PGW-AFT simulation
#----------------------------
# True parameters
theta0 = [0.1,2.0,5.0]
beta0 = [-0.5,0.75]
# censoring
cens = 10
# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = nothing,
theta = theta0, alpha = nothing, beta = beta0,
hstr = "AFT", dist = "PGW")
# status variable
status = collect(Bool,(simdat .< cens))
# Inducing censoring
simdat = min.(simdat, cens)
# Model fit
OPTPGWAFT = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
status = status, hstr = "AFT", dist = "PGW",
des = des, des_t = nothing, method = "NM", maxit = 1000)
MLEPGWAFT = [exp(OPTPGWAFT[1].minimizer[j]) for j in 1:3]
append!(MLEPGWAFT, OPTPGWAFT[1].minimizer[4:end])
# True parameter values vs. MLE
COMP = hcat(vcat(theta0,beta0),MLEPGWAFT);
COMP = DataFrame(COMP, :auto);
rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
Row │ True MLE
│ Float64 Float64
─────┼────────────────────
1 │ 0.1 0.100572
2 │ 2.0 1.97949
3 │ 5.0 4.96817
4 │ -0.5 -0.482975
5 │ 0.75 0.764053
PGW-AH model
# Required packages
using HazReg
using Distributions
using Random
using DataFrames
# Sample size
n = 10000
# Simulated design matrices
Random.seed!(123)
dist = Normal()
des_t = hcat(rand(dist, n), rand(dist, n))
#----------------------------
# PGW-AH simulation
#----------------------------
# True parameters
theta0 = [0.1,2.0,5.0]
alpha0 = [-0.5,0.75]
# censoring
cens = 10
# Data simulation
simdat = simGH(seed = 1234, n = n, des = nothing, des_t = des_t,
theta = theta0, alpha = alpha0, beta = nothing,
hstr = "AH", dist = "PGW")
# status variable
status = collect(Bool,(simdat .< cens))
# Inducing censoring
simdat = min.(simdat, cens)
# Model fit
OPTPGWAH = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
status = status, hstr = "AH", dist = "PGW",
des = nothing, des_t = des_t, method = "NM", maxit = 1000)
MLEPGWAH = [exp(OPTPGWAH[1].minimizer[j]) for j in 1:3]
append!(MLEPGWAH, OPTPGWAH[1].minimizer[4:end])
# True parameter values vs. MLE
COMP = hcat(vcat(theta0,beta0),MLEPGWAH);
COMP = DataFrame(COMP, :auto);
rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
Row │ True MLE
│ Float64 Float64
─────┼────────────────────
1 │ 0.1 0.103402
2 │ 2.0 1.94664
3 │ 5.0 4.85217
4 │ -0.5 -0.484067
5 │ 0.75 0.732805
- [2]
- Y. Chen and N. Jewell. On a general class of semiparametric hazards regression models. Biometrika 88, 687–702 (2001).
- [3]
- F. Rubio, L. Remontet, N. Jewell and A. Belot. On a general structure for hazard-based regression models: an application to population-based cancer research. Statistical Methods in Medical Research 28, 2404–2417 (2019).
- [4]
- J. Kalbfleisch and R. Prentice. The statistical analysis of failure time data (John Wiley & Sons, 2011).
- [5]
- D. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
- [6]
- Y. Chen and M. Wang. Analysis of accelerated hazards models. Journal of the American Statistical Association 95, 608–618 (2000).