Contents
Exam Overview
The Advanced Statistics for Actuarial Modeling (ASTAM) exam tests candidates on advanced statistical techniques used in actuarial work, with a focus on model selection, validation, and advanced regression techniques. The exam is 3 hours and 15 minutes long with a mix of multiple-choice and written-answer questions.
Linear Models and Regression Analysis
Multiple Linear Regression
The fundamental equation:
y = Xβ + ε
Where:
- y is the n×1 vector of responses
- X is the n×p design matrix
- β is the p×1 vector of parameters
- ε is the n×1 vector of errors
Parameter estimation (OLS):
β̂ = (X’X)^(-1)X’y
Variance of parameter estimates:
Var(β̂) = σ²(X’X)^(-1)
Residual standard error:
σ̂² = RSS/(n-p)
Where RSS = Σ(yi – ŷi)²
Model Evaluation Metrics
R-squared:
R² = 1 – RSS/TSS
Where TSS = Σ(yi – ȳ)²
Adjusted R-squared:
R²_adj = 1 – (RSS/(n-p))/(TSS/(n-1))
Akaike Information Criterion (AIC):
AIC = -2ln(L) + 2p
Bayesian Information Criterion (BIC):
BIC = -2ln(L) + p×ln(n)
Generalized Linear Models (GLMs)
Model Components
- Random Component: Y ~ Distribution from exponential family
- Systematic Component: η = Xβ
- Link Function: g(μ) = η
Common Link Functions
Logistic Regression:
g(μ) = ln(μ/(1-μ))
Poisson Regression:
g(μ) = ln(μ)
Gamma Regression:
g(μ) = 1/μ or ln(μ)
Deviance
Deviance = 2[l(y;y) – l(μ̂;y)]
Scaled Deviance:
D* = D/φ
Parameter Estimation
Maximum Likelihood Estimation through Iteratively Reweighted Least Squares (IRLS):
β̂_(t+1) = β̂_t + (X’W_tX)^(-1)X’W_tz_t
Where:
- W_t is the weight matrix
- z_t is the working response
Time Series Analysis
Stationarity Tests
Augmented Dickey-Fuller test statistic:
ΔYt = α + βt + γYt-1 + δ1ΔYt-1 + … + δp-1ΔYt-p+1 + εt
ARIMA Models
ARIMA(p,d,q) model:
φ(B)(1-B)^d Yt = θ(B)εt
Where:
- φ(B) is the AR polynomial
- θ(B) is the MA polynomial
- B is the backshift operator
Forecasting
One-step-ahead forecast:
Ŷt(1) = E(Yt+1|Yt,Yt-1,…)
h-step-ahead forecast:
Ŷt(h) = E(Yt+h|Yt,Yt-1,…)
Advanced Regression Techniques
Principal Component Analysis (PCA)
Eigenvalue decomposition:
Σ = PΛP’
Where:
- Σ is the covariance matrix
- Λ is diagonal matrix of eigenvalues
- P is matrix of eigenvectors
Principal components:
Z = XP
Ridge Regression
Parameter estimation:
β̂_ridge = (X’X + λI)^(-1)X’y
Where λ is the regularization parameter
Lasso Regression
Objective function:
min_β {RSS + λΣ|βj|}
Elastic Net
Objective function:
min_β {RSS + λ[(1-α)Σβj² + αΣ|βj|]}
Where:
- α controls the mix of ridge and lasso penalties
- λ controls overall regularization strength
Model Validation Techniques
Cross-Validation
K-fold CV error:
CV_k = (1/k)Σ_i=1^k MSE_i
Leave-one-out CV error:
LOOCV = (1/n)Σ_i=1^n (yi – ŷ_i^(-i))²
Bootstrap Methods
Bootstrap estimate:
θ̂_boot = (1/B)Σ_b=1^B θ̂*_b
Bootstrap standard error:
SE_boot = √[(1/(B-1))Σ_b=1^B (θ̂*_b – θ̂_boot)²]
Advanced Statistical Concepts
Mixed Effects Models
Linear Mixed Model:
y = Xβ + Zu + ε
Where:
- β are fixed effects
- u are random effects
- Z is the random effects design matrix
Survival Analysis
Hazard function:
h(t) = f(t)/S(t)
Survival function:
S(t) = exp(-∫_0^t h(u)du)
Cox Proportional Hazards:
h(t|X) = h₀(t)exp(Xβ)
Model Selection Techniques
Stepwise Selection
Forward Selection:
Add variables based on F-statistic or p-value
Backward Elimination:
Remove variables based on F-statistic or p-value
Information Criteria Comparison
Choose model that minimizes:
- AIC = -2ln(L) + 2p
- BIC = -2ln(L) + pln(n)
- HQIC = -2ln(L) + 2pln(ln(n))
Study Strategies
- Understanding Theoretical Foundations
- Focus on assumptions behind each model
- Know when each model is appropriate
- Understand relationships between different techniques
- Practical Application
- Practice interpreting model outputs
- Learn to identify violation of assumptions
- Develop intuition for model selection
- Common Pitfalls to Avoid
- Overlooking multicollinearity
- Ignoring model assumptions
- Misinterpreting significance tests
Essential R Functions
While you won’t be coding in the exam, understanding these R functions helps grasp the concepts:
# Linear Models
lm(y ~ x1 + x2)
summary(model)
# GLMs
glm(y ~ x, family=binomial)
glm(y ~ x, family=poisson)
# Time Series
arima(ts_data, order=c(p,d,q))
forecast(model, h=10)
# Advanced Regression
prcomp(X, scale=TRUE) # PCA
glmnet(X, y, alpha=0) # Ridge
glmnet(X, y, alpha=1) # Lasso
Exam Tips
- Time Management
- Read questions carefully
- Prioritize questions you’re confident about
- Leave time for checking work
- Calculation Strategy
- Write out formulas before plugging in numbers
- Show intermediate steps
- Check units and scaling
- Conceptual Understanding
- Explain why you chose specific methods
- Consider practical implications
- Reference assumptions when relevant