Distributional Changes Between MIB and NAIC Data

Author

Philip Adams

1 Background and Summary of Findings

The ILEC dataset for experience years 2009-2019 has nearly 55.4 million rows. At 2017, the row count stood at nearly 5.3 million rows, and this ballooned to 8.1-8.2 million in 2018-2019, about 2 million of which is due to the addition of the MIB_Flag variable. The MIB_Flag variable indicates data which came from companies believed to have been common to the MIB data.

The ILEC and VBT working groups asked whether predictive analytics methods could be used to validate whether and how the two datasets might differ. This problem can be expressed as two questions:

  1. What, if any, differences are there between the MIB and NAIC datasets? What tools can we use to detect differences that go beyond natural patterns of drift?
  2. How, if at all, does mortality differ between the datasets? What tools can we use to detect such differences that go beyond the natural trends of mortality?

The second question is answered in another analysis. This document provides technical analysis for the first question.

To answer the first question, this analysis applies vine copula models, which are explained in more detail later in this document. This analysis views the exposure distributions as probability distributions, both by count and by amount. Vine copulas were chosen over other methods due to their explainability and computational tractability. Other AI methods would not have been explainable and may not have been computationally tractable. Traditional methods of manually exploring the data were deemed too laborious given the extremely large number of combinations to check.

This analysis considers copula models stratified by experience year, with models for each of experience years 2019, 2018, and the combination of 2016 and 2017. Further, within each year group, models are separately fit by count and amount. Each copula model produces a family of best-fit dependency graphs and their associated best-fit copulas. The analysis displays the results of these models and discusses salient points at each level. Also included is a brief discussion of changes in the univariate marginal distributions.

Stratifying the models into three experience year makes it challenging to tell whether and to what extent experience year interacts with other variables. It can be inferred by comparing the three models. However, we also fit a vine copula model against all of the data, where the experience years are grouped into “MIB” for experience years 2016-2017 and “NAIC” for 2018-2019”

To understand differences, we look for differences in the dependency graphs and associated copulas. The degree to which they differ across experience years, if at all, is used tto indicate the degree to which the underlying exposures differ across experience years.

To ensure that nothing is missed, we also fit a vine copula model by count where the source of the data (MIB or NAIC) is included as a variable.

1.1 Notable Differences between Experience Years

For the count-based exposure model, there are several differences:

  1. Among the marginal distributions, there are some slight shifts among all variables, although the most prominent shift is due to a marked increase juvenile risks mostly carrying unismoke risk classes, Perm insurance plans, and lower face amounts. There was also a marked increase in the “Other” insurance plan category.
  2. Dependency structures tended to be quite stable over time, with a notable exception. The copula models detected a correction which NAIC implemented for small face amount preferred class information. When drilling into the data, the shift in distributions in noticeable, but not obvious at higher levels.

For the amount-based exposure model, there are more notable differences in the dependency structure.

  1. Among the marginal distributions, the most notable shifts are the increased prevalence of Term and its associated variables, such as higher face amounts, age basis, and preferred class structures.
  2. The two way interactions are stable over the experience years.
  3. Higher order interactions are more prominent, with distributions by face amount shifting in certain places. The most frequent of these appear to relate with smoker status.

However, higher order interactions should be viewed with caution. Amount-based models tend to overfit when using the likelihood-based model fitting criteria that copulas use. The details of the discussion point out a case where the copula is fitting a nearly degenerate case where there are significant imbalances in the structure of the data.

After assessing the vine copula models, we can feel confident in the following statements:

  1. As measured by policies exposed, the dependencies among variables from year to year are quite similar, with the most prominent difference between the MIB and NAIC being the remediation of small face amount preferred information undertaken by NAIC.
  2. As measured by amount exposed, the dependencies among variables from year to year are close to those of the by-count analysis, except that the calibrated copulas are often of a different type and potentially even spurious due to the exaggeration of amount-based weights.
  3. The vine copula model which explicitly includes Source fails to detect a qualitatively meaningful interaction with Source and other variables. While this generally reinforces the previous points, this vine copula model only weakly detected the NAIC’s remediation efforts. This suggests that the simplifying assumption as described above may have difficulty holding for relationships in the data where such relationships are contained in relatively small subsets of the data. Stratifying on other predictors or including regression capabilities in the vine models might improve the detection power of the methods.

Overall, this analysis provides strong (though not absolute) evidence that the data received from NAIC is, at a minimum, as good as what we previously received from MIB.

Code
suppressPackageStartupMessages(
  {
    library(arrow)
    library(data.table)
    library(tidyverse)
    library(rvinecopulib)
    library(ggplot2)
    library(doParallel)
    library(flextable)
    library(patchwork)
  }
)

source("plot.ilec.cop.contour.R")
source("support_fns.R")
source("depgraphplot.R")

bUseCache <- TRUE

2 Understanding Probability Distributions with Copulas

2.1 Background

The problem of understanding the distribution of exposures and how they may change over time can be dealt with by viewing them like a multivariate probability distribution. The viewpoint opens the door to all of the modern approaches to modeling probability distributions. At the same time, we want to make sure that whatever approach we take can be explained to a wider audience.

Because of the need for explainability to a wide audience, we opted not to use unsupervised AI approaches. Vine copulas have the benefit of being explainable and build on the audience’s previous knowledge base.

Vine copulas start with copulas, which have been studied extensively. The study begins with Sklar’s Theorem, which principally states that every multivariate cumulative distribution function can be expressed as a function of its univariate marginal CDFs. That special function is called a copula, and it happens to be a probability distribution in its own right on the unit hypercube. The copula function is unique for continuous distributions, yet one should not that it is not guaranteed to be unique for discrete random variables.

As an equation, the distribution decomposes as follows:

\[ F_{X_1,...,X_n}\left (x_1,...,x_n \right ) = C \left ( F_{X_1} \left ( x_1 \right ),..., F_{X_n} \left (x_n \right ) \right ) \]

The following is the foregoing when everything has a density, where \(c\) is the copula density. The situation for discrete variables is analogous by taking finite differences instead of derivatives.

\[ f_{X_1,...,X_n}\left (x_1,...,x_n \right ) = c \left ( F_{X_1} \left ( x_1 \right ),..., F_{X_n} \left (x_n \right ) \right ) \times f_{X_1} \left ( x_1 \right ) \times ... \times f_{X_n} \left (x_n \right ) \]

This theorem has some nice consequences.

  • The dependencies between random variables can be analyzed separately from their marginal univariate distributions.
  • It is possible to mix and match copulas and marginals.
  • Any probability distribution can be decomposed into a messy, non-unique representation using only bivariate copulas.

There are many bivariate copulas, including the Gaussian, t, Gumbel, Clayton, Frank, BBx series (combinations of copulas), and Joe copulas. However, there are far fewer higher dimensional copulas, and only a limited subset can be composed into higher dimensional copulas directly.

The insight from vine copulas starts with that messy decomposition of a three-variable density:

\[ \begin{eqnarray*} f\left( x_1,x_2,x_3 \right) & = & c_{13;2}\left( F_{1|2}\left( x_1|x_2 \right), F_{3|2}\left( x_3|x_2 \right); x_2 \right) \\ & & \times c_{23}\left( F_2\left( x_2 \right), F_3\left( x_3 \right) \right) \times c_{12}\left( F_1\left( x_1 \right), F_2\left( x_1 \right) \right) \\ & & \times f_1\left( x_1 \right) \times f_2\left( x_2 \right) \times f_3\left( x_3 \right) \end{eqnarray*} \]

This is not unique, since we could have done the decomposition with the variables in a different order.

The complicating term here is the first copula density

\[ c_{13;2}\left( \cdot, \cdot;x_2 \right) \]

The copula density depends on conditioning variables which can make building up densities in this way difficult. Since it is an obstacle, the simplifying assumption of vine copulas is to drop the dependency. To deal with the uniqueness issue, connect random variables via a copula if they have the highest measure of dependency between them. This can be Kendall’s \(\tau\) or Spearman’s \(\rho\), among others.

Armed with this setup, we have the following algorithm to build up a vine copula representation of a dataset:

  1. For each variable, if necessary, replace the variable with its transform to the uniform distribution. See the probability integral transform for more info. This replaces the \(x_i\) with \(u_i\).
  2. Compute pairwise measures of association between these variables, often using Kendall’s \(\tau\).
  3. Using these measures, build a graph of dependencies between the variables meeting certain conditions, namely that the graph forms a tree (no loops).
  4. Each point represents a variable with associated data on the uniform scale, and each edge represents the dependency relationship.
  5. For each edge in the graph, find a bivariate copula, \(C_{ij}(u_i,u_j)\), which best fits the associated pair of data vectors.
  6. Applying the fitted copula to the associated data now creates a new vector of data on that edge. For example, \(u_{12}=C_{12}(u_1,u_2)\) is a new vector of data for the edge connecting \(u_1\) and \(u_2\).
  7. Each edge now has a vector of (uniformly distributed) data, and there is one less column than before.
  8. Apply the procedure on this derived data.
  9. Continue until there are no more pairs of data to model.

2.2 Interpreting Vine Copula Models

The output of vine copula models can be very challenging to interpret. Let’s start with the simplest possible case, which is a vine copula with three variables, labeled A, B, and C. There is only one vine copula for these three variables, and that is the canonical vine or C-vine, where the initial dependency graph is a straight line. The joining copulas at the first level are the unrotated BB7 copulas having Kendall’s \(\tau\) of 0.62, which indicates a strong upper tail dependency and weak to moderate lower tail dependency. The second level joining the first two copulas is Gaussian with correlation 0.4 and Kendall’s \(\tau\) of 0.26.

Code
cop.example <- vinecop_dist(
  pair_copulas = list(
    list(
      bicop_dist("bb7",180,c(1.5,3)),
      bicop_dist("bb7",180,c(1.5,3))
    ),
    list(bicop_dist("gaussian",0,.4))
  ),
  structure = cvine_structure(1:3)
)


cop.example$names <- c("A","B","C")

The dependency graphs can be plotted as follows:

Code
plot(cop.example,
     tree=1,
     edge_labels = "family_tau",
     var_names="use"
)

Code
cat('\n\n')
Code
cat('### Level 2\n\n')
Code
plot(cop.example,
     tree=2,
     edge_labels = "family_tau",
     var_names="use"
)

Code
cat('\n\n')

The second level can be confusing. The (one) copula here models the dependency between A and B, conditioned on C. Meaning that the dependency between A and B, adjusted for C, is modeled with a Gaussian copula. The variables names in common become the “conditioned on” variables, in this case C.

The copulas themselves can be plotted and are often more informative than the edges of the dependency graph. By default, rvinecopulib plots copulas with normal marginals. This is easier to interpret than using uniform marginals, since readers are more likely to be familiar with contour plots of bivariate distributions that tend to be Gaussian.

Code
contour(cop.example)

3 Fitting and Reviewing Copula Models

3.1 Data Gathering

Code
ia.band.cuts <- c(-1,17,seq(25,105,5))
dur.band.breaks <- c(0,1,2,3,seq(5,25,5),30,40,120)

ilec.dat <- arrow::open_dataset(
  sources="../Data/ilecdata_20240429"
)

factor.cols <- data.table(
  Factor=c("Sex","Smoker_Status",
                 "Insurance_Plan",
                 "Face_Amount_Band",
                 "Issue_Age_Band",
                 "Duration_Band",
                 "Age_Ind",
                 "SOA_Antp_Lvl_TP",
                 "SOA_Guar_Lvl_TP",
                 "SOA_Post_Lvl_Ind",
                 "Number_of_Pfd_Classes",
                 "Preferred_Class"),
  PrettyLabels=c(
    "Sex","Smoker Status",
                 "Insurance Plan",
                 "Face Amount Band",
                 "Issue Age Band",
                 "Duration Band",
                 "Age Ind",
                 "SOA Antp. Lvl Term Period",
                 "SOA Guar. Lvl Term Period",
                 "SOA Post Lvl Ind.",
                 "Number of Preferred Classes",
                 "Preferred Class"
  )
)

if(bUseCache & !file.exists('dat.v.fit.rds') & !file.exists('dat.v.fit.yr.rds') & !file.exists('dat.v.rds')) {

  dat.v <- ilec.dat %>%
    filter(Observation_Year >= 2016) %>%
    group_by(Observation_Year,
             Sex,
             Smoker_Status,
             Issue_Age,
             Duration,
             Insurance_Plan,
             Face_Amount_Band,
             Age_Ind,
             SOA_Antp_Lvl_TP,
             SOA_Guar_Lvl_TP,
             SOA_Post_Lvl_Ind,
             Number_of_Pfd_Classes,
             Preferred_Class,
             Issue_Year,
             MIB_Flag) %>%
    summarize(Policies_Exposed=sum(Policies_Exposed),
              Amount_Exposed=sum(Amount_Exposed),
              Death_Count=sum(Death_Count),
              Death_Claim_Amount=sum(Death_Claim_Amount)) %>%
    collect() %>%
    data.table()
  

  
  
  dat.v[,
        `:=`(Issue_Age_Band=cut(Issue_Age,
                            breaks=ia.band.cuts,
                            labels=paste(
                              ia.band.cuts[-length(ia.band.cuts)]+1,
                              ia.band.cuts[-1],
                              sep="-"
                            ),
                            ordered_result=T
                            ),
             Duration_Band=cut(Duration,
                                breaks=dur.band.breaks,
                                labels=paste(
                                  dur.band.breaks[-length(dur.band.breaks)]+1,
                                  dur.band.breaks[-1],
                                  sep="-"
                                ),
                                ordered_result=T
             )
             )]
  
  dat.v[,
        Number_of_Pfd_Classes:=as.character(Number_of_Pfd_Classes)]
  dat.v[is.na(Number_of_Pfd_Classes) | Number_of_Pfd_Classes == "NA",
        Number_of_Pfd_Classes:="U"]
  
  dat.v[,
        Preferred_Class:=as.character(Preferred_Class)]
  dat.v[is.na(Preferred_Class) | Preferred_Class == "NA",
        Preferred_Class:="U"]
  
  dat.v[Issue_Year < 1980,
        Smoker_Status:="U"]
  dat.v[Issue_Year < 1980,
        Number_of_Pfd_Classes:=1]
  dat.v[Issue_Year < 1980,
        Preferred_Class:=1]
  
  
  dat.v <- dat.v %>%
    group_by(Observation_Year,
             Sex,
             Smoker_Status,
             Issue_Age_Band,
             Duration_Band,
             Insurance_Plan,
             Face_Amount_Band,
             Age_Ind,
             SOA_Antp_Lvl_TP,
             SOA_Guar_Lvl_TP,
             SOA_Post_Lvl_Ind,
             Number_of_Pfd_Classes,
             Preferred_Class,
             MIB_Flag) %>%
    summarize(Policies_Exposed=sum(Policies_Exposed),
              Amount_Exposed=sum(Amount_Exposed),
              Death_Count=sum(Death_Count),
              Death_Claim_Amount=sum(Death_Claim_Amount)) %>%
    collect() %>%
    data.table()
  
  
  dat.v[,
        (c(factor.cols$Factor)):=lapply(.SD,ordered),
        .SDcols=c(factor.cols$Factor)]
  
  dat.v[,Batch:=1]
  dat.v[Observation_Year==2018,Batch:=2]
  dat.v[Observation_Year==2019,Batch:=3]
  
  dat.v.fit <- dat.v[,
               .(Policies_Exposed=sum(Policies_Exposed),
                 Amount_Exposed=sum(Amount_Exposed)),
               keyby=setdiff(names(dat.v),
                             c("Observation_Year",
                               "Policies_Exposed",
                               "Amount_Exposed",
                               "Death_Count",
                               "Death_Claim_Amount",
                               "MIB_Flag"))]
  
  dat.v.fit %>%
    mutate(
      Source=ordered(ifelse(Batch==1,"MIB","NAIC")),
      .before = Policies_Exposed
    ) %>%
    select(
      -Batch
    ) %>%
    group_by(
      across(where(is.factor))
    ) %>%
    summarize(
      across(
        Policies_Exposed:Amount_Exposed,
        sum
      )
    ) %>%
    data.table() ->
    dat.v.fit.yr

  
  saveRDS(dat.v,"dat.v.rds")
  saveRDS(dat.v.fit,"dat.v.fit.rds")
  saveRDS(dat.v.fit.yr,"dat.v.fit.yr.rds")

} else {
  dat.v <- readRDS("dat.v.rds")
  dat.v.fit <- readRDS("dat.v.fit.rds")
  dat.v.fit.yr <- readRDS("dat.v.fit.yr.rds")
}

dat.v.fit  %>%
  head(10) %>%
  flextable()

Sex

Smoker_Status

Issue_Age_Band

Duration_Band

Insurance_Plan

Face_Amount_Band

Age_Ind

SOA_Antp_Lvl_TP

SOA_Guar_Lvl_TP

SOA_Post_Lvl_Ind

Number_of_Pfd_Classes

Preferred_Class

Batch

Policies_Exposed

Amount_Exposed

F

NS

0-17

1-1

Other

01: 0 - 9,999

ALB

N/A (Not Term)

N/A (Not Term)

N/A

3

3

3

1.0109589

2,277.70969

F

NS

0-17

1-1

Other

01: 0 - 9,999

ALB

N/A (Not Term)

N/A (Not Term)

N/A

U

U

3

0.8547945

11.96712

F

NS

0-17

1-1

Other

01: 0 - 9,999

ANB

N/A (Not Term)

N/A (Not Term)

N/A

4

4

2

0.0000000

9,680.65723

F

NS

0-17

1-1

Other

01: 0 - 9,999

ANB

N/A (Not Term)

N/A (Not Term)

N/A

4

4

3

0.0000000

4,402.10133

F

NS

0-17

1-1

Other

01: 0 - 9,999

ANB

N/A (Not Term)

N/A (Not Term)

N/A

U

U

3

0.3342466

66.84931

F

NS

0-17

1-1

Other

02: 10,000 - 24,999

ANB

N/A (Not Term)

N/A (Not Term)

N/A

4

4

2

0.0000000

14,344.37280

F

NS

0-17

1-1

Other

02: 10,000 - 24,999

ANB

N/A (Not Term)

N/A (Not Term)

N/A

4

4

3

0.0000000

438.35617

F

NS

0-17

1-1

Other

03: 25,000 - 49,999

ANB

N/A (Not Term)

N/A (Not Term)

N/A

4

4

3

0.0000000

14,186.26074

F

NS

0-17

1-1

Perm

01: 0 - 9,999

ALB

N/A (Not Term)

N/A (Not Term)

N/A

3

3

3

0.0000000

46,862.67201

F

NS

0-17

1-1

Perm

01: 0 - 9,999

ALB

N/A (Not Term)

N/A (Not Term)

N/A

U

U

1

1.9945360

5,994.53595

First, we collect the data from the ILEC dataset. The version of the dataset is as of April 29, 2024.

The steps are as follows for the models stratified by experience year:

  1. Extract data for observation years 2016-2019, summarizing for deaths and exposures.
  2. Create issue age and duration bands, and summarize out individual issue ages and durations.
  3. Adjust the risk class information, setting null fields to N/A and setting any pre-1980 business to unismoke.
  4. The vine copula framework requires ordered factors for discrete variables, so we set the fields as ordered factors.
  5. Tag the 2016 and 2017 observation years as batch 1, year 2018 as batch 2, and year 2019 as batch 3.
  6. For the vine copula fitting, summarize out observation year, and summarize on exposure measures.

For the model which includes experience year, add the step of combining the batches into MIB and NAIC categories, and summarize out the batch variable.

3.2 Copula Fitting

Next is to fit the copulas. We do this for each batch by count and amount. Additionally, we ask that the vine fitting routine truncate copulas based on a specific statistical test. This will allow the trees to be smaller and retain ostensibly significant non-trivial copulas. Any copula beyond the truncation point is assumed to be the independence copula.

Note that these models take some time to fit. Each vine copula takes approximately 36 hours to fit on 13 cores. This is due to the iterative re-fitting due to the dynamic truncation testing.

Code
vinecops.ct <- list()
vinecops.amt <- list()
vinecops.ct.yr  <- list()

### Count - all data
if(bUseCache & 
   !file.exists("vinecops.ct.rds")
) {
  vinecops.ct <- foreach(i=1:3, .packages=c("rvinecopulib","data.table")) %do% {
    vine(
      data=dat.v.fit[Batch==i  & Policies_Exposed > 0,-c("Policies_Exposed","Amount_Exposed",
                                 "Batch","Death_Count","Death_Claim_Amount")],
      copula_controls = list(family_set="par", 
                             trunc_lvl=NA,
                             threshold=NA,
                             keep_data=TRUE),
      weights=dat.v.fit[Batch==i  & Policies_Exposed > 0,Policies_Exposed],
      cores=16
    )
  }
  
  saveRDS(vinecops.ct,"vinecops.ct.rds")
} else {
  vinecops.ct <- readRDS("vinecops.ct.rds")
}

### Amount - all data
if(bUseCache & 
   !file.exists("vinecops.amt.rds")
) {
  vinecops.amt <- foreach(i=1:3, .packages=c("rvinecopulib","data.table")) %do% {
    vine(
      data=dat.v.fit[Batch==i  & Amount_Exposed > 0,-c("Policies_Exposed","Amount_Exposed",
                                 "Batch","Death_Count","Death_Claim_Amount")],
      copula_controls = list(family_set="par", 
                             trunc_lvl=NA,
                             threshold=NA,
                             keep_data=TRUE),
      weights=dat.v.fit[Batch==i  & Amount_Exposed > 0,Amount_Exposed],
      cores=16
    )
  }
  
  saveRDS(vinecops.amt,"vinecops.amt.rds")
} else {
  vinecops.amt <- readRDS("vinecops.amt.rds")
}

### Count - all data + experience year
if(bUseCache & 
   !file.exists("vinecops.ct.yr.rds")
) {
  vinecops.ct.yr <- vine(
    data=dat.v.fit.yr[Policies_Exposed > 0,-c("Policies_Exposed","Amount_Exposed")],
    copula_controls = list(family_set="par", 
                           trunc_lvl=NA,
                           threshold=NA,
                           keep_data=F,
                           show_trace=T),
    weights=dat.v.fit.yr[Policies_Exposed > 0,Policies_Exposed],
    cores=16
  )
  
  saveRDS(vinecops.ct.yr,"vinecops.ct.yr.rds")
} else {
  vinecops.ct.yr <- readRDS("vinecops.ct.yr.rds")
}

3.3 Marginal Distributions by Count

3.3.1 Distribution Evolution by Experience Year (Category Dominant)

3.3.2 Distribution Evolution by Experience Year (Year Dominant)

3.3.3 Notable Patterns by Count

Notable changes from 2016-2017 to 2018 and 2019 in the marginal distributions include:

  1. Sex: Slight shift toward females (+0.5%)
  2. Smoker Status: Slight shift toward unismoke (+3-3.5%)
  3. Insurance Plan: 0.5-1% shifts within plans, with notable increase in “Other” category
  4. Face Amount Band: 1-3% increase in face amounts < 25,000, offset by decreases in face amounts 50,000+
  5. Issue Age: +1.5% for juvenile ages
  6. Duration: Decreases in durations 6-15 (-0.5%) and increases in durations 31-40 (+1.5-2%)
  7. Age Indicator: ALB increase of 2-2.5%
  8. SOA Anticipated Term Period: Unknown category increased 2.3%
  9. SOA Guaranteed Term Period: Unknown category increased 2.3%
  10. SOA Post Level Indicator: 1.8% increase in “Unknown Level Term”, 0.8-0.9% increase in PLT, fluctations in WLT
  11. Number of Preferred Classes: 1.1-1.8% increase in U category
  12. Preferred Class: 1.1-1.8% increase in U category, 1.5% decline in category 1

3.4 Marginal Distributions by Amount

3.4.1 Distribution Evolution by Experience Year (Category Dominant)

3.4.2 Distribution Evolution by Experience Year (Year Dominant)

3.4.3 Notable Patterns by Amount

Compared to the view by count, shifts are muted by amount. Notable changes from 2016-2017 to 2018 and 2019 in the marginal distributions include:

  1. Sex: Slight shift toward females (+0.5%)
  2. Smoker Status: Slight shift toward unismoke (+0.5)
  3. Insurance Plan: Small, steady shift to Term (+1.5%) and ULSG (+1.2%)
  4. Face Amount Band: Increase in average face amounts, with 1 million+ face amount bands increasing and the others declining
  5. Duration: Increases in durations 11+, with decreases in durations 1-10. Durations 6-10 had the largest decrease (-2.9%), and durations 16-20 had the largest increase (+3.1%)
  6. Age Indicator: +1.5% shift toward ANB
  7. SOA Anticipated Term Period: Unknown category increased 1.7%, 20 year term increased 0.8%
  8. SOA Guaranteed Term Period: Unknown category increased 1.7%, 20 year term increased 0.8%
  9. SOA Post Level Indicator: 1.1% increase in “Unknown Level Term”, 0.5%
  10. Number of Preferred Classes: 1.3% decline in U category, 0.9% increase in 4-class, 1.2% increase in 3-class, 0.5% decline in 2-class

3.5 Reviewing Dependency Structures

The algorithm for fitting a vine copula provides two objects to study: a collection of dependency graphs, and a collection of associated copulas for each edge of the dependency graphs.

In the fitting section, we generated six vine copula models, one each for the years 2016-2017, 2018, and 2019, and then weighted by count and by amount.

3.5.1 Count-Weighted Copulas

3.5.1.1 Level 1

Excepting the independence copula of Sex with whatever it happens to be attached to, the graphs are identical. The copulas on each edge of the graph are also very close, with Kendall’s \(\tau\) typically very close.

Studying these first dependence graphs reveal dominant two-way dependency structure, and we discuss the surprising connections below in combination with the associated copulas:

  1. Age_Ind x Preferred_Class: There is much more “unknown” preferred class in ALB than ANB, about 63-64% in ALB and 39-40% in ANB.
  2. SOA_Post_Level_Ind x Preferred_Class: This is a simplified dependency, where permanent plans (Perm, xL) favor ALB and Term favors ANB.
  3. Face_Amount_Band x SOA_Post_Level_Ind: This again appears to be a simplified dependency, where permanent and “other” plans (appearing in “NA”) favor lower face amounts and Term and xL favor higher face amounts.
  4. Issue_Age_Band x Smoker_Status: Most of unismoke (approximately 55%) come from juvenile policies, with some juvenile policies also labeled as smoker. Smokers also tend to favor younger issue ages.
  5. Insurance_Plan x Smoker_Status: Perm is evenly about half non-smoker, “other” is about 65% unismoke or unknown smoker, and the other plans are mostly non-smoker, ranging from 85-86% for UL/VL, 91% for Term, 95% for ULSG, and 92% for VLSG.
3.5.1.1.1 Dependency Graphs
Code
if(tree.level <= length(vinecops.ct[[1]]$copula$pair_copulas)) {
  p1 <- depgraphplot(vinecops.ct[[1]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p1 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p1 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[2]]$copula$pair_copulas)) {
  p2 <- depgraphplot(vinecops.ct[[2]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p2 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p2 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[3]]$copula$pair_copulas)) {
  p3 <- depgraphplot(vinecops.ct[[3]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p3 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p3 +
  theme(title=element_blank())

3.5.1.1.2 Copula Plots of Dependencies
3.5.1.1.2.1 Observation Years 2016-2017
3.5.1.1.2.2 Observation Years 2018
3.5.1.1.2.3 Observation Years 2019

3.5.1.2 Level 2

The next level of copulas, where associations are now implicitly conditioned on a third variable, are much simpler and in most cases use the independence copula or copulas which are very close to the independence copula. The graphs are very nearly identical (up to the random attachments for independence copulas), with two somewhat interesting non-independence copula.

  1. Duration_Band x Smoker_Status | Face_Amount_Band: The dependency of these two variables appears to further depend on face amount band. Higher face amount bands are associated with early durations, as noted in the two-way dependencies, yet the degree of association differs by smoker status. For example, and unsurprisingly, over half of the unismoke with face amount under 100,000 has duration 36+, whereas for face amounts 100,000+, less than 4% of exposures is duration 36+. Similarly, smokers have higher distributions at later durations than non-smokers for face amounts 100,000+. Moreover, this dependency weakened slightly moving from the MIB to NAIC dataset.
  2. SOA_Post_Level_Ind x SOA_Antp_Lvl_TP | SOA_Guar_Lvl_TP: This is picking up on the slight differences between anticipated and guaranteed level periods and whether they are in the PLT period or not. For example, for 10-year guaranteed term period, about 42% of the 10-year anticipated LT is PLT, and about 5% of the 20-year anticipated LT is PLT. For 15-year guaranteed LT, approximately 40% of the 15-year anticipated LT is PLT, while none of the 20-year anticipated LT is PLT.
3.5.1.2.1 Dependency Graphs
Code
if(tree.level <= length(vinecops.ct[[1]]$copula$pair_copulas)) {
  p1 <- depgraphplot(vinecops.ct[[1]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p1 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p1 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[2]]$copula$pair_copulas)) {
  p2 <- depgraphplot(vinecops.ct[[2]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p2 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p2 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[3]]$copula$pair_copulas)) {
  p3 <- depgraphplot(vinecops.ct[[3]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p3 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p3 +
  theme(title=element_blank())

3.5.1.2.2 Copula Plots of Dependencies
3.5.1.2.2.1 Observation Years 2016-2017
3.5.1.2.2.2 Observation Years 2018
3.5.1.2.2.3 Observation Years 2019

3.5.1.3 Level 3

At this level, there are shifts in dependency relationships.

  1. Preferred_Class x Smoker_Status | Face_Amount_Band, SOA_Post_Level_Ind: There is a modest dependency in the 2016-2017 data which vanishes in the 2018 and 2019 data. Inspecting the data showed marked differences in risk class distribution for face amounts under 50,000 when moving from the MIB to NAIC data. For face amounts 50,000+, there are smaller but notable distribution shifts among the underwriting classes for the xL plans.
  2. Insurance_Plan x SOA_Post_Level_Ind | Face_Amount_Band, Smoker_Status: Suddenly in 2019, a dependency emerges. For face amounts under 50,000, prevalence of ULT within Term shifts from approximately 45-46% to 71-72%. This occurs across all smoker statuses, although to differing degrees per status. This shift is not observed for face amounts 50,000 and higher.
3.5.1.3.1 Dependency Graphs
Code
if(tree.level <= length(vinecops.ct[[1]]$copula$pair_copulas)) {
  p1 <- depgraphplot(vinecops.ct[[1]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p1 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p1 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[2]]$copula$pair_copulas)) {
  p2 <- depgraphplot(vinecops.ct[[2]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p2 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p2 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[3]]$copula$pair_copulas)) {
  p3 <- depgraphplot(vinecops.ct[[3]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p3 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p3 +
  theme(title=element_blank())

3.5.1.3.2 Copula Plots of Dependencies
3.5.1.3.2.1 Observation Years 2016-2017
3.5.1.3.2.2 Observation Years 2018
3.5.1.3.2.3 Observation Years 2019

3.5.1.4 Level 4

At this level, the only non-trivial copula to emerge is for Preferred_Class x Smoker_Status | Sex, Face_Amount, SOA_Post_Level_Ind. The dependency relationship is like the analogous one at the previous level, but with Sex as an additional conditioning variable. The effect of Sex is to modulate the distributions by gender.

3.5.1.4.1 Dependency Graphs
Code
if(tree.level <= length(vinecops.ct[[1]]$copula$pair_copulas)) {
  p1 <- depgraphplot(vinecops.ct[[1]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p1 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p1 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[2]]$copula$pair_copulas)) {
  p2 <- depgraphplot(vinecops.ct[[2]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p2 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p2 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[3]]$copula$pair_copulas)) {
  p3 <- depgraphplot(vinecops.ct[[3]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p3 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p3 +
  theme(title=element_blank())

3.5.1.4.2 Copula Plots of Dependencies
3.5.1.4.2.1 Observation Years 2016-2017
3.5.1.4.2.2 Observation Years 2018
3.5.1.4.2.3 Observation Years 2019

3.5.2 Amount-Weighted Copulas

3.5.2.1 Level 1

While the first level of dependency graphs are reassuringly similar across observation years, they are not identical to the count analogue. This makes comparing dependency models difficult.

  1. Age_Ind x Face_Amount_Band: ANB tends to be associated with higher face amounts, and ANB with lower.
  2. Number_of_Pfd_Classes x Preferred_Class: This is picking up on the degeneracy of the U/U combination.
  3. Duration x Face_Amount_Band: Much higher durations are associated with the lowest face amounts.
  4. Sex x Face_Amount_Band: Males tend to be associated with higher face amounts, females with lower.
  5. SOA_Guar_Lvl_TP x SOA_Antp_Lvl_TP: Extremely tight dependencies between these variables
  6. Insurance_Plan x SOA_Post_Lvl_Ind: The latter is only associated with Term plans, and this is picking on that tight relationship.
  7. SOA_Antp_Lvl_TP x SOA_Post_Lvl_Ind: This copula reflects the same situation as in the previous pair.
  8. Issue_Age_Band x Smoker_Status: This copula is picking up on the tight relationship between juvenile risks and U smoker status.
  9. Preferred_Class x Face_Amount_Band: This copula is picking up on the tight connection between “U” preferred class and low face amounts.
  10. SOA_Post_Lvl_Ind x Smoker_Status: This reflects the tight connection between Perm and U smoker status.
  11. Smoker_Status x Face_Amount_Band: This copula models the close connection between U smoker status and low face amount bands.
3.5.2.1.1 Dependency Graphs
Code
if(tree.level <= length(vinecops.ct[[1]]$copula$pair_copulas)) {
  p1 <- depgraphplot(vinecops.amt[[1]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p1 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p1 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[2]]$copula$pair_copulas)) {
  p2 <- depgraphplot(vinecops.amt[[2]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p2 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p2 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[3]]$copula$pair_copulas)) {
  p3 <- depgraphplot(vinecops.amt[[3]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p3 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p3 +
  theme(title=element_blank())

3.5.2.1.2 Copula Plots of Dependencies
3.5.2.1.2.1 Observation Years 2016-2017
3.5.2.1.2.2 Observation Years 2018
3.5.2.1.2.3 Observation Years 2019

3.5.2.2 Level 2

The dependency graphs for 2018 and 2019 are very nearly identical, while they both differ from 2016-2017. The difference however is minor. The level 1 copula connecting SOA_Antp_Lvl_TP and SOA_Post_Lvl_Ind joins with the same for SOA_Post_Lvl_Ind and Smoker_Status in the 2016-2017 graph. The joining level 2 copula is very weak, with a Kendall’s \(\tau\) of -0.03. In 2018 and 2019, it joins to the level 1 copula for Insurance_Plan and SOA_Post_Level_Ind with a very weak Frank copula having Kendall’s \(\tau\) of -0.02.

At the level of three-variable interactions, we have more copulas of interest than in the by-count models. I point out the interesting interactions as follows:

  1. SOA_Post_Lvl_Ind x Face_Amount_Band | Smoker_Status: The Kendall’s \(\tau\) are close (0.147-0.148), but the copulas change slightly between 2016-2017 and 2018. The tail dependency of lower face amounts associated with perm plans depending on smoker status is common to both. The tail dependency of higher face amounts being associated with Term depending on smoker status weakens though.
  2. Sex x Smoker_Status | Face_Amount_Band: Typically, males tend to have higher percentage of the distribution than females. This is not always true. At lower face amounts, the gap closes with decreasing face amount, and inverts for lower face amounts for female non-smokers.
  3. Issue_Age_Band x Face_Amount_Band | Smoker_Status: As in the by-count copulas, this is picking up on the juveniles in the S and U smoker categories at low face amounts, along with the shift in issue age distribution for smokers.
3.5.2.2.1 Dependency Graphs
Code
if(tree.level <= length(vinecops.ct[[1]]$copula$pair_copulas)) {
  p1 <- depgraphplot(vinecops.amt[[1]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p1 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p1 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[2]]$copula$pair_copulas)) {
  p2 <- depgraphplot(vinecops.amt[[2]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p2 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p2 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[3]]$copula$pair_copulas)) {
  p3 <- depgraphplot(vinecops.amt[[3]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p3 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p3 +
  theme(title=element_blank())

3.5.2.2.2 Copula Plots of Dependencies
3.5.2.2.2.1 Observation Years 2016-2017
3.5.2.2.2.2 Observation Years 2018
3.5.2.2.2.3 Observation Years 2019

3.5.2.3 Level 3

At this level, most dependencies are independent or nearly so. Of the surviving relationships, Duration x Smoker_Status | Sex, Face_Amount_Band seems the most notable. In the 2016-2017 data, there is a clear interaction between duration, smoker status, sex, and face amount band. This appears to weaken in the 2018 and 2019 data, especially for smokers, where there is more early duration exposures at the lower face amounts for females.

3.5.2.3.1 Dependency Graphs
Code
if(tree.level <= length(vinecops.ct[[1]]$copula$pair_copulas)) {
  p1 <- depgraphplot(vinecops.amt[[1]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p1 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p1 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[2]]$copula$pair_copulas)) {
  p2 <- depgraphplot(vinecops.amt[[2]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p2 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p2 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[3]]$copula$pair_copulas)) {
  p3 <- depgraphplot(vinecops.amt[[3]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p3 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p3 +
  theme(title=element_blank())

3.5.2.3.2 Copula Plots of Dependencies
3.5.2.3.2.1 Observation Years 2016-2017
3.5.2.3.2.2 Observation Years 2018
3.5.2.3.2.3 Observation Years 2019

3.5.2.4 Level 4

At this level, only year 2019 data has any interesting dependencies. Since these are amount models, higher level dependencies should be viewed with caution. Amount weighting can skew the BIC-based model selection tests used to truncate the copula models. Since these connections don’t appear in the by-count models, these may be spurious. For example, the most prominent of these, SOA_Guar_Lvl_TP x Smoker_Status | Insurance_Plan, SOA_Post_Lvl_Ind, SOA_Antp_Lvl_TP is probably not valid. The level period and post-level period flags only apply to Term. Moreover, term lengths do not apply to ULT and NLT post level indicators. For any given guaranteed term length, there are only certain associated anticipated term lengths.

3.5.2.4.1 Dependency Graphs
Code
if(tree.level <= length(vinecops.ct[[1]]$copula$pair_copulas)) {
  p1 <- depgraphplot(vinecops.amt[[1]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p1 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p1 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[2]]$copula$pair_copulas)) {
  p2 <- depgraphplot(vinecops.amt[[2]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p2 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p2 +
  theme(title=element_blank())

Code
if(tree.level <= length(vinecops.ct[[3]]$copula$pair_copulas)) {
  p3 <- depgraphplot(vinecops.amt[[3]]$copula,
             var_names = "use",
             edge_labels = "family_tau",
             tree=tree.level)[[1]]
} else {
  p3 <- ggplot() + 
    annotate("text", x = 4, y = 25, size=8, label = "Independent") + 
    theme_void()
}

p3 +
  theme(title=element_blank())

3.5.2.4.2 Copula Plots of Dependencies
3.5.2.4.2.1 Observation Years 2016-2017
3.5.2.4.2.2 Observation Years 2018
3.5.2.4.2.3 Observation Years 2019

3.5.3 Count-Weighted Copula Model Including Source

One weakness of stratifying the analysis into three models, one for each observation year group, is that the dependency on experience year is not explicit in the models. To that end, we calibrate an additional vine copula model by count which includes a source variable, either “MIB” or “NAIC”.

Since we have already learned that the dependency structure by count from year to year is quite similar from year to year, excepting a known improvement relating to preferred information for smaller face amounts, we do not dive into the dependency structures. However, we can look for all fitted interactions and their conditioning variables by examining the table of copulas.

In the table below, we filter the copula list to only include those which mention Observation Year as a “conditioned” variable. All but three of the copulas are chosen to be the independence copula, implying no detectable interaction for the given pair of variables. For the three that are not deemed independent, the associate Kendall’s \(\tau\) is very close to 0, suggesting very weak dependence.

Code
summary(vinecops.ct.yr)$copula %>%
  data.table() %>%
  filter(conditioned %like% "13") %>%
  mutate(
    conditioned=sapply(conditioned,
                       \(x) {
                         
                         paste(vinecops.ct.yr$names[unlist(x)],collapse=", ")
                         
                       }),
    conditioning=sapply(conditioning,
                        \(x) {
                          paste(vinecops.ct.yr$names[unlist(x)],collapse=", ")
                        })
  ) %>%
  select(
    tree,
    conditioned,
    conditioning,
    family,
    rotation,
    tau
  ) %>%
  flextable() %>%
  colformat_double(
    j="tau",
    digits=3
  ) %>%
  valign(valign="top") %>%
  set_table_properties(opts_html=list(
        scroll=list(
          add_css="max-height: 500px;"
          )
        )
        )

tree

conditioned

conditioning

family

rotation

tau

1

Source, Sex

indep

0

0.000

2

Source, Number_of_Pfd_Classes

Sex

indep

0

0.000

3

Source, Preferred_Class

Number_of_Pfd_Classes, Sex

frank

0

0.014

4

Source, SOA_Post_Lvl_Ind

Preferred_Class, Number_of_Pfd_Classes, Sex

indep

0

0.000

5

Source, SOA_Guar_Lvl_TP

SOA_Post_Lvl_Ind, Preferred_Class, Number_of_Pfd_Classes, Sex

bb8

0

0.021

6

Source, SOA_Antp_Lvl_TP

SOA_Guar_Lvl_TP, SOA_Post_Lvl_Ind, Preferred_Class, Number_of_Pfd_Classes, Sex

indep

0

0.000

7

Source, Face_Amount_Band

SOA_Antp_Lvl_TP, SOA_Guar_Lvl_TP, SOA_Post_Lvl_Ind, Preferred_Class, Number_of_Pfd_Classes, Sex

indep

0

0.000

8

Source, Age_Ind

Face_Amount_Band, SOA_Antp_Lvl_TP, SOA_Guar_Lvl_TP, SOA_Post_Lvl_Ind, Preferred_Class, Number_of_Pfd_Classes, Sex

indep

0

0.000

9

Source, Duration_Band

Age_Ind, Face_Amount_Band, SOA_Antp_Lvl_TP, SOA_Guar_Lvl_TP, SOA_Post_Lvl_Ind, Preferred_Class, Number_of_Pfd_Classes, Sex

joe

0

0.006

10

Source, Smoker_Status

Duration_Band, Age_Ind, Face_Amount_Band, SOA_Antp_Lvl_TP, SOA_Guar_Lvl_TP, SOA_Post_Lvl_Ind, Preferred_Class, Number_of_Pfd_Classes, Sex

indep

0

0.000

11

Source, Issue_Age_Band

Smoker_Status, Duration_Band, Age_Ind, Face_Amount_Band, SOA_Antp_Lvl_TP, SOA_Guar_Lvl_TP, SOA_Post_Lvl_Ind, Preferred_Class, Number_of_Pfd_Classes, Sex

indep

0

0.000

12

Source, Insurance_Plan

Issue_Age_Band, Smoker_Status, Duration_Band, Age_Ind, Face_Amount_Band, SOA_Antp_Lvl_TP, SOA_Guar_Lvl_TP, SOA_Post_Lvl_Ind, Preferred_Class, Number_of_Pfd_Classes, Sex

indep

0

0.000

4 Conclusion

After assessing the vine copula models, we can feel confident in the following statements:

  1. As measured by policies exposed, the dependencies among variables from year to year are quite similar, with the most prominent difference between the MIB and NAIC being the remediation of small face amount preferred information undertaken by NAIC.
  2. As measured by amount exposed, the dependencies among variables from year to year are close to those of the by-count analysis, except that the calibrated copulas are often of a different type and potentially even spurious due to the exaggeration of amount-based weights.
  3. The vine copula model which explicitly includes Source fails to detect a qualitatively meaningful interaction with Source and other variables. While this generally reinforces the previous points, this vine copula model only weakly detected the NAIC’s remediation efforts. This suggests that the simplifying assumption as described above may have difficulty holding for relationships in the data where such relationships are contained in relatively small subsets of the data. Stratifying on other predictors or including regression capabilities in the vine models might improve the detection power of the methods.

Overall, this analysis provides strong (though not absolute) evidence that the data received from NAIC is, at a minimum, as good as what we previously received from MIB.