stanbkt.models.MultiBKT#

class stanbkt.models.MultiBKT(fit_method=FitMethod.MCMC, verbose=VerbosityLevel.INFO, stan_compile_kwargs=None, cpp_compile_kwargs=None)#

Bases: BKTModelBase

Grouped Bayesian Knowledge Tracing model.

Extends the standard BKT model to allow group-specific parameters. Each student is assigned to a group via a group_id column in the data, and each group receives its own BKT parameters (pi_know, learn, forget, guess, slip).

The same Stan model (BKT_model.stan) is reused. StandardBKT collapses it to a single group; MultiBKT lets it run with the full group structure present in the data.

Parameters:
  • fit_method (FitMethod) – The method to use for fitting the Stan model.

  • verbose (VerbosityLevel) – Verbosity level for logging.

  • stan_compile_kwargs (dict | None) – Additional Stan compilation options.

  • cpp_compile_kwargs (dict | None) – Additional C++ compilation options.

check_data_contains_fitted_kcs(kcs)#

Check if data contains any KC that was fitted. Raises an error if data contains KCs that were not fitted.

Return type:

None

Parameters:

kcs (set[str])

evaluate(**kwargs)#

Evaluate model performance (not yet implemented).

Returns:

Evaluation results (implementation pending).

Return type:

dict[str, Any]

Raises:

NotImplementedError – This method is not yet implemented.

fit(data, priors=None, column_mapping=None, stan_fit_options=None, overwrite_kcs=False)#

Fit the BKT model to data. Each KC is fitted independently with its own model. Additional KCs can be fitted by calling fit again with new data.

Parameters:
  • data (DataFrame) – DataFrame containing the training data. Must include columns for: Student ID, Problem ID, and Correctness (0/1). If the KC column is absent, all interactions are assumed to belong to a single knowledge component.

  • priors (Union[dict[str, PriorsBase], PriorsBase, None]) – Prior specifications for the model parameters. Can be provided as: - A single BayesianPriors object applied to all KCs. - A dictionary mapping KC IDs to their specific BayesianPriors. If None, default priors will be used for all KCs.

  • column_mapping (Union[Mapping[ColumnNames, str], Mapping[str, str], Mapping[ColumnNames | str, str], None]) – Mapping of expected column names. Keys should be ‘student_id’, ‘problem_id’, ‘correct’, and ‘kc_id’. If None, default column names are used.

  • stan_fit_options (Union[MCMCFitOptions, VBFitOptions, MLEFitOptions, PFFitOptions, dict[str, Any], None]) – Additional keyword arguments to pass to the Stan fitting method. If a dict is passed, it will be forwarded as-is to the CmdStanPy fit method. It is recommended to use the typed StanFitOptions for better type checking and validation. The accepted options depend on the chosen fit method. For example: - MCMC parameters (e.g., iter_sampling, chains, seed) - VB parameters (e.g., iter, tol_rel_obj) If None, default fitting options for the chosen fit method will be used.

  • overwrite_kcs (bool) – Whether to overwrite existing fits for KCs that are already fitted. If False, an error will be raised if attempting to fit a KC that already has a fit. If True, existing fits for the same KCs will be overwritten with the new fits.

Returns:

The fitted BKT model instance.

Return type:

BKTModelBase

Raises:

ValueError – If data validation fails or incompatible cpp_compile_kwargs and stan_fit_options.

get_kcs_in_fitted_kcs(kcs)#

Return the set of KCs in the data that were fitted previously.

Return type:

set[str]

Parameters:

kcs (set[str])

log(msg, level=VerbosityLevel.INFO)#

Log a message if verbosity level permits.

Parameters:
  • msg (str) – Message to log.

  • level (VerbosityLevel) – Verbosity level of this message. Message is printed if self.verbose >= level. Lower enum values = higher verbosity.

predict(data=None, column_mapping=None, point_estimate='mean', parallel=True, fast_math=True)#

Predict hidden states using point-estimate parameters from fitted posteriors.

Return type:

DataFrame

Parameters:
predict_posterior_draws(data, column_mapping=None, stan_output=None, backend='stan')#

Return draw-level posterior prediction DataFrames.

Parameters:
  • data (DataFrame) – Student interaction data used to remap Stan indices to original IDs.

  • column_mapping (Union[Mapping[ColumnNames, str], Mapping[str, str], Mapping[ColumnNames | str, str], None]) – Column name mapping. Defaults to the standard ColumnNames defaults.

  • stan_output (Optional[dict[str, CmdStanGQ]]) – Pre-computed output from predict_posterior_stan. When provided the Stan generated-quantities step is skipped.

  • backend (Literal['stan', 'numba']) – Backend used to produce posterior draws. - stan: use Stan generated quantities output (current behavior). - numba: run deterministic hidden-state recursion for each posterior parameter draw.

Returns:

Mapping from KC ID to draw-level DataFrames. Pass to stanbkt.utils.posterior_summary to obtain summary statistics.

Return type:

dict[str, DataFrame]

predict_posterior_stan(data, column_mapping=None)#

Run Stan generated quantities for posterior state prediction.

Parameters:
Returns:

Mapping from KC ID to raw CmdStanGQ fit objects. Pass the result to predict_posterior_draws to obtain draw-level DataFrames.

Return type:

dict[str, CmdStanGQ]

predict_posterior_summary(data, column_mapping=None, quantiles=[0.025, 0.975], stan_output=None, n_cores=1, backend='stan')#

Return per-observation posterior summaries without materializing all draws.

Parameters:
  • data (DataFrame) – Student interaction data used to remap Stan indices to original IDs.

  • column_mapping (Union[Mapping[ColumnNames, str], Mapping[str, str], Mapping[ColumnNames | str, str], None]) – Column name mapping. Defaults to the standard ColumnNames defaults.

  • quantiles (list[float]) – Quantiles to include in the returned posterior summary.

  • stan_output (Optional[dict[str, CmdStanGQ]]) – Pre-computed output from predict_posterior_stan. When provided, the Stan generated-quantities step is skipped.

  • n_cores (int) – Number of concurrent KC jobs to run. Use -1 to use all available CPU cores.

  • backend (Literal['stan', 'numba']) – Backend used to produce posterior summaries. - stan: summarize Stan generated quantities output. - numba: generate draw-level predictions via numba and summarize them.

Returns:

Per-observation posterior summary statistics for the overlapping fitted KCs.

Return type:

DataFrame

Warning

Setting n_cores greater than 1, or -1, can substantially increase peak memory usage and will most likely cause out-of-memory failures when the dataset is large.

predict_smoothed(data=None, column_mapping=None, point_estimate='mean', parallel=True, fast_math=True)#

Predict smoothed hidden states using point-estimate parameters.

Return type:

DataFrame

Parameters:
predict_smoothed_posterior_draws(data, column_mapping=None, stan_output=None, backend='stan')#

Return draw-level smoothed posterior prediction DataFrames.

Parameters:
  • data (DataFrame) – Student interaction data used to remap Stan indices to original IDs.

  • column_mapping (Union[Mapping[ColumnNames, str], Mapping[str, str], Mapping[ColumnNames | str, str], None]) – Column name mapping. Defaults to the standard ColumnNames defaults.

  • stan_output (Optional[dict[str, CmdStanGQ]]) – Pre-computed output from predict_smoothed_posterior_stan. When provided the Stan generated-quantities step is skipped.

  • backend (Literal['stan', 'numba']) – Backend used to produce posterior draws. - stan: use Stan generated quantities output (current behavior). - numba: run deterministic hidden-state recursion for each posterior parameter draw.

Returns:

Mapping from KC ID to draw-level DataFrames. Pass to stanbkt.utils.posterior_summary to obtain summary statistics.

Return type:

dict[str, DataFrame]

predict_smoothed_posterior_stan(data, column_mapping=None)#

Run Stan generated quantities for smoothed posterior state prediction.

Parameters:
Returns:

Mapping from KC ID to raw CmdStanGQ fit objects. Pass the result to predict_smoothed_posterior_draws to obtain draw-level DataFrames.

Return type:

dict[str, CmdStanGQ]

predict_smoothed_posterior_summary(data, column_mapping=None, quantiles=[0.025, 0.975], stan_output=None, n_cores=1, backend='stan')#

Return per-observation smoothed posterior summaries without materializing all draws.

Parameters:
  • data (DataFrame) – Student interaction data used to remap Stan indices to original IDs.

  • column_mapping (Union[Mapping[ColumnNames, str], Mapping[str, str], Mapping[ColumnNames | str, str], None]) – Column name mapping. Defaults to the standard ColumnNames defaults.

  • quantiles (list[float]) – Quantiles to include in the returned posterior summary.

  • stan_output (Optional[dict[str, CmdStanGQ]]) – Pre-computed output from predict_smoothed_posterior_stan. When provided, the Stan generated-quantities step is skipped.

  • n_cores (int) – Number of concurrent KC jobs to run. Use -1 to use all available CPU cores.

  • backend (Literal['stan', 'numba']) – Backend used to produce posterior summaries. - stan: summarize Stan generated quantities output. - numba: generate draw-level predictions via numba and summarize them.

Returns:

Per-observation smoothed posterior summary statistics for the overlapping fitted KCs.

Return type:

DataFrame

Warning

Setting n_cores greater than 1, or -1, can substantially increase peak memory usage and will most likely cause out-of-memory failures when the dataset is large.

save(save_base_location)#

Save fitted model artifacts to a compressed archive.

Parameters:

save_base_location (str | PathLike[str]) – Archive path where fitted model artifacts should be saved.

Raises:

RuntimeError – If model has not been fitted yet.

Return type:

None

set_verbosity(level)#

Set the verbosity level for logging.

Parameters:

level (VerbosityLevel) – New verbosity level.

Raises:

ValueError – If level is not a valid VerbosityLevel.

summary(kcs=None, percentiles=(2.5, 97.5), column_mapping={}, clear_cache=False)#

Get summary statistics for model parameters.

Parameters:
  • kcs (Union[list[str], str, None]) – KCs to summarize. Can be a single KC string, a list of KCs, or None. If None, summarizes all fitted KCs.

  • percentiles (Tuple[float, float]) – Percentiles to include in summary. Values should be in range [1, 99]. Ignored when the fit method is MLE, MLE produces a point estimate only.

  • clear_cache (bool) – Whether to refresh the cached summaries.

  • column_mapping (dict[str, str])

Returns:

Summary statistics.

Return type:

Any

Raises:

RuntimeError – If model has not been fitted yet.