stanbkt.utils.sim_grouped_BKT#

stanbkt.utils.sim_grouped_BKT(n_students=10, n_problems=20, n_kcs=1, n_groups=2, prior=0.1, learn=0.01, forget=0.05, guess=0.2, slip=0.1, rng_seed=None, kc_sequence=None, group_sequence=None, frac=1.0)#

Simulate student problem responses under grouped BKT model.

Generates synthetic dataset by sampling problem responses from a Bayesian Knowledge Tracing model where BKT parameters can vary by student group and by knowledge component (KC).

Parameters:
  • n_students (int) – Number of students to simulate.

  • n_problems (int) – Number of problems to simulate.

  • n_kcs (int) – Number of knowledge components (KCs).

  • n_groups (int) – Number of student groups with distinct BKT parameters.

  • prior (Union[float, Sequence[float], Sequence[Sequence[float]]]) – Initial knowledge probability. Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).

  • learn (Union[float, Sequence[float], Sequence[Sequence[float]]]) – Learning (mastery) probability. Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).

  • forget (Union[float, Sequence[float], Sequence[Sequence[float]]]) – Forgetting probability. Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).

  • guess (Union[float, Sequence[float], Sequence[Sequence[float]]]) – Guessing probability (correct response without knowledge). Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).

  • slip (Union[float, Sequence[float], Sequence[Sequence[float]]]) – Slipping probability (incorrect response despite knowledge). Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).

  • rng_seed (int or None, optional) – Random seed for reproducibility.

  • kc_sequence (array-like of int or None, optional) – KC assignment for each problem. If None, randomly sampled.

  • group_sequence (array-like of int or None, optional) – Group assignment for each student (0-indexed). If None, students are evenly distributed across groups.

  • frac (float, default 1.0) – Fraction of rows to include in the output dataset. This simulates missing data, or students not completing all problems, by randomly dropping rows after simulation.

Returns:

Simulated dataset with columns: student_id, problem_id, correct, kc_id, group_id, timestamp.

Return type:

DataFrame

Raises:

ValueError – If parameter shapes are invalid, if kc_sequence is invalid, or if group_sequence is invalid.

Notes

Parameters can be specified per-group by providing lists/arrays of length n_groups, or per-(group, KC) by providing a 2D array with shape (n_groups, n_kcs). For example:

sim_grouped_BKT(
    n_students=20,
    n_groups=2,
    prior=[[0.2, 0.1], [0.5, 0.4]],  # rows=groups, cols=KCs
    learn=[[0.01, 0.02], [0.05, 0.06]],
)

Each student is assigned a group, and knowledge states are tracked independently per (student, KC). Transition and emission probabilities are chosen from that student’s group and the active KC.