stanbkt.utils.sim_grouped_BKT#
- stanbkt.utils.sim_grouped_BKT(n_students=10, n_problems=20, n_kcs=1, n_groups=2, prior=0.1, learn=0.01, forget=0.05, guess=0.2, slip=0.1, rng_seed=None, kc_sequence=None, group_sequence=None, frac=1.0)#
Simulate student problem responses under grouped BKT model.
Generates synthetic dataset by sampling problem responses from a Bayesian Knowledge Tracing model where BKT parameters can vary by student group and by knowledge component (KC).
- Parameters:
n_students (
int) – Number of students to simulate.n_problems (
int) – Number of problems to simulate.n_kcs (
int) – Number of knowledge components (KCs).n_groups (
int) – Number of student groups with distinct BKT parameters.prior (
Union[float,Sequence[float],Sequence[Sequence[float]]]) – Initial knowledge probability. Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).learn (
Union[float,Sequence[float],Sequence[Sequence[float]]]) – Learning (mastery) probability. Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).forget (
Union[float,Sequence[float],Sequence[Sequence[float]]]) – Forgetting probability. Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).guess (
Union[float,Sequence[float],Sequence[Sequence[float]]]) – Guessing probability (correct response without knowledge). Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).slip (
Union[float,Sequence[float],Sequence[Sequence[float]]]) – Slipping probability (incorrect response despite knowledge). Accepted formats are scalar, shape (n_groups,), or shape (n_groups, n_kcs).rng_seed (int or None, optional) – Random seed for reproducibility.
kc_sequence (array-like of int or None, optional) – KC assignment for each problem. If None, randomly sampled.
group_sequence (array-like of int or None, optional) – Group assignment for each student (0-indexed). If None, students are evenly distributed across groups.
frac (float, default 1.0) – Fraction of rows to include in the output dataset. This simulates missing data, or students not completing all problems, by randomly dropping rows after simulation.
- Returns:
Simulated dataset with columns: student_id, problem_id, correct, kc_id, group_id, timestamp.
- Return type:
- Raises:
ValueError – If parameter shapes are invalid, if kc_sequence is invalid, or if group_sequence is invalid.
Notes
Parameters can be specified per-group by providing lists/arrays of length n_groups, or per-(group, KC) by providing a 2D array with shape (n_groups, n_kcs). For example:
sim_grouped_BKT( n_students=20, n_groups=2, prior=[[0.2, 0.1], [0.5, 0.4]], # rows=groups, cols=KCs learn=[[0.01, 0.02], [0.05, 0.06]], )
Each student is assigned a group, and knowledge states are tracked independently per (student, KC). Transition and emission probabilities are chosen from that student’s group and the active KC.