Kyle 的强化学习与机器学习博客

// Tags / Thompson Sampling

# 多臂老虎机问题（MAB）

强化学习课堂笔记 1 / 2

2026年04月22日 7 min read 1,334 字

从问题定义、价值估计、增量更新到 epsilon-greedy，UCB 和 Thompson Sampling算法的多臂老虎机入门笔记。

强化学习 bandit 机器学习 epsilon-greedy UCB Thompson Sampling