'ReinforcementLearning/유용한 개념' 카테고리의 글 목록

on-policy Learning vs off-policy Learning

강화학습의 on-policy 와 off-policy update 방식의 차이점을 생각하고 정리합니다. 먼저, 직관적인 배경 지식에 대해 먼저 알고 갑니다. on-policy와 off-policy를 나누는 기준은 무엇인가?Q-learning (off-policy)\begin{equation} Q(a, s) \leftarrow Q(a, s)+\alpha \cdot\left(r_s+\gamma \max _{a^{\prime}} Q\left(a^{\prime}, s^{\prime}\right)-Q(a, s)\right) \end{equation} Sarsa (on-policy)\begin{equation} Q(a, s) \leftarrow Q(a, s)+\alpha \cdot\left(r_s+\gamma \cd..

format_list_bulleted ReinforcementLearning/유용한 개념
· 2025. 4. 17.
textsms

강화학습의 Bellman equation 추가 설명

1. State Value Function$$ V^\pi(s)=\mathbb{E}_\pi\left[G_t \mid s_t=s\right]=\mathbb{E}\left[\sum_{i=0}^{\infty} \gamma^i r_{t+1+i} \mid s_t=s\right] $$강화 학습을 공부한다면, state value function에 대해 많이 보았을것이다. $$ \sum_{a, s^{\prime}} \pi(a \mid s) P_{s s^{\prime}}^a\left[R\left(s, a, s^{\prime}\right)+\gamma V^\pi\left(s^{\prime}\right)\right] $$결국은 Bellman equation 형태로 정리가 가능한데, 왜 가능한지에 대한 수식 전개와 그림 전개..

format_list_bulleted ReinforcementLearning/유용한 개념
· 2025. 4. 14.
textsms

Markov Decision Process (MDP)

format_list_bulleted ReinforcementLearning/유용한 개념
· 2025. 4. 8.
textsms

on-policy Learning vs off-policy Learning

강화학습의 Bellman equation 추가 설명

Markov Decision Process (MDP)

티스토리툴바