Reinforcement Learning (RL) REINFORCE (Policy Gradient) Formula ∇θJ(θ)=Eτ∼πθ[∑t=0T∇θlogπθ(at∣st)Gt]