Louisville, Kentucky
•
Today
Building and deploying enterprise scale decision systems. Hands-on experience with implementing policy gradient methods (PPO, A3C), value-based approaches (DQN, Q-learning) and off-policy algorithms. Deep familiarity with the Bellman equation, reward shaping, exploration-exploitation tradeoff, constraint mapping and knowing common failure points of real-world reinforcement learning systems. Ability to diagnose issues with policy learning and collapse, credit assignment issues, and distributional
Third Party, Contract
$50 - $55
















