Policy iteration: because who needs human intuition?
Because who needs human policy when you have gradients?