Text this: Optimal tracking controllers with Off-policy Reinforcement Learning Algorithm in Quadrotor