Text this: Scale-aware Gaussian mixture loss for crowd localization transformers