Text this: MT-CMVAD: A Multi-Modal Transformer Framework for Cross-Modal Video Anomaly Detection