Text this: Adaptive Graph Learning with Multimodal Fusion for Emotion Recognition in Conversation