Text this: Objectivization of Audio-Visual Correlation Analysis