Text this: Temporal Synchronization and Normalization of Speech Videos for Face Recognition