Text this: A software pipeline for systematizing machine learning of speech data