There is a growing trend in the deep learning community to unify methodologies for solving problems in different areas (language, vision, speech, etc.). This generalized approach requires less domain knowledge and fewer inductive biases for specific problems and motivates models to learn useful knowledge almost entirely from their training data. …