Transformers have really become the dominant architecture for many of these sequence modeling tasks because the underlying attention-mechanism ... audio and images — and other providers are ...