Transformers have really become the dominant architecture for many of these sequence ... encoder representations from transformers (BERT), which could be considered one of the first LLMs ...