The second new model that Microsoft released today, Phi-4-multimodal, is an upgraded version of Phi-4-mini with 5.6 billion parameters. It can process not only text but also images, audio and video.
Transformers have really become the dominant architecture for many of these ... such the GPT family, are decoder only. Encoder-decoder models combine both components, making them useful ...
Microsoft is expanding its Phi line of open-source language models with two new algorithms optimized for multimodal ...