Multimodal Transformer Model

16d

Microsoft releases new Phi models optimized for multimodal processing, efficiency

The second new model that Microsoft released today, Phi-4-multimodal, is an upgraded version of Phi-4-mini with 5.6 billion parameters. It can process not only text but also images, audio and video.

Interesting Engineering on MSN12d

Watch: UBTech achieves world’s first multi-humanoid robot coordination feat

A Chinese robotics company claims to have a breakthrough in multi-humanoid robot collaboration. UBTech achieved the world’s ...

13d

Diffusion LLMs Arrive : Is This the End of Transformer Large Language Models (LLMs)?

Discover how Mercury’s diffusion-based LLMs are 10x faster than Transformers, reshaping AI for text, image, and video ...

InfoWorld16d

Microsoft’s Phi-4-multimodal AI model handles speech, text, and video

The new small language model can help developers build multimodal AI applications for lightweight computing devices, ...

VentureBeat28d

A look under the hood of transfomers, the engine driving AI model evolution

Depending on the application, a transformer model follows an encoder-decoder ... the most exciting applications of transformer models are multimodal models. OpenAI’s GPT-4o, for instance ...

Unlock Open Multimodality with Microsoft’s Phi-4 Series AI Models

Microsoft's Phi-4 Series delivers cutting-edge multimodal AI with compact design, local deployment, and advanced ...

10d

AI Voice Model Wows, Worries Users with Uncanny Realism

A new AI voice model from startup Sesame has astonished users with its near-human realism, sparking both admiration and unease.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results