Multimodal Transformer Model

Microsoft’s Phi-4-multimodal AI model handles speech, text, and video

The new small language model can help developers build multimodal AI applications for lightweight computing devices, ...

16d

Microsoft releases new Phi models optimized for multimodal processing, efficiency

The second new model that Microsoft released today, Phi-4-multimodal, is an upgraded version of Phi-4-mini with 5.6 billion parameters. It can process not only text but also images, audio and video.

10d

AI Voice Model Wows, Worries Users with Uncanny Realism

A new AI voice model from startup Sesame has astonished users with its near-human realism, sparking both admiration and unease.

13d

Diffusion LLMs Arrive : Is This the End of Transformer Large Language Models (LLMs)?

Discover how Mercury’s diffusion-based LLMs are 10x faster than Transformers, reshaping AI for text, image, and video ...

Interesting Engineering on MSN12d

Watch: UBTech achieves world's first multi-humanoid robot coordination feat

UBTech's Walker S1 humanoid robots achieve a breakthrough, collaborating on complex tasks at Zeekr's 5G smart factory using ...

Unlock Open Multimodality with Microsoft’s Phi-4 Series AI Models

Microsoft's Phi-4 Series delivers cutting-edge multimodal AI with compact design, local deployment, and advanced ...

VentureBeat28d

A look under the hood of transfomers, the engine driving AI model evolution

Depending on the application, a transformer model follows an encoder-decoder ... the most exciting applications of transformer models are multimodal models. OpenAI’s GPT-4o, for instance ...

10d

Eerily realistic AI voice demo sparks amazement and discomfort online

In late February, Sesame released a demo for the company's new Conversational Speech Model (CSM) that appears to cross over what many consider the "uncanny valley" of AI-generated speech, with some ...

Devdiscourse10d

Unlocking the power of clinical notes for more accurate disease predictions

Predicting patient trajectories is a complex task due to several factors, including data non-stationarity, the vast number of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results