In December 2024, Microsoft launched Phi-4, a small language mannequin (SLM) with state-of-the-art efficiency in its class. At present, Microsoft is increasing the Phi-4 household with two new fashions: Phi-4-multimodal and Phi-4-mini.
The brand new Phi-4-multimodal mannequin helps speech, imaginative and prescient, and textual content concurrently, whereas Phi-4-mini is targeted on text-based duties.
Phi-4-multimodal is a 5.6B parameter mannequin and can be Microsoft’s first multimodal language mannequin that integrates speech, imaginative and prescient, and textual content processing right into a single, unified structure. In comparison with different current state-of-the-art omni fashions, together with Google’s Gemini 2.0 Flash and Gemini 2.0 Flash Lite, Phi-4-multimodal achieves higher efficiency on a number of benchmarks, as you’ll be able to see within the desk under.
In speech-related duties, Phi-4-multimodal outperforms specialised speech fashions like WhisperV3 and SeamlessM4T-v2-Giant in each computerized speech recognition (ASR) and speech translation (ST). Microsoft states that this mannequin has achieved the highest place on the Hugging Face OpenASR leaderboard with a powerful phrase error fee of 6.14%.
In vision-related duties, Phi-4-multimodal achieved sturdy efficiency in arithmetic and science reasoning. In widespread multimodal capabilities, corresponding to doc and chart understanding, OCR, and visible science reasoning, this new mannequin matches or exceeds fashionable fashions like Gemini-2-Flash-lite-preview and Claude-3.5-Sonnet.
Phi-4-mini is a 3.8B parameter mannequin and outperforms a number of fashionable bigger LLMs in text-based duties, together with reasoning, math, coding, instruction-following, and function-calling.
To make sure the safety and security of those new fashions, Microsoft carried out testing with inside and exterior safety specialists, using methods crafted by the Microsoft AI Crimson Staff (AIRT). Each Phi-4-mini and Phi-4-multimodal fashions may be deployed on-device when additional optimized with ONNX Runtime for cross-platform availability, making them appropriate for low-cost and low-latency eventualities.
Each Phi-4-multimodal and Phi-4-mini fashions are actually obtainable for builders in Azure AI Foundry, Hugging Face, and the NVIDIA API Catalog. Builders can undergo the technical paper to see a top level view of really helpful fashions makes use of and their limitations.
These new Phi-4 fashions signify vital developments in environment friendly AI, bringing highly effective multimodal and text-based capabilities to quite a lot of AI purposes.
No Comment! Be the first one.