In this age of constant technological evolution, the collaboration between VMware and Nvidia mirrors the words of Albert Einstein, who once said, ‘The only source of knowledge is experience.’ Together, they introduce an innovative solution that draws from their extensive expertise, promising to redefine the way organizations harness the potential of artificial intelligence (AI). This article explores the profound implications of VMware Private AI Foundation with Nvidia, highlighting its capacity to empower enterprises, navigate compliance challenges, and chart a new course for AI investments.
AI Deployment Challenges: The Need for Transformation
Deploying generative AI models has posed significant challenges for companies. Traditionally, organizations have had to grapple with the limitations of commercial platforms, such as OpenAI, which often require sending sensitive data to the cloud. This not only raises concerns about compliance but also incurs substantial costs. Alternatively, downloading and running AI models locally demands a deep understanding of model fine-tuning, vector database setup, and operationalization.
Enter the transformative partnership between VMware and Nvidia, designed to address these challenges head-on. Together, they offer a fully integrated, ready-to-use generative AI platform that empowers organizations to deploy AI models on their premises, within colocation facilities, or within private clouds. This groundbreaking platform, VMware Private AI Foundation with Nvidia, encompasses large language models like Llama 2, a vector database for real-time data integration, and a powerful combination of generative AI software and accelerated computing from Nvidia. Built on the robust foundation of VMware Cloud Foundation, this solution is optimized for AI, promising enhanced efficiency and security.
The Urgency of AI Transformation
The need for a solution like VMware Private AI Foundation with Nvidia is compelling, as evidenced by recent industry trends. Lucidworks’ global generative AI benchmark study revealed that a staggering 96% of AI decision-makers prioritize generative AI investments. Moreover, 93% of companies intend to increase their AI spending in the coming year, highlighting the growing significance of AI in organizational strategies.
However, risk management remains a paramount concern. The rapidly evolving regulatory landscape has significantly influenced AI investment decisions, with 77% of CEOs expressing apprehension, according to a KPMG survey. Protecting personal data and addressing privacy concerns top the priority list at 63%, followed closely by cybersecurity at 62%.
Deploying large language models within enterprise-controlled environments, such as on-premises data centers or private clouds, presents an opportunity to alleviate these concerns significantly. Bradley Shimmin, Chief Analyst for AI Platforms, Analytics, and Data Management at Omdia, emphasizes the potential of running models locally, stating, “Having the option to run a model locally can open many doors for companies that were simply prohibited from using publicly hosted models, even if they were hosted in a virtual public cloud.”
This level of control and security is particularly crucial for heavily regulated sectors like finance and government, where data residency concerns and compliance are paramount.
The Paradigm Shift: AI Meets Data Gravity
The paradigm shift facilitated by locally run AI models is akin to the concept of “bringing the model to the data.” This approach recognizes the gravitational pull of data within organizations and the necessity of aligning AI deployment with data locations.
Manish Goyal, Global AI and Analytics Leader at IBM Consulting, underscores the advantages of locally run, open-source models. Not only do they offer lower latency and reduced costs, but they also grant organizations greater control over their AI environments. The ability to customize and fine-tune these models to specific business needs is a game-changer, allowing companies to unlock the true potential of AI without compromising data security.
VMware’s Innovative Solution: Meeting the Challenge
VMware’s new offering is poised to capitalize on this paradigm shift in AI deployment. During the VMware Explore 2023 conference, VMware and Nvidia are demonstrating how organizations can download free, open-source Llama 2 models, customize them, and deploy production-grade generative AI within VMware environments. However, there is one catch—VMware Private AI Foundation won’t be available until early next year.
Paul Turner, Vice President of Product Management for vSphere and Cloud Platform at VMware, elaborates on how this solution works. Enterprises can take models like Meta’s Llama 2, place them in their data centers alongside their data, optimize and fine-tune these models, and create new business offerings. Turner emphasizes the simplicity and comprehensiveness of VMware Private AI Foundation, stating, “We want to make it simple for our customers.”
VMware Private AI Foundation represents the complete stack, starting with foundational models like Llama 2, Falcon, or Nvidia’s NeMo AI. Leveraging existing models streamlines the process, making it more efficient than building new foundational models from scratch. Fine-tuned models also require access to up-to-date information, which is facilitated by a built-in vector database—PostgreSQL with the PGVector extension.
The platform goes beyond simplicity to deliver high performance. It supports not only single GPUs but also scaling up to 16 GPUs, meeting the demands of AI workloads. Storage is optimized with a direct path from the GPU to storage, bypassing the CPU. Additionally, Dell, HPE, and Lenovo have partnered with VMware to deliver the rest of the stack, offering customers flexibility in their choices.
VMware Private AI Foundation will be available through various channels, including VMware’s OEM partners, distributors, and over 2,000 MSP partners. Pricing will be GPU-based, reflecting the value it brings to customers.
For those who can’t wait until next year, reference architectures are already available for customers to build their own solutions. However, the fully integrated single-suite product will debut in early 2024.
Fine-Tuning Generative AI for Business Value
According to Justin Boitano, Vice President of Enterprise Computing at Nvidia, generative AI is the most transformational technology of our times. These models offer a natural language interface to businesses’ systems, presenting unparalleled power and versatility. Boitano envisions AI becoming an integral part of every business within the next decade.
The challenge lies in the fact that off-the-shelf models possess limited knowledge, primarily relying on publicly available data. Foundation models like ChatGPT, while versatile, may not excel at company-specific tasks. Boitano emphasizes the necessity of customizing models against private business information, such as call center records or IT tickets, without exposing this data to the public.
This is where open-source models like Llama 2 come into play. Organizations can download these models, fine-tune them using their own data, and combine them seamlessly with proprietary information. This customization ensures that the model comprehends and addresses the specific needs of the business.
VMware Private AI Foundation simplifies this fine-tuning process. It comes equipped with pre-packaged models, training frameworks, and an AI workbench, facilitating a smooth transition from individual laptops to data centers, where the bulk of AI computing and inference occurs. Fine-tuning can be accomplished in as little as eight hours using eight GPUs, resulting in a 40-billion parameter model. The vector database ensures access to real-time data, unlocking previously unsolvable problems.
The platform supports Nvidia’s A100 AI chip, the H100 chip, and the upcoming L40S chip. The L40S chip is expected to offer 1.2 times more generative AI inference performance and 1.7 times more training performance compared to the A100. Its versatility extends beyond generative AI, making it suitable for virtual desktops and rendering.
The Significance of Llama 2 from Meta
Within the realm of generative AI models, Llama 2, developed by Meta, stands out as a prominent player. Meta released Llama 2 in July, offering it for free commercial use and open sourcing it, albeit with certain limitations for companies with over 700 million active monthly users. This move has reshaped the landscape of open-source foundational models, making them commercially licensable and accessible.
Today, Llama 2 variants dominate the HuggingFace Open LLM Leaderboard, signaling their widespread adoption within the industry. Companies can download these models, tailor them to their specific requirements, and leverage real-time data access through embeddings.
Llama 2 comes in different sizes, allowing organizations to strike a balance between performance and hardware requirements. It enables the deployment of AI models on relatively low-powered devices, expanding accessibility and utility.
The Future of AI Deployment
VMware’s groundbreaking collaboration with Nvidia marks a pivotal moment in the evolution of AI deployment within enterprises. By offering a comprehensive, integrated, and efficient AI platform, organizations can harness the full potential of generative AI while maintaining control and compliance. The ability to run AI models locally and fine-tune them to specific business needs opens up new possibilities for innovation and differentiation.
As AI continues to permeate every facet of business operations, solutions like VMware Private AI Foundation with Nvidia empower organizations to adapt, evolve, and thrive in an increasingly AI-driven world. While the launch date may be on the horizon, the promise of a transformative AI future is already within reach. VMware and Nvidia’s partnership exemplifies a commitment to shaping that future, one where AI is not just a tool but an integral part of organizational success.