Snowflake’s open-source Arctic LLM to take on Llama 3, Grok, Mistral, and DBRX

Cloud-based data warehouse company Snowflake has developed an open-source large language model (LLM), Arctic, to take on the likes of Meta’s Llama 3, Mistral’s family of models, xAI’s Grok-1, and Databricks’ DBRX.

Arctic is aimed at enterprise tasks such as  SQL generation, code generation, and instruction following, Snowflake said Wednesday.

It can be accessed via Snowflake’s managed machine learning and AI service, Cortex, for serverless inference via its Data Cloud offering and across model providers such as Hugging Face, Lamini, AWS, Azure, Nvidia, Perplexity, and Together AI, among others, the company said. Enterprise users can download it from Hugging Face and get inference and fine-tuning recipes from Snowflake’s Github repository, the company said.

Snowflake Arctic versus other LLMs

Fundamentally, Snowflake’s Arctic is very similar to most other open-source LLMs, which also use the mixture of experts (MoE) architecture and this includes DBRX. Grok-1, and Mixtral among others.

The MoE architecture builds an AI model from smaller models trained on different datasets, and later these smaller models are combined into one model that excels in solving different kind of problems. Arctic is a combination of 128 smaller models.

One exception among the open-source models on the market is Meta’s Llama 3, which has a transformer model architecture—an evolution of the encoder-decoder architecture developed by Google in 2017 for translation purposes.

The difference between the two architectures, according to Scott Rozen-Levy, director of technology practice at digital services firm West Monroe, is that an MoE model allows for more efficient training by being more compute efficient.

“The jury is still out on the right way to compare complexity and its implications on quality of LLMs, whether MoE models or fully dense models,” Rozen-Levy said.

Snowflake claims that its Arctic model outperforms most open-source models and a few closed-source ones with fewer parameters and also uses less compute power to train.

“Arctic activates roughly 50% less parameters than DBRX, and 75% less than Llama 3 70B during inference or training,” the company said, adding that it uses only two of its mix of expert models at a time, or about 17 billion out of its 480 billion parameters.

DBRX and Grok-1, which have 132 billion parameters and 314 billion parameters respectively, also activate fewer parameters on any given input. While Grok-1 uses two of its eight MoE models on any given input, DBRX activates just 36 billion of its 132 billion parameters.

However, semiconductor research firm Semianalysis’ chief analyst Dylan Patel said that Llama 3 is still significantly better than Arctic by at least one measure.

“Cost wise, the 475-billion-parameter Arctic model is better on FLOPS, but not on memory,” Patel said, referring to the computing capacity and memory required by Arctic.

Additionally, Patel said, Arctic is really well suited for offline inferencing rather than online inferencing.

Offline inferencing, otherwise known as batch inferencing, is a process where predictions are run, stored and later presented on request. In contrast, online inferencing, otherwise known as dynamic inferencing, is generating predictions in real time.

Benchmarking the benchmarks

Arctic outperforms open-source models such as DBRX and Mixtral-8x7B in coding and SQL generation benchmarks such as HumanEval+, MBPP+ and Spider, according to Snowflake, but it fails to outperform many models, including Llama 3-70B, in general language understanding (MMLU), MATH, and other benchmarks.

Experts claim that this is where the extra parameters in other models such as Llama 3 are likely to add benefit.

“The fact that Llama 3-70B does so much better than Arctic on GSM8K and MMLU benchmarks  is a good indicator of where Llama 3 used all those extra neurons, and where this version of Arctic might fail,” said Mike Finley, CTO of Answer Rocket, an analytics software provider.

“To understand how well Arctic really works, an enterprise should put one of their own model loads through the paces rather than relying on academic tests,” Finley said, adding that it worth testing whether Arctic will perform well on specific schemas and SQL dialects for a specific enterprise although it performs well on the Spider benchmark.

Enterprise users, according to Omdia chief analyst Bradley Shimmin, shouldn’t focus too much on the benchmarks to compare models.

“The only relatively objective score we have at the moment is LMSYS Arena Leaderboard, which gathers data from actual user interactions. The only true measure remains the empirical evaluation of a model in situ within the context of its perspective use case,” Shimmin said.

Why is Snowflake offering Arctic under the Apache 2.0 license?

Snowflake is offering Arctic and its other text embedding models along with code templates and model weights under the Apache 2.0 license, which allows commercial usage without any licensing costs.

In contrast, Llama’s family of models from Meta has a more restrictive license for commercial use.

The strategy to go completely open source might be beneficial for Snowflake across many fronts, analysts said.

“With this approach, Snowflake gets to keep the logic that is truly proprietary while still allowing other people to tweak and improve on the model outputs. In AI, the model is an output, not source code,” said Hyoun Park, chief analyst at Amalgam Insights.

“The true proprietary methods and data for AI are the training processes for the model, the training data used, and any proprietary methods for optimizing hardware and resources for the training process,” Park said.

The other upside that Snowflake might see is more developer interest, according to Paul Nashawaty, practice lead of modernization and application development at The Futurum Research.

“Open-sourcing components of its model can attract contributions from external developers, leading to enhancements, bug fixes, and new features that benefit Snowflake and its users,” the analyst explained, adding that being open source might add more market share via “sheer good will”.

West Monroe’s Rozen-Levy also agreed with Nashawaty but pointed out that being pro open source doesn’t necessarily mean that Snowflake will release everything it builds under the same license.

“Perhaps Snowflake has more powerful models that they are not planning on releasing in open source. Releasing LLMs in a fully open-source fashion is perhaps a moral and/or PR play against the full concentration of AI by one institution,” the analyst explained.

Snowflake’s other models

Earlier this month, the company released a family of five models on text embeddings with different parameter sizes, claiming that these performed better than other embeddings models.  

LLM providers are increasingly releasing multiple variants of models to allow enterprises to choose between latency and accuracy, depending on use cases. While a model with more parameters can be relatively more accurate, the one with fewer parameters requires less computation, takes less time to respond, and therefore, costs less.

“The models give enterprises a new edge when combining proprietary datasets with LLMs as part of a retrieval augmented generation (RAG) or semantic search service,” the company wrote in a blog post, adding that these models were a result of the technical expertise and knowledge it gained from the Neeva acquisition last May.

The five embeddings models, too, are open source and are available on Hugging Face for immediate use and their access via Cortex  is currently in preview.

Copyright © 2024 IDG Communications, Inc.

Source