ML-Based Forecasting and Anomaly Detection in Snowflake Cortex

Historically, only a few AI experts within an organization could develop insights using machine learning (ML) and predictive analytics. Yet in this new wave of AI, democratizing ML to more data teams is crucial—and for Snowflake SQL users, it’s now a reality.

With the general availability of ML-based forecasting and anomaly detection functions in Snowflake Cortex, data analysts and other SQL users can now build more accurate forecasts and identify outliers for their time-series data in Snowflake—all without needing to learn Python, have expertise in ML algorithms or stand up or manage infrastructure. These new functions are already delivering value to organizations worldwide, including large U.S. grocery retailer and wholesaler SpartanNash, which manages over 180 stores and 25 centers, and Laybuy, a rapidly growing buy-now, pay-later service based in New Zealand.  

Read more to dive deeper into how these functions work, how customers are using them today, and what’s new and improved with this GA launch.  

What are Snowflake Cortex functions?

Announced at Snowday 2023, Snowflake Cortex is a new, intelligent, fully managed service that helps organizations analyze data and quickly build AI applications—all within Snowflake. 

Harnessing the power of Snowflake Cortex ML-based forecasting and anomaly detection is easy: Simply use them wherever you access your Snowflake data today, whether in Snowsight or your favorite SQL editor. (Note: LLM-based functions are still in private preview; reach out to your account team to gain access.)

ML functions empower analysts to maximize ever-growing volumes of data across sales, costs, inventory, demand and more to derive more accurate insights, faster—ultimately helping their teams make better decisions.

Forecasting and anomaly detection

You can accomplish forecasting and anomaly detection in two simple steps: First, train an ML model. Second, predict new time periods for forecasting, or detect outliers on a new set of data for anomaly detection. 

Because training and prediction are separate steps, you can reuse your model to repeatedly generate forecasts over new time horizons or detect anomalies in new data. 

Forecasting

The forecasting function helps build more accurate time series forecasts with:

  • Automated seasonality consideration: Do sales in your retail locations vary by day of the week or month? Forecasting will automatically take these day-to-day or month-to-month patterns into account for simpler, more accurate predictions. 
  • Automated trend detection: Capture trends in your data, such as whether electricity demand is increasing rapidly or staying relatively constant with important fluctuations. 
  • Other automations: Train models and generate predictions across regions or sales categories by adding just one parameter to your SQL. Boost your model’s accuracy by including features in your training—and get evaluation metrics with a single line of SQL. And inspect the relative importance of automatically generated and manually selected features to your model’s predictive power. 

Here’s how to train a model and generate forecasts—in just a few lines of SQL: 

— Train your model CREATE SNOWFLAKE.ML.FORECAST my_model( input_data => SYSTEM$REFERENCE(’TABLE’, my_data), timestamp_colname => ‘my_timestamp’, target_colname => ‘daily_sales’); — Generate forecasts CALL my_model!FORECAST(forecasting_periods => 7);

Anomaly detection 

Similarly, detecting anomalies takes just two easy steps. Train a model with SQL using a subset of your data, and detect outliers in other data sets. You don’t need to worry about ML frameworks, training or inference infrastructure, or having expertise to use data science tools or Python. 

— Train your model CREATE SNOWFLAKE.ML.ANOMALY_DETECTION my_model( input_data => SYSTEM$REFERENCE(’TABLE’, my_data), timestamp_colname => ‘my_timestamp’, target_colname => ‘daily_sales’, label_colname => ‘past_anomalies’); — Detect anomalies CALL my_model!DETECT_ANOMALIES( input_data => SYSTEM$REFERENCE(’TABLE’, my_data), timestamp_colname => ‘my_timestamp’, target_colname => ‘daily_sales’);

Unlocking more value: from faster, more accurate forecasts to better decision-making 

The general availability of these functions also includes several enhancements, which make them even easier to use and helps you solve more complex problems—from trend detection to seasonality consideration:

  • More automated data preparation: Spend less time preparing data before using these functions—even if timestamps aren’t regularly spaced or data has some null values.
  • Larger data sets: Train models on up to 100 million rows with higher memory compute using Snowpark-optimized warehouses.
  • Longer timeout windows: Complete jobs with large data sets or hundreds of individual forecasts in under six hours, without timing out.
  • Evaluation metrics: Determine whether your predictions are sufficiently accurate for the problem you’re solving with forecasting accuracy metrics, including mean absolute error, mean absolute percentage error and mean squared error. These functions also make it easier to explain the results to your organization when you’re ready to share them. Metrics coming soon to anomaly detection. 

Snowflake Cortex ML-based functions in action 

Customers such as SpartanNash and Laybuy are using Snowflake Cortex ML-based functions to increase efficiency and deliver more accurate forecasts to the business. 

SpartanNash manages more than 180 stores and 25 distribution centers. The team uses the Snowflake Cortex forecasting function to more efficiently and accurately predict daily sales across their many locations. SpartanNash’s Team Lead for Business Intelligence Jeff Magnuson says, “We’ve been using Snowflake’s ML-based forecasting function for three months now and have saved hours of effort while generating more accurate forecasts than our previous process. Given our experience, we’re exploring more ways to use Snowflake Cortex functions throughout SpartanNash.” 

Laybuy has also experienced similar benefits. “Snowflake’s forecasting function has simplified how we produce time-series forecasts,” says Laybuy’s Head of Data Analytics Dean Sequeira. “It eliminates the need for multiple Python packages and scripts that we previously used and constantly maintained. Instead, a single SQL script can generate a time-series forecast, even for multi-series data, which makes forecasting for multiple regions and products a breeze with just a single task. This simplification has made it easier and faster to obtain a usable forecast and present it to business users when needed. Since it’s SQL-based, the forecasting function also enables our citizen-data analysts, in addition to the data engineering squad, to explore ML functions, ensuring that data is always at the center of decision-making.” 

Where can I learn more? 

Tune into our BUILD session to see how to use forecasting and anomaly detection to monitor data pipelines. In this demonstration, we’ll walk you through how to build forecasts and multi-series anomaly detection—and schedule both with Snowflake Tasks. 

What’s next? 

Classification, our next ML-based Snowflake Cortex function, is now in private preview. Allowing you to classify data into groups based on observed patterns in similar data, this function is particularly helpful in predicting churn, scoring leads, predicting credit default, detecting manufacturing faults and more. And you can expect the same easy-to-use experience—just a few lines of SQL help you predict key decision variables and act confidently. 

To learn more about Snowflake Cortex ML functions,visit Snowflake documentation or try out this Quickstart.

Source