Using Snowflake to Optimize Global Support

Snowflake is a fast-growing company, and our global Support organization ensures our team scales accordingly to align with expanding customer needs. In just three short years, we grew from roughly 50 technical support staff to 350 today, and we are on track to top more than 400 people by year-end. How do you manage such high growth and onboard that many new people while striving for ever-higher customer satisfaction within the support experience? By innovating in that support experience and—and this is the topic of today’s blog post—by using our own products. In this “Snowflake on Snowflake” story, I’ll tell you how we used Snowpark to solve a critical global Support challenge, empowering our cloud support engineers to find relevant information and resolve customer issues faster and more effectively. 

First, let’s set the stage. What is Snowpark? Recently announced at our Summit 2022 event, Snowpark is a framework for Snowflake that allows data scientists and developers to expedite application development and deployment by using their preferred language—including Java, Scala, and Python (currently in public preview) —to execute data pipelines, machine learning (ML) workflows, and data apps quickly, securely, and from a single platform. In particular, our engineers have really loved using Snowpark to quickly build and deploy advanced data processing pipelines that deliver real-world results.

Getting to the root of an incident by optimizing search capabilities

The use case here is our search capabilities. In particular, search times for troubleshooting incidents were increasing, not because the content wasn’t in the knowledge base, but because the search index wasn’t well enough integrated with it to generate the most relevant, accurate result. 

Engineers would enter terms into our search index based on what they initially understood to be the problem, often using the customer’s own terms to describe the situation, and that search index used the common “keyword in content” convention. You can see the problem here: By its nature, the support experience usually starts without a final diagnosis—that’s what we’re trying to get to—so if you’re entering terms that are too broad, or even off base, you will get content that’s too broad or off base. Worse, you may have the preliminary diagnosis correct but, as our search index ability was set up prior to this project, if that term wasn’t included in the most relevant content for the issue, the article failed to come up. In our fast-growing organization, the number of people who were new to Snowflake added to the complexity.

Our Customer Experience Engineering (CXE) team within the global Support organization created a new search index using Snowpark, natural language processing, ML, Snowflake Tasks, Streams, Stored Procedures, and SQL API. The result is a knowledge base search index that has improved the accuracy of the third-party search index we use by 68% over real-world historical search terms that Snowflake Cloud Support Engineers curated as a test-set along with their expected results.

Like all support organizations, over the years we’ve accumulated and curated a wealth of technical content, including product documentation, knowledge articles, historical support cases, confluence wiki pages, community posts, and more. This growth has been great for ensuring we have proper coverage of content, but it has led to another problem: As the breadth of technical content has grown, honing in on the most relevant content has become more difficult. Building on the existing search index has greatly improved our support engineers’ experience and, by extension, sped up time-to-resolution (TTR) to the benefit of our customers. 

We’ve also optimized the search to achieve much greater relevancy. To do this, we focused on the use case of troubleshooting, defined as being able to search for content by describing symptoms and issues a customer is facing, and getting results relevant to that search query. To prove that the current indexes don’t perform well for the troubleshooting use case, CXE worked with cloud support engineers (CSEs) to create a test set of real-world searches performed in the past by CSEs, along with the expected knowledge base articles. When run against the old (unmodified) index, this test set returned 41% of the expected results in the top five hits.

The CXE team dove into the data on troubleshooting incidents and found that many of the keywords that CSEs were searching on weren’t in the resulting content being indexed. This is because the common convention of “keyword in content” was guiding the search. This means that only the keywords found in the content are stored in the indexes (with minor modifications such as storing term synonyms). For content to be returned for a search query, that search query needs to include query terms that are contained in the indexed content itself. Unfortunately, CSEs and customers commonly search using their “problem,” as they define it, as the keyword. So they may not even identify the knowledge articles most likely to help them, especially when they don’t yet know the root cause of their issue. They only know what they are experiencing. In fact, it’s possible the best knowledge article for their case doesn’t even use the keyword they would use to describe their problem.

The solution

The engineers in CXE hypothesized that a better index can be crafted by extracting relationships between historical support cases, and the technical content that was used to solve those support cases. Support cases contain the words and phrases that customers and CSEs naturally use to describe the issues they are facing. If those words and phrases are combined with the words that exist in technical content we are searching for in the index, there’s a much higher probability that a customer or CSE will perform the search using keywords that match the indexed keywords.

The result is an industry-first integration between Snowflake and a popular, third-party search index, which, as referenced above, was found to improve knowledge accuracy by nearly 68%. All the innovation in the SnowflakeSupport team is analyzed through a lens of whether it can be productized for customer benefit. Because we know our customers can use this to solve their customers’ problems, the Snowflake Support interface will be used to bring such capabilities to customers in the future. 

Snowflake’s Customer Support team always considers whether the outcomes, such as this integration, could possibly be productized for customer benefit. In our efforts to create a continuous learning environment, this project is just one stride forward in our quest to ensure that the global Support team is exposed to anything and everything concerning our products and customers.  

How we did it: A deeper dive 

The first problem was figuring out how to determine which support cases were solved by which pieces of technical content. Support case objects don’t have a reliable field to extract what content was used in resolving the support case. Instead, we scanned all of the communication text that happens on the support case, looking for references to known technical content. This was done in a Snowpark job called Entity Extraction, which looked at the communication text for all kinds of entities, such as URLs, QueryIDs, and stack traces.

We also had to tackle a data cleansing issue to allow us to distinguish between that content that’s technically relevant to the support case and all kinds of content that gets captured but is not relevant, such as text about scheduling meetings, exchanging greetings, asking if the outcome seems satisfactory to the customer, and so on. To weed out content not related to the technical problem, a ML model was built using Snowpark that could classify whether a sentence in a support case was technically relevant.

A training set of thousands of manually labeled sentences was curated. A common open source NLP library was used to split content into sentences and lemmatize keywords. A popular open source Java ML Library was used to provide a Naive Bayes implementation that was ideal for this kind of text classification problem. After fine-tuning the feature set and developing some hooks to enable Weka to be used from within Snowpark code, a model was created that was measured at 92% accuracy using 10-fold cross validation. This model was then packaged and deployed to a stage location that a Snowpark task and user-defined function (UDF) (in public preview) could read it from in order to classify sentences from support cases.

The Entity Extraction Snowpark task and non-technical content cleansing task were strung together in a pipeline to join technical words from support cases with the technical content used to solve issues. This joined data set is output to a CXE-owned database in Snowhouse, a Snowflake account that we use internally and has identical functionality to what our customers use. Establishing a connecting index data from a Snowflake table to the index search was the final step required. As referenced above, that improvement was found to be 68% increase in search relevance, defined as relevant content coming up in the first five hits, and an industry-first integration between Snowflake and our third-party search index.

Continued investment in CX

Ultimately, the team’s greatest achievement here isn’t the integration we created; it’s making the enthusiastic and unrelenting investment in our customer experience. Our senior leadership team is firmly intent on building one of the best support organizations in the world, ensuring that as Snowflake scales, we continue to put customers first and always lead with that value. It’s a well-known fact that customer support is an important brand differentiator. The number of customers willing to reference our support experience is growing and we are just getting started. Through innovating in the support space, aided by our own products, we expect that our ongoing investment in this area will be a key part of building Snowflake as a brand you can continue to trust. 

Source

Originally posted on November 2, 2022 @ 5:30 am