Key to product search is a dynamic scientific taxonomy
Scientists lead search demand
When Label Insight first began working with the FDA in 2008, our core use case was powering more detailed search. Fast forward 10 years - e-commerce and robust search has become critically important and, surprisingly, FDA scientists are still leading the way in regards to deep search demand.
Our experience over 10 years with FDA scientists, along with the deep search capabilities we have developed, have given us a window into enabling future search capabilities. Our key learning has been that sophisticated search is all about the background ontology* behind the data (going forward, I will refer to ontology by using the more industry familiar term; “taxonomy” - see the end of this post for a quick exploration of the difference between these two terms). By exploring Taxonomy and search, our goal is to innovate toward the future of product search.
Before we do, let's take a step back for some context.
Robust scientific search.
When the FDA first came to Label Insight looking for help with search, they had a particular problem in mind. They had recently passed a new regulation around trans fat labeling but had no effective way to measure the impact this new regulation was having on the market. They had been collecting information from physical products in stores, attempting to analyze ingredients to understand if the prevalence of trans fat ingredients had been affected by the new regulations.
The challenge they faced was that trans fat was not clearly identified on any single label. In fact, there were many different ingredients that could be considered trans fat containing (such as partially hydrogenated oils) and identifying the products with any one of those multiple ingredients was a huge challenge.
Fortunately, our core competency was the ingredient taxonomy that we had created, which enabled us to identify individual ingredients instantaneously. In simple terms we had created a map of every single ingredient and associated all of the different spellings and versions of ingredients so that we knew what everything was.
With this core technology we helped the FDA quickly create an attribute that identified every product containing a trans fat related ingredient, no matter which ingredient it was, how that ingredient was spelled, or which version of that ingredient name was used.
Simply put, FDA scientists needed a reliable way to identify products against a specific search criteria and our ingredient taxonomy helped them to achieve that goal.
Why a taxonomy is critical to search
To understand the importance of a taxonomy to search, I think about Legos™. To perform search without an underlying taxonomy is the equivalent of only using black and white Legos to make something. It is possible to build a structure in black and white but without the colors it's impossible to build something sophisticated. A search taxonomy is equivalent to using all of the colors in a Lego set. When you do, you will be able to see and organize all the pieces which are red or pieces which are blue so on and so forth.
A layer in a taxonomy is another level of information that can be used to organize the information further. In the Lego example, the size, shape, and number of dots can all be layers that can be used to find and use Lego pieces more effectively.
An taxonomy enables you to organize an unstructured dataset to search, filter, and in the end, utilize the data elements more effectively. A taxonomy is a background mapping which organizes all of the data elements by similarities. Lego pieces can be organized by color, by shape, size, or function. This layered taxonomy enables some simple yet robust search parameters. For instance one could easily perform a multivariate search for the rectangular, 6 dot, red Lego pieces. The more layers that are added to an taxonomy the more sophisticated the search parameters can be.
Transformation of data elements to High Order Attributes.
We call the process of organizing unstructured data the “transformation” of product data to create high-order attributes. High-order attributes are the result of our deconstruction, organization, and reconstruction of product data elements into searchable form.
When it comes to an underlying search taxonomy for product data we have created many more layers than in the Lego example above. For instance, when we built our ingredient taxonomy our first layer was to separate all individual ingredients. Our second layer was to group together all ingredients that were the same but that had different names - for instance yellow food coloring has over 1250 different names - so in order to truly find all products that contain this ingredient one needs to find the products that contain any of the 1250 same-as ingredients. Beyond this we've added hundreds of different layers to group ingredients by different properties enabling highly sophisticated search with a high degree of precision and accuracy.
The point is, the more layers of organization we add to our data elements, the more granular product search criteria becomes. We can search for all the products that contain an added color, or an added yellow coloring, or that contains specifically the ingredient “Color Yellow”, or instead the ingredient “Tartrazine” - both of which are examples of the 1250 different names for “color yellow”.
The dynamic taxonomy that powers our high order attributes makes it possible to search the market of products in an almost infinite number of ways. This is critical for the shopper search experience increasingly becoming required by consumers today, which I explored in my previous post, Grocery E-commerce 2.0.
Scientific Search in 2018
Today, we are powering this type of taxonomy driven granular product search capability, not just for FDA scientists, but for USDA scientists, and for much of the industry. The demand for robust search has gone far beyond the ingredient taxonomy where we began. Needs for sophisticated search has evolved from searching across all information on a package, to executing multivariate searches requiring both wider taxonomies and more layers to each taxonomy.
In response to this evolution, we've expanded our taxonomies into new dimensions including: marketing claims, romance copy, certifications, title, subtitle and brand, warnings, supply chain statements, and ethical statements, and more. Across all of these taxonomies, we have developed and organized many layers of mapping. These comprehensive taxonomies combined with the detailed layers power sophisticated multivariate search across all of the taxonomies and effectively across all of the product data.
Over time, scientific research has continued to evolve, and now includes data elements that are derived - that is, data that is determined based on the information provided for a product but not explicitly contained in that dataset. For instance, we have worked with the scientific community to develop a layer of ingredient classification that aligns with global standards. In addition, we've leveraged the scientific community to develop a granular product type categorization.
These two layers alone combine to provide a level of analysis and scientific scrutiny that is unique in product data search, and can benefit consumer search. It's a great example of the scientific community leading when it comes to product data search.
Grocery e-comm search in 2018
Although the underlying technology and data now exists to power more sophisticated search, most implementations fall short of a truly engaging search experience. Most consumer search takes place in an e-commerce environment such as Amazon, Instacart, Peapod, FreshDirect, Relay's; on retailer sites like Walmart, Target, Kroger; or on pick-up and delivery services like MyWebGrocer.
Regardless of where search is taking place, the number one challenge is consistent up-to-date product data. The entire CPG product search ecosystem continues to struggle with providing comprehensive, current product data to power the most basic of information needs for consumers, much less any type of sophisticated search.
An example of a sophisticated, taxonomy driven search experience is Raley’s Shelf Guide online. The user experience features the Raley’s Wellness attributes - such as Minimally Processed, Nutrient Dense, No Added Sugar, and Non-GMO - which are leveraged to help shoppers quickly interpret whether a product meets their needs based on their specific search criteria.
The health criteria are available throughout the shopping experience, and leveraged to filter lists of products quickly and conveniently. When the shopper drills down to the product page all the relevant wellness attributes are displayed for that product along with conventional product information such as ingredients and nutrition panel.
The Raley’s online shopping experience is one of the leading grocery user experiences in the market - once you shop online this way, it spoils you, and its tough to go back to shopping with aisles, shelves, and keyword search alone!
Flexible taxonomy makes search work:
Search is one of the most important challenge facing grocery e-commerce user experience at this time. Raley’s Supermarkets is a good example of a best in class implementation that demonstrates the potential for attributes driven by a dynamic taxonomy.
In this series of posts, we will be exploring the state of other search implementations in grocery e-commerce and opportunities for furthering the practice of shopper engagement both online and in-store.
* Taxonomy vs ontology
The difference between taxonomy and ontology is admittedly academic - what can I say, I'm a geek! Outside the academic community there is no real agreement around the difference. I like this post, which covers some of the differences in definition.
To understand the differences between ontology and taxonomy, a useful approach is that a Taxonomy is more like a tree in nature in that every level is nested from the one before whereas an ontology is more like a forest in that both nesting and cross referencing take place in something more resembling a web network.
Our underlying technology more closely resembles the latter.
Key to search challenge is dynamic taxonomies (this post)
Search challenge is breaking data standards (coming soon)
360 product view vs single source of truth (coming soon)
About Anton Xavier
Anton Xavier is a Co-Founder of Label Insight. With experience in management, operations and marketing, Anton has led the Label Insight team from its inception in Australia and subsequent move to the US, to its current position as a market leading, cloud-based product data engine. Completing postgraduate degrees in Australia, Anton gained invaluable management and marketing experience working with a variety of firms in Asia.