Google’s search ecosystem has shown impressive strength. The virtuous cycle of publishers optimizing content for Google and users searching on Google to find such optimized content is still going strong.
Google’s or in fact any web search engines weak spot is “dark” content that is not crawlable, in proprietary formats or databases, or purposely hidden from search engines. To find innovative ways to access and search structured content was my focus as partner development manager at Bing.
One area of innovation was structured content submitted by publishers in feed format or through APIs. For the shopping search vertical we had developed a feed ingestion pipeline, which became the starting point for Bing’s massive scalable structured data ingestion infrastructure.
We started to explore how a structured data ecosystem could be created and operated with a recipe vertical and efforts targeted at enterprises with commercial web sites. As part of the Fast acquisition by Office a team of search engineers in Oslo joined my group to strengthen the enterprise-focused effort.
The second area of investment was centered on using structured data to compute answers and to allow people to explore search results through sophisticated refinements. Our approach was to describe data by a dynamic model that during retrieval was executed by an engine embedded in the search stack to compute results.
The effort to bring computational power into the search stack was accelerated with a partnership with Wolfram|Alpha (WA) and its technology stack to serve knowledge computed from expertly curated data. The WA results were used to enrich Bing’s results in select areas such as nutrition, health, and advanced mathematics.
The technical challenge of integrating a computational engine with unbound execution time into a web search stack – with a response time in milliseconds – was tremendous and required a heroic effort by everybody involved.
It is interesting to follow recent efforts of Google/ Bing to provide answers / knowledge based on structured data. However, without a functioning publisher ecosystem efforts will not scale and without innovation in how to rank and disambiguate structured data results will suffer from low relevance.