Google is Dead? Long Live Perplexity and Metaphor?

Move over GPT, there are new AI search engines in town. Can they scare Larry and Sergey

Jan 24, 2024

Hey everyone 👋 — I’m Avirath. I learn a lot from analysing new products, thinking about new product strategies and my opinion on the markets/products for the future. I wanted to share this knowledge with my readers - that’s the motive behind this newsletter.

This article covers a breakdown of Perplexity and Metaphor System’s foray into creating AI search engines and as a whole covering the opportunities and risks of new age AI search engines vs incumbents primarily Google.

I am also looking for writers in colleges in the US to join me in breaking down companies - I have an exciting product coming in this space.

Read till the end for a surprise!

Join the fun on Twitter.

Actionable Insights

If you're short on time, here's what you need to know about Perplexity AI, Metaphor Systems, Google and the evolving landscape of AI-driven search engines.

Innovative Search Engines: Perplexity and Metaphor. These AI-powered search engines are revolutionizing information retrieval by focusing on context, relevance, and reducing misinformation ('hallucinations'). Perplexity AI, founded by ex OpenAI and Google DeepMind scientist Aravind Srinivas, is building a search experience that prioritizes accuracy and context. Metaphor Systems, meanwhile, specializes in site-specific indexing and link prediction for a more targeted search experience.
Challenging the Status Quo. The traditional search engine model, dominated by Google, is under threat. These new engines address the limitations of current search technologies by providing more direct, relevant results, especially for subjective and nuanced queries. They're redefining the user's search experience by offering more meaningful and context-aware responses.
Key Features and Opportunities. Perplexity and Metaphor stand out with their advanced content matching, real-time information updates, unique indexing methods, and API offerings. These features represent significant opportunities in specific verticals like subjective search and contextual follow-up search, potentially capturing market share from Google.
Economic Viability. These AI-driven search engines face the challenge of high operational costs compared to traditional search engines. The cost per query with AI models can be high, which might necessitate a hybrid model of monetization or a focus on subscription models for niche, high-value verticals.
Facing Giants. Despite their innovative approaches, Perplexity and Metaphor must contend with the entrenched dominance and technical prowess of Google, especially with its own advancements in AI through Gemini. The Lindy Effect suggests that well-established services like Google tend to persist, making it challenging for new entrants to disrupt the market.
A Fragmented AI Search Landscape. The field is becoming crowded with multiple players like Perplexity, Metaphor, and others, each vying for a share of the search engine market. This fragmentation might benefit Google but also presents a unique opportunity for newcomers to specialize and excel in specific niches.
Potential for Open-Source AI Integration. Developments in open-source AI models, like Meta’s Llama 3, could lower operational costs for these search engines, allowing them to scale more effectively and compete with established players.
The Way Forward. For Perplexity and Metaphor, success lies in starting small, focusing on specific verticals, and gradually expanding. Their ability to synthesize and contextualize information rapidly is key to chipping away at Google's dominance in specific sectors.

If you were to pick the best way to access information in the 20th century, libraries would be hot spots.

For researchers and the curious alike, libraries loomed large as havens of knowledge, amassing an extensive assortment of books, journals, and newspapers. They provided a pivotal platform for individuals to plunge into a plethora of subjects, aided by the expertise of librarians and meticulous catalog systems. For students, scholars, and the inquisitive, libraries offered a thorough, albeit sometimes labyrinthine, pathway to information gathering.

With time, libraries laid the groundwork for knowledge and around them were built some of the finest educational institutions that sit strong till date.

If you were to stack rank the most transformative tools for accessing information in the 21st century, search engines would undoubtedly be near the top.

For developers and content creators, early search engines like AltaVista and Google revolutionized the discovery and indexing of information, transforming the vast and chaotic World Wide Web into a navigable and useful resource. For users, they offered a more convenient and efficient way to find information on any topic imaginable, from the latest news to historical archives, often with just a simple query.

With time, search engines centralized the tools and workflows of information seekers.

You can see their impact every time you type a query into that search bar on your screen.

Origins of knowledge being available to humans in a bundled form can be traced back to directories like Yellow Pages, catalogues and books like the Encyclopædia Britannica. The progress from data being available on those sources to information being available on the Internet and indexed on search engines can be thought of as solving an optimisation problem to shorten effort and distance between query and answer.

With time, our dependence on engines for searching for the smart, weird, funny, profane and inane has proven to be quite magnanimous; magnanimous to the extent that the world searches 70,000 times every second and voice searches a billion times every month.

Several companies and technologies have competed to reach the enviable position of being at the top of the search engine food-chain. However, the foundations of the modern apex predator search engine a la Google were probably laid in 1996 when Larry Page and Sergey Brin documented the PageRank algorithm for the first time. While internet search engines had existed up till then, none had attempted to categorise information based on the quantity of quality backlinks a source had.

One can only understate the effect Larry and Sergey have had on information retrieval. Today, Google holds 83% of the desktop search engine market share worldwide and a whopping 95% of the mobile search engine market share worldwide. The company generated $283 billion in revenue in 2022, with search contributing 57% of that figure.

As the company has grown into a trillion dollar giant, its ecosystem of being home to the largest index of maps, videos and webpages has only grown. As a student of computer science, I’ve often heard that software engineering is not about “knowing things”, it’s about “googling” well. Google has always had all the answers. It’s been the go-to library in the digital world!

However, as the years have rolled on, getting fitting and non SEOed/non ad-oriented results has become analogous to traversing a maze. Though Google indexes sites like Twitter, Reddit and Substack, finding high signal information or curated opinions from such sites has become an odd chance event. At the same time, Google’s focus to monetise through ads has led to minimalistic distinction between non-ad and ad results.

History of Google Ad Labeling — Google’s ads and non-ad results are practically the same colour.

While several engines like Neeva and DuckDuckGo tried to attack Google’s dominion on the privacy angle, none could command sizeable user share to mount a threatening challenge.

But for the first time, the waters of Google’s empire seem choppy. Tech personalities and ex-Googlers are clamouring for Google’s disruption in less than a couple of years; analysts are calling for the composed Sundar Pichai to be replaced by a wartime general; Google has internally panicked with a Code Red. There seems to be an East wind coming!

All of this trouble for Google boils down to rapid advancement in artificial intelligence, specifically the birth of production level LLM(large language model) products like ChatGPT. As GPT took the world by storm becoming the fastest ever product to find a 100 million users, it seemed to make search and information retrieval simple again.

A few problems did arise though. Knowledge cutoffs, hallucinations, and the design of such models(next word probability attribution) meant that the information from such models may not always be true.

The solution to these troubles started seeming obvious to an intelligent few. To prevent the limitations above, they started using data on the Internet or knowledge bases/search indexes as the source of truth on top of which they experimented with modern LLMs and AI to tailor results to what a user actually needs.

Perplexity AI and Metaphor Systems have made the most encouraging attempts so far.

Product

Note(Understanding embeddings can help understand this section better): Embeddings are a crucial concept in AI and machine learning. Imagine if you had to explain the meaning of a word not by using a dictionary definition, but by showing how it relates to other words and concepts. Embeddings do this in mathematical form. They transform words or phrases into vectors (a series of numbers) in a way that captures their meanings, nuances, and relationships with other words. A better example is: Imagine you're at a party where you know a few people. Your understanding of each person is based not just on their name but on how they relate to others and their characteristics. Embeddings work similarly in AI.

Imagine an orchestra where each musician expertly plays their part, creating a harmonious symphony. This is the essence of Perplexity AI. Founded by Aravind Srinivas, an ex OpenAI and Google DeepMind scientist, Perplexity AI aims to provide a radically improved search experience prioritising context and relevance, delivering more accurate and hallucination-free search results.

Perplexity AI is a new AI chat tool that acts as an extremely powerful search engine. When a user inputs a question, the model scours the internet to give an answer. In addition, it has ability to display the source of the information it provides with close to zero hallucination which leads to the conclusion that it fetches information from a well organised search index like Google’s.

The best way to understand Perplexity’s product and appeal would be drawing parallels with current dominating browsers:

Advanced Content Matching: Unlike conventional search engines, Perplexity AI doesn't rely solely on keyword matching where high word frequency in specific HTML elements means the website is ranked higher. It uses a sophisticated system that employs existing search indexes to organize web data. On top of this, it adds a layer of embeddings that enhances the ability to match the actual context/content of a site closely with the user's query. This leads to retrieving data that more directly answers the query rather than have a high frequency of keywords from the query. Basically, better quality of content rather than quantity of keywords mentioned leads to those sites being used in Perplexity’s answer generation.
Imagine searching for "apple pie recipes without sugar." A traditional keyword-based search might lead you to pages filled with "apple," "pie," and "sugar," often including regular recipes that don't match your specific sugar-free requirement. In contrast, an embedding search, like Perplexity AI, understands the context – it recognizes you're looking for a sugar-free version. It fetches results that genuinely offer apple pie recipes without sugar, even if those exact words aren't frequently used on the page.
Real-time Information and Zero Hallucination: The platform ensures that the information it retrieves and presents is current and accurate, thanks to its continuously updating index. This approach significantly reduces the chances of providing outdated or incorrect information, a common issue known as 'hallucination' in AI parlance.
Developing a Unique Search Index: Perplexity AI is also in the process of creating its own search index. This involves developing a proprietary version of the PageRank algorithm, indicating a shift towards valuing context and relevance over simple keyword optimization. This method is reminiscent of how neural networks prioritize different parameters, emphasizing the importance of the content's quality and pertinence to the query.
API: Perplexity also has an API that allows users to integrate its AI capabilities into their own applications. The API draws from a model fine-tuned on Meta’s Llama and Mistral AI models.

Metaphor Systems operates in a similar way to Perplexity but offers a unique approach to search, focusing on site-specific indexing and the innovative concept of link prediction.

Site-Specific Indexing: Metaphor Systems indexes websites like GitHub, Twitter, and Wikipedia separately. This specialization allows for more accurate and relevant search results within specific comment sections or posts, ensuring that users find the most pertinent/opinionated information for their queries on sites of their liking.
Link Prediction Approach: Diverging from the conventional word prediction models, Metaphor Systems focuses on link prediction. This method involves determining the most suitable URL that aligns with a user's partial query, thereby ensuring that the search results are closely matched with the user's intent.
Natural Language Optimization in Prompts: The platform excels in refining search queries to reflect the way people naturally describe and share links online. This focus on optimizing prompts for natural internet language significantly enhances the accuracy and relevance of the search results.

While Metaphor does provide a free limited search UI option, it also provides an API to fetch links and also upto 1000 tokens worth of scraped text data from linked websites.

Key Features/Opportunities

More Direct Results

Having worked in product roles, the one thing PMs try to optimise for is reducing friction and actions needed to be taken to complete a certain task. AI systems like Metaphor and Perplexity organise their indexes and use LLMs in such a way that they can analyse content, understand its direct usefulness to the end-user and filter out spammy or indirect websites. The image below is a perfect example of how Metaphor cuts out intermediary steps of having to traverse through a blog to find the best startups in AI agents.

A comparison of the directness of Metaphor vs Google Search

Building in niche verticals:

In a February 2023 interview, Satya Nadella said “he wanted to make Google dance” as Microsoft integrated GPT in their search experience with Bing. Three months prior, OpenAI had released ChatGPT. In the days ensuing, if Twitter were your only source of information, you would have pronounced Google dead. Former loyalists to the Google empire like Paul Bucheitt(founder of Gmail) predicted that frontier AI technology would render Google obsolete in just a couple of years.

However, after several months of usage and observing search behaviour, I strongly believe the opportunity and market for search engines built on top of LLM technology should be based off these verticals:

Subjective Search: Talia Goldberg, of Bessemer, dictates “subjective search” to be the queries that aren’t searching for facts or specific results. They don’t have cut and dry answers. They are Ideas. Opinions. Recommendations. Advice. Conversations. “Subjective search” encompasses everything from restaurant or recipe recommendations, to commentary on politics or new technology, to fitness tips and product reviews.

Subjective search doesn’t live in SEO-optimised websites, biased media publications or politically diplomatic articles. It lives in the comments of brutal feedback provided to product launches on HackerNews, the twenty different ways of managing context and state in React on StackOverflow, the wildest takes on Sam Altman being ousted from OpenAI on Reddit, scouring for “cute date-night dresses” on Pinterest and countless other site-specific searches.
Most of these sites implement middling forms of search, similar to Google’s keyword matching functionality. Newer engines can look to optimise indexing at such atomic level where such hidden information can be retrieved faster. Metaphor does atomic indexing but there are miles to go to improve the quality of links returned. More importantly, the marketing can be modified to promote the fact they are better subjective search engines that curate well and chip away vertically rather than trying to compete with Google as a whole.
In 2014, Eric Schmidt said this about Google’s competition:
“Many people think our main competition is Bing or Yahoo. But, really, our biggest search competitor is Amazon,” Eric Schmidt, currently serving as Google’s executive chairman, told a crowd in Berlin. “They are obviously more focused on the commerce side of the equation, but, at their roots, they are answering users’ questions and searches, just as we are.”
One of the biggest risks to Google’s ad business, especially those with commercial intent, increasingly start on other platforms. This might include:
- Product searches on Amazon/Etsy
- Event searches on Partiful/Eventbrite
- Restaurant searches on Yelp/Corner
- Fashion/Clothes searches on Pinterest
This danger has existed for some time, and Google has effectively managed it, usually by responding with features like Google Flights and Google Reviews along with ads, or by capitalizing on its advantage of neutrality by showing Amazon listed products in its ad recommendations to note an example.
If Metaphor or Perplexity can create a tailored experience for detailed searching within a vertical which is 10x superior to the largely one-sized fits all approach of Google, along with a RLHF like process for upvoting/downvoting results, it could definitely be a start to vertically stealing market share.
Contextual or Follow-Up Search: In 2022, users reformulated their queries 17.9% of the time on desktop and 29.3% of the time on mobile. Cutting away a few percentage points, we could still estimate that about 10-15% of the time search engines don’t contextually process the user’s query. Eating up even 10% of search activity amounts to 200 billion searches every year. While Perplexity implements conversational style contextual search well and Metaphor seems to have gotten well a ways with niche indexing, a combination of these services could entice users to switch loyalties away from Google.
Cost Perspective: Further, the traditional monetisation model championed by Google, focusing on higher link optimization/clickthrough and ad revenue, stands in stark contrast to the subscription model approaches of Perplexity AI and Metaphor Systems. While Google’s strategy to rely heavily on maximizing user engagement with links and increasing exposure to advertisements has started to prove intrusive in nature, it’s still economically viable. The company’s substantial costs of constantly re-indexing the index and running servers to hold the gargantuan amount of information can be thought of as fixed in nature where every request does not trigger an additional variable cost in opposition to Perplexity and Metaphor where every answer racks up an inference cost.
The cost of processing a query in a LLM-driven search bot like ChatGPT, estimated at around 3 cents per query, scales to astronomical figures at Google’s volume of approximately 10 billion queries per day. This would translate to an unsustainable expense of around $110 billion annually. Just to put it in scale, Uber has to 3.1x their revenue to reach that metric.
Considering this, two options seem viable:
- A hybrid model might emerge. In this scenario, deterministic queries, those with straightforward, factual answers, would still be directed through traditional search engines like Google or Bing. On the other hand, more subjective, nuanced queries for sites like Reddit or Stack Overflow could be addressed by LLM-based search engines.
- Vertical chatbots for industry wide spaces like medicine, law, politics could actually help champion the subscription model where aficionados in those spaces would actually pay $20 - $100 price monthly for the high value a fine-tuned or fine-trained model would create.

Key Challenges

Google’s longevity and AI team

All of our concerns about Google dropping off would be even more heightened if they didn't have such a technically gifted AI team in the Deepmind division. To counter OpenAI’s GPT, Google released Bard which they recently updated with their GPT-4 counterpart - Gemini. Gemini does beat GPT-4 on several benchmark tests and certainly performs equally well in its responses for tasks like code generation, creative tasks and text-based understanding/summarisation.

Packy McCormick, in Excel Never Dies, states that one of the reason Excel doesn’t taper off is the longer something lasts, the longer it can be expected to last. Something that has been around for a year is expected to be around for another year, but something that has been around for 100 years is expected to be around for another 100 years. That is the Lindy Effect at work and it applies to Google too.

The Lindy Effect is true for a few reasons:

Quality and Simplicity: Cream rises to the top, and only the strong survive. Part of the Lindy Effect can just be explained by the fact that some things are higher-quality than others, that people recognize and appreciate quality, and that over time, higher-quality things tend to outlast lower-quality things. If we were to hypothetically place Freud's foundational works on a perpetual library shelf beside a teenager’s ramblings on abandonment, love and loneliness, Freud's insights would likely be continually chosen and revered generation over generation.

There are parts of the world that haven’t even exposed themselves to advanced Googling methods since most of their search activity is very deterministic and easily findable. Statistics help prove this:

75% of searchers only look at the first page of search results
Over 27% of clicks go to the first Google search result.

Google has been serving these customers well for a couple of decades. Only in the upper echelons of Tech Twitter do we find a “Google is Dead. Long Live GPT-tech” attitude.

Google gets the job done for the majority of the world’s searchers.

Power Curve showing behaviours of user distribution for Google Search

Here is another example reiterating how answering the simple questions helps Google be a Lindy software. Look at the world’s twenty most popular Google searches by keyword:

How many questions could Perplexity or Metaphor respond to at Google’s speed, and in what areas would it surpass Google? ChatGPT lacks the ability to provide real-time information such as location details, weather updates, or current news. It can generate some website links and possibly has an advantage in translation tasks. However, that's roughly the extent of its capabilities. Newer engines shine in solving the complex but don’t attempt the simple.

There are smaller inadequacies, too. New engines do not attempt to link requests to potential actions - we are used to a restaurant recommendation being appended by a link to request a reservation, a business/location search has a Google Maps extension attached, sports games have live scores being updated play by play.

Metaphor or Perplexity are unlikely to build mapping, booking, or shopping features soon. Though it could integrate with other players, doing so may introduce friction for users.

Network Effects and Distribution: As a product or concept proves its worth and endures over time, people gain confidence in adding to it, enhancing its chances of continued existence. More than 80% of consumers find local information and locations using search engines. The amount of “near me” searches has increased by 500% in recent years. Though ads can be intrusive, it levels the playing field for small stores and restaurants with giants in terms of visibility. For the same reason, the ROI for ads sit at a sweet 700% and personalised ads boost sales by 65%. Numbers also show consumers actually want to look at ads to make purchasing decisions.

The flywheel of using Google’s distribution through Maps, YouTube, Search and adding to it with more information and well targeted ads is a much bigger moat than what seems visible.

However, a case can be made for newer search engines supporting more focused and contextual conversations for future generations with increasing literacy rates and overall employment. A census study shows that the percentage of adults in the U. S. between the ages of 25 to 64 with college degrees, certificates, or industry-recognized certifications, has increased from 37.9% in 2009 to 53.7% in 2021, a gain of nearly 16 percentage points. This means more academics, more engineers, more doctors, more lawyers and more people becoming really good in their respective spheres. They can make a sizeable market for vertical or specific answer engines - a blue ocean for the incoming generation.

Intense competition among new players

One of the key ways the British took over India in the 1800s was they used the fragmentation of religious kingdoms such as the Rajputs in the North, the Mughals, the Cholas, the Nawabs to foster the divide between them. Together they posed a threat, but divided the British were too strong. Slowly, most of these majestic empires dissolved.

The AI race must be having a deja-vu moment. Companies like Perplexity, Metaphor, You, Neeva and several others are trying to vie for the top-spot. This can prove beneficial to Google who may find these young companies fighting for user activity and in the process losing out on the bigger picture. There is a quote attributed to the great Ralph Waldo Emerson - If you swing at the king, you best not miss. Currently, all of these fledgling startups are aiming at Google but their capital, resource and know-how is divided making it easier for Google to dodge these threats.

At the same time, another AI race is brewing that can boost the economics of such companies. While breaking down Hugging Face in another newsletter, we found that the recent AI-hype is also brewing a battle between open-source AI and commercial organisations. Recently, Zuckerberg dropped a bombshell announcing Meta’s Llama 3 would be open-sourced. Their previous models had already made huge headways in competing with OpenAI’s GPT-4 and Anthropic’s Claude.

Competent open-source models would lead to Perplexity and Metaphor optimising their compute cost(not paying a markup on request costs that OpenAI/Anthropic probably charge), using more flexible caching mechanisms opening up the possibility to make it economically viable to scale number of user queries for their search engines.

Execution

As discussed above, successfully re-imagining the search engine effectively means competing with Google/Microsoft — and winning.

That takes a strong team.

Thankfully both teams have strong founders and engineers. Here are a few notable ones:

Perplexity AI

Aravind Srinivas - Founder and CEO - Aravind was a Research Scientist at Google, Google DeepMind and OpenAI before he co-founded Perplexity.
Denis Yarats - Founder and CTO - Denis was a PhD student at NYU(Yann LeCun teaches at NYU), ML Engineer at Quora and AI researcher at Facebook before he co-founded Perplexity.
Jonny Ho - Founder and Chief Strategy Officer - Jonny is a Harvard grad who did engineering at Quora, quantitative trading at Tower Research before he joined Aravind and Denis at Perplexity.
Andy Konwinski - Co-Founder - Andy is a former PhD student from Berkeley, co-founder of Databricks and an instructor at UC Berkeley in addition to working with the Perplexity team.

Metaphor Systems

Will Bryk - Founder and CEO - Will interned at Zoom and worked as an engineer at Cresta before going on to found Metaphor.
Jeffrey Wang - Founder - Jeffrey interned and worked full time at Plaid before founding Metaphor.
Kudzo Ahegbebu - Founder and CTO - Kudzo had roles at OpenAI as a Research Scientist and Research Fellow, Facebook as an engineer before becoming CTO at Metaphor.

Conclusion

Search engines are the most important piece of software in the world today.

They centralise and amplify the world’s collective information. The internet as we know it wouldn’t exist without them.

For Metaphor and Perplexity, meaningfully improving the engine means starting small and chipping off vertically at specific sectors and sites. Perplexity and Metaphor's talent to quickly synthesize, create, and merge information could be particularly beneficial.

Let the games begin.

If you have any feedback or suggestion, do loop it in the comments section. I’ll be sure to take a look at it and even reach out to discuss if possible. You can reach me at avirathtibrewala@gmail.com for any other queries or discussions.

As a part of my newsletter, I wanted to affect some moral good to the society we live in. Every time I publish a post, I’ve decided to share an artefact or cool thing on kindness and benevolence.

This week, listen to this amazing song on the importance of time from my favourite band Pink Floyd.

Akash

Jan 25, 2024

Just a brain dump of some thoughts I had while reading. A lot of Google's strength just comes from marketing/branding. Just the fact that genericization occurred and people use Google as a verb is proof of that. Funny how of the top 10 google searched things, 5/10 (assuming google weather widget - which has to be the biggest), are Google products. One of Google's, as a company, biggest safeguards against the rise of LLM search is it's diversification. YouTube competes with Netflix in terms of video while Youtube is also one of the biggest social media. Gmail has dominant email market share. Google maps is by far the biggest map software. Google suite and drive is the biggest share of text editors / spreadsheet / file storage only really competing with Office which I'd argue is a different vertical since office is paid. They have all this dominant market share, especially compared to second, in areas that are critical in everyones life. Only place where I'm pretty bearish is that GCP is worse than Azure & AWS. Also pretty interestingly, google made it free to transfer data when exiting google cloud. Almost forgot about Chrome, Android, and previously Kubernetes. Also most of these LLM Search apps will live on the web which means they'll use Google Ads unless they have a subscription based model. Generally don't think any of these LLM search apps will scale unless they have a usable free option. The average person won't suddenly pay for a core service like search when it historically is free and a unpaid, good service like google will continue to exist. Also kinda funny that I've heard nothing about gemini recently. Also have actually seen people using other search engines like duckduckgo and bing where that would have been unthinkable 5 years ago. Also curious if there's a way for perplexity to monetize the data from LLM searches in the same way google monetizes all the data from their products for ads.

Expand full comment

1 reply by Avirath Tibrewala

Chinar Dankhara

Excellent writeup as always. A few things in my mind:

Lindy effect is partially a result of inertia. For me to switch from Google means switching not only my searches but my flight bookings, password management, and more. A competitor would need to match and exceed Google across these dimensions for anyone to switch. There is value in inertia and being a one-stop shop.

Google ranking systems have gotten much more complicated than PageRank. Their own blog mentions almost regular use of BERT for embeddings, entity resolution systems to build and query a knowledge graph for their knowledge panels, intent recognition for special results around bookings, and more. Ofcourse, its hard to individual apps to devote so many resources to search - they can't compete against Google unless there is a sea change in technology and UX, which is what we see now with Perplexity and others.

2 more comments...

Avirath’s Newsletter

Discussion about this post