I Built a Serverless AI Search Engine for Over 400,000 Quotes After Attending a Summit

After attending the Google Web AI Summit, I was inspired to build a new kind of search engine. This is the story of that project, and what it reveals about the future of Web AI.

Oct 21, 2025

“Do interesting things and interesting things will happen to you.” - John Hegarty

I found that quote using an app I built in a day. This article is the story of why and how I built it, and what it taught me about the future of AI on the web.

The Spark: A Glimpse into the Future of Web AI

A few days ago, I attended the Google Web AI Summit, and it was a genuine “peek into the future” moment. While the entire event was packed with insights, one concept stood out to me: the rapidly growing power of running AI models directly in the browser.

The idea isn’t new, but the maturity of the tools is. We’re no longer talking about tiny, trivial models. We’re talking about sophisticated transformers capable of impressive tasks, running locally on your machine. This eliminates the need for a server, enhances privacy, and opens up a world of possibilities for applications that work offline. Inspired, I felt that familiar itch to build something tangible.

The Problem: An Indie Creator’s Itch

Having recently left Google to become an indie creator, my workflow has changed. A new ritual I'd like to adopt is finding a compelling quote to anchor each article I write. However, my search for the perfect tool was frustrating. Most quote websites are cluttered with ads, require logins, or rely on simple keyword matching that misses the semantic nuance of what I’m looking for.

This is where the summit’s inspiration clicked with my personal need. What if I could build my own quote search engine? One that was:

Free: No subscriptions or hidden costs.
Fast: Instant results without network latency.
Private: All searches happen on my device.
Semantic: It understands the meaning of my query, not just the words.

By building it as a “Web AI” application, I could deploy it for free on a platform like Hugging Face Spaces. For an indie creator managing costs, eliminating the server bill is a massive win.

The Result: A Quick Look at QuoteSearch

So, I built it. You can try it here: https://huggingface.co/spaces/ruidiao/QuoteSearch

It’s a simple interface that allows you to search over 400,000 quotes. After a one-time data download, it’s lightning-fast and even works completely offline.

The “How”: A Peek Under the Hood

I built the app in about a day—you could say it was completely “vibe coded”—and it’s fully open-source. This rapid development was only possible thanks to the incredible tools available. Here’s a quick look at the architecture.

The Stack:

Library: Transformers.js was the star of the show. It’s a powerful library from Hugging Face that brings their famous transformers architecture to JavaScript.
Model: I used nomic-ai/nomic-embed-text-v1.5, a popular and effective model for generating text embeddings.
Data: The core dataset is a collection of over 400,000 quotes from archive.org.

The Process (The Offline/Online Split):

The magic lies in splitting the work between offline preparation and online execution.

Offline Prep: The raw data was a bit messy. I wrote a script to perform some simple data cleaning, which removed about 16% of the quotes. While more sophisticated strategies could probably preserve more data, this was a quick and effective first pass. I then used the embedding model to convert all the remaining quotes into numerical vectors (embeddings). To further reduce the final data size, I also used quantization to reduce the precision of the numbers in each vector before saving everything into a single binary file.
Online Execution: When you visit the webpage, the app is just a simple static site. The first time you search, it triggers the download of the AI model (~180MB) and the pre-computed quote data (~140MB). Once the data is loaded, the model takes the user’s query and converts it into an embedding. It then computes the cosine similarity between the query embedding and all the pre-computed quote embeddings. Finally, it shows the top results (the most similar quotes) to the user.

The User Experience Touches:

I knew that asking a user to download over 300MB of data is a big ask. To make this manageable, I added:

A clear heads-up message stating the total download size before the process begins.
A progress bar to track the download of the model and data files.
A “Clear Cache” button so users can easily reclaim their local storage space.
Thanks for reading The Signal! Subscribe for free to receive new posts and support my work.

My Key Learnings: The Good, The Bad, and The Surprising

This small project was incredibly insightful.

Learning #1: It’s surprisingly easy to get started. High-level libraries like Transformers.js abstract away enormous complexity. I didn’t have to be a machine learning expert to build a powerful semantic search app.
Learning #2: The upfront cost is real. This is the biggest trade-off. A ~320MB initial download is not trivial. For some use cases, this could be a dealbreaker. However, for others—especially tools you use regularly—it’s a one-time cost for permanent offline access.
Learning #3: Model selection is a practical choice. Transformers.js supports many models, but not all of them. I simply picked one of the most downloaded compatible models, and it worked beautifully. You don’t always need a niche, specialized model to get great results.
Learning #4: Mobile browsers can handle heavy lifting. I’ll admit I was curious how the mobile experience would hold up. The idea of loading a ~180MB model and searching ~140MB of data on a phone is a serious stress test for any browser. The result was a pleasant surprise: the app works perfectly on my mobile browser, a powerful testament to how capable the modern mobile web platform has become. This opens the door for truly powerful, private AI tools that run on the devices we use most.
Thanks for reading The Signal! This post is public so feel free to share it.
Share

Conclusion: The Future is Local (Sometimes)

Building QuoteSearch convinced me that Web AI is a powerful new paradigm. It’s not a replacement for server-side AI, but it’s a brilliant alternative for specific applications. You trade server costs and complexity for a one-time download on the user’s end.

The benefits are clear: unmatched privacy, zero network latency after setup, offline functionality, and no server bills.

While my app has a large data footprint, many other use cases might involve a small model with little or no data, making the Web AI approach even more ideal. As browsers become more powerful and internet speeds increase, this upfront “cost” will become less and less of an issue. For now, it represents a fascinating new frontier for developers, hobbyists, and indie creators.

The era of powerful, local, in-browser AI is just beginning.

Thanks for reading. For those who are more active on other platforms, you can also find me on LinkedIn or X.

Rainbow Roxy

Oct 31

Thanks for writing this, it clarifies a lot. The part about sophisticated transformers running locally on your machine make me think deeply about the future of user centric AI. So insightful.

1 reply by Rui Diao

Dan McRae

Oct 21

Just tried it and it worked like a charm. I asked ChatGPT this morning for this quote to add to a presentation and your Space did just as well- "No man's life, liberty or property are safe while the legislature is in session."

- Gideon J. (Gideon John) Tucker

2 replies by Rui Diao and others

3 more comments...

The Signal

Discussion about this post

Ready for more?