Web search

Running web searches for getting current information

Web searches matter especially for questions where the answer can change between when the language model was trained and today. For example:

  • What was the weather on this date?

  • What is the market cap of Google?

  • Who is the president of the United States?

If you run the last question using the question-answerer, you might get an answer like:

The current president of the United States is Donald Trump.

Let’s start by simply providing the list of search results as additional context before answering a question. To do this, let’s write a helper function that uses SerpAPI to retrieve the search results. (You could similarly use the Bing API. In either case you need an API key.)

Running web searches

search_json.py
import httpx

from fvalues import F

from ice.recipe import recipe


def make_qa_prompt(context: str, question: str) -> str:
    return F(
        f"""
Background text: "{context}"

Answer the following question about the background text above:

Question: "{question}"
Answer: "
"""
    ).strip()


async def search(query: str = "Who is the president of the United States?") -> dict:
    async with httpx.AsyncClient() as client:
        params = {"q": query, "hl": "en", "gl": "us", "api_key": "e29...b4c"}
        response = await client.get("https://serpapi.com/search", params=params)
        return response.json()


recipe.main(search)

Running python search_json.py returns a large JSON object:

Rendering search results to prompts

We add a method to render the search results to a string (remember to update the code below with your own API key):

Now the results are much more manageable:

Answering questions given search results

Now all we need to do is stick the search results into the Q&A prompt (remember to update the code below with your own API key):

If we run this file…

…we get:

Much better!

Execution trace (view online)

Choosing better queries

There’s still something unsatisfying—we’re directly searching for the question, but it could be better to let the model control what search terms we use. This is especially true for complex questions that we don’t expect to get a full answer to through Google, like:

Based on the weather on Sep 14th 2022, how many people do you think went to the beach in San Francisco?

Here it’s probably better to just research the weather on that date using Google, not to enter the whole question. So let’s introduce a choose_query method (remember to update the code below with your own API key):

If we run our question…

…we get:

The query chosen by the model was “beach weather san francisco september 12th 2022”. The results here may differ on each run. For another example, see this trace:

Execution trace (view online)

Exercises

  1. It’s nice to look at search results, but often the results are in the actual web pages. Extend the recipe to add the text of the first web page.

  2. Use the model to decide which of the search results to expand.

Get feedback on exercise solutions

If you want feedback on your exercise solutions, submit them through this form. We—the team at Ought—are happy to give our quick take on whether you missed any interesting ideas.

References

  1. Nakano, Reiichiro, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, et al. WebGPT: Browser-Assisted Question-Answering with Human Feedback

Last updated