Interpreters

Executing code for more accurate computation

Sometimes the limitation isn’t factual knowledge, but ability to do computation.

For example, if we ask the basic question-answerer “What is 578921 days * 12312 miles/day?”:

python qa_simple.py --question "What is 578921 days * 12312 miles/day?"

we get:

7223849252 miles

This is similar to the correct answer 7127675352 miles, but not the same.

Evaluating Python expressions

Let’s add a method for evaluating Python expressions:

eval_direct.py
from fvalues import F

from ice.recipe import recipe


def eval_python(expression: str) -> str:
    try:
        result = eval(expression)
    except Exception as e:
        result = F(f"Error: {e}")
    return str(result)


async def answer_by_computation(question: str):
    return eval_python(question)


recipe.main(answer_by_computation)

This works as expected for expressions that are literally Python code:

Of course, it doesn’t work for natural language questions that benefit from compute:

So, we need to choose what to evaluate.

Choosing what to evaluate

We make a prompt that asks the model what expression to enter into a Python interpreter to answer the question. We’ll also print out the result of evaluating this expression:

If we run this on our example…

…we get:

This is a helpful expression and result!

Execution trace (view online)

Using the results of evaluation

Now all we need to do this provide this expression and result as additional context for the basic question-answerer.

Rerunning our test case…

…we get the correct answer:

Another example:

If I have $500 and get 3.7% interest over 16 years, what do I have at the end?

Running this:

We get:

In contrast, the basic question-answerer says “You would have $1,034,957.29 at the end.”

Execution trace (view online)

Exercises

  1. Many questions can only be answered using longer algorithms in Python. Extend the code above to support multi-line Python programs (example).

  2. Another approach to (1) is to let the model “enter” multiple expressions into the interpreter. Extend the recipe to support this.

Get feedback on exercise solutions

If you want feedback on your exercise solutions, submit them through this form. We—the team at Ought—are happy to give our quick take on whether you missed any interesting ideas.

Last updated