githubEdit

Interpreters

Executing code for more accurate computation

Sometimes the limitation isn’t factual knowledge, but ability to do computation.

For example, if we ask the basic question-answerer “What is 578921 days * 12312 miles/day?”:

python qa_simple.py --question "What is 578921 days * 12312 miles/day?"

we get:

7223849252 miles

This is similar to the correct answer 7127675352 miles, but not the same.

Evaluating Python expressions

Let’s add a method for evaluating Python expressions:

eval_direct.py
from fvalues import F

from ice.recipe import recipe


def eval_python(expression: str) -> str:
    try:
        result = eval(expression)
    except Exception as e:
        result = F(f"Error: {e}")
    return str(result)


async def answer_by_computation(question: str):
    return eval_python(question)


recipe.main(answer_by_computation)

This works as expected for expressions that are literally Python code:

Of course, it doesn’t work for natural language questions that benefit from compute:

So, we need to choose what to evaluate.

circle-exclamation

Choosing what to evaluate

We make a prompt that asks the model what expression to enter into a Python interpreter to answer the question. We’ll also print out the result of evaluating this expression:

If we run this on our example…

…we get:

This is a helpful expression and result!

Execution trace (view onlinearrow-up-right)

Using the results of evaluation

Now all we need to do this provide this expression and result as additional context for the basic question-answerer.

Rerunning our test case…

…we get the correct answer:

Another example:

If I have $500 and get 3.7% interest over 16 years, what do I have at the end?

Running this:

We get:

In contrast, the basic question-answerer says “You would have $1,034,957.29 at the end.”

Execution trace (view onlinearrow-up-right)

Exercises

  1. Many questions can only be answered using longer algorithms in Python. Extend the code above to support multi-line Python programs (examplearrow-up-right).

  2. Another approach to (1) is to let the model “enter” multiple expressions into the interpreter. Extend the recipe to support this.

chevron-rightGet feedback on exercise solutionshashtag

If you want feedback on your exercise solutions, submit them through this formarrow-up-right. We—the team at Ought—are happy to give our quick take on whether you missed any interesting ideas.

Last updated