Interpreters
Executing code for more accurate computation
Sometimes the limitation isn’t factual knowledge, but ability to do computation.
For example, if we ask the basic question-answerer “What is 578921 days * 12312 miles/day?”:
python qa_simple.py --question "What is 578921 days * 12312 miles/day?"we get:
7223849252 milesThis is similar to the correct answer 7127675352 miles, but not the same.
Evaluating Python expressions
Let’s add a method for evaluating Python expressions:
from fvalues import F
from ice.recipe import recipe
def eval_python(expression: str) -> str:
try:
result = eval(expression)
except Exception as e:
result = F(f"Error: {e}")
return str(result)
async def answer_by_computation(question: str):
return eval_python(question)
recipe.main(answer_by_computation)This works as expected for expressions that are literally Python code:
Of course, it doesn’t work for natural language questions that benefit from compute:
So, we need to choose what to evaluate.
Evaluating arbitrary expressions is dangerous. Don’t use this approach outside of highly experimental code.
Choosing what to evaluate
We make a prompt that asks the model what expression to enter into a Python interpreter to answer the question. We’ll also print out the result of evaluating this expression:
If we run this on our example…
…we get:
This is a helpful expression and result!

Using the results of evaluation
Now all we need to do this provide this expression and result as additional context for the basic question-answerer.
Rerunning our test case…
…we get the correct answer:
Another example:
If I have $500 and get 3.7% interest over 16 years, what do I have at the end?
Running this:
We get:
In contrast, the basic question-answerer says “You would have $1,034,957.29 at the end.”

Exercises
Many questions can only be answered using longer algorithms in Python. Extend the code above to support multi-line Python programs (example).
Another approach to (1) is to let the model “enter” multiple expressions into the interpreter. Extend the recipe to support this.
Last updated