Interpreters
Executing code for more accurate computation
Sometimes the limitation isn’t factual knowledge, but ability to do computation.
For example, if we ask the basic question-answerer “What is 578921 days * 12312 miles/day?”:
python qa_simple.py --question "What is 578921 days * 12312 miles/day?"
we get:
7223849252 miles
This is similar to the correct answer 7127675352 miles
, but not the same.
Evaluating Python expressions
Let’s add a method for evaluating Python expressions:
from fvalues import F
from ice.recipe import recipe
def eval_python(expression: str) -> str:
try:
result = eval(expression)
except Exception as e:
result = F(f"Error: {e}")
return str(result)
async def answer_by_computation(question: str):
return eval_python(question)
recipe.main(answer_by_computation)
This works as expected for expressions that are literally Python code:
python eval_direct.py --question "1 + 1"
2
Of course, it doesn’t work for natural language questions that benefit from compute:
python eval_direct.py --question "What is 578921 days * 12312 miles/day?"
Error: invalid syntax (<string>, line 1)
So, we need to choose what to evaluate.
Evaluating arbitrary expressions is dangerous. Don’t use this approach outside of highly experimental code.
Choosing what to evaluate
We make a prompt that asks the model what expression to enter into a Python interpreter to answer the question. We’ll also print out the result of evaluating this expression:
from fvalues import F
from ice.recipe import recipe
def make_computation_choice_prompt(question: str) -> str:
return F(
f"""You've been asked to answer the question "{question}".
You have access to a Python interpreter.
Enter an expression that will help you answer the question.
>>>"""
)
def eval_python(expression: str) -> str:
try:
result = eval(expression)
except Exception as e:
result = F(f"Error: {e}")
return str(result)
async def choose_computation(question: str) -> str:
prompt = make_computation_choice_prompt(question)
answer = await recipe.agent().complete(prompt=prompt, stop='"')
return answer
async def eval_selective(question: str):
expression = await choose_computation(question)
result = eval_python(expression)
return (expression, result)
recipe.main(eval_selective)
If we run this on our example…
python eval_selective.py --question "What is 578921 days * 12312 miles/day?"
…we get:
('578921 * 12312', '7127675352')
This is a helpful expression and result!

Using the results of evaluation
Now all we need to do this provide this expression and result as additional context for the basic question-answerer.
from fvalues import F
from ice.recipe import recipe
def make_computation_choice_prompt(question: str) -> str:
return F(
f"""You've been asked to answer the question "{question}".
You have access to a Python interpreter.
Enter an expression that will help you answer the question.
>>>"""
)
def make_compute_qa_prompt(question: str, expression: str, result: str) -> str:
return F(
f"""A recording of a Python interpreter session:
>>> {expression}: {result}
Answer the following question, using the Python session if helpful:
Question: "{question}"
Answer: "
"""
).strip()
def eval_python(expression: str) -> str:
try:
result = eval(expression)
except Exception as e:
result = F(f"Error: {e}")
return str(result)
async def choose_computation(question: str) -> str:
prompt = make_computation_choice_prompt(question)
answer = await recipe.agent().complete(prompt=prompt, stop='"')
return answer
async def answer_by_computation(question: str):
expression = await choose_computation(question)
result = eval_python(expression)
prompt = make_compute_qa_prompt(question, expression, result)
answer = await recipe.agent().complete(prompt=prompt, stop='"')
return answer
recipe.main(answer_by_computation)
Rerunning our test case…
python answer_by_computation.py --question "What is 578921 days * 12312 miles/day?"
…we get the correct answer:
7127675352 miles
Another example:
If I have $500 and get 3.7% interest over 16 years, what do I have at the end?
Running this:
python answer_by_computation.py --question "If I have \$500 and get 3.7% interest over 16 years, what do I have at the end?"
We get:
If you have $500 and get 3.7% interest over 16 years, you will have $894.19 at the end.
In contrast, the basic question-answerer says “You would have $1,034,957.29 at the end.”

Exercises
Many questions can only be answered using longer algorithms in Python. Extend the code above to support multi-line Python programs (example).
Another approach to (1) is to let the model “enter” multiple expressions into the interpreter. Extend the recipe to support this.
Last updated