pythonqa_simple.py--question"What is 578921 days * 12312 miles/day?"
we get:
7223849252 miles
This is similar to the correct answer 7127675352 miles, but not the same.
Evaluating Python expressions
Let’s add a method for evaluating Python expressions:
eval_direct.py
from fvalues import Ffrom ice.recipe import recipedefeval_python(expression:str) ->str:try: result =eval(expression)exceptExceptionas e: result =F(f"Error: {e}")returnstr(result)asyncdefanswer_by_computation(question:str):returneval_python(question)recipe.main(answer_by_computation)
This works as expected for expressions that are literally Python code:
pythoneval_direct.py--question"1 + 1"
2
Of course, it doesn’t work for natural language questions that benefit from compute:
pythoneval_direct.py--question"What is 578921 days * 12312 miles/day?"
Error: invalid syntax (<string>, line 1)
So, we need to choose what to evaluate.
Evaluating arbitrary expressions is dangerous. Don’t use this approach outside of highly experimental code.
Choosing what to evaluate
We make a prompt that asks the model what expression to enter into a Python interpreter to answer the question. We’ll also print out the result of evaluating this expression:
eval_selective.py
from fvalues import Ffrom ice.recipe import recipedefmake_computation_choice_prompt(question:str) ->str:returnF(f"""You've been asked to answer the question "{question}".You have access to a Python interpreter.Enter an expression that will help you answer the question.>>>""" )defeval_python(expression:str) ->str:try: result =eval(expression)exceptExceptionas e: result =F(f"Error: {e}")returnstr(result)asyncdefchoose_computation(question:str) ->str: prompt =make_computation_choice_prompt(question) answer =await recipe.agent().complete(prompt=prompt, stop='"')return answerasyncdefeval_selective(question:str): expression =awaitchoose_computation(question) result =eval_python(expression)return (expression, result)recipe.main(eval_selective)
If we run this on our example…
pythoneval_selective.py--question"What is 578921 days * 12312 miles/day?"
…we get:
('578921 * 12312', '7127675352')
This is a helpful expression and result!
Using the results of evaluation
Now all we need to do this provide this expression and result as additional context for the basic question-answerer.
answer_by_computation.py
from fvalues import Ffrom ice.recipe import recipedefmake_computation_choice_prompt(question:str) ->str:returnF(f"""You've been asked to answer the question "{question}".You have access to a Python interpreter.Enter an expression that will help you answer the question.>>>""" )defmake_compute_qa_prompt(question:str,expression:str,result:str) ->str:returnF(f"""A recording of a Python interpreter session:>>> {expression}: {result}Answer the following question, using the Python session if helpful:Question: "{question}"Answer: """" ).strip()defeval_python(expression:str) ->str:try: result =eval(expression)exceptExceptionas e: result =F(f"Error: {e}")returnstr(result)asyncdefchoose_computation(question:str) ->str: prompt =make_computation_choice_prompt(question) answer =await recipe.agent().complete(prompt=prompt, stop='"')return answerasyncdefanswer_by_computation(question:str): expression =awaitchoose_computation(question) result =eval_python(expression) prompt =make_compute_qa_prompt(question, expression, result) answer =await recipe.agent().complete(prompt=prompt, stop='"')return answerrecipe.main(answer_by_computation)
Rerunning our test case…
pythonanswer_by_computation.py--question"What is 578921 days * 12312 miles/day?"
…we get the correct answer:
7127675352 miles
Another example:
If I have $500 and get 3.7% interest over 16 years, what do I have at the end?
Running this:
python answer_by_computation.py --question "If I have \$500 and get 3.7% interest over 16 years, what do I have at the end?"
We get:
If you have $500 and get 3.7% interest over 16 years, you will have $894.19 at the end.
In contrast, the basic question-answerer says “You would have $1,034,957.29 at the end.”
Exercises
Many questions can only be answered using longer algorithms in Python. Extend the code above to support multi-line Python programs (example).
Another approach to (1) is to let the model “enter” multiple expressions into the interpreter. Extend the recipe to support this.
Get feedback on exercise solutions
If you want feedback on your exercise solutions, submit them through this form. We—the team at Ought—are happy to give our quick take on whether you missed any interesting ideas.