Primer
  • Factored Cognition Primer
  • Intro
    • Factored Cognition
    • Before We Start
  • Chapters
    • Hello World
    • Question Answering
      • Q&A without context
      • Q&A about short texts
    • Debate
      • Representing debates
      • From debates to prompts
      • The debate recipe
    • Long Texts
      • Loading paper text
      • Finding relevant paragraphs
      • Answering given paragraphs
    • Amplification
      • Asking subquestions
      • Answering subquestions
      • One-step amplification
      • Recursive amplification
    • Verifiers
      • Checking answers
      • Checking reasoning steps
    • Tool Use
      • Web search
      • Interpreters
    • Deduction
      • Chain of Thought
    • Action Selection
      • One-shot action selection
      • Iterative action selection
    • Amplification Revisited
  • Appendix
    • What’s next?
  • Links
    • We’re Hiring
    • Our Slack Community
    • ICE on Github
Powered by GitBook
On this page
Edit on GitHub
  1. Chapters
  2. Long Texts

Loading paper text

Loading papers as structured data

ICE has built-in functionality for parsing and loading papers, and includes some example papers that you can download. Here’s a minimal recipe that loads a paper and prints out the first paragraph (often the abstract):

paper_hello.py
from ice.paper import Paper
from ice.recipe import recipe


async def answer_for_paper(paper: Paper):
    return paper.paragraphs[0]


recipe.main(answer_for_paper)

You can run the recipe as follows, providing the path to the downloaded paper as a keyword argument:

python paper_hello.py --paper path_to_downloaded_paper/keenan-2018.pdf

You’ll see a result like this:

Paragraph(sentences=['We hypothesized that mass distribution of a broad-spectrum antibiotic agent to preschool children would reduce mortality in areas of sub-Saharan Africa that are currently far from meeting the Sustainable Development Goals of the United Nations.'], sections=[Section(title='Abstract', number=None)], section_type='abstract')

Note that:

  • Papers are represented as lists of paragraphs.

  • Paragraphs are represented as lists of sentences.

  • Each paragraph has information about which section it’s from.

Try it with your own PDF papers!

PreviousLong TextsNextFinding relevant paragraphs

Last updated 2 years ago