Primer
  • Factored Cognition Primer
  • Intro
    • Factored Cognition
    • Before We Start
  • Chapters
    • Hello World
    • Question Answering
      • Q&A without context
      • Q&A about short texts
    • Debate
      • Representing debates
      • From debates to prompts
      • The debate recipe
    • Long Texts
      • Loading paper text
      • Finding relevant paragraphs
      • Answering given paragraphs
    • Amplification
      • Asking subquestions
      • Answering subquestions
      • One-step amplification
      • Recursive amplification
    • Verifiers
      • Checking answers
      • Checking reasoning steps
    • Tool Use
      • Web search
      • Interpreters
    • Deduction
      • Chain of Thought
    • Action Selection
      • One-shot action selection
      • Iterative action selection
    • Amplification Revisited
  • Appendix
    • What’s next?
  • Links
    • We’re Hiring
    • Our Slack Community
    • ICE on Github
Powered by GitBook
On this page
Edit on GitHub
  1. Chapters
  2. Long Texts

Loading paper text

Loading papers as structured data

PreviousLong TextsNextFinding relevant paragraphs

Last updated 2 years ago

ICE has built-in functionality for parsing and loading papers, and includes that you can download. Here’s a minimal recipe that loads a paper and prints out the first paragraph (often the abstract):

paper_hello.py
from ice.paper import Paper
from ice.recipe import recipe


async def answer_for_paper(paper: Paper):
    return paper.paragraphs[0]


recipe.main(answer_for_paper)

You can run the recipe as follows, providing the path to the downloaded paper as a keyword argument:

python paper_hello.py --paper path_to_downloaded_paper/keenan-2018.pdf

You’ll see a result like this:

Paragraph(sentences=['We hypothesized that mass distribution of a broad-spectrum antibiotic agent to preschool children would reduce mortality in areas of sub-Saharan Africa that are currently far from meeting the Sustainable Development Goals of the United Nations.'], sections=[Section(title='Abstract', number=None)], section_type='abstract')

Note that:

  • Papers are represented as lists of paragraphs.

  • Paragraphs are represented as lists of sentences.

  • Each paragraph has information about which section it’s from.

Try it with your own PDF papers!

some example papers