Loading paper text

Loading papers as structured data
ICE has built-in functionality for parsing and loading papers, and includes some example papers that you can download. Here’s a minimal recipe that loads a paper and prints out the first paragraph (often the abstract):
from ice.paper import Paper
from ice.recipe import recipe
async def answer_for_paper(paper: Paper):
return paper.paragraphs[0]
You can run the recipe as follows, providing the path to the downloaded paper as a keyword argument:
python --paper path_to_downloaded_paper/keenan-2018.pdf
You’ll see a result like this:
Paragraph(sentences=['We hypothesized that mass distribution of a broad-spectrum antibiotic agent to preschool children would reduce mortality in areas of sub-Saharan Africa that are currently far from meeting the Sustainable Development Goals of the United Nations.'], sections=[Section(title='Abstract', number=None)], section_type='abstract')
Note that:
  • Papers are represented as lists of paragraphs.
  • Paragraphs are represented as lists of sentences.
  • Each paragraph has information about which section it’s from.
Try it with your own PDF papers!