Loading paper text

Loading papers as structured data

ICE has built-in functionality for parsing and loading papers, and includes some example papers that you can download. Here’s a minimal recipe that loads a paper and prints out the first paragraph (often the abstract):

paper_hello.py

from ice.paper import Paper
from ice.recipe import recipe


async def answer_for_paper(paper: Paper):
    return paper.paragraphs[0]


recipe.main(answer_for_paper)

You can run the recipe as follows, providing the path to the downloaded paper as a keyword argument:

python paper_hello.py --paper path_to_downloaded_paper/keenan-2018.pdf

You’ll see a result like this:

Paragraph(sentences=['We hypothesized that mass distribution of a broad-spectrum antibiotic agent to preschool children would reduce mortality in areas of sub-Saharan Africa that are currently far from meeting the Sustainable Development Goals of the United Nations.'], sections=[Section(title='Abstract', number=None)], section_type='abstract')

Note that:

Papers are represented as lists of paragraphs.
Paragraphs are represented as lists of sentences.
Each paragraph has information about which section it’s from.

Try it with your own PDF papers!

PreviousLong Texts NextFinding relevant paragraphs

Last updated 2 years ago