Loading paper text
Loading papers as structured data
ICE has built-in functionality for parsing and loading papers, and includes some example papers that you can download. Here’s a minimal recipe that loads a paper and prints out the first paragraph (often the abstract):
paper_hello.py
from ice.paper import Paper
from ice.recipe import recipe
async def answer_for_paper(paper: Paper):
return paper.paragraphs[0]
recipe.main(answer_for_paper)
You can run the recipe as follows, providing the path to the downloaded paper as a keyword argument:
python paper_hello.py --paper path_to_downloaded_paper/keenan-2018.pdf
You’ll see a result like this:
Paragraph(sentences=['We hypothesized that mass distribution of a broad-spectrum antibiotic agent to preschool children would reduce mortality in areas of sub-Saharan Africa that are currently far from meeting the Sustainable Development Goals of the United Nations.'], sections=[Section(title='Abstract', number=None)], section_type='abstract')
Note that:
- Papers are represented as lists of paragraphs.
- Paragraphs are represented as lists of sentences.
- Each paragraph has information about which section it’s from.
Try it with your own PDF papers!
Last modified 8mo ago