Document processing#
Important
The serverless functions is free while in beta. Community users get 10 calls per day, PRO users get 100 calls per day.
You can use Ploomber Cloud’s API to process PDFs and have them ready for LLM processing. First, install the client:
pip install ploomber-cloud --upgrade
Important
Ensure you’re running the latest version of ploomber-cloud
since the API will change
over the beta period.
Ensure you set your API key, and then you’ll be able to use the API:
from ploomber_cloud import functions
# pass the path to the pdf file
result = functions.pdf_to_text("document.pdf")
Tip
functions.pdf_to_text
only works with native PDFs, if you have a scanned PDF, use
functions.pdf_scanned_to_text
instead.
result
will be a list (one per page in the PDF) with the text.
By default, functions.pdf_to_text
will wait until processing is done, you can pass
block=False
to return immediately:
jobid = functions.pdf_to_text("document.pdf", block=False)
Then, you can get the result:
result = functions.get_result(jobid)