亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Table of Contents
Reading and Extracting Text from PDFs
Writing and Creating PDFs
Merging and Splitting PDFs
Adding Password Protection and Encryption
Home Backend Development Python Tutorial How to work with PDF files in Python

How to work with PDF files in Python

Sep 20, 2025 am 04:44 AM

PyPDF2, pdfplumber and FPDF are the core libraries for Python to process PDF. Use PyPDF2 to perform text extraction, merging, splitting and encryption, such as reading the page through PdfReader and calling extract_text() to get content; pdfplumber is more suitable for retaining layout text extraction and table recognition, and supports extract_tables() to accurately capture table data; FPDF (recommended fpdf2) is used to generate PDF, and documents are built and output through add_page(), set_font() and cell(). When merging PDFs, PdfWriter's append() method can integrate multiple files; splitting it through the page and save it page by page. Encryption is implemented by setting the user and owner password through writer.encrypt(). Select the right tool to complete PDF reading, modification and creation tasks efficiently.

How to work with PDF files in Python

Working with PDF files in Python is common for tasks like reading, writing, merging, splitting, and extracting data. Several libraries make this possible, with PyPDF2 , pdfplumber , and FPDF being among the most popular. Here's how to handle various PDF operations effectively.

Reading and Extracting Text from PDFs

If you need to extract text from a PDF file, PyPDF2 or pdfplumber are good choices. PyPDF2 is great for basic text extraction, while pdfplumber preserves layout and formatting better.

  • Install PyPDF2: pip install PyPDF2
  • Open the PDF in read-binary mode and create a PdfReader object
  • Loop through pages and use extract_text() to get content

Example using PyPDF2:

from PyPDF2 import PdfReader<br><br> reader = PdfReader("example.pdf")<br> for page in reader.pages:<br> text = page.extract_text()<br> print(text)

For tables and precision text positioning, use pdfplumber :

import pdfplumber<br><br> with pdfplumber.open("example.pdf") as pdf:<br> for page in pdf.pages:<br> text = page.extract_text()<br> tables = page.extract_tables()

Writing and Creating PDFs

To generate new PDF files from scratch, use FPDF . It's lightweight and easy to use for simple documents.

  • Install FPDF: pip install fpdf2 (updated version)
  • Create a PDF object, add a page, set font, and write content
  • Output the file to disk

Example:

from fpdf import FPDF<br><br> pdf = FPDF()<br> pdf.add_page()<br> pdf.set_font("Arial", size=12)<br> pdf.cell(0, 10, "Hello, this is a generated PDF!", ln=True)<br> pdf.output("output.pdf")

Merging and Splitting PDFs

Use PyPDF2 to combine multiple PDFs into one or split a large PDF into smaller ones.

To merge:

from PyPDF2 import PdfWriter, PdfReader<br><br> merger = PdfWriter()<br> for filename in ["file1.pdf", "file2.pdf"]:<br> with open(filename, "rb") as f:<br> merger.append(f)<br> with open("merged.pdf", "wb") as output_file:<br> merger.write(output_file)

To split:

reader = PdfReader("large.pdf")<br> for i, page in enumerate(reader.pages):<br> writer = PdfWriter()<br> writer.add_page(page)<br> with open(f"page_{i 1}.pdf", "wb") as out:<br> writer.write(out)

Adding Password Protection and Encryption

You can encrypt PDFs using PyPDF2 by setting a user password when writing the file.

writer = PdfWriter()<br> writer.add_page(page)<br> writer.encrypt(user_password="user123", owner_password="admin456")<br> with open("protected.pdf", "wb") as f:<br> writer.write(f)

This restricts opening and editing based on the passwords provided.

Basically, working with PDFs in Python is straightforward once you pick the right tool for the job: PyPDF2 for manipulation, pdfplumber for detailed text and table analysis, and FPDF for generating new documents. Most tasks come down to reading, modifying, and writing PDF streams with these libraries.

The above is the detailed content of How to work with PDF files in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

ArtGPT

ArtGPT

AI image generator for creative art from text prompts.

Stock Market GPT

Stock Market GPT

AI powered investment research for smarter decisions

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Efficient merge strategy of PEFT LoRA adapter and base model Efficient merge strategy of PEFT LoRA adapter and base model Sep 19, 2025 pm 05:12 PM

This tutorial details how to efficiently merge the PEFT LoRA adapter with the base model to generate a completely independent model. The article points out that it is wrong to directly use transformers.AutoModel to load the adapter and manually merge the weights, and provides the correct process to use the merge_and_unload method in the peft library. In addition, the tutorial also emphasizes the importance of dealing with word segmenters and discusses PEFT version compatibility issues and solutions.

How to install packages from a requirements.txt file in Python How to install packages from a requirements.txt file in Python Sep 18, 2025 am 04:24 AM

Run pipinstall-rrequirements.txt to install the dependency package. It is recommended to create and activate the virtual environment first to avoid conflicts, ensure that the file path is correct and that the pip has been updated, and use options such as --no-deps or --user to adjust the installation behavior if necessary.

How to test Python code with pytest How to test Python code with pytest Sep 20, 2025 am 12:35 AM

Python is a simple and powerful testing tool in Python. After installation, test files are automatically discovered according to naming rules. Write a function starting with test_ for assertion testing, use @pytest.fixture to create reusable test data, verify exceptions through pytest.raises, supports running specified tests and multiple command line options, and improves testing efficiency.

How to handle command line arguments in Python How to handle command line arguments in Python Sep 21, 2025 am 03:49 AM

Theargparsemoduleistherecommendedwaytohandlecommand-lineargumentsinPython,providingrobustparsing,typevalidation,helpmessages,anderrorhandling;usesys.argvforsimplecasesrequiringminimalsetup.

Floating point number accuracy problem in Python and its high-precision calculation scheme Floating point number accuracy problem in Python and its high-precision calculation scheme Sep 19, 2025 pm 05:57 PM

This article aims to explore the common problem of insufficient calculation accuracy of floating point numbers in Python and NumPy, and explains that its root cause lies in the representation limitation of standard 64-bit floating point numbers. For computing scenarios that require higher accuracy, the article will introduce and compare the usage methods, features and applicable scenarios of high-precision mathematical libraries such as mpmath, SymPy and gmpy to help readers choose the right tools to solve complex accuracy needs.

How to correctly merge PEFT LoRA adapter with basic model How to correctly merge PEFT LoRA adapter with basic model Sep 17, 2025 pm 02:51 PM

This article details how to use the merge_and_unload function of the PEFT library to efficiently and accurately merge the LoRA adapter into the basic large language model, thereby creating a brand new model with integrated fine-tuning knowledge. The article corrects common misunderstandings about loading adapters and manually merging model weights through transformers.AutoModel, and provides complete code examples including model merging, word segmenter processing, and professional guidance on solving potential version compatibility issues to ensure smooth merge processes.

How can you create a context manager using the @contextmanager decorator in Python? How can you create a context manager using the @contextmanager decorator in Python? Sep 20, 2025 am 04:50 AM

Import@contextmanagerfromcontextlibanddefineageneratorfunctionthatyieldsexactlyonce,wherecodebeforeyieldactsasenterandcodeafteryield(preferablyinfinally)actsas__exit__.2.Usethefunctioninawithstatement,wheretheyieldedvalueisaccessibleviaas,andthesetup

How to work with PDF files in Python How to work with PDF files in Python Sep 20, 2025 am 04:44 AM

PyPDF2, pdfplumber and FPDF are the core libraries for Python to process PDF. Use PyPDF2 to perform text extraction, merging, splitting and encryption, such as reading the page through PdfReader and calling extract_text() to get content; pdfplumber is more suitable for retaining layout text extraction and table recognition, and supports extract_tables() to accurately capture table data; FPDF (recommended fpdf2) is used to generate PDF, and documents are built and output through add_page(), set_font() and cell(). When merging PDFs, PdfWriter's append() method can integrate multiple files

See all articles