How you can Create and Edit PDF Paperwork in Python

November 30, 2022

1

In our earlier tutorial, we realized how you can learn PDF paperwork in Python and mentioned the fundamentals of the PyPDF2 library. Whereas some tasks would require you to extract information from PDF paperwork, additionally it is quite common that you simply may must create a PDF of your personal for issues like computerized bill technology or reservation affirmation.

One wonderful library that you need to use to create and edit paperwork in Python is the PyPDF2 library. The library has an enormous function set that allows you to do all types of issues akin to extracting info like textual content, pictures and metadata from the PDF doc which we lined within the earlier tutorial. You can too create and edit a PDF doc, carry out encryption and decryption, add or take away annotations, and extra.

On this tutorial, our focus might be on creating and modifying PDF paperwork. Let’s get began.

Creating PDF Paperwork

We use the PdfReader class to learn and extract content material from a PDF doc and we use the PdfWriter class to create new PDF recordsdata. One limitation of PyPDF2 is which you can solely use the library to create new PDF recordsdata from current PDF recordsdata.

We’ll start by making a clean web page for our PDF file and that requires us to instantiate an object utilizing the PdfWriter() class. This class has a technique referred to as add_blank_page() which is able to create a clean web page with the desired dimensions and append it to the present object.

The scale of the web page are laid out in default person area items the place 72 items are equal to 1 inch. Preserving that in thoughts, we will create an A4 dimension web page by multiplying 8.27 with 72 to get the web page width and 11.69 with 72 to get the web page top.

I used the next code to create a clean PDF doc utilizing PyPDF2:

import math
from PyPDF2 import PdfWriter

my_pdf_pages = PdfWriter()

page_width = math.ground(8.27*72)
page_height = math.ground(11.69*72)

my_pdf_pages.add_blank_page(page_width, page_height)

with open('doc.pdf', 'wb+') as file:
    my_pdf_pages.write(file)

It is very important use integer values for the width and top of the web page. In any other case, you find yourself with a PDF doc with incorrect dimensions. I’ve used the open() perform in Python and specified a file identify together with the opening mode. The worth wb+ signifies that I might be opening the binary file for writing and updating.

After that, I take advantage of the write() methodology to put in writing the contents of my_pdf_pages object to the doc.pdf file. Granted, you’ll solely see a clean web page for those who open up the file now however we had been capable of create it utilizing the library.

Keep in mind how we learn completely different pages from a PDF doc within the earlier tutorial utilizing the pages property? The pages property saved all of the pages of the doc as an inventory of Web page objects. We will extract a selected set of pages after which embed them into our newly created PDF utilizing the add_page() methodology.

Right here is an instance by which I learn the content material of two completely different PDF books and write a few of their pages to a brand new file sequentially:

import math
from PyPDF2 import PdfReader, PdfWriter

my_pdf_pages = PdfWriter()

with open('secret-doctrine-01.pdf', 'rb') as book_a:
    with open('secret-doctrine-02.pdf', 'rb') as book_b:
        with open('excerpts.pdf', 'wb+') as file:
            book_a_pages = PdfReader(book_a).pages
            book_b_pages = PdfReader(book_b).pages
            for i in vary(1, 10):
                book_a_page = book_a_pages[i]
                my_pdf_pages.add_page(book_a_page)
                book_b_page = book_b_pages[i]
                my_pdf_pages.add_page(book_b_page)
            my_pdf_pages.write(file)

Numerous the code right here is much like the earlier instance. The one distinction is that as an alternative of the add_blank_page() methodology, we’re utilizing the add_page() methodology so as to add a Web page object to our doc. We iterate over pages with index 1 to 9 after which add them to our PdfWriter object referred to as my_pdf_pages one after the other. As soon as all of the pages have been added we write them to our file referred to as excerpts.pdf.

A number of months again, I downloaded a ebook that I wished to learn. Nonetheless, it may solely be downloaded one chapter at a time and I wished to merge all of them in a single doc. I did it with some third celebration service again then however we will do it simply as simply utilizing just a few strains of code.

As an alternative of studying a file one web page at a time after which appending that web page to our doc, we will additionally append the entire file without delay utilizing the append_pages_from_reader() perform. This perform additionally accepts a second parameter which is the identify of the callback perform that you simply need to name with every web page append.

from PyPDF2 import PdfReader, PdfWriter

my_pdf_doc = PdfWriter()

for i in vary(101, 107):
    chapter_name="lemh" + str(i) + '.pdf'

    with open(chapter_name, 'rb') as chapter:
        chapter_reader = PdfReader(chapter)
        my_pdf_doc.append_pages_from_reader(chapter_reader)

        with open('ebook.pdf', 'wb+') as file:
            my_pdf_doc.write(file)

Slicing, Insertion and Concatenation of PDF Paperwork

There may be one other class referred to as PdfMerger within the PyPDF2 library that you need to use to create a PDF doc in Python. This class gives extra superior performance in comparison with the PdfWriter class. There are two essential capabilities that we are going to cowl right here: append() and merge().

Let’s start with append(). Within the earlier part, we used the append_pages_from_reader() perform from the PdfWriter class to append the chapters in our ebook one after the opposite. The benefit of utilizing append() is that it gives you extra choices and suppleness.

from PyPDF2 import PdfMerger

my_pdf_doc = PdfMerger()

with open('ebook.pdf', 'wb+') as file:
    for i in vary(101, 107):
        chapter_name="lemh" + str(i) + '.pdf'
        my_pdf_doc.append(chapter_name)
    my_pdf_doc.write(file)

As you’ll be able to see, this code is far shorter than what I wrote above to perform the identical process. The essential distinction is that we didn’t need to instantiate a PdfReader object with the intention to append the chapters. The append() methodology from the PdfMerger class simply wants a file identify or a file object.

The append() methodology accepts 4 completely different parameters. The primary one is the file identify as we noticed above.

The second parameter is a string that identifies a bookmark to be utilized in the beginning of the included file. We may use it so as to add the chapter depend as a bookmark in our generated doc.

The third parameter lets you solely add a selected set of pages to the ebook as an alternative of the entire chapter. It may be a (begin, cease[, step]) tuple to suggest the begin index, the cease index and the variety of pages to skip.

from PyPDF2 import PdfMerger

my_pdf_doc = PdfMerger()

with open('bookmarked.pdf', 'wb+') as file:
    for i in vary(101, 107):
        chapter_name="lemh" + str(i) + '.pdf'
        outline_name="Chapter " + str(i - 100)
        my_pdf_doc.append(chapter_name, outline_name, (0, 10))
    my_pdf_doc.write(file)

After I executed the above code, it created a PDF doc that had bookmarks for every chapter. It additionally had solely the primary 10 pages from every chapter.

To illustrate you might have a bunch of books however they do not have an index or preface in the beginning. The creator offers you the index as a separate PDF doc. How do you prepend it to the start of the books? The append() methodology will not be of a lot assist right here particularly for those who additionally need to add some content material someplace in the midst of the ebook. Fortunately, one other comparable methodology referred to as merge() could be helpful right here.

my_pdf_doc.merge(0, 'lemh1ps.pdf')
my_pdf_doc.write(file)

The primary line above provides the index doc in the beginning of our PdfMerger object whereas the second line writes all of the merged information again to our PDF file.

Including Bookmarks to a PDF Doc

It’s completely doable that you simply could be required so as to add bookmarks for some particular pages to a PDF doc for simple entry. One helpful methodology that you need to use so as to add bookmarks is named add_outline_item(). This methodology is offered in each the PdfWriter class and the PdfMerger class. Two required parameters for this methodology specify the title and the web page quantity for the bookmark. The title must be a string and the web page quantity must be an integer.

You can too specify a father or mother define merchandise because the third parameter with the intention to create nested bookmark objects. The subsequent three parameters decide the font coloration, weight and magnificence for the bookmark. Right here is an instance that makes use of the primary two parameters to create a bookmark to the abstract of Chapter 1.

my_pdf_doc.add_outline_item("Chapter 1 (Abstract)", 52)

Last Ideas

On this tutorial, we realized how you can create a PDF doc in Python and how you can add content material to the doc by appending particular person pages or a bunch of pages. We additionally realized how you can add content material at specific places in our PDF doc utilizing the PdfMerger class from the PyPDF2 library.

Previous articleLaid-off Whereas on H1-B? Right here is What You Can Do

How you can Create and Edit PDF Paperwork in Python

Creating PDF Paperwork

Slicing, Insertion and Concatenation of PDF Paperwork

Including Bookmarks to a PDF Doc

Last Ideas

Introducing Turbopack: A Rust-based successor to webpack

Utilizing The New Constrained Structure In WordPress Block Themes | CSS-Tips

Find out how to Mix Paths in Illustrator

LEAVE A REPLY Cancel reply

Most Popular

Laid-off Whereas on H1-B? Right here is What You Can Do

arithmetic – encode homogeneous transformations on the foundation node of a gltf

Hackers utilizing USB drives to unfold malware in ongoing assault

The best way to Take away the Sidebar in WordPress

Recent Comments

ABOUT US

POPULAR POSTS

Laid-off Whereas on H1-B? Right here is What You Can Do

arithmetic – encode homogeneous transformations on the foundation node of a gltf

Hackers utilizing USB drives to unfold malware in ongoing assault

POPULAR CATEGORY