In our earlier tutorial, we realized how you can learn PDF paperwork in Python and mentioned the fundamentals of the PyPDF2 library. Whereas some tasks would require you to extract information from PDF paperwork, additionally it is quite common that you simply may must create a PDF of your personal for issues like computerized bill technology or reservation affirmation.
One wonderful library that you need to use to create and edit paperwork in Python is the PyPDF2 library. The library has an enormous function set that allows you to do all types of issues akin to extracting info like textual content, pictures and metadata from the PDF doc which we lined within the earlier tutorial. You can too create and edit a PDF doc, carry out encryption and decryption, add or take away annotations, and extra.
On this tutorial, our focus might be on creating and modifying PDF paperwork. Let’s get began.
Creating PDF Paperwork
We use the PdfReader
class to learn and extract content material from a PDF doc and we use the PdfWriter
class to create new PDF recordsdata. One limitation of PyPDF2 is which you can solely use the library to create new PDF recordsdata from current PDF recordsdata.
We’ll start by making a clean web page for our PDF file and that requires us to instantiate an object utilizing the PdfWriter()
class. This class has a technique referred to as add_blank_page()
which is able to create a clean web page with the desired dimensions and append it to the present object.
The scale of the web page are laid out in default person area items the place 72 items are equal to 1 inch. Preserving that in thoughts, we will create an A4 dimension web page by multiplying 8.27 with 72 to get the web page width and 11.69 with 72 to get the web page top.
I used the next code to create a clean PDF doc utilizing PyPDF2:
import math from PyPDF2 import PdfWriter my_pdf_pages = PdfWriter() page_width = math.ground(8.27*72) page_height = math.ground(11.69*72) my_pdf_pages.add_blank_page(page_width, page_height) with open('doc.pdf', 'wb+') as file: my_pdf_pages.write(file)
It is very important use integer values for the width and top of the web page. In any other case, you find yourself with a PDF doc with incorrect dimensions. I’ve used the open()
perform in Python and specified a file identify together with the opening mode. The worth wb+
signifies that I might be opening the binary file for writing and updating.
After that, I take advantage of the write()
methodology to put in writing the contents of my_pdf_pages
object to the doc.pdf file. Granted, you’ll solely see a clean web page for those who open up the file now however we had been capable of create it utilizing the library.
Keep in mind how we learn completely different pages from a PDF doc within the earlier tutorial utilizing the pages
property? The pages
property saved all of the pages of the doc as an inventory of Web page
objects. We will extract a selected set of pages after which embed them into our newly created PDF utilizing the add_page()
methodology.
Right here is an instance by which I learn the content material of two completely different PDF books and write a few of their pages to a brand new file sequentially:
import math from PyPDF2 import PdfReader, PdfWriter my_pdf_pages = PdfWriter() with open('secret-doctrine-01.pdf', 'rb') as book_a: with open('secret-doctrine-02.pdf', 'rb') as book_b: with open('excerpts.pdf', 'wb+') as file: book_a_pages = PdfReader(book_a).pages book_b_pages = PdfReader(book_b).pages for i in vary(1, 10): book_a_page = book_a_pages[i] my_pdf_pages.add_page(book_a_page) book_b_page = book_b_pages[i] my_pdf_pages.add_page(book_b_page) my_pdf_pages.write(file)
Numerous the code right here is much like the earlier instance. The one distinction is that as an alternative of the add_blank_page()
methodology, we’re utilizing the add_page()
methodology so as to add a Web page
object to our doc. We iterate over pages with index 1 to 9 after which add them to our PdfWriter
object referred to as my_pdf_pages
one after the other. As soon as all of the pages have been added we write them to our file referred to as excerpts.pdf.
A number of months again, I downloaded a ebook that I wished to learn. Nonetheless, it may solely be downloaded one chapter at a time and I wished to merge all of them in a single doc. I did it with some third celebration service again then however we will do it simply as simply utilizing just a few strains of code.
As an alternative of studying a file one web page at a time after which appending that web page to our doc, we will additionally append the entire file without delay utilizing the append_pages_from_reader()
perform. This perform additionally accepts a second parameter which is the identify of the callback perform that you simply need to name with every web page append.
from PyPDF2 import PdfReader, PdfWriter my_pdf_doc = PdfWriter() for i in vary(101, 107): chapter_name="lemh" + str(i) + '.pdf' with open(chapter_name, 'rb') as chapter: chapter_reader = PdfReader(chapter) my_pdf_doc.append_pages_from_reader(chapter_reader) with open('ebook.pdf', 'wb+') as file: my_pdf_doc.write(file)
Slicing, Insertion and Concatenation of PDF Paperwork
There may be one other class referred to as PdfMerger
within the PyPDF2 library that you need to use to create a PDF doc in Python. This class gives extra superior performance in comparison with the PdfWriter
class. There are two essential capabilities that we are going to cowl right here: append()
and merge()
.
Let’s start with append()
. Within the earlier part, we used the append_pages_from_reader()
perform from the PdfWriter
class to append the chapters in our ebook one after the opposite. The benefit of utilizing append()
is that it gives you extra choices and suppleness.
from PyPDF2 import PdfMerger my_pdf_doc = PdfMerger() with open('ebook.pdf', 'wb+') as file: for i in vary(101, 107): chapter_name="lemh" + str(i) + '.pdf' my_pdf_doc.append(chapter_name) my_pdf_doc.write(file)
As you’ll be able to see, this code is far shorter than what I wrote above to perform the identical process. The essential distinction is that we didn’t need to instantiate a PdfReader
object with the intention to append the chapters. The append()
methodology from the PdfMerger
class simply wants a file identify or a file object.
The append()
methodology accepts 4 completely different parameters. The primary one is the file identify as we noticed above.
The second parameter is a string that identifies a bookmark to be utilized in the beginning of the included file. We may use it so as to add the chapter depend as a bookmark in our generated doc.
The third parameter lets you solely add a selected set of pages to the ebook as an alternative of the entire chapter. It may be a (begin, cease[, step])
tuple to suggest the begin
index, the cease
index and the variety of pages to skip.
from PyPDF2 import PdfMerger my_pdf_doc = PdfMerger() with open('bookmarked.pdf', 'wb+') as file: for i in vary(101, 107): chapter_name="lemh" + str(i) + '.pdf' outline_name="Chapter " + str(i - 100) my_pdf_doc.append(chapter_name, outline_name, (0, 10)) my_pdf_doc.write(file)
After I executed the above code, it created a PDF doc that had bookmarks for every chapter. It additionally had solely the primary 10 pages from every chapter.
To illustrate you might have a bunch of books however they do not have an index or preface in the beginning. The creator offers you the index as a separate PDF doc. How do you prepend it to the start of the books? The append()
methodology will not be of a lot assist right here particularly for those who additionally need to add some content material someplace in the midst of the ebook. Fortunately, one other comparable methodology referred to as merge()
could be helpful right here.
my_pdf_doc.merge(0, 'lemh1ps.pdf') my_pdf_doc.write(file)
The primary line above provides the index doc in the beginning of our PdfMerger
object whereas the second line writes all of the merged information again to our PDF file.
Including Bookmarks to a PDF Doc
It’s completely doable that you simply could be required so as to add bookmarks for some particular pages to a PDF doc for simple entry. One helpful methodology that you need to use so as to add bookmarks is named add_outline_item()
. This methodology is offered in each the PdfWriter
class and the PdfMerger
class. Two required parameters for this methodology specify the title and the web page quantity for the bookmark. The title must be a string and the web page quantity must be an integer.
You can too specify a father or mother define merchandise because the third parameter with the intention to create nested bookmark objects. The subsequent three parameters decide the font coloration, weight and magnificence for the bookmark. Right here is an instance that makes use of the primary two parameters to create a bookmark to the abstract of Chapter 1.
my_pdf_doc.add_outline_item("Chapter 1 (Abstract)", 52)
Last Ideas
On this tutorial, we realized how you can create a PDF doc in Python and how you can add content material to the doc by appending particular person pages or a bunch of pages. We additionally realized how you can add content material at specific places in our PDF doc utilizing the PdfMerger
class from the PyPDF2 library.