imagemagick_resize_to_a4

Resizing all pages in a multi-page PDF document to A4 size with ImageMagick

While many iPhone users regularly utilize the Notes app for jotting down ideas or making lists, fewer are aware of its powerful document scanning capabilities. This feature allows you to use your iPhone camera to scan documents, automatically detecting edges and transforming images into clean, rectangular forms akin to a traditional scanner’s output.

However, those who use this function for multi-page documents might notice a frustrating issue: the resulting PDFs often have inconsistently sized pages. Although the Notes app lets you compile these into a single file, it doesn’t offer a way to standardize page sizes, such as adjusting all to A4 dimensions.

For Mac OS X users, the Preview app provides a temporary workaround. It allows for resizing pages to a specific output when printing or saving as a new PDF. But what if you need to process numerous files?

This blog post explores a solution for batch processing using convert from ImageMagick, a powerful tool for image manipulation for those comfortable with scripting and basic command-line operations. I will guide you through setting up a bash script to resize all pages of each PDF to a uniform size and save them into a new file in a separate folder. This method is not only efficient but also scalable, making it ideal for those handling large volumes of documents.

At the heart of our script is the convert command, tailored to adjust page size and quality:

convert input.pdf -density 300 -resize 2480x3508 
                  -gravity center -extent 2480x3508 
                  -colorspace Gray -quality 100 output_a4_300.pdf

Note the relationship between the density and the width, and height of the output document. Adjusting the density parameter should be accompanied by proportional changes to width and height. For example in the case of -density 150 the width will be set to 2480 * 150/300 = 1240, and the hight at that density: 1754, and the command:

convert input.pdf -density 150 -resize 1240x1754 
                  -gravity center -extent 1240x1754 
                  -colorspace Gray -quality 100 output_a4_150.pdf

A bash script for batch resizing of PDFs to A4:

The following script utilizes ImageMagick, the powerful image processing tool, to resize all pages in all PDF files in a given folder to the A4 size. This method uses the -resize, and -extent features in combination with -density and -gravity to resize any page to the size of A4 pages while keeping the original image centered.

I added options to run the script in -q quiet mode, and to keep backups of the original files with the .bak extension, and to process only files with a specific prefix within a given folder.

In comparison with other tools, the script allows us to process files in bulk. It is free and open-source thus give the user the possibility to analyze it, understand it, and make sure there is nothing fishy about it. It also gives the power-user the option to tweak it and improve it according to the need.

The script

#!/bin/bash

# Function to display usage instructions
usage() {
    echo "Usage: $0 [-q] <folder> [prefix]"
    echo "Options:"
    echo "  -q    Quiet mode (less output)"
    echo "  -b    Backup original PDF files"
    exit 1
}

# Function to ensure ImageMagick is installed
check_dependencies() {
    if ! command -v convert &> /dev/null; then
        echo "ImageMagick is not installed. Please install it to use this script."
        exit 5
    fi
}

# Function to print verbose messages
VERBOSE=1
verbose_echo() {
    if [ "$VERBOSE" -ne 0 ]; then
        echo "$@"
    fi
}

# Improved backup functionality
backup_file() {
    local file=$1
    local backup_dir="$TARGET_FOLDER/backups"
    mkdir -p "$backup_dir"
    local timestamp=$(date +"%Y%m%d_%H%M%S")
    cp "$file" "$backup_dir/$(basename "$file" .pdf)_$timestamp.bak"
    verbose_echo "Backup of $file created in $backup_dir."
}

# Function to handle Ctrl-C
handle_sigint() {
    echo "Script interrupted by user. Exiting..."
    exit 2
}

# Set a trap for Ctrl-C
trap handle_sigint SIGINT

# check dependencies
check_dependencies

# Parse optional flags
BACKUP=0
while getopts ":qhb" option; do
    case $option in
        b)
            BACKUP=1
            ;;
        q)
            VERBOSE=0
            ;;
        h)
            usage
            ;;
        \?)
            echo "Invalid option: -$OPTARG" >&2
            usage
            ;;
    esac
done
shift $((OPTIND -1))

# Check if the correct number of arguments are provided
if [ "$#" -lt 1 ]; then
    usage
fi

# Get the target folder and optional prefix from the command-line arguments
TARGET_FOLDER="${1%/}"
PREFIX="${2:-}"

if [ ! -d "$TARGET_FOLDER" ]; then
    echo "The selected folder does not exist: $TARGET_FOLDER"
    exit 3
fi

# Check for PDF files in the folder with the given prefix
if [ -n "$PREFIX" ]; then
    PDF_FILES=("$TARGET_FOLDER"/"$PREFIX"*.pdf)
else
    PDF_FILES=("$TARGET_FOLDER"/*.pdf)
fi

TOTAL_FILES=${#PDF_FILES[@]}
if [ "$TOTAL_FILES" -eq 0 ] || [ -z "${PDF_FILES[0]}" ]; then
    verbose_echo "No PDF files found in the folder."
    exit 4
else
    verbose_echo "Number of PDF files to be processed: $TOTAL_FILES"
fi

# Process each PDF in the folder
CURRENT_FILE=0
for pdf in "${PDF_FILES[@]}"; do
    CURRENT_FILE=$((CURRENT_FILE + 1))
    base_name=$(basename "$pdf" .pdf)
    
    # Backup original PDF if backup option is set
    if [ "$BACKUP" -eq 1 ]; then
        backup_file "$pdf"
    fi
    
    verbose_echo "Processing $pdf ($CURRENT_FILE of $TOTAL_FILES)..."

    convert "$pdf" -density 300 -resize 2480x3508 -gravity center -extent 2480x3508 -colorspace Gray -quality 100 "$TARGET_FOLDER/${base_name}.pdf"

    verbose_echo "Completed processing for $pdf. Saved to $TARGET_FOLDER/$base_name.pdf"
done

verbose_echo "All files processed successfully!"

Script Breakdown

In this script several functions are defined, these functions are modular and can be reused in other scripts, here is a list of the functions used in this script:

  • usage(): Displays the correct way to use the script and exits. This function is called when the user inputs an incorrect command format.
  • verbose_echo(): A custom echo function that only outputs text if verbose mode is enabled. This helps in controlling the script’s output.
  • handle_sigint(): Catches and handles the interrupt signal (Ctrl-C), allowing the script to exit gracefully.
  • check_dependencies(): Ensures required dependencies, like ImageMagick, are installed.
  • backup_file(): Manages the creation of backup files, placing them in a dedicated directory with a timestamp to prevent overwriting.

trap handle_sigint SIGINT Sets up a trap for the interrupt signal, which calls the handle_sigint function, allowing the script to handle sudden stops elegantly.

The script provides a loop for Option Parsing to handles flags like -q for quiet mode and -b for backup mode. This section uses getopts to process command-line options and modify script behavior accordingly.

The script also employs several Input Validations, to checks for the necessary number of arguments and whether the specified folder exists. If the conditions are not met, the script calls the usage function and exits.

The main loop calls the convert command to perform the main function of the script, the loop backs-up the file before processing, if backup is enabled.

Backup Handling: The script includes an option to create backups of the original PDF files before processing. These backups are stored in a separate directory and are timestamped to prevent data loss.

Usage Instructions

To use the script, navigate to the directory containing the script and run:

./pdf_resize_a4.sh [-q] [-b] <folder> [prefix]

Options:

  • -q: Quiet mode (suppresses most of the output)
  • -b: Backup original files before processing
  • <folder>: The target folder containing PDF files
  • [prefix]: (Optional) A prefix to filter which PDF files to process

Example:

./pdf_resize_a4.sh ~/Documents/Experiment/Resized_PDFs/

Conclusion

This script provides a robust, flexible solution for batch resizing PDF files to a standard A4 format using ImageMagick. It is particularly useful for users looking to automate the processing of numerous documents with varying page sizes. The inclusion of options for quiet operation and file backup adds to its utility in diverse environments.

Leave a Reply