How I Turned a Quran Dataset into a Bilingual PDF Using Python

English

yahdiinformatika

Read Time:11 Minute, 13 Second

Have you ever thought about creating a personalized PDF of the Quran—complete with Arabic text and English translation—using nothing but data and code?

That’s exactly what I set out to do, and after a few unexpected challenges, it worked. In this post, I’ll walk you through how I transformed a raw Quran dataset into a fully formatted, bilingual PDF using Python. Whether you’re a student, a developer, or simply curious about combining faith and tech, this post offers insights, resources, and actionable steps to try it yourself.

Why Generate a Quran PDF Programmatically?

You might ask—why not just download a pre-made PDF of the Quran?

Good question. There are plenty of PDFs out there, but generating one programmatically gives you full control and flexibility. Here’s what you can do with it:

Customize the layout, fonts, and translation.
Automatically generate specific surahs or ajza for study or teaching.
Integrate it into learning apps or digital tools.
Build a foundation for more advanced features like audio-sync, tafsir annotations, or interactive Quran study platforms.

For me, it started as a small experiment and quickly turned into a useful, modular tool.

Step 1: The Dataset

I used a CSV file with 6,236 entries—one row per ayah. The dataset included:

Surah number and names (in Arabic and English)
Ayah number within surah and overall
Arabic text (ayah_ar)
English translation (ayah_en)
Juz, manzil, ruku, sajdah details, and more

This structured format made it ideal for programmatic PDF generation.

Step 2: First Attempt with `fpdf2` (and Why It Failed)

My first choice was the popular fpdf2 library. It’s simple to use and has a fairly intuitive API. I expected it to handle Arabic just fine—especially since it claims Unicode support.

But I quickly hit a wall: UnicodeEncodeError.

Turns out, my Python environment had both fpdf (the old version) and fpdf2 installed. These two share the same namespace (fpdf) and don’t play well together. Even though I had fpdf2, Python was importing the legacy version, which lacks proper Unicode and RTL (right-to-left) support.

I uninstalled both and reinstalled only fpdf2. Still, issues remained when rendering Arabic.

Step 3: Switching to `reportlab` with Proper Arabic Support

Eventually, I shifted to a more robust combination:

reportlab for PDF generation
arabic-reshaper to shape Arabic letters correctly
python-bidi to display Arabic in proper right-to-left order

This stack gave me full control over layout, styling, and text handling. It also worked seamlessly with custom fonts like Amiri-Regular.ttf, an excellent open-source Arabic font.

Here’s the core logic for displaying Arabic in the PDF:

import arabic_reshaper
from bidi.algorithm import get_display

reshaped_text = arabic_reshaper.reshape("قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ")
bidi_text = get_display(reshaped_text)
canvas.drawRightString(x, y, bidi_text)

Once this was working, I looped through the dataset and drew each ayah, alternating between Arabic (right-aligned) and English (left-aligned).

Checkpoint Success: It Works

After resolving the package conflicts and switching to the reportlab stack, everything clicked.

With each page, the script rendered:

The Surah title (Arabic and English)
Each ayah, formatted clearly in Arabic and English
Proper RTL layout using bidi and reshaper

The result was a beautifully formatted, high-quality Quran PDF—fully generated from structured data, and I was genuinely thrilled to see it come to life. To keep things manageable for a first test, I chose to render the last five surahs (110 to 114), and watching them appear line by line, in both Arabic and English, felt like witnessing code and spirituality converge in the most rewarding way.

Tools You’ll Need

To replicate this setup, here’s your checklist:

Required Python Packages

pip install reportlab arabic-reshaper python-bidi

Required Files

A structured CSV dataset of the Quran
An Arabic font like Amiri-Regular.ttf (place it in the working directory)

Breaking Down the Code: How It All Works

Now let’s walk through the full script that generated the Quran PDF — line by line. The magic here isn’t just in the output, but in how each component comes together to handle Arabic script, layout, and design beautifully.

1. Importing the Essentials

import pandas as pd
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase import pdfmetrics
import arabic_reshaper
from bidi.algorithm import get_display

We begin by importing all the necessary libraries:

pandas: for loading and filtering the Quran dataset (CSV format).
reportlab: for creating and drawing content on PDF pages.
arabic_reshaper and bidi: critical for correctly displaying Arabic text, which needs both reshaping and right-to-left (RTL) processing.
TTFont and pdfmetrics: used to register custom fonts, especially the Amiri Arabic font we’ll use.

2. Loading and Filtering the Dataset

df = pd.read_csv("The Quran Dataset.csv")
df = df[df["surah_no"] >= 110]
df = df.sort_values(by=["surah_no", "ayah_no_surah"])

We load the dataset from a CSV file into a DataFrame. To keep the scope small for testing, we filter to include only the last five surahs (Surah 110 to 114). These are short, familiar surahs—perfect for validating formatting and layout before generating the full Quran. Feel free to change df = df[df[“surah_no”] >= 110] part.

Sorting by surah_no and ayah_no_surah ensures each verse appears in the correct order.

3. Setting Up the PDF Canvas

c = canvas.Canvas("quran_last_5_surahs.pdf", pagesize=A4)
width, height = A4
y_position = height - 50

Here, we initialize the PDF using A4 dimensions. canvas.Canvas is our drawing surface. We also define y_position as a vertical tracker that moves down the page as we print each verse.

4. Registering and Setting the Arabic Font

pdfmetrics.registerFont(TTFont('Amiri', 'Amiri-Regular.ttf'))
c.setFont("Amiri", 16)

This registers the Amiri Arabic font, a beautifully typeset, open-source font well-suited for Quranic text. Make sure the .ttf file is in your project directory. We set it as the default font for most of the Arabic content.

5. Iterating Through Each Ayah

current_surah = ""
for _, row in df.iterrows():

We loop through each verse in the DataFrame. The current_surah variable helps us detect when a new surah starts so we can add a header.

6. Adding Surah Titles

if surah_en != current_surah:
    current_surah = surah_en
    y_position -= 30
    if y_position < 100:
        c.showPage()
        y_position = height - 50
        c.setFont("Amiri", 16)

    c.setFont("Helvetica-Bold", 14)
    c.drawString(50, y_position, f"Surah {surah_en} / {surah_ar}")
    y_position -= 25
    c.setFont("Amiri", 16)

When we detect a new surah, we:

Drop the vertical position
Start a new page if we’re too close to the bottom
Print the Surah name in English and Arabic
Then reset the font back to Arabic for the verses

This makes the structure easy to navigate and visually clean.

7. Reshaping and Displaying the Arabic Ayah

reshaped_text = arabic_reshaper.reshape(f"{ayah_no}. {ayah_ar}")
bidi_text = get_display(reshaped_text)
c.drawRightString(width - 50, y_position, bidi_text)
y_position -= 20

Here’s where things get exciting. Arabic text, especially from raw datasets, can’t just be printed directly.

arabic_reshaper.reshape connects the Arabic letters properly.
get_display from the bidi library ensures the text is rendered right-to-left.
drawRightString aligns the verse to the right margin, as expected for Arabic script.

8. Adding the English Translation

c.setFont("Helvetica", 12)
c.drawString(50, y_position, ayah_en)
y_position -= 30
c.setFont("Amiri", 16)

After each Arabic ayah, we display its English translation using a clean, readable font like Helvetica. Then we reset the font to Amiri for the next Arabic ayah.

9. Handling Page Breaks

if y_position < 100:
    c.showPage()
    y_position = height - 50
    c.setFont("Amiri", 16)

To avoid overlapping or cutting off verses at the bottom, we start a new page when the cursor gets too low on the current one. We also reset the font each time a new page begins.

10. Saving the PDF

c.save()

Finally, we export everything to a file named quran_last_5_surahs.pdf.

At this point, the output is a clean, multi-page PDF with beautifully rendered Arabic and English Quranic text.

Final Thoughts

This script might look simple at first glance, but it’s doing a lot under the hood—handling right-to-left Arabic script, reshaping glyphs for proper rendering, dynamically managing PDF layout, and maintaining both elegance and readability. If you’ve never created a bilingual document programmatically before, this is a powerful and meaningful place to start.

More than just a technical exercise, this project reminded me how code can deepen our interaction with sacred texts. By blending structured data, thoughtful design, and a bit of creativity, we can build tools that are not only functional but also spiritually enriching.

You now have the foundation to generate personalized Quran PDFs for teaching, learning, or sharing. And this is just the beginning. With a few additions—like Juz dividers, Basmala headers, tafsir annotations, or even audio integration—this script can evolve into a more dynamic and impactful resource.

If you’re a developer exploring Quran-related projects or educational tools, I encourage you to start here. The core idea is simple, the tools are accessible, and the results are deeply rewarding. Interested in getting the full code or contributing to this project? I’d love to collaborate and see where this can go next.

Found this post helpful? Share it with someone who’s exploring Islamic tech, working on a Quran app, or simply learning Python through meaningful projects.

Have questions or feedback? Drop a comment or connect—I’d love to hear from you.

Full Code :

import pandas as pd
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase import pdfmetrics
import arabic_reshaper
from bidi.algorithm import get_display

# Load dataset
df = pd.read_csv("The Quran Dataset.csv")
df = df[df["surah_no"] >= 110]
df = df.sort_values(by=["surah_no", "ayah_no_surah"])

# Setup PDF
c = canvas.Canvas("quran_last_5_surahs.pdf", pagesize=A4)
width, height = A4
y_position = height - 50

# Register Arabic font (make sure .ttf is in same folder)
pdfmetrics.registerFont(TTFont('Amiri', 'Amiri-Regular.ttf'))

# Set default font
c.setFont("Amiri", 16)

current_surah = ""
for _, row in df.iterrows():
    surah_en = row["surah_name_en"]
    surah_ar = row["surah_name_ar"]
    ayah_no = row["ayah_no_surah"]
    ayah_ar = row["ayah_ar"]
    ayah_en = row["ayah_en"]

    # Add new surah title
    if surah_en != current_surah:
        current_surah = surah_en
        y_position -= 30
        if y_position < 100:
            c.showPage()
            y_position = height - 50
            c.setFont("Amiri", 16)

        # English title
        c.setFont("Helvetica-Bold", 14)
        c.drawString(50, y_position, f"Surah {surah_en} / {surah_ar}")
        y_position -= 25
        c.setFont("Amiri", 16)

    # Prepare Arabic (reshaped + RTL)
    reshaped_text = arabic_reshaper.reshape(f"{ayah_no}. {ayah_ar}")
    bidi_text = get_display(reshaped_text)

    # Add Arabic ayah
    c.drawRightString(width - 50, y_position, bidi_text)
    y_position -= 20

    # Add English ayah
    c.setFont("Helvetica", 12)
    c.drawString(50, y_position, ayah_en)
    y_position -= 30
    c.setFont("Amiri", 16)

    # New page if needed
    if y_position < 100:
        c.showPage()
        y_position = height - 50
        c.setFont("Amiri", 16)

# Save PDF
c.save()

Jangan lewatkan artikel penting! Langganan newsletter dosensibuk.com sekarang.

yahdiinformatika

"I'm a data analyst who loves diving into data to find insights and bringing creativity to life through 3D design. Proud to be SciVal certified, adding research analytics to my toolkit for making smarter decisions."

Author Posts