Unlocking the Quran Through Data: A First Look at the Quran Dataset

Read Time:10 Minute, 38 Second

As someone passionate about both technology and Islamic studies, I recently came across a structured dataset of the Quran and decided to explore it through the lens of data analysis. What I discovered was a beautifully organized structure of the Holy Book—revealing patterns and insights that bridge spiritual depth and data clarity.

First, I load the Quran dataset from a CSV file, displays basic information about the dataset, and shows the first few rows for an initial inspection.

import pandas as pd

# Load the dataset
file_path = "The Quran Dataset.csv"
df = pd.read_csv(file_path)

# Display basic information and first few rows
df_info = df.info()
df_head = df.head()

df_info, df_head

The result is this :


<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 6236 entries, 0 to 6235
Data columns (total 19 columns):
 #   Column               Non-Null Count  Dtype  
—  ——               ————–  —–  
 0   surah_no             6236 non-null   int64  
 1   surah_name_en        6236 non-null   object 
 2   surah_name_ar        6236 non-null   object 
 3   surah_name_roman     6236 non-null   object 
 4   ayah_no_surah        6236 non-null   int64  
 5   ayah_no_quran        6236 non-null   int64  
 6   ayah_ar              6236 non-null   object 
 7   ayah_en              6236 non-null   object 
 8   ruko_no              6236 non-null   int64  
 9   juz_no               6236 non-null   int64  
 10  manzil_no            6236 non-null   int64  
 11  hizb_quarter         6236 non-null   int64  
 12  total_ayah_surah     6236 non-null   int64  
 13  total_ayah_quran     6236 non-null   int64  
 14  place_of_revelation  6236 non-null   object 
 15  sajah_ayah           6236 non-null   bool   
 16  sajdah_no            15 non-null     float64
 17  no_of_word_ayah      6236 non-null   int64  
 18  list_of_words        6236 non-null   object 
dtypes: bool(1), float64(1), int64(10), object(7)
memory usage: 883.2+ KB
(None,
    surah_no surah_name_en surah_name_ar surah_name_roman  ayah_no_surah  \
 0         1    The Opener       الفاتحة       Al-Fatihah              1   
 1         1    The Opener       الفاتحة       Al-Fatihah              2   
 2         1    The Opener       الفاتحة       Al-Fatihah              3   
 3         1    The Opener       الفاتحة       Al-Fatihah              4   
 4         1    The Opener       الفاتحة       Al-Fatihah              5   
 
    ayah_no_quran                                   ayah_ar  \
 0              1    بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ   
 1              2     ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ   
 2              3                   ٱلرَّحْمَٰنِ ٱلرَّحِيمِ   
 3              4                   مَٰلِكِ يَوْمِ ٱلدِّينِ   
 4              5  إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ   
 
                                              ayah_en  ruko_no  juz_no  \
 0  In the Name of Allah—the Most Compassionate, M…        1       1   
 1        All praise is for Allah—Lord of all worlds,        1       1   
 2             the Most Compassionate, Most Merciful,        1       1   
 3                     Master of the Day of Judgment.        1       1   
 4  You ˹alone˺ we worship and You ˹alone˺ we ask …        1       1   
 
    manzil_no  hizb_quarter  total_ayah_surah  total_ayah_quran  \
 0          1             1                 7              6236   
 1          1             1                 7              6236   
 2          1             1                 7              6236   
 3          1             1                 7              6236   
 4          1             1                 7              6236   
 
   place_of_revelation  sajah_ayah  sajdah_no  no_of_word_ayah  \
 0              Meccan       False        NaN                4   
 1              Meccan       False        NaN                4   
 2              Meccan       False        NaN                2   
 3              Meccan       False        NaN                3   
 4              Meccan       False        NaN                4   
 
                                 list_of_words  
 0    [بِسْمِ,ٱللَّهِ,ٱلرَّحْمَٰنِ,ٱلرَّحِيمِ]  
 1     [ٱلْحَمْدُ,لِلَّهِ,رَبِّ,ٱلْعَٰلَمِينَ]  
 2                   [ٱلرَّحْمَٰنِ,ٱلرَّحِيمِ]  
 3                   [مَٰلِكِ,يَوْمِ,ٱلدِّينِ]  
 4  [إِيَّاكَ,نَعْبُدُ,وَإِيَّاكَ,نَسْتَعِينُ]  )



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6236 entries, 0 to 6235
Data columns (total 19 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   surah_no             6236 non-null   int64  
 1   surah_name_en        6236 non-null   object 
 2   surah_name_ar        6236 non-null   object 
 3   surah_name_roman     6236 non-null   object 
 4   ayah_no_surah        6236 non-null   int64  
 5   ayah_no_quran        6236 non-null   int64  
 6   ayah_ar              6236 non-null   object 
 7   ayah_en              6236 non-null   object 
 8   ruko_no              6236 non-null   int64  
 9   juz_no               6236 non-null   int64  
 10  manzil_no            6236 non-null   int64  
 11  hizb_quarter         6236 non-null   int64  
 12  total_ayah_surah     6236 non-null   int64  
 13  total_ayah_quran     6236 non-null   int64  
 14  place_of_revelation  6236 non-null   object 
 15  sajah_ayah           6236 non-null   bool   
 16  sajdah_no            15 non-null     float64
 17  no_of_word_ayah      6236 non-null   int64  
 18  list_of_words        6236 non-null   object 
dtypes: bool(1), float64(1), int64(10), object(7)
memory usage: 883.2+ KB
(None,
    surah_no surah_name_en surah_name_ar surah_name_roman  ayah_no_surah  \
 0         1    The Opener       الفاتحة       Al-Fatihah              1   
 1         1    The Opener       الفاتحة       Al-Fatihah              2   
 2         1    The Opener       الفاتحة       Al-Fatihah              3   
 3         1    The Opener       الفاتحة       Al-Fatihah              4   
 4         1    The Opener       الفاتحة       Al-Fatihah              5   
 
    ayah_no_quran                                   ayah_ar  \
 0              1    بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ   
 1              2     ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ   
 2              3                   ٱلرَّحْمَٰنِ ٱلرَّحِيمِ   
 3              4                   مَٰلِكِ يَوْمِ ٱلدِّينِ   
 4              5  إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ   
 
                                              ayah_en  ruko_no  juz_no  \
 0  In the Name of Allah—the Most Compassionate, M...        1       1   
 1        All praise is for Allah—Lord of all worlds,        1       1   
 2             the Most Compassionate, Most Merciful,        1       1   
 3                     Master of the Day of Judgment.        1       1   
 4  You ˹alone˺ we worship and You ˹alone˺ we ask ...        1       1   
 
    manzil_no  hizb_quarter  total_ayah_surah  total_ayah_quran  \
 0          1             1                 7              6236   
 1          1             1                 7              6236   
 2          1             1                 7              6236   
 3          1             1                 7              6236   
 4          1             1                 7              6236   
 
   place_of_revelation  sajah_ayah  sajdah_no  no_of_word_ayah  \
 0              Meccan       False        NaN                4   
 1              Meccan       False        NaN                4   
 2              Meccan       False        NaN                2   
 3              Meccan       False        NaN                3   
 4              Meccan       False        NaN                4   
 
                                 list_of_words  
 0    [بِسْمِ,ٱللَّهِ,ٱلرَّحْمَٰنِ,ٱلرَّحِيمِ]  
 1     [ٱلْحَمْدُ,لِلَّهِ,رَبِّ,ٱلْعَٰلَمِينَ]  
 2                   [ٱلرَّحْمَٰنِ,ٱلرَّحِيمِ]  
 3                   [مَٰلِكِ,يَوْمِ,ٱلدِّينِ]  
 4  [إِيَّاكَ,نَعْبُدُ,وَإِيَّاكَ,نَسْتَعِينُ]  )

What Is the Quran Dataset?

The dataset contains 6,236 rows, each representing an individual ayah (verse) of the Quran. It’s organized into 19 columns, covering not only the text of each verse but also contextual and structural metadata.

Highlights of the Dataset:

Surah Information: Includes the number and names of each surah in English, Arabic, and Romanized form.
Ayah Content: Arabic text (ayah_ar), English translation (ayah_en), and a tokenized list of words (list_of_words).
Structural Metadata: Information such as juz_no, ruko_no, manzil_no, and hizb_quarter.
Contextual Metadata: Includes the place of revelation (Meccan or Medinan) and whether the verse includes a prostration instruction (sajah_ayah).

Initial Observations

After loading the dataset and performing a quick exploration, a few key points stood out:

Data Integrity: The dataset is clean and complete in most fields. The only column with significant missing values is sajdah_no, which is relevant to only 15 verses that include prostration.
Well-Structured Format: Each verse is traceable to its corresponding surah, juz, and other organizational divisions of the Quran, allowing for granular analysis.
Ready for Text Analysis: With tokenized words available, this dataset is well-suited for natural language processing tasks such as word frequency analysis, semantic search, and clustering.

Example: The First Verse

Here is a quick look at the structure of the very first verse:

Column	Value
Surah	1 – Al-Fatihah (الفاتحة)
Ayah	بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
Translation	In the Name of Allah—the Most Compassionate, Most Merciful
Juz	1
Place of Revelation	Meccan
Number of Words	4
Tokenized Words	[‘بِسْمِ’, ‘ٱللَّهِ’, ‘ٱلرَّحْمَٰنِ’, ‘ٱلرَّحِيمِ’]

Meccan vs Medinan Revelations

The column place_of_revelation allows for a fascinating comparison between Meccan and Medinan verses. Traditionally, Meccan surahs focus on foundational beliefs and the afterlife, while Medinan surahs address social and legal aspects. This dataset provides a framework for investigating those differences quantitatively.

Next Steps

This exploration is just the beginning. With this dataset, I plan to:

Visualize the distribution of ayahs across surahs and juz.
Analyze linguistic patterns and frequently used roots.
Explore thematic progression across the timeline of revelation.
Build interactive tools for Quranic study and learning.

Visualize the distribution of ayahs across surahs and juz.

I visualized the distribution of ayahs across Surahs and Juz in the Quran dataset by creating bar charts to show the number of verses in each Surah and Juz.

# Load the dataset again
file_path = "The Quran Dataset.csv"
df = pd.read_csv(file_path)

# Visualize the distribution of ayahs across Surahs
plt.figure(figsize=(10, 6))
surah_ayah_counts = df.groupby('surah_name_en')['ayah_no_surah'].count()
surah_ayah_counts.plot(kind='bar', color='skyblue')
plt.title('Distribution of Ayahs Across Surahs')
plt.xlabel('Surah Name')
plt.ylabel('Number of Ayahs')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

# Visualize the distribution of ayahs across Juz
plt.figure(figsize=(10, 6))
juz_ayah_counts = df.groupby('juz_no')['ayah_no_quran'].count()
juz_ayah_counts.plot(kind='bar', color='lightgreen')
plt.title('Distribution of Ayahs Across Juz')
plt.xlabel('Juz Number')
plt.ylabel('Number of Ayahs')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

Distribution of Ayahs Across Surahs

This chart shows how the number of ayahs varies across the surahs of the Quran. Some surahs contain more verses than others.

Distribution of Ayahs Across Juz

This bar chart illustrates the distribution of ayahs across the 30 Juz (sections) of the Quran.

Meccan vs Medinan Revelations

I visualized the distribution of Meccan and Medinan revelations in the Quran dataset using a pie chart, displaying the percentage of verses revealed in Mecca and Medina.

# Plot Meccan vs Medinan revelations using a pie chart
plt.figure(figsize=(7, 7))
revelation_counts = df['place_of_revelation'].value_counts()
revelation_counts.plot(kind='pie', autopct='%1.1f%%', colors=['lightblue', 'lightcoral'], startangle=90, legend=False)
plt.title('Meccan vs Medinan Revelations')
plt.ylabel('')
plt.tight_layout()
plt.show()

And here is the plot :

The Quran dataset serves as a powerful bridge between data science and spiritual study. Whether you’re a student of religion, a technologist, or both, this type of analysis offers a unique perspective on one of the most influential texts in history.

Dataset url : https://www.kaggle.com/datasets/imrankhan197/the-quran-dataset

Share the Post:

Wherever you look, there is always a dot.

Wherever you look, there is always a dot. With a gaze tracker, every movement of your eyes—every fixation, every jump—is translated into numerical coordinates. In my journey to become a Gaze Warrior, I explore the fascinating world of eye-tracking data, starting with the Random Saccade Task. This task captures rapid shifts in gaze as participants follow a series of target points on the screen. Join me as I delve into the dense stream of eye-movement data, revealing the intricacies of how our eyes navigate the visual landscape. Discover the insights waiting to be uncovered!

Al-Qur’an: Petunjuk Hidup yang Tak Pernah Usang

Al-Qur’an adalah kalam Allah yang diturunkan kepada Nabi Muhammad ﷺ sebagai pedoman hidup bagi seluruh umat manusia. Di dalamnya terdapat