textproc: Typesetting holy content

Kyrre Nygard kyrreny at broadpark.no
Fri May 26 09:36:02 PDT 2006


Hello!

I hope this is not too off topic.

I'm involved in some studies here, on the authority of holy scriptures.

I am trying to transcribe The Noble Qur'an, by some said to be the 
most elegant book
ever written, into LaTeX format. That way I can format it the way I wish,
and study it at my own premises.

I began to wget -m http://www.usc.edu/dept/MSA/quran/

Which gave me 001.qmt.html all the way up to 114.qmt.html.

Next, I ran this:

for i in `find -s . -name "*.html"`; do w3m -dump "$i" > 
"${i%.html}.txt"; echo "${i%.html}.txt"; done

And ended up with 001.qmt.txt all the way up to 114.qmt.txt.

Then, I took 001.qmt.txt, which looked like this:

--

USC
USC
Compendium of Muslim Texts

Fundamentals
Allah
Muhammad
Qur'an
Sunnah
Pillars

Special Topics
Economics
History
Human Relations
Law
Misconceptions About Islam
Politics

Tools
Qur'an Search
Hadeeth Search
Glossary

Translations of the Qur'an, Chapter 1:

                             AL-FATIHA (THE OPENING)

Total Verses: 7
Revealed At: MAKKA
Maududi's introduction

-------------------------------------------------------------------------------

001.001
YUSUFALI: In the name of Allah, Most Gracious, Most Merciful.
PICKTHAL: In the name of Allah, the Beneficent, the Merciful.
SHAKIR: In the name of Allah, the Beneficent, the Merciful.

001.002
YUSUFALI: Praise be to Allah, the Cherisher and Sustainer of the worlds;
PICKTHAL: Praise be to Allah, Lord of the Worlds,
SHAKIR: All praise is due to Allah, the Lord of the Worlds.

001.003
YUSUFALI: Most Gracious, Most Merciful;
PICKTHAL: The Beneficent, the Merciful.
SHAKIR: The Beneficent, the Merciful.

001.004
YUSUFALI: Master of the Day of Judgment.
PICKTHAL: Master of the Day of Judgment,
SHAKIR: Master of the Day of Judgment.

001.005
YUSUFALI: Thee do we worship, and Thine aid we seek.
PICKTHAL: Thee (alone) we worship; Thee (alone) we ask for help.
SHAKIR: Thee do we serve and Thee do we beseech for help.

001.006
YUSUFALI: Show us the straight way,
PICKTHAL: Show us the straight path,
SHAKIR: Keep us on the right path.

001.007
YUSUFALI: The way of those on whom Thou hast bestowed Thy Grace, those whose
(portion) is not wrath, and who go not astray.
PICKTHAL: The path of those whom Thou hast favoured; Not the (path) of those
who earn Thine anger nor of those who go astray.
SHAKIR: The path of those upon whom Thou hast bestowed favors. Not (the path)
of those upon whom Thy wrath is brought down, nor of those who go astray.

Sponsored by the MSA.

--

And transformed it into LaTeX format:

--

\documentclass[11pt,a4paper,oneside,english]{book}
\begin{document}

\title{The Noble Qur'an}

\tableofcontents{}

\chapter{AL-FATIHA (THE OPENING)}

001.001 In the name of Allah, Most Gracious, Most Merciful.
001.002 Praise be to Allah, the Cherisher and Sustainer of the worlds;
001.003 Most Gracious, Most Merciful;
001.004 Master of the Day of Judgment.
001.005 Thee do we worship, and Thine aid we seek.
001.006 Show us the straight way,
001.007 The way of those on whom Thou hast bestowed Thy Grace, those 
whose (portion) is not wrath, and who go not astray.

--

Basically what I did manually on the first file is what I intend to 
do automatically
with all the other files. The format remains the same, however the 
quantity of text will differ.

The process, to be done on each of my now *.txt files, would look 
something like this:

1
Cut out everything before line 27.

2
Take line 27, and embody it. So if line 27 says "HELLO", it will become:

\chapter{HELLO}

3
Cut out everything preceding line 27 until a NNN.NNN (verse 
indication) appears.

4
Join the NNN.NNN with the below line and cut out "YUSUFALI:"

5
Join all lines below the "YUSUFALI:" line ...

6
Until the "PICKTHAL:" line appears. Then, delete it and all below 
lines until the next NNN.NNN appears.

The reason is that the University of California compilation displays
three different english translations and I'd only be interested in 
the first one.

For instance, this:

--

004.054
YUSUFALI: Or do they envy mankind for what Allah hath given them of his bounty?
but We had already given the people of Abraham the Book and Wisdom, and
conferred upon them a great kingdom.
PICKTHAL: Or are they jealous of mankind because of that which Allah of His
bounty hath bestowed upon them? For We bestowed upon the house of Abraham (of
old) the Scripture and wisdom, and We bestowed on them a mighty kingdom.
SHAKIR: Or do they envy the people for what Allah has given them of His grace?
But indeed We have given to Ibrahim's children the Book and the wisdom, and We
have given them a grand kingdom.

--

Would simply become this, in one line:

--

004.054 Or do they envy mankind for what Allah hath given them of his 
bounty? but We had already given the people of Abraham the Book and 
Wisdom, and conferred upon them a great kingdom.

--

7
When the next NNN.NNN appears, treat it like the rest.

Thank you! Really! For bearing with me so far!

Indeed, this is what I wish to achieve.

I realize now though, after writing all this down, that it might be 
too much for some.
I hope that is not the case with he who has been endowed with the 
ability to help me.

All the best,
Kyrre



More information about the freebsd-questions mailing list