Plain Text Files in the Dunyazad Digital Library

I provide the plain text versions of the Dunyazad Library books for four reasons:

– if PDF is not a good or valid choice for the reading device that you use;

– if you want to perform complex search operations, using regular expressions, that are not supported by your e-book reader;

– if you want to convert the files to HTML, ePub or Mobi (see below);

– if you want to save the money or the troubles for obtaining a Library Card.

The Dunyazad Plain Text Format

I use a tagged text format which is deliberately kept very simple – for lack of a better name, let’s call it Dunyazad Text Format. It is not a markup language, and it is not intended to support typographical or layout niceties – its main purpose is to be readable “as is” in plain text, without demanding any learning from the reader, but it provides some essential formatting options that help with reading simply structured text as will usually be found in fiction.

A line of text is a paragraph.

A blank line is a blank line.

A line consisting only of one ~ character is a separator within a chapter (scene break).

» This is heading level 1.

»» This is heading level 2.

»»» This is heading level 3.

The first line is the title, it has to be a level 1 heading.

Level 1 headings (except for the title) are preceded by two blank lines and followed by one blank line.

Level 2 and 3 headings are preceded by single blank lines.

Quotes, lyrics are preceded and followed by single blank lines.

A line before the first blank line that begins with "by" (case sensitive, followed by a space) contains the name of the author.

The underscore character toggles normal/ialic text. Line (paragraph) ends do not reset italic to normal.

Footnote references are enclosed in curly brackets { }.

Footnotes are enclosed in curly brackets, beginning with the reference number or symbol followed by a colon and a space. A footnote may have several paragraphs.

Footnotes immediately follow the paragraphs in which they are referenced, or, in the case of poetry, the poems.

Character set

The character set is Windows 1252, but, unless the text contains non-standard (e.g. German, French or Spanish) characters, only four symbols are used that are not also part of the standard ISO-8859-1 character set: typographic quotation marks and apostrophes. If your system does not display these characters correctly, you can easily replace them using any text editor.

M-dashes are represented as double hyphens, and ellipses as three dots. If your system supports Windows 1252, you can easily restore the typographically correct symbols. With monotype fonts double hyphens and three dots improve readability.

Symbols outside the scope of Windows 1252 are replaced by standard Latin characters. Text in non-Latin alphabets (e.g. Greek) is lost, as are illustrations, and some typographical or layout details.

Converting to HTML

You can download a tool to convert Dunyazad Text files to HTML. dthtm.exe has to be run from the Windows command line. You find the details in the file dthtm.txt that is included in the download zip file.

Download dthtm (exe, doc and source files). No installation required, just run dthtm.exe with the appropriate arguments. You can safely ignore or delete the two source code files.

Current version: 1.01 (06/08/2015)

Requirements: Any not too ancient version of Windows, and a basic knowledge of how to use the command line.

You may use dthtm freely for non-commercial purposes, but you may not re-distribute the program without my permission.

If you want to create your own Dunyazad-formatted text files, please note:

For headings you can use one of the symbols .$*#= instead of »

For scene breaks you can use * * * instead of ~

An URL gets converted to a link if it is a line by itself. It can start with "http://" or "www.", must not contain spaces, but may contain underscore characters.

Converting to ePub or Mobi

With a Library Card you can download ePub and Mobi versions of the books in the Library, but you can also create them yourself. From the HTML format (see above) you can (for instance) use calibre to convert the file to any of your favorite e-book formats, e.g. ePub or Mobi. When configured correctly (using //h:h1 and //h:h2 for heading levels 1 and 2), the resulting e-book will have a properly working TOC.

From the command line, you can use these commands (replace "source" and "target" with the file names):

ebook-convert source.htm target.epub --epub-inline-toc --level1-toc //h:h1 --level2-toc //h:h2 --page-breaks-before //h:h1 --language en

ebook-convert source.htm --level1-toc //h:h1 --level2-toc //h:h2 --page-breaks-before //h:h1 --language en

You can, of course, use the calibre GUI instead of ebook-convert.exe. You can also create an ePub file with LibreOffice, when you have installed the writer2epub extension:

Create a new text document, insert the .htm file, and export to ePub, after setting the appropriate meta data and preferences.


Any HTML code can be embedded in the text, except heading and paragraph tags.

Footnotes do not get linked, they appear as and where they are in the text.

Feel free to contact me with any questions, suggestions or comments you may have.


