Difference between revisions of "Treating the Traité"

From Mondothèque

Line 1: Line 1:
== Printing ==
+
== Scan Tailor ==
 +
 
 +
[[person::Tomislav Medak]] from Public Library project spends two days with us to demonstrate a workflow for digitizing books. I use the opportunity to look at the Traité through the lense of Scan Tailor, "an interactive post-processing tool for scanned pages"<ref>http://scantailor.org/</ref>.
 +
 
 +
First, I import the image files exported from the [http://lib.ugent.be/fulltxt/handle/1854/5612/Traite_de_documentation_ocr.pdf pdf] into Scan Tailor and let it treat the traité with all options set to 'automatic'. Some interesting artifacts:
 +
 
 +
 
 +
 
 +
== Printing the Traité ==
  
 
The <em>Traité de documentation : le livre sur le livre, théorie et pratique</em> is an almost hypertextual book on documentation, written in the 1930's by Paul Otlet. It has many cross-references, tables and illustrations; at times it is written in encyclopedic style, turns into a passionate manifesto, speculative fiction, and a practical manual for librarians. The pdf I have is badly OCR-ed and too heavy for reading comfortably on a digital device. So this morning I transformed the digital version into something that I can print at a copy shop.
 
The <em>Traité de documentation : le livre sur le livre, théorie et pratique</em> is an almost hypertextual book on documentation, written in the 1930's by Paul Otlet. It has many cross-references, tables and illustrations; at times it is written in encyclopedic style, turns into a passionate manifesto, speculative fiction, and a practical manual for librarians. The pdf I have is badly OCR-ed and too heavy for reading comfortably on a digital device. So this morning I transformed the digital version into something that I can print at a copy shop.
Line 23: Line 31:
 
Tomorrow I'll have the document printed and bound. Can't wait.
 
Tomorrow I'll have the document printed and bound. Can't wait.
  
== Sources ==
+
== Transcribing the Traité ==
Original scans http://lib.ugent.be/fulltxt/handle/1854/5612/Traite_de_documentation_ocr.pdf
 
OCR https://archive.org/details/OtletTraitDocumentationUgent
 
 
 
==Transcribing the Traité==
 
 
in progress on [http://fr.wikisource.org/wiki/Livre:Otlet_-_Trait%C3%A9_de_documentation,_1934.djvu Wikisource]   
 
in progress on [http://fr.wikisource.org/wiki/Livre:Otlet_-_Trait%C3%A9_de_documentation,_1934.djvu Wikisource]   
  
 
https://github.com/PaulOtlet/traite
 
https://github.com/PaulOtlet/traite
 
http://traite.czam.de/en/latest/otlet_traite_1934_FR.html#i-buts-de-la-documentation
 
http://traite.czam.de/en/latest/otlet_traite_1934_FR.html#i-buts-de-la-documentation
 +
 +
== Sources ==
 +
Original scans http://lib.ugent.be/fulltxt/handle/1854/5612/Traite_de_documentation_ocr.pdf
 +
OCR https://archive.org/details/OtletTraitDocumentationUgent

Revision as of 13:52, 5 February 2015

Scan Tailor

Tomislav Medak from Public Library project spends two days with us to demonstrate a workflow for digitizing books. I use the opportunity to look at the Traité through the lense of Scan Tailor, "an interactive post-processing tool for scanned pages"[1].

First, I import the image files exported from the pdf into Scan Tailor and let it treat the traité with all options set to 'automatic'. Some interesting artifacts:


Printing the Traité

The Traité de documentation : le livre sur le livre, théorie et pratique is an almost hypertextual book on documentation, written in the 1930's by Paul Otlet. It has many cross-references, tables and illustrations; at times it is written in encyclopedic style, turns into a passionate manifesto, speculative fiction, and a practical manual for librarians. The pdf I have is badly OCR-ed and too heavy for reading comfortably on a digital device. So this morning I transformed the digital version into something that I can print at a copy shop.

I started with extracting the images from the pdf with the help of the imagemagick convert command:

$ mkdir spreads

$ convert Traite\ de\ documentation\ -\ Paul\ Otlet.pdf spreads/%03d.jpg

Next I removed front- and back-cover (they will be treated separately), and also 113.jpg (pages 118-119 are repeated), then cut each spread in half:

mkdir pages

convert spreads/*.jpg -crop 2x1@ pages/%03d.jpg

The properties of the original pdf mention a paper size of 200 × 260 mm (and also that the file was created with ABBYY FineReader on Monday December 3, 2007 16:25:51 CET (This file is already 6 years old ...). I am not sure if the measurements refer to the size of the spread or the single page, but from the detailed description in the catalog of the Universiteitsbibliotheek Gent [2] I gather that pages are 26cm high, and will fit comfortably on an A4: 431, [12], viii p. : illus. ; 26 cm.

I then simply put all images back into a new pdf:

convert pages/*jpg traite.pdf

Tomorrow I'll have the document printed and bound. Can't wait.

Transcribing the Traité

in progress on Wikisource

https://github.com/PaulOtlet/traite http://traite.czam.de/en/latest/otlet_traite_1934_FR.html#i-buts-de-la-documentation

Sources

Original scans http://lib.ugent.be/fulltxt/handle/1854/5612/Traite_de_documentation_ocr.pdf

OCR https://archive.org/details/OtletTraitDocumentationUgent
  1. http://scantailor.org/
  2. http://lib.ugent.be/catalog/rug01:000990276#reference-details