I am heading off to Boston to the Northeast Regional Conference of the Social Studies (NERC) on March 18 and 19. I thought it might be appropriate to look at open source “primary source documents” which in history teaching means writings or other documents that were written at the time of an event, usually by a participant, not written later by scholars.

The Internet is opening up opportunities to access primary source documents without needing to go to a major research library where getting access may still be difficult because of the fragile nature of the materials.

Google, the Internet search giant, is making waves in this effort by creating Book Search scanning thousands of books and making snippet previews available even when the book is still under copyright restrictions. Many university libraries have entered into the project, giving Google permission to scan their collections. Certainly, go and look up your interests there.

What can you do to get personally involved, making more FULL TEXT primary sources available?

Project Gutenberg is a longstanding effort to convert books into ebook format in the public domain. It was the first project started with ebooks as a goal and predates the World Wide Web. The books that get included in the project are generally in the public domain because the copyright has expired. Copyrights run out every year, and as they do, more books become eligible to become part of the project.

Originally, I think the process involved people transcribing a book from paper to computer and then saving that book to text format. More recently, a new effort has begun to capitalize on the creative commons of the Internet. The Distributed Proofreaders project intends to spread out the burden of making books suitable for Project Gutenberg. Instead of having one person transcribe the book, electronic tools are brought into play.

A scanner – converting the paper pages to images
OCR Software – Optical Character Recognition software attempts to change the images of words into text

OCR conversion from the images is prone to error. Proofreading is needed. That’s where we humans come in. the Distributed Proofreader Project calls on us to provide page-by-page proofreading. One session need not accomplish any more, and, indeed, a tricky effort can be suspended in the middle. Clearly, the effort is voluntary, but a rough guideline is to have a personal goal of a page a day.

You could offer your services as a scanner of books, but the main volunteer effort is to proofread and the software runs entirely on the project servers. You gain access to pages through your browser. You only need to enable Javascript, cookies, popup windows for the site (popups let you have a window specially set up for proofreading while keeping the regular browser available for something like the FAQ page, for example.

Proofreading progresses through several stages with more than one person checking each text. The idea is to let an ad hoc team work through the task of getting a book converted. I am still a novice, but some people dedicate much time to the project and become more involved in the late stages of the conversion process, making final edits and submitting the books to Project Gutenberg where they become universally available.