Transcription of Souvenir History of Pella

Here are the steps to work on this project. Thanks so much for contributing.

  1. Currently, the first 74 pages have had pdf files created for them. If you wish to work on this project, please email the coordinator (see footer) and let him know. He will then suggest a range of pages to consider and a note will then be added here indicating that what pages are left.
  2. Go to http://babel.hathitrust.org/cgi/pt?id=njp.32101078164074;view=1up;seq=5 to find the digitized book online. Toward the center top there is a box to enter a page and then "Jump To". Enter in your starting page here and press Go.
  3. You should then see the page confirmed in the header of the book (unless at beginning of chapter). On the left of the screen click (or perhaps command or control click so result goes to a new window) on "Download this page PDF". (Yes, it would be nice to download the whole book at once, but we don't have access.) After a bit, your screen will show the pdf of the entire page. There should be a way to save the file as a pdf. In Chrome, there are icons toward the bottom or there may be menu items, depending on how you save pdfs within your browser. Save the page as simply "pxx.pdf" where xx is the page number. Save it to your computer in a separate folder. Later, I will want all these files, as well as the ones generated from these.
  4. The next step is to transcribe the page. While in the past this was literally done by hand, we now have tools available to us. If you go to http://www.newocr.com/, this is a free website that does a fairly good job of applying OCR (optical character recognition) to the pdf file. You can use this web site for X minutes before you need to take a break for 5 before resuming - it is free. Once at this site, Click on the button that says "Choose File" and navigate to where you stored your file(s). (Note: you may actually wish to download a batch and then OCR a batch or you can do them in sequentially.) Once the file has been chosen, click on the blue Upload button. This will take a few seconds. Then click on the blue OCR button. Again, this will take a few seconds. Then, below the image of the page will be a rectangle that contains the actual text of the page, to the best of the software's ability to convert the collection of pixels into readable letters. Photographs will largely be ignored though there may be some stray characters.
  5. The next step is to take the text just produced and copy it out of the browser's rectangle and copy it into a text file. If you are a Mac user, please use Text Edit or other text editor (but not Word). If you are a PC user, please use NotePad or WordPad (but not word). Paste into your text editor and save it as "pxx.txt" where xx is the same page number from which it came. At this point, if you are ambitious, it would be great to do some proofreading and correcting errors. Do not correct spelling errors that the book has but rather produce them as they are. (However, if the book really does have a spelling error, it would be OK to add something like [sic, actual_spelling] - in brackets and starting with sic.) Mostly it will do fairly well but there are some issues. In particular, you can fix any hyphenations from end of the line breaks. Scan the text for other obvious mistakes. Remove the header at the top completely.
  6. When done, email the files (both txt and pdf) to the coordinator's email below. THANKS!
  7. If you find that you need clarifications here, please let me know and I will add them here as well.