Shortcovers adds 1.8 million titles from Internet Archives

Posted by Kobo December  14, 2009

If you’ve seen the release, you’ll know we are adding 1.8 million titles to the Shortcovers catalog through a partnership with the Internet Archives. Through their own scanning efforts and through relationships with other public domain repositories, they’ve amassed an incredible collection of free public domain works. We want to see those works in as many places, on as many devices as possible, so we fired up the development team and got to work. For those of you really interested in the details, here are some extra tidbits below:

What Books, From Where?
* These titles have been scanned from 120 libraries in 5 countries.
* 180 languages
* Roughly 400 million pages
* Adding about 1,000 new titles every day

What Formats?
This week (tomorrow, if all goes well) we’ll have all of the PDF and downloadable ePUB titles available, so that you can download them for desktop or eInk device reading. Mobile access through the Shortcovers app is going to take a little longer — Jan/Feb is the plan right now — as we make some tweaks to our library-in-the-cloud to support downloading/archiving from Internet Archive.

The PDFs are especially cool because they’re the page scans for the books. Some of them are beautiful, some are very old, very weird, or both. You can see marginalia and notes from library patrons long gone, as if you’d just pulled the title off the shelf. (On the other hand, they are big files – 30-40Mb!)

The downloadable ePUB files are OCR’d versions of the original scans. Like all OCR endeavors, much depends on the quality of the source material. Plenty of them are great. Some of them are a little odd. Pushing a 17th century wood-cut type book through an OCR program, no matter how good, is going to result in a bit of weirdness. If you download an ePUB that is difficult to read, check out the PDF — you may find there is a good reason for it.

What Can You See Through Shortcovers?
Tomorrow, we should have everything in English loaded up. Over the next few weeks, we’ll be adding the other languages as well. Our search engine needs a little tweaking to properly index non-English titles and authors.

DRM?
Nope. The PDFs are straight PDFs — read ‘em wherever you like. Same for the ePUB files.

Browsability
We’re working on it. For the English titles, we need to construct a usable Library of Congress-to-BISAC subject code mapping*. Not an easy thing, but definitely do-able. The non-English is going to be a bit trickier, but one way or another, we’ll find a way to make the majority of them perusable. We’ll keep you posted.

Why?
Because, at the end of the day, we’re sold on this idea that you should be able to find and read every book, no matter where you are. Some of them we hope to sell you. Others are free and you should be access to them quickly and easily. We’d like to be the place you come for both.

Technical Backstory
We started working with Internet Archive after I presented at their conference “Making Books Apparent” in October, when Peter Brantley and Brewster Kahle unveiled the BookServer project . We’re fans of OPDS and the work that BookServer is doing in terms of enhancing discoverability for books, so when Peter said that they had the whole collection in ePUB and ready to go, it seemed like a match made in heaven.

We are ingesting an OPDS feed that provides us with the catalog and updates. Internet Archive holds the files and serves them up on an as-needed basis. We index their catalog and merge it in with ours. As implementations go, it was the easiest 1.8M titles we’ve ever picked up. (Lawyers and agreements probably took longer than development ;-) ) The IA-to-mobile side is a bit trickier, but we’re on the case and wanted to make sure that people could start getting their hands on the books in the meantime.

We’re excited about this because we get to meet a number of objectives at the same time. Most importantly, Shortcovers readers get another 1.8 million titles to read. But we also reinforce the importance of open standards in ebooks, support Internet Archive in their goal of preserving digital resources, and showcase OPDS as a means of making ebooks discoverable. Smiles all around.

Enjoy!
We’re constantly working to bring you more great books. If you have any thoughts, feedback, or concerns, don’t hesitate to drop me a note here in the forums or directly at “mt at shortcovers dot com” or @mtamblyn via Twitter.


* This is the point at which the librarians in the house fall off their chairs laughing.

 

Leave a Comment

*
*
*