The Bookmaking Robot

Friday, November 6th, 2009 by Harry Lewis

I have been negligent in not commenting on the Harvard Book Store’s marvelous print-on-demand engine, dubbed Paige N. Gutenborg. For those of you in the area, the store is right across Massachusetts Avenue from Harvard Yard, and the press is right ¬†on the main floor — just keep walking straight ahead to the back of the store. In a few minutes, you can have any public domain book printed that is available via Google Books. Some copyrighted works are available too, but the big buzz is over the access to copies of old books, many in foreign languages, of which only a few libraries may have copies. What you get is just what you want — a printed book, on good paper, bound and trimmed, and with a full-color soft cover in the original design. The machine prints, binds, and trims, in only a few minutes. And for only $8 per book.

While I was watching this process a couple of days ago, the book being printed was an old French text — a professor had ordered copies for his class. A nearby shelf has a variety of other samples.

The first book printed on this press was a copy of the first book printed in North America — the 1640 hymnal, The Bay Psalm Book. In a wonderful loop of history, it had been printed in Cambridge, only steps from where the Harvard Book Store printed the copy 369 years later.

Paige is fascinating to watch. Even more than its (her?) marvelous automation, it is simply energizing to witness the bits, coming from heaven knows where, becoming atoms in front of your eyes. The imagination runs wild. Maybe, if I ever move, I’ll just throw all my books out and have new copies printed of anything I discover I actually want. I can’t find half the books I own anyway. If you don’t like that fantasy, come up with a better one of your own.

I have to congratulate Jeff Mayersohn, the president of the store. He has seen independent book stores die, one after the next. Even the Harvard Book Store, which offers outstanding service and a knowledgeable staff and is operating in a book-loving community if any such still exists, must have felt threatened. He’s decided to make the technology work for him rather than kill him. Good for him and good for the store. I wish them the best.

Bonus for those of you able to drop in: the trimmed edges are there for the taking. They are bound booklets of blank pages, an inch or so tall and six or eight inches wide. Kids can create their own books by writing or drawing on the pages. How neat is that?

Do It Yourself Book Scanning

Saturday, October 10th, 2009 by Harry Lewis

I am just back from D is for Digitize, James Grimmelmann’s conference on the Google Books Settlement at the New York Law School, where he teaches. It was an excellent meeting, about which I will report more in a followup post. But there was one clear star in the day-and-a-half of panels, each with four or five speakers. The prize goes to Daniel Reetz of — that is, Do It Yourself Book Scanner. Reetz, a book freak and mechanical genius, figured out how to make a book scanner out of stuff you can find in dumpsters, or buy cheaply, including off-the-shelf, cheap digital camers. He has put the instructions online so we can all build our own. It’s amazing to see — Reetz demonstrated it, and then in just a few minutes folded it up and put it into a bag small enough to take on an airplane. It works fast — and it’s really well designed. The slowest part is turning the pages, which you do by hand.

This is the equivalent for books of the tape recorder for music and the VCR for movies. We can all digitize our own books and throw them away now.

Or make them publicly available. I said “can,” not “may” or “should.” But the existence of the device has the potential to raise lots of the same kinds of questions those other duplicating technologies raised. It empowers individuals, and enough empowered individuals could produce a Wikipedian digital library, collectively assembled, imperfect and incomplete, but growing and expanding.

While everyone else at the conference was ruminating about whether Google had a library monopoly or whether Amazon or Microsoft might imaginably be able to compete, along comes this dude with his Rube Goldberg contraption and says, hey, let’s all just start doing it, and we’ll catch up eventually.

Astonishing idea. At the conference, it took about thirty seconds for an author to ask Daniel why he should ever write another book, since the first person who bought it could instantly make it available to the entire world. Of course, Daniel replied something like what he has on the Web:

I¬†love books. There is some truly fantastic knowledge and information hidden out there in hard to find, rare, and not commercially viable books. I find that I want my books with me everywhere. But that’s where the problems begin. Buying, moving, storing, and preserving books means environmental costs… and when I loan a book to a friend, I no longer have access to it.

Digital books change the landscape . After suffering through scanning many of my old, rare, and government issue books, I decided to create a book scanner that anybody could make, for around $300. And that’s what this instructable is all about. A greener future with more books rather than fewer books. More access to information, rather than less access to information. And maybe, years from now, a reformed publishing/distribution model (but I’m not holding my breath…).

Check it out. And if Daniel comes to a show near you, go see him. He’s cool in the way many a game-changing techologist has been cool.

Added October 11: I’ve received two interesting pointers since posting the above. First, an account of how Google’s scanner works; and second, a pointer to Snapster, commercial software for using your digital camera as a cocument scanner. The point is that there are a lot of things that are possible in this space, in mechanical design, image analysis, and coding, and it’s going to be interesting to see if Reetz can build a open community around his scanner, contributing both engineering and content.

A Harvard Skirmish in the Copyright Wars

Sunday, September 27th, 2009 by Harry Lewis

Andrew Magliozzi, who graduated from Harvard College in 2006, runs the website. It hosts lecture notes and study groups for Harvard courses. At the moment you can get the nickel precis of Harvard librarian and history professor Robert Darnton’s course on the history of the book, as written up, lecture by lecture, by a pseudonymous note taker.

The Arlington Advocate has a good story about the site, motivated by the fact that Magliozzi is a graduate of the Arlington (MA) high school. (He also happens to be the son of Ray Magliozzi of Car Talk fame.) Magliozzi portrays the site as a nonprofit devoted to open access education.

Problem is, there’s an argument that what professors say in class is their intellectual property. After all, if they just read their own lecture notes, then their words have been “fixed in a tangible medium,” to quote the Copyright Act. So the professor automatically holds the copyright, and Magliozzi, or his note-taking helpers, are violating it. By that logic, it’s the same thing as listening to a song being sung, transcribing it, and posting the notes and lyrics on your web site. Copyright violation.

Magliozzi, according to the Arlington Advocate story, seems to be counting on leniency because his site is a non-profit.

Harvard has not been helpful to Mr. Magliozzi. According to a Crimson story from February, the university’s Office of the General Counsel informed him that “under the federal Copyright Act of 1976, a lecture is automatically copyrighted as long as the professor prepared some tangible expression of the content‚Äînotes, an outline, a script, a video or audio recording.”

This all reminds me a bit of the birth of Facebook. There was discussion within the Harvard administration and information technology office of a project to create an online version of the printed facebooks that Harvard had had for decades. By the time the discussions had advanced very far, some students had just gone ahead and created one.

It is the sort of wrinkle in the law and technological evolution that will keep lawyers and programmers both busy for several years. Follow the logic a little further and you get where the University of Texas is, advising its faculty thus:

Licensing Students to Create a Derivative Work

Many students probably create a work that would infringe a faculty member’s copyright, that is, they base their notes on and incorporate her particular expression rather than just state facts and ideas she articulates in more detail. Faculty members have always permitted this kind of activity without actually talking about it. They “implicitly” license students to create a “derivative work” from the lecture. The license is implied through academic tradition — students are expected to take notes. Now faculty may wish to make the implied license explicit and add some restrictions. Written and verbal instructions at the beginning of class could look something like this:

“My lectures are protected by state common law and federal copyright law. They are my own original expression and I record them at the same time that I deliver them in order to secure protection. Whereas you are authorized to take notes in class thereby creating a derivative work from my lecture, the authorization extends only to making one set of notes for your own personal use and no other use. You are not authorized to record my lectures, to provide your notes to anyone else or to make any commercial use of them without express prior permission from me.”

A limited license to take notes could be very important to protecting the intellectual content of lecture materials that embody the faculty member’s unfixed lecture and unpublished research, among other things.

Doesn’t sound much like a temple of the free exchange of ideas.

At least a few of us at Harvard are going to the opposite extreme. In a project being mounted in honor of the centennial of the Harvard Extension School, several of Harvard’s popular courses, including my own Bits course, will be, in large measure, given away free. (The Dean of Continuing Education, Michael Shinagel, announced this at the Centennial Convocation on Friday, September 25. Stay tuned for more details.)

One final note. As the Advocate notes, the FinalsClub site is named after Harvard’s old social clubs, which are called Final Clubs, because there once were Waiting Clubs for freshmen and sophomores while they waited to become members of the Final Clubs. “Finals” in Harvard lingo are the 3-hour final examinations in courses, and since more students talk about exams than clubs, over the years “Final Clubs” have come to be known as “Finals Clubs,” as though they were clubs for exam preparation. Now we have a FinalsClub site which legitimately could be thought of as an exam-preparation aid. There–I’ve said it. Perhaps, now fixed in a tangible medium, the etymological history will be remembered.

Addendum: Here is an example that explains how odd it seems for professors to be exercising intellectual property rights over students’ notes of their lectures. The original note-taker was Plato. Without him, the teachings of Socrates might not have survived, and Western philosophy might have been a very different animal.

Objections to the Google Book Settlement

Friday, September 4th, 2009 by Harry Lewis

I have blogged several times about the Google Book Settlement (type “settlement” into the search window to bring up the posts). To be brief: Google started scanning books, including copyrighted works; organizations representing authors and publishers of copyrighted books sued Google for copyright infringement; the three parties have worked out their differences behind closed doors, producing a very long settlement agreement, which is now public; the matter now sits on the desk of a federal judge, Denny Chin, who must either approve or disapprove the settlement document (he cannot edit it). Because it is a class action lawsuit, members of the class are invited to tell Judge Chin what they think. The deadline for that is today. A number of objections have been filed; it appears that a group of distinguished authors and academics, including Jacques Barzun, Harold Bloom, and Harvard colleagues Steven Ozment, Mary Ann Glendon, and Ruth Wisse is also planning to file (pdf of notice here).

(There are other legal strategies for opposing the settlement. Microsoft, Yahoo, and Amazon are all lobbying the Department of Justice to oppose the Settlement, on anti-trust grounds. Amusing as it is to see Microsoft warning about the possibility of another company becoming a monopoly, it is quite correct in that fear.)

I am grateful to Lewis Hyde for permission to reproduce his eloquent objection immediately below. I have filed a brief objection myself, which I include below Lewis’s.

The Honorable Denny Chin
% Office of J. Michael McMahon, Clerk
U.S. District Court, Southern District of New York
500 Pearl Street
New York, NY  10007

August 31, 2009

Dear Judge Chin:

I write to object to some of the terms of the settlement that has been proposed by the litigants in Case No. 05 CV 8136, The Authors Guild, Inc., et al. v. Google Inc.

I am a member of the Author Sub-Class in this lawsuit.  The University of Michigan library lists eight books of mine that we may assume were digitized by Google in the course of their creation of Google Book Search.  These include The Gift (Random House, 1983), This Error is the Sign of Love (Milkweed Editions, 1988), and Trickster Makes this World (Farrar, Straus, 1998).

As an author I am also a reader, a user of libraries, and a beneficiary of the public domain.  I say this because I believe that the settlement in question amounts to a major intervention in our national cultural policy, one that will affect the U.S. knowledge ecology for generations to come.  It therefore should not be adjudicated upon the assumption that we authors (and our publishers) are rights holders only.  We are cultural citizens as well; our copyrights matter to us, but so do larger questions of how literature and knowledge circulate among us.

It is my understanding that courts hold fairness hearings in class action law suits in order to determine whether all members of the class find the proposed settlement fair, adequate and reasonable.  While I applaud many of the elements of the settlement in question, I am nonetheless troubled by several others.  To be specific:

*¬† I object to the settlement’s proposed capture of income from orphan works.¬† I can think of nothing in the history of copyright law, or in the law as currently written, that would countenance such a taking.¬† As a 1988 House of Representatives Report stated clearly:

Under the U.S. Constitution, the primary objective of copyright law is not to reward the author, but rather to secure for the public the benefits derived from the author’s labors.¬† By giving authors an incentive to create, the public benefits in two ways:¬† when the original expression is created and…when the limited term… expires and the creation is added to the public domain. [H.R. Rep. No. 100-609 at 17]

In no instance are third parties meant to benefit, as the settlement would allow.¬† The primary beneficiary of a copyright grant is the author and, as copyright expert Melville Nimmer once wrote, “the ultimate beneficiary is the public domain.” [Nimmer on Copyright III at 13]¬† To allow other parties to intervene between the author and the public would be like allowing an executor to drain an estate before distributing it to the rightful heirs.

The settling parties must therefore find some better way to dispose of the unclaimed funds that accrue from orphan works.¬† My own suggestion would be for the court to appoint a guardian or trustee charged not just with the task of representing absent owners but with a mandate to do so in the light of copyright’s traditional double focus:¬† rights holders must be given their due and, where no rights holder can be found, the public domain must be the beneficiary.

*  I object to the monopoly powers that Google and the Books Rights Registry will acquire, should the court approve the orphan works elements of the settlement.  Approving the settlement as it stands will in essence grant the settling parties a compulsory license enabling them to exploit the commercial value of orphan works.  Because of the unique nature of class action litigation it will be virtually impossible for any other digital library or search service to receive such an exemption.  Google will thus be in a position to monopolize this important part of our emerging knowledge economy.

Again, history makes it clear we should be wary of such broad powers in regard to the circulation of knowledge.¬† Copyright has long been classified as a monopoly privilege and in the context of expression, as Lord Macaulay famously said, “…monopoly is an evil.¬† For the sake of the good we must submit to the evil; but the evil ought not to last a day longer than is necessary for the purpose of securing the good.” [Misc. Works (1880) at 233]

In our own tradition, as you may know, Thomas Jefferson believed that the Constitution ought to have prohibited monopolies in general and, if an exception were to be made for copyright, believed that it should be strictly limited.¬† In a letter to James Madison dated 28 August 1789 Jefferson suggested the following addition to the Bill of Rights:¬† “Monopolies may be allowed to persons for their own productions in literature…, for a term not exceeding ‚Äì‚Äì years, but for no longer term, and no other purpose.”¬† (As for the term, Jefferson’s usual suggestion was 19 years.)

In short, monopoly privileges in the world of public expression have been viewed with extreme skepticism ever since the first appearance of a public sphere in the eighteenth century.  It would be dangerous indeed to grant them now to the private parties who propose this settlement.

*¬† Finally, I object to the fact that no representative of libraries or of the public interest will be a voting member of the board of directors of the Books Rights Registry.¬† The Registry promises to be an important player in the nation’s cultural environment; if it is called into existence, its directors must represent the users as well as the owners of proprietary content.

In sum, while I support many elements of the settlement as currently drafted, in regard to the items I have here listed I do not consider it to be fair and reasonable.  I urge you to take these issues into account as you approach the difficult task of deciding whether or not to approve the settlement.

Yours sincerely,

Lewis Hyde

And here is my own letter to the judge.

The Honorable Denny Chin
% Office of J. Michael McMahon, Clerk
U.S. District Court for the Southern District of New York
500 Pearl Street
New York, NY  10007

September 4, 2009

Dear Judge Chin,

I write to object to some of the terms of the settlement that has been proposed by the litigants in Case No. 05 CV 8136, The Authors Guild, Inc., et al. v. Google Inc.

I am a member of the Author Sub-Class in this lawsuit.  At least four books of mine have pesumably been digitized by Google in the course of the creation of Google Book Search: Excellence Without a Soul: How a Great University Forgot Education (PublicAffairs, 2006), Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion (co-authored by Hal Abelson and Ken Ledeen, Addison-Wesley, 2008), Data Structures and their Algorithms (co-authored by Larry Denenberg, HarperCollins, 1991), and Elements of the Theory of Computation, 2nd Edition (co-authored by Christos Papadimitriou, Prentice Hall, 1998).

I am Gordon McKay Professor of Computer Science in the School of Engineering and Applied Sciences at Harvard University. As a scholar, I am not only a writer of books but a reader, researcher, and teacher who uses books daily.

While I support many aspects of the proposed settlement, I object to the proposed settlement because it threatens to create an unregulated digital book monopoly. Specifically, it would grant to Google and the Books Rights Registry legal license to profit from orphan works (copyrighted works whose rights holders are unknown, and who therefore cannot set what the proposed settlement refers to as a “Specified Price”). It is unfair for these parties to profit from works they had no role in creating. Moreover, the proposed Settlement would grant Google the authority to set the price of orphan works without any outside review or regulation. In other commercial domains where monopolies have arisen (electric power and telecommunications, for example), some public body has overseen the pricing structure. It would be unfair for the Court to sanction the creation of a private information monopoly, in which Google could, without fear of competition or regulation, fix prices for works to which it does not hold rights.

Thank you for your attention.

Sincerely yours,

Harry R. Lewis

Blown to Bits in Google Books

Tuesday, August 25th, 2009 by Harry Lewis

Google Books has launched a special collection of digitized books which are available for free download under Creative Commons licenses — including Blown to Bits, in particular. It’s a slick interface, nicely searchable, and the table of contents is hyperlinked, section by section, to the text. We’re in fine company here — other authors include Jonathan Zittrain, Carl Malmud, Cory Doctorow, and Lawrence Lessig.

The Audacity of the Google Books Settlement

Tuesday, August 11th, 2009 by Harry Lewis

That is thee title of a superb column by Pamela Samuelson explaining some (but only some) of the worries about the proposed settlement of copyright infringement claims against Google for scanning copyrighted works. She explains the perverse incentives to both parties to this litigation. In a word, each realized that they could become literary monopolists if they played their cards right with each other.

That is exactly the reason why the federal judiciary gets involved in settlements that private parties have negotiated with each other in class action cases. There is too much risk that the parties will find a way to divide the pie between themselves in a way that does not serve the public well.

And, of course, the public would gain much from the settlement. Advocates for the disabled are urging the judge to approve it because it would expand access to works that can be mechanically vocalized. And so it would, at a huge cost o competition, openness, privacy, and various other pitfalls.

It may not matter, if the Department of Justice decides the settlement has serious anti-trust implications, as it certainly seems to. (You can read the DOJ’s curt letter to Google at that site, thanks to DocStoc.)

Apple Censors the English Dictionary

Wednesday, August 5th, 2009 by Harry Lewis

Hard on the heels of Amazon reaching into the homes of Kindle owners and snatching copies of Orwell’s 1984 off their devices, we have a stunning reminder that Apple’s iPhone is also a tethered device, and nothing goes on it that Mother Apple doesn’t want on it. Application developers have to go through a certification process to get their apps approved for the iPhone, and among the standards applied by the certification team are prohibitions on obscene and pornographic material. On that basis, Apple refused to certify the Ninjawords dictionary until the developer removed words such as “shit” and “fuck” that appear in every standard dictionary of the English language. John Gruber, the author of the linked-to post, points out that some of the banned words appear in the King James bible, and some, such as “ass,” “cock,” and “screw,” have inoffensive meanings which are equally unavailable to the iPhone users of the dictionary. Even after the developers scrubbed every word that had a sexual meaning, Apple insisted that the dictionary carry an “age 17 and over” classification.

No unexpurgated dictionary on the iPhone? No dictionary at all for 16 year olds, lest they find a word with sexual connotations? I’ve been thinking about getting one, but this is too much, no matter how neat they are. We don’t want our consumer electronics suppliers to be the arbiters of public morality, because in 21st century America the least common denominator will be down somewhere near the level of Saudi Arabia.

DOJ Questions the Google Books Settlement

Friday, July 3rd, 2009 by Harry Lewis

The Department of Justice has now confirmed rumors that it was taking an interest in the draft settlement between Google and the Authors and Publishers, now before federal judge Denny Chin (who just sentenced Bernie Madoff to 150 years). Presumably the question for the DOJ is whether the proposed settlement is anti-competitive; Google responds “It‚Äôs important to note that this agreement is non-exclusive and if approved by the court, stands to expand access to millions of books in the U.S.‚Äù Which is true, but may well not be sufficient to avoid anti-trust issues.¬†¬†See the Digital Daily post here, which includes a link to the actual correspondence between the government and Judge Chin. Judge Chin notes that he is still planning to hold a Fairness hearing on October 7, and if the government wants its views known in writing, it has to submit something by September 18.


Friday, June 26th, 2009 by Harry Lewis

The Register has a fascinating report on a new phenomenon, arising from the conjunction of stiff copyright laws and the zero-cost copying those laws were meant to combat, insofar as the works copied were under copyright. People are making copies of works in the public domain and slapping their own copyright notice on them, and then charging money for them. The article describes the use of this technique for some 19th century Japanese books. But why would anyone pay for them when they are in the public domain? Because it may be safer to do so rather than run the risk that you are wrong about the claimed copyright ownership. This scam hits universities hard, because they have proved to be attractive targets for copyright lawsuits and are likely to err on the side of paying (or, to be specific, having their students pay).

But what could be the business model for the scammers? After all, what if they publish books and no one buys them? No problem — they issue the books as print-on-demand volumes through Booksurge. They have no costs until the first copy gets ordered. There is not much incentive for Amazon (which owns Booksurge)_ to crack down.

We blogged awhile back about the Obama administration’s misunderstanding of the fact that White House photos are in the public domain (The White House Confused PhotoStream). No scam intended there, to be sure, but it’s an indicator of how the public domain will continue to get restricted if people don’t fight back. Oddly, Creative Commons (under which Blown to Bits is licensed for free download on this site) is now getting into the act, apparently on the wrong side.As the Register reports,

Now Creative Commons seeks expanded authority to administer the Public Domain, by issuing a “Creative Commons Public Domain License,” as if it was a sublicense of its own invention. Creative Commons is trying to expand its licensing authority over not just newly created works, but all public domain works.

Very odd. I hope someone will correct the Register, if they have the story wrong, or correct Creative Commons, if it’s right.

Added June 29: Creative Commons says the Register is wrong. CC says,

Creative Commons does not have any “authority to administer” the public domain, whatever that means. Our public domain tools are not licenses — there is no “Creative Commons Public Domain License”. CC0 is a waiver that allows a copyright holder, to the extent possible, to release all restrictions on a copyrighted work worldwide. The Public Domain Certification facilitates clearly marking works already in the public domain as such. We also don’t have “licensing authority” over newly created works. All of our tools are voluntary and have an over-arching goal of expanding the commons, more specifically the public domain in the case of CC0 (as much as possible) and the Public Domain Certification (the effective public domain, by making existing public domain works more clearly marked, including with metadata, making them more available and discoverable).

Rising Interest in Orphan Works

Saturday, April 18th, 2009 by Harry Lewis

The discussions about how the Google Book settlement proposes to handle orphan works have expanded. A small group of which I am a member have formally sought to intervene. So has the Internet Archive. Today the NYT Bits Blog has a brief explanation, and some good commentary.

There have also been three articles that take up the settlement in a more serious way:

Randy Picker, “The Google Book Search Settlement: A New Orphan-works Monopoly?” Picker is an anti-trust lawyer. It’s a longish paper (though not by law review standards), but the first few pages provide a good summary.

Pamela Samuelson: “Legally Speaking: The Dead Souls of the Google Book Settlement.” An excellent, clear, short critique of the settlement. Easy to read for the layperson, highly recommended. This will be Samuelson’s column in the July issue of the Communications of the ACM.

James Grimmelmann, “The Google Book Settlement: Ends, Means, and the Future of Books” (pdf, 17 pages). An issues brief, thoughtful and analytical and complete.

I urge anyone interested to read the Samuelson piece in particular.