Listing entries tagged with pdf
1 | 2
digital editions 06.20.2007, 1:44 PM
Yesterday Adobe announced the release of their Digital Editions software. The software's been available in a beta format for a while; I downloaded it back then & didn't think it was interesting enough to write about. I've spent the past two days playing with the new release. I'm still not sure that it's worth attention, but I'll try to explain why it's not interesting.
What is Digital Editions? It's still a bit hard to tell. When I downloaded the beta version, it seemed to be a lightweight remake of Adobe Reader (née Acrobat Reader), Adobe's PDF viewer. The full release expands the capabilities of Digital Editions: in addition to being a PDF viewer, it's also a viewer for the new EPUB format. It also seems to be a front end for future web-based electronic book sellers, like Apple's iTunes for music. I'll go through each of these three uses in turn, but first a few notes on how Digital Editions works.
Digital Editions looks more like a web application than a desktop application. There are no menu bars to speak of, and its interface borrows nothing from the operating system. This is nice in that it feels like it's a reading environment: the interface is black-on-black, which should block out the distractions rampant on the desktop. Certainly there's none of the excess frippery that comes with Acrobat. However, the minimalism may be a bit excessive: it can be difficult to find black buttons and sliders to turn pages. (I'd be curious to see a review of the application from someone interested in accessibility for the disabled.) And some controls don't behave the way a user might expect: given a scrollbar along the right edge of a page, I expect to be able to click at a point where the slider isn't to move the slider. No such luck. Nor can you drag-select to change which part of the page is visible when the page is larger than the window, or drag a file into the window to open it.
Many of my problems with it stem from it not behaving like Mac software; I suspect a PC user would have similar complaints about it not behaving like PC software. This wouldn't matter if the interface were an improvement over the operating systems – in both there's plenty of room for improvement – but it's not a noticeable improvement. It's simply different, and that slows users down.
1. as a PDF reader
As mentioned above, Digital Editions initially seemed to be a remake of Adobe Reader, which has become hideously bloated with time. The current OS X version of the software is 108Mb; it's a slow program. While I look at a fair number of PDFs on a daily basis, I've long since stopped using Acrobat in any of its forms if I don't have to; Apple's Preview application is much faster and delivers almost all the functionality that I want out of a PDF reader. I suspect most other Mac users do the same. Acrobat can be useful if you're doing print pre-press work or working with forms, but neither of those are things I do that often.
Digital Editions does work as a PDF viewer. It's based around a library concept, so every time you open a PDF in DE, an image of the front page is saved in the library; you can click on this image to open it. Once you have a PDF, you can look at it as a single page, as a double page (even if the PDF hasn't been set up for this), at the width of the screen, or with a zoom widget that lets you use 18 levels of zoom from 87% to 919%. Here's how a PDF from /ubu editions looks:
Digital Editions is clearly built around a different PDF rendering engine than the rest of Adobe's software. (The FAQ explains that this engine was designed to be used on cellphones.) Image quality is noticeably worse than in Acrobat or Preview. Text is poorly aliased, and spacing between characters seems to be off for some fonts at some zoom levels. Graphics are notably grainy, and weird rendering artifacts sometimes show up. (In the image above, for example, note that there's a light blue rectangle under the text on the left. This doesn't show up in any other PDF viewer.) Some PDFs have extras that shouldn't have been there, blocks of background color, for example. One illustration of the color picker in the Sophie help PDF I made a couple weeks back turns a lovely shade of purple:
This is frustrating: one of Adobe's chief selling points of PDF as a format has been that a PDF will look the same on every machine in every viewer. Not this one. Adobe offers sample PDFs for download at their Digital Editions website (see below), which are similarly perplexing. Although these appear to be ordinary PDFs (with no restrictions), they don't behave like regular PDFs. They can't be opened in any PDF viewer that's not Digital Editions. Preview shows only blank pages; opening them in the current Adobe Reader takes you to a webpage where you can download Digital Editions; and opening them in an older version of Acrobat brings up a message asking whether I'd like to learn more about documents protected with Adobe DRM. Clicking yes takes me to a pre-Digital Editions Adobe ebooks page. PDFs have become popular because they can be used in a variety of ways across a variety of platforms. This seems like a significant step backwards for Adobe: interoperability is taking a back seat to DRM.
2. as an EPUB reader
But Digital Editions isn't only a PDF viewer; it's also a viewer for EPUB format. EPUB is the work of the IDPF; it's essentially an XHTML format for ebooks. You can get sample EPUBs from Adobe's website. If you have the latest version of Adobe InDesign, you can make them yourself (more about that in a bit). Here's the front page of their edition of Alice's Adventures in Wonderland:
Perhaps not surprisingly for an XHTML format, the experience of reading an EPUB in Digital Editions is similar to reading a web page. The text becomes as wide as the Digital Editions window; if the window is wide enough, the text may reflow into more columns. When this happens is unclear to me: in some books, the text column is much too wide to read well before the text is reflowed:
You can choose between 4 different font sizes; you can't change the fonts. (Some EPUB books include their own fonts; some use system fonts.) As in the Digital Editions PDF viewer, there's some bookmarking capability: you can select text and click "Add bookmark" to add a note at a particular point in the text. Books have tables of contents; there's a search function. You can print books (or, on a Mac, convert them to PDFs); this seems to be in two columns by default.
That seems to be all you can do with these books. The books that Adobe provides are noticeably ugly: most of the graphics included are low resolution. Text looks weirdly bad: in the default font, the italic text seems to actually be slanted roman characters, which you'd think Adobe would be embarrassed about. To my eye, the text looks much better in Safari or even Firefox. You can make this comparison if you rename the .epub file .zip and unzip it; in the resulting folder, you'll find a bunch of HTML pages, the images used, and fonts, if they're included.
Adobe trumpets the one-click creation of EPUB files in the new version of InDesign. So I fired up InDesign and made some EPUBs to see how those worked in Digital Editions. Try for yourself: here is a version of the Sophie help PDF in EPUB format. The results are a bit disappointing: all the graphics have been dumped at the end of the document, much of the formatting has been lost, and the table of contents I laboriously set up for PDF export has been eliminated. One-click conversion evidently doesn't allow exporting the fonts the document uses; and even though I have the Avenir and Scala fonts on my machine, it displays in the default Digital Editions font. The graphics do display in their real color, which is more than you can say for the way Digital Editions handles the PDF, though many of them do seem to have been converted to JPEGs in a lossy way.
As a whim, I fed InDesign's converter some foreign-language poetry to see how it would handle Unicode text. French came through okay. Lithuanian was mangled beyond recognition. Some Chinese poetry didn't work at all:
It's clear that this needs a lot of work before it can be taken seriously.
3. as a store
From Adobe's press release, it's clear that the main impetus behind Digital Editions is to provide a local front end for web-based selling of ebooks. The model that Adobe is working on becomes apparent when you open it up: the program maintains a library of all the PDF files you look at, in the same way that iTunes maintains a library of the MP3s on your computer:
Categories of books (on the left in the above screenshot) include "Borrowed" and "Purchased". The iTunes model of incorporating a store in software isn't necessarily a bad one: Linotype has embedded a font store in their free font management software, with some degree of success. It's hard to tell how well Adobe's integration will work. They've tried selling ebooks before with little success; I have a couple of PDFs bought from Amazon that I've long since despaired of ever opening again. (Some progress may be reported: clicking on these now now opens DigitalEditions, where I get a different cryptic error than I did before in Acrobat.) The same sort of problems are likely with ebooks designed for DigitalEditions; it does worry me that even PDFs without DRM can't be opened outside of the software.
DRM are probably the logical place to end this overlong review. One of the major reasons that we haven't spent much time covering the efforts of the IDPF is that it's devoted to standards that satisfy producers rather than consumers; many producers are concerned with locking down their products as thoroughly as possible. It may be a reasonable position from their perspective, but it's resulted in products that aren't particularly useful to consumers. DigitalEditions looks like it might be a big piece in the puzzle for DRM-focused producers. Unfortunately, readers are being neglected.
adobe acrobat 8 is probably not for you 09.20.2006, 12:30 PM
Adobe just announced the release of Acrobat 8, their PDF production software. To promote it, they hosted three "webinars" on Tuesday to demonstrate some of the new features to the interested public. Your correspondent was there (well, here) to see what glimmers might be discerned about the future of electronic reading.
Who cares about Acrobat?
What does Acrobat have to do with electronic books? You're probably familiar with Acrobat Reader: it's the program that opens up PDFs. Acrobat is the "author" program: it lets people make PDFs. This is very important in the world of print design and publishing: probably 90% of the new printed material you see every day goes through Acrobat in some form or another. Acrobat's not quite as ubiquitous as it once was – newer programs like Adobe InDesign, for example, let designers create PDFs that can be sent to the printer's without bothering with Acrobat, and it's easy to make PDFs out of anything in Mac OS X. But Acrobat remains an enormous force in the world of print design.
PDF, of course, has been presented as being a suitable format for electronic books; see here for an example. Acrobat provided the ability for publishers to lock down the PDFs that Amazon (for example) sold with DRM; publishers jumped on board. The system wasn't successful, not least because opening the locked PDFs proved chancey: I have a couple of PDFs I bought during Amazon's experiment selling them which, on opening, download a lot of "verification information" and then give inscrutable errors. In part because of these troubles, Amazon's largely abandoned the format – notice their sad-looking ebook store.
Why keep an eye on Acrobat? One reason is because Acrobat 8 is Adobe's first major release since merging with Macromedia, a union that sent shockwaves across the world of print and web design. Adobe now releases almost most significant programs used in print design. (A single exception is Quark XPress, which has been quietly rolling away towards oblivion of its own accord since around the millennium.) With the acquisition of Macromedia's web technologies – including Flash and Dreamweaver – Adobe is inching towards a Microsoft-style monopoly of Web design. In short: where Acrobat goes is where Adobe goes; and where Adobe goes is where design goes. And where design goes is where books go, maybe.
So what does Acrobat 8 do?
Acrobat 8 provides a number of updated features that will be useful to people who do pre-press and probably uninteresting to anyone else. They've made a number of minor improvements – the U. S. government will be happy to know that they can now use Acrobat to redact information without having to worry about the press looking under their black boxes. You can now use Acrobat to take a bunch of documents (PDF or otherwise) and lump them together into a "bundle". All nice things, but nothing to get excited about. More DRM than you could shake a stick at, but that's to be expected.
The most interesting thing that Acrobat 8 does – and the reason I'm bringing this up here – is called Acrobat Connect. Acrobat Connect allows users to host web conferences around a document – it was what Adobe was using to hold their "webinars". (There's a screenshot to the right; click on the thumbnail for the full-sized image.) These conferences can be joined by anyone with an Internet connection and the Flash plugin. Pages can be turned and annotations can be made by those with sufficient privileges. Audio chatting is available, as is text-based chatting. The whole "conversation" can be recorded for future reference as a Flash-based movie file.
There are a lot of possibilities that this technology suggests: take an electronic book as your source text and you could have an electronic book club. Teachers could work their way through a text with students. You could use it to copy-edit a book that's being published. A group of people could get together to argue about a particularly interesting blog post. Reading could become a social experience.
But what's the catch?
There's one catch, and it's a big one: the infrastructure that Acrobat 8 uses: you have to use Adobe's server, and there's a price for that. It was suggested that chat-hosting access would be provided for $39 a month or $395 a year. This isn't entirely a surprise: more and more software companies are trying to rope consumers into subscription-based models. This might well work for Adobe: I'm sure there are plenty of corporations that won't balk at shelling out $395 a year for what Acrobat Connect offers (plus $449 for the software). Maybe some private schools will see the benefit of doing that. But I can't imagine, however, that there are going to be many private individuals who will. Much as I'd like to, I won't.
I'm not faulting Adobe for this stance: they know who butters their bread. But I think it's worth noting what's happening here: a divide between the technology available to the corporate world and the general public, and, more specifically, a divide that doesn't need to exist. Though they don't have the motive to do so, Adobe could presumably make a version of Acrobat Connect that would work on anyone's server. This would open up a new realm of possibility in the world of online reading. Instead, what's going to happen is that the worker bees of the corporate world will find themselves forced to sit through more PowerPoint presentations at their desks.
While a bunch of people reading PowerPoint could be seen as a social reading experience, so much more is possible. We, the public, should be demanding more out of our software.
google offers public domain downloads 08.30.2006, 5:41 PM
Google announced today that it has made free downloadable PDFs available for many of the public domain books in its database. This is a good thing, but there are several problems with how they've done it. The main thing is that these PDFs aren't actually text, they're simply strings of images from the scanned library books. As a result, you can't select and copy text, nor can you search the document, unless, of course, you do it online in Google. So while public access to these books is a big win, Google still has us locked into the system if we want to take advantage of these books as digital texts.
A small note about the public domain. Editions are key. A large number of books scanned so far by Google have contents in the public domain, but are in editions published after the cut-off (I think we're talking 1923 for most books). Take this 2003 Signet Classic edition of the Darwin's The Origin of Species. Clearly, a public domain text, but the book is in "limited preview" mode on Google because the edition contains an introduction written in 1958. Copyright experts out there: is it just this that makes the book off limits? Or is the whole edition somehow copyrighted?
ted nelson & the ideologies of documents 10.31.2005, 4:59 PM
I. Nelson's criticism
Ted Nelson (introduced last week by Ben) is a lonely revolutionary marching a lonely march, and whenever he's in the news mockery is heard. Some of this is with good reason: nobody's willing to dismantle the Internet we have for his improved version of the Internet (which doesn't quite work yet). You don't have to poke around too long on his website to find things that reek of crackpottery. But the problems that Nelson has identified in the electronic world are real, even if the solutions he's proposing prove to be untenable. I'd like to expand on on one particular aspect of Nelson's thought prominent in his latest missive: his ideas about the inherent ideologies of document formats. While this sounds very blue sky, I think his ideas do have some repercussions for what we're doing at the Institute, and it's worth investigating them, if not necessarily buying off on Xanadu.
Nelson starts from the position that attempting to simulate paper with computers is a mistaken idea. (He's not talking about e-ink & the idea of electronic paper, though a related criticism could be made of that: e-ink by itself won't solve the problem of reading on screens.) This is correct: we could do many more things with virtual space than we can with a static page. Look at this Flash demonstration of Jef Raskin's proposed zooming interface (previously discussed here), for example. But we don't usually go that far because we tend to think of electronic space in terms of the technology that preceded it – paper space. This has carried over into the way in which we structure documents for online reading.
There are two major types of electronic documents online. In one, the debt to paper space is explicit: PDFs, one of the major formats currently used for electronic books, are a compressed version of Postscript, a specification designed to tell a printer exactly what should be on a printed page. While a PDF has more functionality than a printed page – you can search it, for example, and if you're tricky you can embed hyperlinks and tables of content in them – it's built on the same paradigm. A PDF is also like a printed page in that it's a finalized product: while content in a PDF can be written over with annotations, it's difficult to make substantial changes to it. A PDF is designed to be an electronic reproduction of the printed page. More functionality has been welded on to it by Adobe, who created the format, but it is, at its heart attempting to maintain fidelity to the printed page.
The other dominant paradigm is that of the markup language. A quick, not too technical introduction: a markup language is a way of encoding instructions for how a text is to be structured and formatted in the text. HTML is a markup language; so is XML. This web page is created in a markup language; if you look at it with the "View Source" option on your browser, you'll see that it's a text file divided up by a lot of HTML tags, which are specially designed to format web pages: putting <i> and </i> around a word, for example, makes it italic. XML is a broader concept than HTML: it's a specification that allows people to create their own tags to do other things: some people are using their own version of XML to represent ebooks.
There's a lot of excitement about XML – it's a technology that can be (and is)bent to many different uses. A huge percentage of the system files on your computer, for example, probably use some flavor of XML, even if you've never thought of composing an XML documents. Nelson's point, however, is that there's a central premise to all XML: that all information can be divided up into a logical hierarchy – an outline, if you will. A lot of documents do work this way: book is divided into chapters; a chapter is divided into paragraphs; paragraphs are divided into words. A newspaper is divided into stories; each story has a headline and body copy; the body copy is divided into paragraphs; a paragraph is divided into sentences; a sentence is divided into words; and words are divided into letters, the atom of the markup universe.
II. A Victorian example
But while this is the dominant way we arrange information, this isn't necessarily a natural way to arrange things, Nelson points out, or the only way. It's one way of many possible ones. Consider this spread of pages (double-click to enlarge them):
This is a title page from a book printed by William Morris, another self-identified humanist. We mostly think of William Morris (when we're not confusing him with the talent agency) as a source of wallpaper, but his work as a book designer can't be overvalued. The book was printed in 1893; it's entitled The Tale of King Florus and the Fair Jehane. Like all of Morris's books, it's sumptuous to the point of being unreadable: Morris was dead set on bringing beauty back into design's balance of aesthetics & utility, and maybe over-corrected to offset the Victorian fixation on the latter.
I offer this spread of pages as an example because the elements that make up the page don't break down easily into hierarchical units. Let's imagine that we wanted to come up with an outline for what's on these pages – let's consider how we would structure them if we wanted to represent them in XML. I'm not interested in how we could represent this on the Web or somewhere else – it's easy enough to do that as an image. I'm more interested in how we would make something like this if we were starting from scratch & wanted to emulate Morris's type and woodcuts – a more theoretical proposition.
First, we can look at the elements that comprise the page. We can tell each page is individually important. Each page has a text box, with decorative grapevines around the text box; inside the text box, the title gets its own page; on the second page, there's the title repeated, followed by two body paragraphs, separated by a fleuron. The first paragraph gets an illustrated dropcap. Each word, if you want to go down that far, is composed of letters.
But if you look closer, you'll find that the elements on the page don't decompose into categories quite so neatly. If you look at the left-hand page, you can see that the title's not all there – this is the second title page in the book. The title isn't part of the page – as would almost certainly be assumed under XML – rather, they're overlapping units. And the page backgrounds aren't mirror images of each other: each has been created uniquely. Look at the title at the top of the right-hand page: it's followed by seven fleurons because it takes seven of them to nicely fill the space. Everything here's been minutely adjusted by hand. Notice the characters in the title on the right and how they interact with the flourishes around them: the two A's are different, as are the two F's, the two N's, the two R's, the two E's. You couldn't replicate this lettering with a font. You can't really build a schema to represent what's on these two pages. A further argument: to make this spread of pages rigorous, as you'd have to to represent it in XML, would be to ruin them aesthetically. The vines are the way they are because the letters are the way they are: they've been created together.
The inability of XML to adequately handle what's shown on these pages isn't a function of the screen environment. It's a function of the way we build electronic documents right now. Morris could build pages this way because he didn't have to answer to the particular restraints we do now.
III. The ideologies of documents
Let's go back to Ted:
Nearly every form of electronic document- Word, Acrobat, HTML, XML- represents some business or ideological agenda. Many believe Word and Acrobat are out to entrap users; HTML and XML enact a very limited kind of hypertext with great internal complexity. All imitate paper and (internally) hierarchy.
For years, hierarchy simulation and paper simulation have been imposed throughout the computer world and the world of electronic documents. Falsely portrayed as necessitated by "technology," these are really just the world-view of those who build software. I believe that for representing human documents and thought, which are parallel and interpenetrating– some like to say "intertwingled"– hierarchy and paper simulation are all wrong.
It's possible to imagine software that would let us follow our fancy and create on the screen pages that look like William Morris's – a tool that would let a designer make an electronic woodcut with ease. Certainly there are approximations. But the sort of tool I imagine doesn't exist right now. This is the sort of tool we should have – there's no reason not to have it already. Ted again:
I propose a different document agenda: I believe we need new electronic documents which are transparent, public, principled, and freed from the traditions of hierarchy and paper. In that case they can be far more powerful, with deep and rich new interconnections and properties- able to quote dynamically from other documents and buckle sideways to other documents, such as comments or successive versions; able to present third-party links; and much more.
Most urgently: if we have different document structures we can build a new copyright realm, where everything can be freely and legally quoted and remixed in any amount without negotiation.
Ben does a fine job of going into the ramifications of Nelson's ideas about transclusion, which he proposes as a solution. I think it's an interesting idea which will probably never be implemented on a grand scale because there's not enough of an impetus to do so. But again: just because Nelson's work is unpragmatic doesn't mean that his critique is baseless.
I feel there's something similar in the grandiosity of Nelson's ideas and Morris's beautiful but unreadable pages. William Morris wasn't just a designer: he saw his program of arts and crafts (of which his books were a part) as a way to emphasize the beauty of individual creation as a course correction to the increasingly mechanized & dehumanized Victorian world. Walter Benjamin declares (in "The Author as Producer") that there is "a difference between merely supplying a production apparatus and trying to change the production apparatus". You don't have to make books exactly like William Morris's or implement Ted Nelson's particular production apparatus to have your thinking changed by them. Morris, like Nelson, was trying to change the production apparatus because he saw that another world was possible.
And a postscript: as mentioned around here occasionally, the Institute's in the process of creating new tools for electronic book-making. I'm in the process of writing up an introduction to Sophie (which will be posted soon) which does its best to justify the need for something new in an overcrowded world: Nelson's statement neatly dovetailed with my own thinking on the subject on why we need something new: so that we have the opportunity to make things in other ways. Sophie won't be quite as radical as Nelson's vision, but we will have something out next year. It would be nice if Nelson could do the same.