Cost 219ter logo Skip to main content

Cost 219ter

IBM Research's Web Accessibility Project

Jonathan Brezin (brezin@us.ibm.com)
IBM T. J. Watson Research Center
19 Skyline Drive
Hawthorne, NY 10532, USA


Some Background

IBM has had a long interest and involvement in making the Web as broadly accessible as possible. As one might expect, the early efforts were concentrated on the most severe impairments, like blindness. Recently, though, there has been a growing awareness of the importance of meeting the needs of those with relatively mild visual and motor deficits of the kind often encountered by older adults.

The sorts of problems we have in mind are

Visual:
loss of visual acuity, limited color perception, and limited contrast discrimination.

Motor:
tremors, limited flexibility, and limited motor control.

Perceptual:
dyslexia and problems focusing attention.

The first observation one should make is that, if one is to tailor the Web browser user experience to accommodate these sorts of deficits, one must devote at least some effort to software that will run on the client. The obvious case is that of dealing with mouse and keyboard input problems. There are others: when visual loss is so severe that simply magnifying the whole page is no longer a good option, one might want to allow the user to indicate a small part of a page (a picture, for instance) to "super-magnify" in a separate window.

When IBM first tried to implement this support several years ago, there was great pressure to minimize the amount of client software. As much processing as possible was to be done on a proxy server. The rationale was:

Servers are an effectively infinite resource for computing power. For jobs like sharpening magnified pictures or translating text to speech, the extra computing power is particularly attractive.

Rich code libraries are available to do much of the technically difficult work. There are good parsers as well as script engines and programs for patching up bad HTML and scripts. They are necessarily large and might better reside on servers than on every desktop.

It is easier to maintain the code on a relatively small number of servers. The problem of broadcasting new versions of the software to every client is daunting, particularly since many of our clients may be expected to have slow connections to the network.

Unfortunately, reality did not meet expectation. The server based platform turned out to have substantial performance problems and to be error prone. Servers are indeed very much a finite resource, and only a fraction of the difficulties with real sites are handled by general-purpose libraries. The second time around, another team (see the appendix) decided to concentrate on working directly on the client with the browsers and the underlying operating system. This effort was significantly more successful and resulted in the current platform.

Both efforts were supported both by IBM's Research Division and by IBM Corporate Community Relations ("CCR"), which is IBM's philanthropic arm. We also had the help of SeniorNet (www.seniornet.org), a non-profit organization devoted to helping older adults take advantage of the Web. They were our first "customers" and found many interesting pages that helped us refine our transformations, as well as smoothing out our user interface. SeniorNet was also an early distributor of the program, which is now available in the major European and Asian languages, and, through IBM CCR, from a number of other non-profit organizations. The name of the offering is Web Adaptation Technology.


Why this effort, and why now?

It all but goes without saying that using the Web is an absolutely critical part of day-to-day work within IBM itself. And, since IBM is no more immune from the realities of demographics than anyone else, it can reasonably expect that a growing number of its own employees will have to cope with the kinds of problems we set out to address. Thus, in addition to public goodwill, there were sound business reasons for IBM to want better browsers, and it was in the fortunate position of being able to act.

Both IBM and its customers are also affected by increasing regulations on Web content. To the extent that we can make the browser capable of adjusting to the content, we lessen that impact dramatically for ourselves and our customers.

The key point, though, is not just that browsers are supremely important to us. Equally important, browsers come with programming tools that make them amenable to being worked on at reasonable cost.

As an aside, let me say that there are opportunities to serve a similar population's word processing and other general desktop needs. Increasingly, these other desktop applications are opening up programming interfaces and publicly documented output formats, so they, too, are becoming possible to work on.

Cost is not an issue that should be lightly passed over. Were it true that the existing content out on the Web could be economically reworked to accommodate the same broad audience we are targeting, we would have felt far less pressure to work on the browsers themselves. There are many reasons, though, why reworking the content is not merely too expensive, but in a very real sense just not possible. There are the obvious reasons, such as the massive volume of existing content, haphazard ownership of much content, and the problems of dealing with scripting designed to alter content at the client at the time it is rendered. And then there are the not so obvious, like allowing transformation of selected parts of a site by the user on demand. Existing pages are already complex enough without adding the extra burden of scripting required to allow dynamic transformation after the initial rendering.

Finally, further regulatory action aimed at the content providers is not likely to solve the sort of problem we are worried about, either. Regulatory action makes sense when there is a relatively well-defined goal-for example, best effort to minimize the loss of information to those who hear but do not see. Automated checkers make sense when there are a relatively small number of quantifiable parameters to check. For example, does every image have alternate text? What, on the other hand, can an automated checker do to answer whether a given page has adequate contrast, or whether it can be assigned background and foreground colors so that it does have adequate contrast? What regulatory guides would make a difference, without unnecessarily inhibiting page design?

Just to make the point in its extreme, regulators could insist that each HTTP request be accompanied by metadata (presumably in the HTTP header) that describe the requestor's needs according to a Worldwide Web Consortium (W3C) or IETF protocol, and content providers would be compelled to prepare the returned content accordingly. What would be the effect on page design? On throughput at the content server?


There is nothing peculiar about our population

Once one recognizes that even in the case of severe disabilities, some, if not all, of the facilities needed to improve the situation are, appropriately understood, applicable far more broadly, one is in a position to alter radically the economics of providing the support one wants. The blind, for example, are helped if one can attach sound tracks to films to explain what is not clear from that which can be heard in the original. But there are many other reasons why one might wish to attach optionally audible sound tracks to films. Teachers who want to provide commentary or criticism are an obvious example. Another is home movies, where several family members might wish to "add their own two cents" after the fact. A little imagination will surely yield more examples. At the risk of underlining the obvious, the point is that disabilities define niche markets to which some part of the population always belongs, but to which a vastly larger part sometime belongs.

The deficits we are concerned with fit this model well. Anyone who falls and suffers a minor hand or wrist injury might find our keyboard and mouse adjustments useful. Anyone might find a particular site difficult to read because the font used is too small and might want to use our facilities to magnify the page "just enough". There are also sites out there with very poor contrast and color choices. Anyone might want to try some background color changes to read the site more easily. Bad ambient lighting alone might cause someone to want either to magnify the site or to use different background/foreground color choices. So, again: many people all the time, everyone some of the time.


The dimensions of our solution

Let us change viewpoint for a moment and look at the world as it is seen by our program. When someone comes to us for the first time, we are not in a position to anticipate that person's needs, and even one particular deficit may require adjustments to what the operating system and the browsers see as several different parameters. People also, with rare exceptions, are unable to describe their needs in the language that their browser understands. Finally, the generic group "older adults" do not see themselves as "disabled" even should they have one or another problem. In fact, they do not appear to themselves to be significantly different from their peers for the good and simple reason that they are not. What they expect (again, rightly) is a menu of preferences for their browser that is rich enough to accommodate their needs. As we were hammering home above, if our software were indeed the normal preference menu (so everyone knew it was there), it would find many more users than the obvious ones, and the obvious ones, no longer being singled out, might find it more comfortable to use.

As it is, we have a "welcome page" that guides first-time users to our preferences menu. To say that what is at stake is just a preference menu, though, is a bit of an over-simplification. There are too many parameters to adjust, the effects of the adjustments are not always easy to visualize "in the abstract". Therefore we have at once to

The help page for spoken text

Figure 1: The help page for spoken text

What are the adjustments? The answer is easiest to see from our help system, which consists of a family of Web pages, one per related set of adjustments. Figure 1 shows the page for using the spoken text facility (the top half of the display), together with the "band" we provide for choosing one's settings.

On the left side of the page, you can see the various options that are available. This part of the page is common to all of the help pages. You can use the links there to navigate among the options, and as you do, the appropriate choices show up in the band. The arrows on the band also allow one to navigate through the entire list of adjustments, whatever the page is that is visible above it. If a help page does happen to be displayed when an arrow is pressed, the help page for the new option is automatically displayed. Otherwise, the upper part of the display is unaffected. The size of the "band" is also important, because we are catering to a population that may have trouble reading or clicking accurately on normal menus.

If you wish to try out some settings on a particular page, you can go to that site, and then, either by clicking on a toolbar button or using a keyboard shortcut (F12), bring up the band at the bottom of the display. Selecting an option then, like "standard" in the "speak text" menu, immediately turns on that option. Adjustments, once made, become "the default" for all pages until the user actively resets them.

In our example, once standard text-to-speech is chosen, moving the mouse over the displayed page would then cause whatever text the mouse hovers over to be read aloud at "normal" speed. Similarly, options that affect the visual properties of the display are immediately put into effect, so you can see how they work on the page you are interested in. It is important that the band be available with any page visible, because sometimes you just want to play with the settings for that page-for example, if the page has a very bad background image that you want to suppress, or a tiny font you cannot read.

As you can see from the list on the help page, coping with visual deficits is about more than magnification and reading aloud. Even magnification is not a "one-shot deal," if you want to cope well with the broadest spectrum of problems. For some, magnification of the whole display is just what is needed. On the other hand, once one goes beyond a certain point, scaling the whole display leads to the need for horizontal scrolling, which is justly known for making it difficult for people to maintain a sense of context. We therefore provide two alternatives. One is to "linearize" the page, the effect of which is to transform all tables in the page into a single column format, the table cells being ordered left to right, top to bottom. The other is to provide greatly enlarged text phrase by phrase in a separate window-"banner text". While reading aloud might be a more effective answer in this case, there are situations where that is not possible, if only because it would be impolite.

There are other subtleties as well. For instance, if you shop on the Web, uniformly magnifying the pages is probably what you want. But if you read magazines, blogs, or newspapers, you really only want the text magnified, so that you keep as much context on the screen as possible. Besides, many of the "pictures" in this sort of site are advertisements-to be less kind, they are just "noise". We provide a facility for magnifying pictures individually in a separate window, so if you do want to examine one closely, you can simply by hovering over it with the mouse.

Another possibility is that by avoiding fonts that have serifs and by expanding inter-letter and inter-line spacing, the text becomes easier for some to read. Difficulties due to some forms of dyslexia apparently are mitigated by this sort of transformation.

The "large browser" option, which is toward the bottom of the list, points up another problem. It is not just the page that may need magnification, but the browser controls (menus, toolbars, cursor, scrollbar) as well, both for legibility, and for ease of selection by those with problems manipulating a pointing device like the mouse.


The action is all in the client

I want to turn now to the theme of why we work with the browser and do not use a proxy server.

Those familiar with the various browsers and operating systems will have recognized that much of what we offer to adjust are things that are more or less directly available as browser or operating system settings. Our contribution, in this case, is to make these facilities visible and readily usable. This is not a small matter.

Convenience is critical. Changing the default background and foreground colors requires a monumental number of clicks-80 by one count. Even if it were true that only a dozen were needed and that at each stage in this process, the next "click" to make were clear, who would "just try it?" And how many of those courageous enough to have done so once would do so a second time? Surely no one who just wants to clean up one particular page!

It is not that easy to implement. There are a variety of places where the information is kept, and a variety formats are used, as well. In addition, these conventions vary from browser to browser and operating system to operating system. Some adjustments are done by inserting style sheets or scripting into the source, some by updating plain text files such as ".ini" files or ".js" files, yet others are done by altering values in an operating system maintained "registries," which are proprietary databases for tracking user state. Worse: sometimes one has to work with a combination of these resources to get the desired effect.

No one would have suggested that facilities that are already implemented on every client be moved to a server. Consider, though, a sort of "halfway case," where what is needed is to introduce a style sheet in front of the source before the browser tries to render the display. One example would be replacing fonts in the document with easier to read ("sans-serif," normally) fonts. That would seem a natural task for a proxy server. Yet it is not. For one thing, styles already present in the source document may interact with the inserted sheet in undesirable ways. It is therefore be necessary to examine the entire source to clean up the local styles. Again, one might naively think that this is a good job for a server, particularly because a significant amount of processing is involved. In practice, as we know from having tried this approach, this is not so. There are at least three problems.

It is increasingly the case that the content rendered is generated dynamically by scripts executed on the client by the browser. The server must mimic that functionality, which, since the various browsers do not even do the same things with the same scripts, is an error-prone task at best.

A horrifying percentage (about half, in our experience) of Web pages have either incorrect HTML or scripts or both. For the server to deal with these errors, it must react to them in a way consistent with how the various browsers will. In most cases, this is easy enough, but in many it is not, and once again, we are left with an error-prone solution.

Finally, the server must be aware of settings on the client that may cause the browser to alter a page at the client.

The bottom line here is simple: render unto the browser that which is the browser's. Let the browser process the source however it will, and then operate on the resulting internal data structures, the Document Object Model (DOM). It is the DOM, after all, that will be directly displayed by the browser. You are therefore guaranteed that, whatever browser you are working with, and whatever errors the source might have contained, you are transforming what is actually going to be displayed. The programming interfaces provided by the major browsers, while not perfect, are essentially adequate for this purpose and are far less error-prone to use than going it on one's own at the server.

Another sort of problem is posed by the need for our software to be aware of the user's mouse use and key clicks. (When banner or spoken text is requested, for example, our code has to react to the mouse "hovering" over some section of the document, so that we can magnify that text or read it aloud.) This can be accomplished at the server by adding scripting to the source, but such an approach is dangerous, because one has to be very careful that one's own scripts do not interfere with, or interact in unsuspected ways with, similar scripting inserted by the page's authors. And remember that source one is inserting one's script into may have to be dynamically generated from scripts in the original at the server. The programming interface provided for the DOM is a much safer and easier way to accomplish the same task.

There are two other transformations that are easier to handle by working with the DOM.

Linearization: replacing rectangular frame and table layouts with a single-column layout. The DOM may be thought of as a tree in the sense of graph theory, and the transformation is a simple reconfiguration of that tree.

Breaking up long text blocks: Long blocks of text that contain no HTML markup are a problem for us. (This occurs mostly in blogs, technical reports-like this one-and magazines.) The programming interfaces we have do not give us good information about where in such a block of text the mouse might be. That is a problem for banner and spoken text delivery, because we want to process what is near the mouse. We supply ourselves the necessary information by inserting HTML markup (<SPAN> elements) to break up the text without affecting the final display.

Even if we were able to eliminate the errors and minimize the processing load on the server, proxy servers would still present problems.

Copyrights and fair use: It is one thing to say that proxies may legally transform and cache pages if the sole purpose is to help "the disabled". It is quite another to stick to that argument if "the disabled" is just about everyone at some time or other.

Special image types: There are ticklish issues with some medical and other scientific images, where the authors may rightly not want any transformations done, because of the possibility of misinterpretation. For example, "improving" an image may make it seem misleadingly precise. By pushing the problem onto the client, we are at least in a situation where the person viewing the image is more likely to be aware that it has been altered at his/her request. Preventing transformation altogether is equally difficult in either venue: we need a better protocol.

Secure transmissions: No service provider is going to want to be responsible for having a customer's "secure" transmission available in the clear on its hardware, no matter for how short a time. Therefore transforming a returned page to be more easily readable has to happen at the client. Thin proxies are available that run on the client, but then you are working on the client anyway, so why bother?

Proxies are not meant to cascade: The design of proxy servers was never meant to make it easy to interpose a whole series of proxies between the end-user and the content server. Since many potential users are already behind proxies set up either by their internet service providers or corporate intranet, it is at best inconvenient to interpose another layer of proxy.


Where from here?

We feel that we have achieved a reasonable mix of features in view of the cost and complexity of the software as it now stands. There remain serious problems with the robustness of the transformations for a variety of reasons, including the complexity of pages, the unpredictable interactions of one part of a page with another, and errors (most minor, but errors nonetheless) in the browser implementations themselves. We, of course, will continue to improve on this score. With the increasing use of new protocols, particularly the increasing use of Macromedia's Flash, whole new sets of problems present themselves. We do not feel we have enough experience yet to have a clear view of what is needed there, if anything. One thing we do know: we ourselves will have to read those new pages and we will definitely be alert to what gives us trouble!


Appendix: The Contributors

Two groups in IBM Research put together the Web Adaptation Technology product. The originator and lead of the project was Vicki Hanson. The members of her group, which specializes in education and accessibility, were Susan Crayne, Beth Tibbitts, and Sharon Trewin. The other half of the team was part of group led by John Richards, and consisted of John himself, Calvin Swart and myself. This group's normal responsibility is the design of user-interfaces for client-server applications, with a particular interest in "unusual" clients, such as hand-held devices.

We all work at the T. J. Watson Research Center in Hawthorne, New York.

 

Next Page

 

 

Last updated: 20.11.2007    © Copyright reserved