Teleshuttle home - CoTV home - Concept - Tech - Cable/SatMediaCenter - CoTV Ads - Users - Why CoTV/FAQ - About Teleshuttle

Coactive Television – Interactive TV, the Web, and "Co-vergence"

A User and Content-Centric Re-visioning

A White Paper by
Richard R. Reisman, President, Teleshuttle Corporation
September, 2002

CoTVLogo.jpg (3779 bytes)

Part 1
Concept Summary
The need for coactivity

Part 2
Enabling coactivity
Building on coactivity
Background and call to action

Part 1 of this White Paper is provided below

The complete White Paper with Part 2 (originally available only on request), is here

Concept Summary

  1. People want to interact with content – devices matter only as the interfaces to content.
  2. Viewers most often want to interact with content related to what is on TV, not with what is on TV.
  3. Sometimes you feel like leaning back, sometimes you don’t.
  4. Technology should empower, not confine.

Current efforts at interactive television (ITV) have been crippled by simple misunderstandings and failures of the imagination.

One of the key misunderstandings relates to what interactive TV is. The skeptics are partly right, that TV is largely a passive medium, and interactive TV is something of an oxymoron. But they are wrong in missing the point that many viewers often do want to interact with content that is related to what is on their TV. The term "coactive TV" directs our thinking to this experience of media multitasking, to how we can interact with Web-like hypermedia content in full coordination with what we watch on TV. The technology challenge of interacting with our TV is a red herring--a problem we need not solve. The problem we do need to solve is how we can create this coactive media experience. That is actually less daunting, and largely a matter of software.

Victims of this misunderstanding, TV and PC industry players have struggled to redefine and extend their offerings to add TV-related interactivity. This has failed to capture the imagination of viewers because they have battled over their boxes instead of thinking outside of them. TV and computers are converging into a common core technology, but we still need to live with the fact that they will not converge into a single presentation device, at least not any time soon.

To provide a powerful user-centric interactive TV experience, we must look outside our current boxes to "co-vergence" -- TV and PC device sets must work together in seamless coordination -- to provide a powerful and unified interactive experience (see Figure 1). When we shift to that perspective, it becomes apparent that a rich second-generation ITV service is achievable now, with simple software, using existing TV and PC systems. It will get even better and more flexible with more advanced hardware, but the rich coordination needed to make ITV available and useful can be done with the equipment that tens of millions of people have in their homes right now.

CoTV.gif (9688 bytes) Fig. 1

This paper explores the ideas of coactivity and co-vergence, what they enable now, and how they will evolve. It centers on the perspective of the user, and shows how serving the user supports the profit and service objectives of the key industry players.

The key points are:

The first generation of ITV has been hobbled by misunderstanding, resulting in the inability of either TV or PC industry players to find a value proposition that serves both themselves and the consumer. CoTV™ provides a simple new context, for a win-win collaboration that allows all the players to cooperate in bringing value to the user, and to share in profiting from the powerful new services and related commerce opportunities that will create. It is a collaboration that can bear fruit now, and grow rapidly to embrace a larger market that will demand increasingly advanced products and services.


What is interactive TV?

Ask three media people what interactive TV is and you will get at least four answers. The term has been a catch-all. Few have really been clear about the fact that there are three very different kinds of interactivity.

The proponents of ITV and the nay-sayers are both partially correct, but both seem to miss the opportunity that coactivity offers. The rest of this paper is oriented to re-visioning the part of this misunderstood beast - coactive TV – what it can be, and how we can enable it.

Chickens, eggs, and set-top boxes

At the height of the digital media boom of the late ‘90s, it appeared that the long-heralded convergence of TV and computer-based media was within reach. Interactivity and "t-commerce" were seen as gold mines, and the TV industry was preparing to deploy a new generation of advanced digital STBs that were to enable this new age of interactive TV. Much of the cable plant had been upgraded for two-way digital services, and first generation digital STBs were deployed to some 15 million homes, but those boxes lacked the computer power for more than the most basic interactivity. Another round of investment was needed.

As the cold light of skepticism and tight capital replaced these rosy predictions, the cable industry pulled back on plans to deploy advanced STBs. We now face what is often called "the chicken and egg problem" -- which comes first?: how can we justify advanced STBs without demonstrable demand for advanced services, and how can there be demonstrable demand if there are no advanced STBs to enable consumers to try and buy services they are do not yet understand.

Given this current impasse, the industry is trying to find other services that can generate revenues with current digital STBs now, and begin to offer a taste of interactivity to prepare the market for more. These include interactive services that are structured to draw on intelligence at remote head-end servers to support simple t-commerce, games, and such, and other limited forms of interactivity such as video on demand (VOD). Without thinking in these terms, the industry came to understand that interactivity with the TV set was valuable and readily enabled, and that has become the primary focus for now.

One-box, Two-box, Inter-box, Any-box

But the call of interactivity with content still beckons. Some programmers have sought to avoid this problem of the STB by moving these interactions to the Web. This potentially serves the roughly 50 million households that have a PC in the same room as the TV. For selected shows such as Jeopardy or Monday Night Football, a viewer can go to a Web URL for the program. Upon logging in and indicating the time zone, Web pages can then be provided in synchrony with the TV program. This "two-box," "two-screen" ITV scheme, often called Enhanced or Extended TV (ETV), and also known as "synchronized TV" or "telewebbing" offers considerable flexibility by being able to provide any kind of Web content with the full interactive user interface of a PC browser, but the coordination with the TV is very limited and must be established by the viewer every time a program is tuned in. The Web enhancements are synchronized globally (by time zone) and thus not applicable to VOD or time-shifted programming, or other individual variations such as addressable ads.

The "one-box," "one-screen" solution of interactivity on the TV screen and driven by the STB is far more seamlessly coordinated, since the interactive elements are presented by a browser within the STB that is fully aware of the program content and any interactive elements are tightly associated with that. There is no URL to enter. If a program has interactive enhancements, they are at hand and ready to interact with. If the viewer changes channels, the STB knows it, sees the corresponding interactive elements, and interaction is directly coordinated. If a Digital Video Recorder is used to pause, rewind, or defer viewing, the interactive elements can retain their synchronization. Current two-box technology cannot accommodate any such variations. For reasons such as these, much of the industry regards "two-box" solutions as just a stopgap, one that may be more or less useful to get limited interactivity while we wait for an effective "one-box" solution. The industry seems to have settled into the camps of those cautiously exploiting the two-box stopgap and those who ignore it as a temporary distraction.

This is where the industry imagination has fallen short, and the user perspective has been missed. In a simplistic view, one-box ITV is convenient to the user, and two-box reliance on a separate PC is awkward. What this simplistic view forgets is that

The user does not care what comes from the cable or satellite or broadcast TV source or what comes from the Internet. The user does not care how the boxes might be connected and coordinated. All he cares about is the functionality of his viewing experience. Once the user becomes familiar with ITV and the various kinds of tasks he can do, he will want to be able to decide which box to interact with and when, and will want to be able to use any intelligent box he has and deems useful ("any-box"). He will want his boxes to be coordinated so that what he does at one box can be reflected at the other box. He will want to decide when to work on one box, such as for a few quick interactions, and when to shift the activity to another box, such as for more intensive interaction ("inter-box"). By failing to recognize this as coactive TV, the nature of this multi-tasking coordination has been missed.

This any-box/inter-box coordination of ITV, this coactive TV, may sound like idealistic futurism, but it is not difficult to do. It can be made available not in years, but in months. Much of this capability can be achieved with no new hardware in the home, just by downloading some software and providing some simple server support, at a relatively modest investment. More advanced coordination will come as home networks make the linkage even easier.

The need for coactivity

Sometimes you feel like leaning-back; sometimes you don’t

Before outlining the technical aspects of just how this coordination can be achieved, it may be helpful to look more deeply why this capability is so important to enabling really powerful and user-friendly ITV and other similarly advanced interactive hypermedia. The short-term chicken and egg problem with one-box ITV set-tops is well recognized in the industry, but the deeper limitations of one-box TV interactivity are not.

Most people became familiar with interactive media when the Web became popular in the mid- to late-‘90s. We learned the wide range of tasks, services, and communications that can be facilitated using a fairly simple and powerful Web browser user interface, combined with the open linking of content and services on the Internet. We became familiar with high-resolution computer monitors, keyboards, and mice or similar pointing devices. Few of us remember the early days of interactive services, starting decades ago with much cruder screens and pointing devices like cursors and arrow keys. Services like videotex, teletex, and the early Prodigy service. Clunky text, displayed fuzzily on TV screens using Commodore 64s and Apple II’s. Tabbing line-by-line through menus. The reason few remember that is that few were willing to use such awkward and restricted systems. Now we are being invited back to those glorious days of yesteryear. It is déjà vu on the interactive TV screen.

Forgetting the constraints technology may or may not put on how we do it, consider the essential human experience of ITV and related media browsing activities. We work with various input/output devices, having various form factors, that let us see and hear video, audio, text, and other kinds of data as outputs, and to point, select, command, and enter text or other data as inputs. Let’s call the group of input/output (I/O) devices we work with on any given task a device set (see Figure 1). What is important from this perspective is not the STB or the PC or any other controller system, but the fact that these device sets, however they are driven, serve as the user’s window into a virtual world of media content, wherever it may be.

In thinking about user interfaces and the device sets we need to embody them, it becomes apparent that these differences in lean-back and lean-forward device sets are fundamental to the kind of user interface device technologies can expect to rely on for many years to come. Someday we may have heads-up displays, head mounted goggles, and data gloves or even bionic interfaces that can present images of any format and sense rich gestures from any posture, but that day is probably many years off, at least for most purposes. High definition (HDTV), big-screen TV monitors and wireless keyboards could make working with text across a room a bit more workable, but that will never beat a lean-forward screen for extended intensive use. Similarly, we can get HDTV on an up-close PC monitor (or a second such monitor), but that will rarely be the ideal way to kick back and relax with a movie.

ITV and the spectrum of interactivity

But wait, some would say – lean-back interactivity may be limiting, but isn’t ITV a matter of simple interactions that do work on the TV-type device set? Aren’t current ITV services carefully designed to use few short menus, minimize text entry, and allow useful tasks to be completed with a few clicks. Doesn’t that mean the one-box solution will be just fine?

Only in part. Current ITV services seem to be repeating the early evolution of computer interactive services. Simple, limited UIs favor simple, limited services. More complex services can be attempted, but do not work well. Some limited market develops for the simple services, however slowly. But as soon as a better UI is available, the floodgates open, and the latent demand for richer services causes a quantum leap in the richness and variety of services offered. (AOL and CompuServe WinCIM made one such step, and Netscape made another.) Simple services become more polished and complex services are no longer hobbled. Online environments that could not keep pace dropped away, as those that did flourished.

Current one-screen ITV functions much like the early ‘90s pre-Web version of Prodigy, with simple walled garden services not much richer than the simple videotex systems of the ‘70s (Prodigy used the same videotex presentation technology, called NAPLPS). We are so inured to the current limitations that few look ahead to what ITV could become with an effective suite of UI devices. Just as the Web quickly took on a depth and richness and sophistication far beyond Prodigy, ITV can evolve to have a similar richness and power.

WebTV unwittingly demonstrated the power of a good interactive user interface by doing the reverse adaptation, degrading the Web to work on a TV screen. The inherent limitations of that adaptation are recognized as a major cause of WebTV’s failure. But cable players still talk of similar Web and PC services driven by their STBs. When all you have is a hammer, you can use it to drive a screw, but that will not be very satisfying if you have ever used a screwdriver.

Similarly, this simplistic form of TV-related interaction limits the vast potential of rich and open enhancement content. The Web moved online interactivity beyond simple and limited sets of screens with a narrowly programmed content theme, to deliver flexible, user-controlled linkage to a vast variety of content that could follow tangent upon tangent, giving us the concept of Web surfing (including both its casual and serious forms). ITV can extend that evolution into coactivity with a similarly open (and far more rich) hypermedia space of content related to an evolving, personally shaped video experience.

As an aside on industry dynamics, TV distributors, programmers, and advertisers may be concerned that greater richness will cause a leakage of their viewers away from the content they control and derive revenue from. However, the upside of this can bring even greater potential to attract, keep, and profit from viewers who find the base TV program to form the center of a truly compelling multimedia experience. ITV will have limited appeal as long as it is confined to the trivial: simple trivia contests, play-alongs, and spoon-fed content. The TV experience becomes far richer and more valuable, and can build enthusiastic community bonds, when it integrates with deep and rich content and services. Similarly, advertising can go beyond simplistic responses to offer a fully coordinated suite of rich interactive transactions, demos, tools, configurators, and infomercials (VOD or streamed). Rich opportunities for cross promotion, product placement, and other forms of integrated marketing will emerge as such media develop. In fully open form, such co-vergent services could offer a richness of content greater than any programmer can provide alone. (In deference to the continuity of the viewing experience, such coactive, multi-tasking tangents could also be deferred and held, much like bookmarks, to be pursued after the main program completes.) Nevertheless, the expanded capabilities of the CoTV™ approach proposed here have compelling value even if restricted to a closed, walled-garden service (much as some current Web enhanced TV services such as Disney’s ETV are closed).

The right tool for the job, as the job changes

The right tool for the job depends on what the job is, at any given time. ITV and Web interactive experiences are not inherently different, but lean toward different ends of a single spectrum of interactivity. ITV (and other advanced hypermedia) marries video to the Web (or more limited Web-like walled gardens) to range over the full spectrum of activities from pure lean-back, passive viewing to pure lean-forward, intensive interaction. That interaction may include video (or text) across the room, or in a window on a PC screen.

The job, and thus the right tool, varies over time. TV viewing, browsing, and the combined ITV experience is best understood as an interactive "session" having a sequence of steps. The nature of a session changes over time. Tangential sub-sessions can fork off from an initial session. Complex patterns of multi-tasking can occur, especially as a user becomes more skilled and empowered by rich content presented with powerful viewing tools. Some viewers will remain content with simple sessions, but an increasingly large population of "heavy media users" can be expected to exploit whatever power tools are provided, if they are flexible and well designed.

What is important from these examples is that the determining factor is not whether the activity is "TV" or "Web" or "computer" activity, but where on the spectrum of interactivity it lies (and how tightly it is coupled to the primary program). ITV can be lightly interactive or intensively so. The same applies to Web-based activities and other computer-driven media activities.

Some limited recognition of the need for complementary device sets has been creeping into the visions of the TV and PC industries. Evolution of the home media gateway brings convergence a step closer to the user, moving toward a technology base of common networks, common controls, and shared content access for an entire suite of home media systems, including TV, home theater, music, digital photos, PCs, the Web. Microsoft has recognized that a lean-back user interface for a computer acting as a media system controller is desirable, and is adding a new lean-back "Freestyle" UI to WindowsXP. Motorola has announced a lean-forward Webpad-style device for use with TV, called the EVr Enhanced TV Viewer, and Philips, Universal Electronics, eRemote, and others have proposed screen-equipped remote controls, including versions based on a standard PDA. These devices do not include support for real coordination as described below. These dedicated devices present another chicken and egg problem: we can’t see the need for buying a dedicated device if we can’t really see what it could do for us.

User-centric Co-vergence and Any-box flexibility

The emergence of "two-box" ITV using a PC, and of specialized Webpad crossover devices reflects some basic recognition of the limitations of the "one-box" TV form factor, but none of these efforts has addressed the high level of flexibility required for a truly effective user-centric viewing experience.

Clearly one tool is not right for all ITV tasks. Multi-function devices are well known to have the common problem of doing none of their functions well. But even a set of specialized tools will be awkward to use if it is not easy to shift from one tool to another as the nature of the task changes during the course of a session. Adding more expensive devices to the cost of an ITV viewing system would just bring us back to the chicken and egg problem. What is needed to break the provisioning impasse is a way to use available devices that are at hand, and need not be obtained specifically for use with ITV. (Dedicated ITV devices might find a market later, when users know exactly what they want in such a device and how much they will use it, and can justify paying for a particularly well-suited model to keep by the couch.)

We can view this in terms of a concept that might be called a Multi-Machine User Interface (MMUI). The broad requirement is to be able to use multiple co-vergent device sets in concert to view a single integrated set of content resources. That capability is provided by a MMUI. Such an MMUI could be controlled by a single integrated box that drives a varied suite of lean-forward and lean-back device sets -- that could be a future convergent STB or computer. The problem is that such a convergent box is a long time off, and such a multi-function system may never be economically attractive. So a key technology question is how can simpler, less costly, independent controller boxes (STBs and PCs/PDAs) be made to work in cooperation to provide such an MMUI?


The complete White Paper with Part 2 (originally available only on request), is here


The final concluding portions of Part 2 are included below

Conclusion: Technology should empower us, not confine us

Making coactive TV and other similar advanced hypermedia happen is a major step in the evolution of media, but in terms of technology, it is an incremental one.

  • It does involve changes in user awareness and behavior, new ways of combining and relating content of diverse kinds, and new combinations of distribution, hardware, and software. But it is largely a matter of extension and convergence of current media and systems, not addition of unfamiliar new ones. This becomes clear when we distinguish the key variants that have been confused within the catch-all term "interactive TV."
  • All of the pieces we need to make it happen are at hand, or readily provided, and most of the infrastructure is well known. The costs to begin the introduction of next generation coactive TV services are modest, and can expand incrementally with gradual build-out of infrastructure and content production as demand grows.

Technology convergence is necessary, but not sufficient. ITV is not a matter of boxes, whether TV industry boxes or computer industry boxes. ITV is based on interaction by users, and must deliver what the user wants from a hypermedia experience. Users, and content producers have not been fully aware of what they want, because they have seen so little of a real hypermedia experience, and have failed to see the importance of coactivity. But by stepping back and imagining the ideal hypermedia experience, it becomes clear that it mixes lean-back and lean-forward sub-tasks -- often with distinct, but related content elements -- that can be seamlessly interwoven to work with a rich blend of video, text, and other content. Supporting those sub-tasks is best done with different kinds of device sets working in coordination to present a unified media experience.

This coordination of lean-back and lean-forward device sets stops short of complete convergence, and takes a form we are calling co-vergence. Television and the Web are currently at different ends of a media spectrum. The bands of the spectrum should not be expected to converge into one thing, but to play out as a palette of colors, with different combinations at different times. We need to recognize the nature of multi-tasked coactivity and coactive content. We should not be confined within standalone boxes that can only deal with one band of the spectrum. Sometimes you feel like leaning back, sometimes you don’t.

Set-top boxes and computers are not in a winner-take-all battle of the boxes -- neither can be all things to all people. The sooner we start pulling in the same direction, the sooner a new age of interactivity can be realized and profit all of us.

When Ted Nelson coined the terms hypertext and hypermedia many years ago, he observed that, "everything is deeply intertwingled." He saw that the core task for information and media technology was to facilitate deep interconnection. It is time for us to take another step in that direction, and do what it takes to deal with coactive media.

 Background and call to action

Richard Reisman, President of Teleshuttle Corporation, has developed these ideas of ITV, coactivity, and hypermedia co-vergence over a number of years, and has applied for fundamental patents on key methods that enable it.

Working through Teleshuttle Corporation, he seeks to cooperate with all participants in the industry to apply and extend these methods, assist in the development of reference designs and relevant standards, and to license this technology broadly for widespread use.

Feedback on these ideas is invited. How can we bring key players together and jump-start the revolution? What are the killer apps? Selected contributions will be posted at the Web site.

Richard Reisman can be reached at


Teleshuttle home - CoTV home - Concept - Tech - Cable/SatMediaCenter - CoTV Ads - Users - Why CoTV/FAQ - About Teleshuttle

Copyright 2002, Teleshuttle Corp. All rights reserved. / Patent pending