
09.08.2007
Researchers are finders of useful bits of information. That’s a central part of our job. It’s a job made ever more complicated as digital information supplants printed documentation. It isn’t just the growing volume of information that makes finding things more difficult, although that’s a major part of the challenge. The biggest difficulty stems from the many forms that information now takes: (a.) countless document formats, many of which can only be interpreted by your computer if you have the right software installed; (b.) webs of documents and document fragments linked together with hypertext, often haphazardly; (c.) electronic databases and repositories, many of which can not be accessed without paying a fee or getting special permission; (d.) digital videos, audio recordings, and pictures, few of which are adequately labeled; (e.) identification tags and chips that require special scanners to read; and (f.) information applications and appliances, such as digital maps and geo-location devices, that are only useful once you learn how to interact with them. There is no straightforward way to organize all this material and make it easy to track down.
Even the most adept research ninja regularly encounters difficulties when trying to find that crucial bit of well-documented information within the giant, ever-expanding pile. Most people who try to find information on a regular basis sadly know very little research ninjutsu. They put their faith in a few decent search tools. Unfortunately, a well-designed Web search engine such as Google only catalogues a small minority of the documents that exist on the Web; and although the Web is very large, it contains only a tiny portion of the stockpile of documented information that is available publicly. As a result, most people use the information that is closest at hand, regardless of whether it’s suited to the task. And that drives researchers like myself to despair.
The field of information architecture aspires to overcome these difficulties by making documented information more useful by design. The field encompasses several age-old disciplines, notably library science and records management. It also includes relatively newfangled pursuits dedicated to structuring information and making it easier to use: Web-site design, information-services design, and the study of human-computer interaction. The new label is somewhat apropos because the decisions made by information architects (or their less capable stand-ins) shape the way we interact with documented information.
In recent years, information architects are less inclined to speak in terms of information storage, search, and retrieval. There is a recognition that information isn’t so much stored in well-catalogued repositories as it is scattered hither and thither. More importantly, the change of terminology is taking place because we don’t just look for information: sometimes the information comes looking for us, such as with advertising and so-called “push” news services. Thus, the operative term is findability. If something is findable, it is easy to discover, locate, or navigate. In an ideal world, the information we need the most is very findable, even in those instances when we don’t even know that we need a particular piece of information in the first place.
This is big business because every peddler and soap-box pundit wants to be findable as they attempt to grab attention amid the info-glut. Good findability also creates efficiencies because less time is wasted looking for the right information. If you accept that, in our current era, those with the right information have a commercial or political advantage, then the notion of findability is probably worth paying attention to.

Ambient Findability by Peter Morville (Sebastopol, 2005), pp. xiv, 188.
Enter Peter Morville, who endeavors to take the notion of findability up to the next level with his book Ambient Findability. If something is ambient, it surrounds or envelops us in a seemingly natural, unobtrusive way. It follows that if information is ambient-findable, then it can be accessed where ever and when ever we want, without much effort. This is an ideal that seems unachievable for the vast majority of documented information. Yet, that said, many commonplace types of information can be made ambient-findable. For example, if you want to check sports scores and you have a Web-enabled mobile phone, those scores are only a few keypad clicks away regardless of where you are. This example presumes that the features of your phone are not annoyingly difficult to use, your mobile-phone-service coverage extends to everywhere you travel, and the sports-news service is already bookmarked in the phone’s Web browser. In Morville’s work as a consultant, he tries to help others get all of their ducks aligned so that similar levels of ambient-findability are achieved.
The problem is that, as Morville stresses, information is inherently difficult to organise and find. In real space, we find our way around an unfamiliar place with the help of maps, compasses, visual clues (e.g., landmarks), and so forth. In Cyberspace, such navigational aids are missing for the most part. Indeed, although geographical, trajectory, and container metaphors are used extensively on the Internet, they are usually unhelpful—mostly because they are used carelessly.
The ambiguity of language is just as much of a problem. As Morville writes: “Our language bubbles with synonyms, homonyms, acronyms, and even contronyms (words with contradictory meanings in different contexts such as sanction, cleave, and bi-weekly). And this is before we even talk about the epic number of spelling errors committed on a daily basis.” (p. 51) This limits the usefulness of keyword searches. If you are looking for only a half-decent document to suit your purpose (simple search) or a particular document you know already of (existence search), this method usually suffices. The problem is that keyword searches (no matter how elaborate) can not match the searcher’s meaning with the meaning found within documents. As Morville puts it, there is no way for computers to accurately determine aboutness.
This is the reason why Morville devotes a great deal of his discussion to taxonomies and formalized vocabularies. There is a big push to do a better job of integrating metadata (data about data) into Web documents in order to better identify the content within, plus other document characteristics (e.g., authorship and place of origin). This allows information services, such as search engines, to find and organize information with greater accuracy and efficiency. Some technologies, notably Extensible Markup Language (XML), are improving our ability to do so. The question is whether to allow a single bit of information to be labeled with multiple terms (faceted classification) or whether each bit of information is assigned to one of several mutually exclusive categories (formal taxonomy). The answer depends on usage. Either way, the vocabulary used to describe each class is controlled. Controlled vocabularies require co-operation, often on a large scale. The challenge is that defining taxonomies is inherently political because they determine how we see the world and how we use information. The perspectives and needs of different groups are often in conflict. These problems are not insurmountable, as shown by industrial standards consortia, supply-chain partnerships, and governmental schemes. According to Morville, however, a greater degree of flexibility is required if we are going to make the entire Web easier to navigate.
That is where ontologies and folksonomies come in. Just to be clear, the term “ontology” is not meant in the philosophical sense (i.e., the theory of essence or being). Instead, computer scientists use the term to stand for any method that specifies the relationship between concepts. Of interest is the Resource Description Framework (RDF) that allows information to be labeled in terms of subject, predicate, and object (e.g., Romeo & Juliet—created by—William Shakespeare; Romeo & Juliet—is a type of—text document). Thus, ontologies provide the potential to add greater meaning beyond basic labels. Folksonomies are simply taxonomies that emerge from the ground-up, not from some over-riding co-ordination or unilateral stipulation by an authority. On the Internet, folksonomies emerge as individuals add labels to a bit of information as they see fit (free tagging). This information can be captured by online software (e.g., cataloging bots and spiders used by search engines) or when people voluntarily alert others to the labeled information (e.g., social networking services such as del.icio.us). As all of these activities are aggregated by the software, a category emerges. As the category gets circulated, it is more likely to be used as a label in the future. The category becomes popular. Morville sees great potential for combining ontologies and folksonomies (and conventional taxonomies) to make information more usable.
Although Morville points out the weaknesses of conventional taxonomies in some detail, there is surprisingly little attention to the problems posed by folksonomies. A folksonomy can emerge very quickly and can be generated in an ad hoc way by people who know very little about a particular subject (hence the alternate term mob indexing). If you happen to know a lot about that subject, then the resulting folksonomy may not be very intuitive and your job of finding things is made more difficult—revenge of the lay-person, I suppose. It is also problematic when multiple communities build competing classification schemes for the same subject. For example, one can imagine a situation where a small, specialist community has a taxonomy for a topic but members are compelled to use a fuzzier, colloquial folksonomy to find a piece of information. This situation makes searching extremely difficult when trying to find all of the relevant information on a subject (exhaustive search). Similar problems happen with taxonomies, too. For example, governments, academic disciplines, and professional associations often create competing taxonomies for the same (or partially overlapping) subject. The problem with the rise of folksonomies is that new labels for already well-labeled items can emerge all the time and propagate very quickly. Searchers are then compelled to constantly learn new (often ephemeral) vocabularies on the fly. The silver lining of this dark cloud might be that it forces searchers to think more carefully about multiple audiences for particular subject areas. That is not unlike the way multiple taxonomies force searchers to think in more interdisciplinary or international ways.
That brings us to another major theme of Morville’s book: personalisation. Targeted advertising, “push” news services, list-serves, and online syndicated content (RSS feeds) provide new potential for useful information to find us. For example, when you select an item at an on-line retailer, such as Amazon.com, the retailer often recommends similar items and items that people with similar interests have also purchased. This allows retailers to better cater to small niches. This type of personalisation also has its limits, as Morville explains. There is the ambiguity of language problem once again. Few users have the patience or inclination to declare a personal profile or buy enough for a profile to be approximated. The reason behind the purchase may be ambiguous. For example, it is difficult to determine whether a purchase is a gift or whether it will be shared among several people in a household. A person’s demand also changes over time and may be situation specific.
Some of these problems can be illustrated with my recent experiences with Apple’s iTunes Store. I purchased several Japanese hip-hop albums and several albums of avant-garde electronic music from Europe. (Such are my tastes of late.) So did the retailer recommend more of the same? Nope. Instead, it simply offered a hodgepodge of American pop hits. Why is this so? Part of the answer is that other users may not have similarly eclectic tastes or there haven’t been enough purchasers of some products to make adequate recommendations. iTunes places users in kennels (country-specific stores) so the tastes of my compatriots are the ones that are compared to mine, whether or not that is the most suitable clan. The algorithm that makes the recommendations could also be designed poorly (it’s officially labeled as “beta”, meaning not quite up to snuff). The retailer may have an agenda, such as featuring items with higher profit-margins. Part of the answer is certainly the insufficiently fine-grained taxonomy that can’t be used to detect nuanced differences in taste. All of this music was simply classified as “electronic”. Then there is the opposite problem, what Douglas Coupland calls musical hairsplitting: “The act of classifying music and musicians into pathologically picayune categories: ‘The Vienna Franks are a good example of urban white acid folk revivalism crossed with ska.’” (1991, p. 85) So how does a retailer handle that algorithmically? I guess I’ll have to read Chris Anderson’s book The Long Tail (2006), which is said to delve into that problem in greater detail.
All of these problems are reason to take a closer look at what documents get flagged by social networking. Documents are often flagged because they are popular, which poses a problem because there is something self-fulfilling about popularity online: popular items become ever more popular because of their profile, while nonetheless valuable items remain obscure. Then there is reputation rankings, such as “most authoritative” or “most often cited” items. As Morville explains, these rankings can be subject to undue manipulation (although this is usually only the case with political hot-button items). Morville also notes that wear (the tendency of physical objects to degrade with use) is another method to identify popularity that is largely untapped by information architects, but one which searchers in brink-and-mortar libraries use sometimes. (The stockpiling of multiple copies of a book also suggests a seminal work, another method that remains untapped.) All of this is undoubtedly useful for moving us beyond the classic files-versus-piles problem. That is, it helps us find information in the absence of perfect classification and allocation of documents (files) or the loose chronology that happens when newly used documents are piled on top of older ones (piles). When you account for all the problems, however, the ideal of ambient findability seems well out of reach.
Morville closes his book with an interesting survey of the literature about information and decision-making. He uses the term “information diet” to describe our daily in-take of documented information, suggesting that librarians are a form of nutritionist (Morville is a librarian or used to be). I like the analogy because it speaks to the “junk food” equivalent of information and the large quantities of non-nutritious information that people are ingesting these days. It also speaks to an “information fitness” (although Morville uses “information literacy”, thus ruining the analogy by mixing metaphors).
Not long after Morville introduce these worries, he brushes them aside, suggesting that the fears are somewhat overblown. Then he provides a personal anecdote that confirms most researcher’s worst suspicion about the current state of findability.
Morville talks about a debilitating back problem he had. Doctors were not helping. So he went on the Internet and found a practitioner of alternative medicine. The treatment seemed to work. Thus, Morville concludes: “[the alternative medicine practitioner] blames doctors for perpetuating an epidemic of pain that costs our society over a hundred billion dollars a year. And you know what? I believe him. He healed my body.” (p. 163) And you know what? That’s utter nonsense. Morville probably feels better because he believes in this treatment, given the persuasiveness and heightened attention of the alternative medicine practitioner—it’s called the placebo effect and it can reduce pain (see my review of Dylan Evans’ Placebo). Who knows, his back injury might have been entirely psychosomatic. Or an attention to ergonomics might have kicked in. Or the condition might have been temporary. Or his recovery might have been a case of spontaneous remission. If the treatment is demonstrably beneficial, then scientific medical trials would be able to show it. But they don’t … not yet. The larger point is that this type of advice is hardly consistent with a healthy information diet regardless of the favourable result. And that should send shivers down your spine.
Morville suggests that new ways of finding information are changing the general public’s views of authority and trust. I agree. But it is not entirely a change for the better. The irony of Ambient Findability is that it stresses how difficult it will be to achieve the ideal findability that Morville yearns for. The book is mostly devoted to surveying all the problems related to finding information. The success stories that Morville cites tend to involve half-measures and promising prototypes. Thus, we are left in a bit of a pickle: existing societal institutions and professions, which are filters of information, are deemed to be less trustworthy; while new technologies create the opportunity to access all sorts of information but without a very effective way of sorting it all out. This seems to be a reality in which the fittest ones—the ninjas with the research ninjitsu and healthiest info-diet—are most likely to thrive.
Review by Peter Stoyko
REFERENCES
