Author: Eric C. Kansa
This volume looks at archaeology in the context of the World Wide Web, a communication system that has witnessed over two decades (and counting) of exponential growth. In many ways, the Web represents a revolution in communications and information sharing that rivals in significance the invention of the printing press or the origins of writing. In the past two decades, the Web has come to permeate virtually every aspect of our lives, transforming journalism, the arts, commerce, and the way we socialize.
Who Should Read This Book and Why
As long as the centuries continue to unfold, the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe. It will be almost as convenient to search for some bit of truth concealed in nature as it will be to find it hidden away in an immense multitude of bound volumes.
As part of this transformation, the Web’s growth means that we live in an information-saturated era. This volume adds minutely to this ever expanding body of data. While the Web increasingly enriches our work and social lives with new information, we face often overwhelming demands on our attention. Not only do we have new sources of news, information, and entertainment to jostle for our attention, we have many new sources of disinformation, spam, propaganda, and plain junk. It is no wonder that many scholars, overloaded with information, lament the passing of bygone days of quiet scholarly contemplation (Harley et al. 2010). No doubt many of the complaints about today’s Web reflect some romanticism about the past. Even with the development of telegraph networks in the nineteenth century, people complained about information overload (Standage 1998:165). While the phenomenon of information overload may not be a new symptom of modernity, even some pioneers of cyberspace worry about the current data deluge and its impact on creativity and deep thinking (Lanier 2010). Given all these competing demands on our limited attention, this book about archaeology on the Web requires some justification. Instead of merely feeding information overload, we hope this volume will help archaeologists take stock and better understand how the Web is transforming the professional practice of archaeology, just as it transforms professional communications in other disciplines. Reflection on these changes can help us better understand the state of this discipline. Moreover, researchers who study scholarly communications more generally will find this book a useful source of case studies. Archaeology is an inherently multidisciplinary enterprise, with one foot in the humanities and interpretive social sciences and another in the natural sciences. As such, case studies in digital archaeology can help illuminate changing patterns in scholarly communications across a wide array of disciplinary contexts. Archaeologists will find this book a useful guide in understanding how this revolution in communications technology reverberates across this discipline. Many of the contributions describe technologies, user interface designs, and organizational practices that attempt to mitigate some of the problems associated with the Web, especially information overload and disinformation. Contributions to this book also explore how the Web can be used to transform archaeological communications into forms that are more open, inclusive, and participatory. Some discussions focus on ways that Web based systems can make archaeological knowledge production more open and transparent, while others focus on the challenges of archiving and preserving digital data. Finally, some chapters describe case study examples of digital projects. Sharing these experiences can provide useful guidance for other researchers wanting to create and apply technology to archaeology.
Looking Back at Web 2.0
The book is loosely themed on so-called Web 2.0 approaches to these issues. Most contributions presented here derive from presentations given at the 2008 Society for American Archaeology conference in Vancouver, British Columbia. However, as is typical of many scholarly publishing cycles, transitioning conference presentations to a book form has taken about three years. Thus, many of the new technologies and perspectives presented at the 2008 conference are no longer so new. Nevertheless, the slower pace of academic publishing does offer some advantages. The lengthier review and revision cycle gave many of our contributors some added perspective and a chance for more nuanced reflection. Thus, many discussions of Web 2.0 presented in this volume have an element of retrospection. Originally coined in 2004 by Tim O’Reilly, founder and CEO of O’Reilly Media, the term “Web 2.0” describes social networking systems, blogs, and other Web-based platforms emphasizing collaboration and sharing, rather than the unidirectional flow of information in a traditional Web 1.0 architecture (O’Reilly 2005). Many archaeologists have embraced Web 2.0 tools and technologies, allowing them to integrate different bodies of content and develop new tools and interfaces for peer-to-peer communication and collaboration. Ultimately, these new tools and platforms allow opportunities for research and public participation in archaeology. However, “Web 2.0” is fast becoming a clichéd and obsolete expression. It stems from the revival of investment and interest in the Web following the dot-com collapse at the turn of the century. Now, in the aftermath of another and far graver financial collapse, many of the startups branded as Web 2.0 will likely fail. The term “Web 2.0” soon may be consigned to history.
The Promise of Web 2.0
As archaeologists accustomed to dealing with “deep time,” it makes sense to consider the Web’s impact on the discipline with a longer time horizon than is typical of most discussions of Web 2.0. While much about Web 2.0 will be a passing trend, the term still points to perspectives and developments likely to have lasting value and significance. In general, the term evokes designs and services emphasizing user interaction and the user as a source of extra value. This value can take multiple forms, including (but not limited to):
• User-generated content: Many Web 2.0 systems provide users with platforms for sharing and publishing content. These range from images to videos, and from essays to short, 140-character messages in microblogging services like Twitter.
• Crowd-sourced classification: Many Web 2.0 systems provide mechanisms for users to create and share metadata (“information about information”) that can describe content to facilitate search and retrieval.Tagging and folksonomy (informal classification) systems provide important information retrieval services.
• Remixable data: Providing Web-based data in ways that can be easily manipulated by software represents another common theme of Web 2.0. Web services, often called “application program interfaces” (APIs), publish data on the Web in formats intended for use by third-party software. The intent behind an API is often to “crowd-source” interesting and useful software applications based on content. Google Maps is currently an excellent example. Google provides mapping data and tools to third-party web developers. These developers install Google Maps on their sites, and in the process, Google (and its brand) becomes an increasingly ubiquitous feature of the Web.
• Enhancing and evaluating information quality: Web 2.0 often has connotations of an anarchic free-for-all, lacking traditional gate-keeping mechanisms to maintain quality. While information quality concerns rightly make researchers skeptical of some Web 2.0 platforms, in other contexts Web 2.0 systems try to promote quality. Sometimes Web 2.0 evaluations of quality are little more than popularity contests. In other cases, experimental scientific journals, such as PlosOne, attempt to use Web 2.0– style rating and commenting systems as a new form of “enhanced peerreview.”
Leveraging user communities and user interactions in the ways described above will probably continue to feature in future technology developments, in both popular and scholarly media. As is evident to many of us, the speed of implementation, rate of adoption, and impacts of drawing value from the efforts of users can be highly uneven. Developments in commercial web services, technologies, and tools are transforming the professional and personal lives of archaeologists, often by blurring the boundaries between these lives. The founders of Google, one of the first and most successful Web 2.0 companies, discovered a powerful method for finding value in the distributed work of millions, with the Page- Rank algorithm (Brin and Page 1998). Google searches, as well as more specialized services such as Google Scholar, have transformed the way researchers find information (Markey 2007; Yu and Young 2004). Google has also changed the way university libraries serve their patrons, influencing them to adopt single-box “full-text” search and stronger ranking algorithms. Web 2.0 increases the diversity of content that people can find readily on the Web. Social networking sites, blogs, shared web bookmarks (Delicious.com and the like), ratings systems (Digg), and shared media (Flickr and YouTube) have expanded the range of media that people publish online. However, trends impact scholarly communications in slower and more tentative ways. While the peer-reviewed, journal-published paper is still the main currency of professional research communication, an expanding number of research-themed blogs make less formally published material available. More archaeologists now publish images to Flickr, share presentations on SlideShare, and participate in open access publication (usually through self-archiving). While Web 2.0’s impact is far reaching, it does seem to have limits.Web 2.0 platforms and services mainly facilitate informal communications among archaeologists.Web 2.0 systems are simple to use, fast, and geared to content that requires relatively minimal investment to create. Archaeologists tend not to use Web 2.0 platforms as the primary dissemination channel for forms of content that take a great deal of effort and expertise to create. In this light, data sets and sophisticated scholarly manuscripts see less circulation in Web 2.0 channels.
Finding Web 2.0 Solutions for Primary Data
If related data and documents can be linked together in a scholarly information infrastructure, creative new forms of data and information intensive, distributed, collaborative, multidisciplinary research and learning become possible. Data are outputs of research, inputs to scholarly publications, and inputs to subsequent research and learning. Thus they are the foundation of scholarship (Borgman 2007: 115).
Borgman describes a desired goal for so-called cyberinfrastructure systems that support research through more efficient communication and data preservation. Unfortunately, most popular Web 2.0 tools and services cannot deal with the complexities required of such a system. Flickr, Google Docs, and other applications make certain aspects of archaeological fieldwork convenient to publish online, but they cannot describe key contextual information in a precise, consistent, and machine-readable way. For instance, contextual relationships are difficult to precisely describe in Flickr’s annotation (tagging) system, limiting Flickr’s usefulness in publishing images from an archaeological excavation. Similarly, Google Docs, ManyEyes, Swivel.com (now defunct), and other online systems for sharing tabular structured) data limit dissemination of structured data to data structures that can be represented on single tables. This is sufficient for archaeological data sets like individual zooarchaeological analyses, but it does not work well for publishing data on complex, multidisciplinary archaeological projects involving data sets generated by several different specialists and describing complex contextual relations. Most Web 2.0 systems are simple to use and place minimal requirements on end users to prepare and describe content. Users generally find it easy to retrieve relevant pictures or videos by searching content indexed by user-generated tags. Contributing (publishing) to a Web 2.0 system can be quite easy, because the content published is generally simple and described with limited and informal metadata, such as user-generated tags. So,Web 2.0 systems sufficiently serve many popular needs and applications. However, they have not been widely used in academic communities for content central to research. To respond to that problem, this volume presents recent advances in archaeological data sharing and explores how Web 2.0 services affect communication and collaboration in archaeology.
Overview of Contributors
The chapters in this volume illustrate the possibilities and limitations of the Web in meeting the specialized needs and requirements of professional researchers. Because Web 2.0 is best understood loosely as a zeitgeist or even as a marketing term, contributors define and discuss Web 2.0 from their own perspectives. The chapters address various semantic, intellectual-property, technical, social, and professional challenges of networking archaeological information. They present different perspectives on conceptual, theoretical, and practical approaches to communicating archaeological knowledge with new technologies and platforms. Some have successfully implemented Web 2.0 tools and approaches, while others have rejected such approaches. Issues about information quality, audience, and authority also inform their discussion. This volume shows how emerging digital forms of archaeological communication differ from traditional paper-based media, and how these differences require examination and rethinking of knowledge production processes. Many example projects in this volume are rich in structured data and multimedia content. Some of this content is generated “in real time” in active field programs and sees little editing or filtering before global dissemination. These projects hope to use the inherent capabilities of Web 2.0 technologies and platforms to make archaeology more collaborative and more transparent. However, they also raise difficult questions about information quality, information overload, intellectual property, and relationships between professional researchers, students, and different public communities. This book is divided into themed sections to help highlight certain particularly salient points of discussion made by the various contributions. The section themes are summarized as follows:
• Section 1 focuses on information retrieval and information-access approaches, especially centered on gray literature and primary field data. These forms of content have traditionally seen little dissemination.
• Section 2 explores larger conceptual concerns regarding information access and management. The contributions in this section discuss practical as well as theoretical concerns inherent in various design choices for archaeology’s computing infrastructure.
• Section 3 presents projects that aim to enhance collaboration in archaeology through various approaches, such as the adaptation and development of certain technologies for mobile field-based collaboration, coordination and data management of field-based researchers and other specialists, and collaboration among the international researcher community.
• Section 4 addresses scholarly communications issues, with a particular emphasis on concerns over information quality and access in light of sustainability and preservation imperatives.
In “The Archaeology Data Service and the Archaeotools Project: Faceted Classification and Natural Language Processing,” Julian Richards and colleagues discuss design innovations in information retrieval and integration of different data services. A key problem for archaeological information sharing is information overload. Standard keyword search systems often retrieve too much irrelevant information or fail to deliver relevant information if keywords are not mapped to synonyms. These problems make keyword searches somewhat unreliable and prone to deliver different results depending on the keyword inputs. However, by applying faceted search mechanisms, researchers gain greater comprehension of an entire corpus of material and can progressively refine searches to obtain specific information relevant to their interests.
Beyond search methodologies, Richards et al. discuss emerging frontiers of archaeological information management. Techniques in natural language processing (NLP) promise to enhance the value of archaeological literature. Automated and semiautomated NLP techniques help address some limitations of Web 2.0 methods in narrow niche professional contexts. How do you crowd-source metadata creation via techniques like social tagging when there is no crowd? The archaeological research community is relatively small, typically has very specialized interests, and may not be especially interested in social collaborative tagging. Thus, NLP offers a viable strategy to generate metadata to improve information retrieval (through faceted search and other techniques) without requiring the action of a (nonexistent) “crowd.”
Richards et al.’s discussion of faceted search and NLP techniques for metadata creation is on the cutting edge of archaeological informatics, indicating important features of the landscape of archaeological information retrieval for years to come. As these systems are developed and deployed, they will shape how professionals and members of the public encounter cultural heritage. In other words, our record of the past will be increasingly shaped and organized by algorithms. This trend will be a fascinating topic for future research and critique. Will these algorithms be part of unobserved background processes that rarely see scrutiny? Or will there be discussion and debate about who creates and deploys these algorithms, and for what agenda and purpose? How will different perspectives and agendas be accommodated? Many of the theoretical debates and concerns that shaped the content of archaeological literature may also emerge in the context of automated processes to organize and retrieve that literature.
In Chapter 2,“Toward a Do-It-Yourself Cyberinfrastructure: Open Data, Incentives, and Reducing Costs and Complexities of Data Sharing,” Eric Kansa and Sarah Whitcher Kansa discuss how techniques of Web 2.0 systems can be applied for research applications. Simple web services delivering machine readable data can help make archaeological information open and reusable for research, instruction, and creativity. However, fitting new modes of communication and collaboration into traditional research practices poses potentially insurmountable problems with regard to time, recognition, technical challenges, and workflows. These concerns have guided new developments to Open Context, an open source publishing system designed to facilitate sharing, collaboration, and integration of archaeological content.
The Kansas discuss some of the successes and failures of Web 2.0 in Open Context. They discuss how folksonomies provoked some initial curiosity in the system but failed to engage enough users to create useful metadata. In contrast, other aspects of Web 2.0 seem to have greater long-term traction and significance for their project, particularly exposure of machine readable data through web services. Approaches to design and delivery of machine-readable data through web services represent one legacy of Web 2.0 likely to have long-term impact and application for research data sharing. They explore how web services enable data reuse across different applications and collections. When data are made machine-readable, content can be freed from individual silos and used with content from other sources. These may include other archaeological collections or systems supporting data sharing in other disciplines. Content is also freed from a single mode of presentation and visualization. Web services therefore encourage development of new user-interface paradigms and greater flexibility in user interactions with diverse content. Finally, certain technical design perspectives encourage architectures that better support important scholarly conventions, including citation and linking.
Discussing the reluctance of some researchers to share data, they emphasize a strategy that casts data sharing as a form of publication, where many of the conventions for citation and editorial oversight used in narrative publication can be applied. This perspective has increasing traction across many scientific fields, as indicated by recent editorial comments in the journal Nature (“Data’s Shameful Neglect,” 2009). Beyond serving as a useful publication model, narratives also provide context and meaning for archaeological data sharing. Kansa and Whitcher Kansa discuss “tacit knowledge” and the implicit understandings and background required to make sense of archaeological data. They discuss transmission of tacit knowledge via formal classification systems and ontologies, or through social scholarship enabled by Web 2.0 systems. They highlight the need to integrate data publication with narrative and interpretive publication to make shared primary data intelligible and usable by a wider community. This last point is made by many other contributors to this volume.
Historically, archaeologists became interested in computing and databases to control huge quantities of excavation data. They looked to the computer as a tool to retrieve and analyze information across multiple data sets and excavations to create broad syntheses. Unfortunately, the promise of digitally based meta-analysis has not panned out. The mass of digital material generated by archaeological activity is geographically distributed, fuzzy, incomplete, inconsistent, and often hard to access. The resulting complexity deluge presents a whole new set of problems for archaeology. Stuart Dunn’s contribution in Chapter 3, “Poor Relatives or Favorite Uncles? Cyberinfrastructure and Web 2.0: A Critical Comparison for Archaeological Research” critically reviews Web 2.0 methods and technologies that address this emerging problem. He explores cyberinfrastructure/e-science and how it relates to Web 2.0 technologies and techniques in archaeology. Dunn divides the process of using archaeological data into collection and harvesting; analysis, integration, and interpretation; and social research. In these three domains, he explores various hallmark technologies and methodologies commonly associated with both Web 2.0 and cyberinfrastructure. In particular, he sees folksonomy as a way to supplement and enhance traditional taxonomies. He advocates a “spade to screen” documentation process, to ensure that methods used to author and create digital objects are transparent and attributable. Dunn concludes that the top-down approach of cyberinfrastructure and the bottom-up approach of Web 2.0 are not two irreconcilable models, but different layers in the same structure. that methods used to author and create digital objects are transparent and attributable. Dunn concludes that the top-down approach of cyberinfrastructure and the bottom-up approach of Web 2.0 are not two irreconcilable models, but different layers in the same structure.
In Chapter 4, “Archaeological Knowledge Production and Dissemination in the Digital Age” Robin Boast and Peter Biehl discuss the ways in which different cultural contexts shape information management, retrieval, and use. They explore the diversity of ontologies and classification systems among expert communities and others, including different indigenous communities. Their discussion begins with an exploration of archaeological approaches to knowledge creation, contrasting “classificatory” versus “interpretive” paths. Whatever interpretive approach is taken, tangible cultural heritage becomes embedded in intangible processes that shape understandings of that tangible heritage. They argue that online information systems are contact zones where different understandings collide and inform one another.
Boast and Biehl then discuss how different conceptual systems can inform one another through Web-mediated collaboration. They look at how museums and their educational programs try to bridge understandings between museum experts and various professional communities. However, as they note, museums typically concern themselves only with managing their own, expert-informed classifications and documenting their own collections. They do little to document how different public communities understand these collections. The perspectives of outsiders, though present and expressed in museum educational performances, rarely end up being recorded or informing experts.
Alternatives to the model of “one-way” broadcasting of museum expert knowledge are emerging. Many of these use Web 2.0 ideas of two-way communication and participation in creating content and sharing ideas. Boast and Biehl’s exploration of classification is of interest to those wishing to foster reciprocal information sharing across different community settings. Essentially, they find that categorizing cultural heritage, even in loosely structured and constrained “folksonomy” approaches, is of limited appeal and interest to many people outside of museum professional circles. They find much more interest in digital representations of material culture to support narratives. In other words, opening up museum collections for social tagging resonates less than encouraging storytelling. This observation has important implications. Concerns over classification and standards for classification dominate thinking about cyberinfrastructure, the Semantic Web (or “Linked Data”), and cultural heritage data sharing. Many discussions of Web 2.0 and folksonomies emphasize classification issues in information sharing. However, while classification is important, it is not the only concern. In some cultural contexts, construction of narratives has greater priority. By looking at how cultural heritage information is used in different contexts, Boast and Biehl highlight the need to move beyond classification to other social uses of information. Their insights help guide future attempts to bridge gaps between museums and other communities and also highlight the importance of narrative even within academic and museum professional circles.
Archaeological projects are rarely blessed with full-time, permanent staff. In some cases, part-time specialists are employed full time at other museums or institutions, or work as freelance archaeological specialists involved in a wide range of additional projects. In other cases, specialists focus so intently on their particular research interest that they have no real sense of the totality of the project. The result is that specialists often feel isolated or semidetached. In Chapter 5, “Creating a Virtual Research Environment for Archaeology,” Michael Rains discusses VERA (Virtual Environment for Research in Archaeology), a system attempting to address these issues. Funded by JISC (Joint Information Systems Committee), VERA is a Web-based virtual research environment (VRE) collaboratively developed by the University of Reading, University College London, and York Archaeological Trust. It is centered on the Silchester Town Life Project at the University of Reading. This is a large-scale, ongoing excavation of part of the abandoned Roman town of Calleva Attrebatum at Silchester, approximately 80 km west of London. Silchester has used the Integrated Archaeological Database (IADB) as its data management system since the start of the project in 1996. Key aims of the VERA project include improving the flow of information from excavation, through analysis and research, to publication and dissemination, and developing a collaborative working environment involving all members of the project team. At the heart of the VERA interface is one or more interactive graphical representations or visualizations of part of the project database. For example, a particular phase in site development is displayed as a standard archaeological stratigraphy diagram. What is unique about VERA is that additional content from the project database, such as plans of stratigraphic units, photographs, and field notes, can be attached to the diagram. This process adds context to excavation materials and allows all Web-connected stakeholders, regardless of their location, to contribute to the project.
Ethan Watrall’s contribution in Chapter 6, “iAKS: A Web 2.0 Archaeological Knowledge Management System,” proposes a system to leverage web services and Web 2.0 technologies. iAKS aims to solve many data collection, storage, and visualization challenges currently faced by archaeologists. Database field tools are difficult to find and are often expensive, complex systems developed on a project-specific basis. iAKS, by contrast, is a flexible design that can be used by many different projects with very different research designs. Most importantly, iAKS can be used in the field and offers different types of service, depending on whether a project has Internet access, a local server, or simply a hard drive. Content created using iAKS is converted into XML, making it easy to share, integrate with other content, and preserve. This holistic approach, from field-based design and data collection to data sharing and integration via the Internet, makes the iAKS system particularly promising. Most importantly, the architecture approaches Watrall advocates explore new possibilities in mobile computing, where global information systems and infrastructure can come together with handheld, field-ready devices (phones, tablet computers, etc.). These capabilities can make archaeology increasingly “glocal” (simultaneously global and local), as particular finds and contexts observed locally can be related to other digital documentation found on global information networks.
In their contribution, “User-Generated Content in Zooarchaeology: Exploring the ‘Middle Space’ of Scholarly Communication,” Sarah Whitcher Kansa and Francis Deblauwe review the emerging role of user-generated content in archaeological communications. As discussed above, archaeologists mainly participate in Web 2.0 platforms for less formal types of communication and sharing. This chapter, based on the experiences of people who actively manage and participate in Web 2.0 systems, explores why archaeologists sometimes see a valuable role for Web 2.0 channels.
This chapter makes a number of interesting points with regard to incentives for professional researchers to participate in online social media. Blogging, as a social media platform, has been widely used now for several years. While there are several professionally oriented archaeological blogs, this chapter notes that blogging is still a somewhat niche activity in the discipline. Nevertheless, despite the fact that most blog posts will see little comment, the impacts of blogging may be greater than is immediately apparent: they can help spark interest in a paper, a website, or a grant announcement through the passive engagement of readers, and even through improving the search engine exposure of web resources referenced in a blog post.
Similarly, community portals are commonplace on the Web but are less discussed in calls for disciplinary “cyberinfrastructure.” One such portal, BoneCommons, has seen various incarnations since its initial launch in 2006 and now enjoys continual use by the zooarchaeological community. The authors recount their experiences managing scholarly blogs and community portals, and note a rapid change in academic uptake and participation in social media. This last point is particularly important. Expectations and acceptance of technologies are not static, even in academia. In laying the foundation for digital infrastructure, we have to be mindful of trends and trajectories, and not just the current state of the research community in accepting a given technology or dissemination platform.
Finally, Chapter 7 touches on the expanding reach of digital preservation efforts to capture the ephemera of discussions on Twitter and email lists. This raises important questions about the scope and reach of data preservation efforts. When does data preservation go too far, and when does it start to appear invasive? This issue goes beyond social media and can include the primary field-documentation and notes of excavators. Such documentation can be full of irrelevancies that range from bickering to flirtations, and from complaints of confusion to inside jokes (sometimes very off-color). While this content could help contextualize archaeological data, should all of this sometimes embarrassing content go into the official archaeological record? What is the scope of privacy in data preservation?
Willeke Wendrich’s chapter, “UCLA Encyclopedia of Egyptology, Archaeological Data, and Web 2.0,” outlines the tensions between the various traditions, incentives, and quality concerns of professional scholarship, on the one hand, and the possibilities and environment of the Web, on the other. There is a widespread and often justified perception among academics that the Web is an unreliable foundation for scholarship. The Web is highly fluid, content can change at any time without notice, and resources may move or disappear entirely. At the same time, the Web has obvious advantages in reducing the cost and difficulty of disseminating scholarship.Wendrich’s chapter describes the efforts of the UCLA Encyclopedia of Egyptology to take advantage of the best of aspects of the Web while avoiding the worst.
A clear implication of Wendrich’s work relates to the concerns over what constitutes publication. Key attributes involve persistence, peer review, and editorial control. In some ways, these attributes run counter to the emphasis common to Web 2.0 systems: easy retrieval, immediacy, popularity, and participation.Wendrich highlights the importance of reliable and credible citation. In contrast to the typical contributor to Web 2.0 systems, archaeologists participate in knowledge creation in very different ways, typically resulting in more complex, larger, and discrete works (the chapters in this volume are a good example). As Wendrich points out, giving credit to archaeological researchers as individuals is vital. Many do not want their authorial voice diluted or lost in a collective, as occurs in contributing to, for instance,Wikipedia. Many scholars also consider knowledge creation to be cumulative, where it is important to build upon works and contributions made across many decades. Quality and comprehensiveness are more important to scholars than they are to Web 2.0 users looking for easy dissemination and discussion.
In exploring these issues,Wendrich emphasizes the importance of considering Web-based publication as publication. However, she goes beyond a simple model that merely replicates traditional printed matter on the Web. The Encyclopedia of Egyptology’s experiments with various forms of digital media beyond text illustrate how Web-based publication can support more depth and diversity in the content of scholarly communication. Nevertheless, while open to experiments with “new media,”Wendrich makes a convincing case that digital dissemination must rest on a solid foundation of established scholarly traditions.
In Chapter 9, “Open Access for Archaeological Literature: A Manager’s Perspective,” Jingfeng Xia reviews open access archiving of content from the perspective of an experienced archival manager offering recommendations for the nascent field of archaeological publication archiving. Xia discusses the institutional archive approach and warns that, while it benefits from vast input by hired institutional managers, the content is often broad but shallow and not well informed. That is, people inputting content aim for breadth, while depth and accuracy in metadata suffer because managers may not understand the subject beyond abstracts or keywords. Learning from other disciplines, Xia encourages the archaeological community to adopt a subject repository approach, where the archive pools resources from many organizations and is managed by archaeological subject matter experts. Xia explains that subject repositories tend to offer deeper and more accurate metadata description of content, but may suffer from a lack of institutional infrastructure. Without an organizational hub, who will manage the content? Who will ensure its longevity? Xia discusses possible next steps for data sharing in the archaeological community.
Xia’s focus on the accessibility of archaeological publications has significance beyond impact and quality issues for human readers. The same sorts of text-mining and NLP approaches explored by Richards et al. in Chapter 1 can be applied to more mainstream archaeological publications. However, copyright restrictions, subscriptions, and login barriers now make it too difficult to obtain large corpora of published archaeological literature. Thus Xia’s call for an open access repository in archaeology can pave the way for new research opportunities using advanced computational methods.
Another consensus among the contributors is that, despite its new possibilities, Web 2.0 by itself will not “crack the archaeological data-sharing nut.” In the penultimate chapter of this volume, “What Are Our Critical Data-Preservation Needs?,” Harrison Eiteljorg offers a “naysayer” position, enumerating the shortcomings of sharing data via a Web 2.0 repository. Eiteljorg distinguishes “data access” via a passive archive, where access involves “frozen” resources such as spreadsheets of data, versus “data organization,” employing Web 2.0 features such as data integration and user contribution. Beyond the specific issues, which range from controlled vocabularies to different file formats, data sharing via contributory systems faces an overarching challenge: how does one ensure that the content user fully understands (1) the project the data come from and (2) the data collection process itself? Furthermore, how can we logically compare “resources that are inherently dissimilar because they are derived from data collected in different ways by different people at different times and with different purposes”? Eiteljorg reviews disincentives to contributing data to a Web 2.0 repository, including the limited professional rewards for doing so and the lack of momentum to archive data once a project has been published (and the “big push” is over). In contrast, many of the other chapters in this volume discuss efforts at eliciting professional rewards for data sharing. However, data sharing is a new concept for most researchers, who are still getting accustomed to the idea of archiving print publications. Eiteljorg recognizes the promise of Web 2.0 and suggests that Web 2.0–style approaches continue to be explored as a means of data sharing, but in conjunction with the static archiving of data sets.
Fred Limp’s concluding chapter, “Web 2.0 and Beyond, or On the Web, Nobody Knows You’re an Archaeologist,” recapitulates concepts discussed in preceding chapters and paints an optimistic picture of the future of archaeological data sharing. In reviewing the contributions to this volume, Limp notes the importance of differentiating between goals and techniques to accomplish those goals. As technologies rapidly evolve, specific implementations will vary, but strategic needs will be more stable and should guide the professional community’s efforts more than fixations on the latest technological fashions. Limp explores strategic concerns affecting the viability of attempts at archaeological data sharing. In comparison with commercial uses of Web 2.0, archaeological data sharing, Limp notes, has some unique requirements. One key element is the need for sustainability. Commercial Web 2.0 initiatives need not bear the burden of maintaining the irreplaceable record of humanity’s cultural heritage on volatile media and technology platforms long into the future. Sustained and credible institutional support—such as the support of the California Digital Library, the new organization Digital Antiquity (a welcome new development since 2008 when these chapters first came together), or the Archaeology Data Service—is a requisite for the discipline. In addition, Limp argues that Web 2.0 services often emerged in situations where there was a large amount of valuable content readily available to “prime the pump” and attract sustained interest and use in their platforms. For archaeology, this is more difficult, because of a limited supply of content ready for digital dissemination. Beyond these difficulties, Limp sees challenges in motivating greater data sharing and in agreeing upon technical and semantic standards.
Sustainability and reaching a critical scale for content sharing remain important strategic questions. Limp’s point about separating implementation specifics from strategic goals helps to highlight options available to the archaeological community. Development of Web 1.0 (and also Web 2.0) was very much a distributed effort, with many failures and some successes. Similarly, prospects for archaeological data dissemination must be considered more broadly than the successes or failures of any given project. What are the overall trends, and is there evidence for increasing and more cumulative archaeological data sharing? Given the growing list of initiatives discussed and referenced in these chapters, we suspect that archaeological data sharing and Web engagement, as a distributed phenomenon, will likely continue to grow. While individual projects may come to an end, technical expertise, data, standards, and experience will continue to grow.
Sustainability requires long-term institutional credibility and resources typically found only in organizations like libraries, universities, and government agencies. Ultimately, sustainability may also require sustained public financing to maintain “public goods.” To put this issue in perspective, archaeology as a discipline is manifestly not sustainable without continued public financing.Without public support, a sustainable business model for archaeology would probably look much like the antiquities trade! Since the entire enterprise of archaeology cannot be sustained without public support, archaeological knowledge preservation and dissemination will also likely require continued public financing.
Sustainability strategies must also accommodate the reality that archaeological data-sharing efforts are scattered among several diverse initiatives and projects. This experimentation fosters innovation and builds technical capacity and expertise throughout the discipline. It also reduces the danger that there will be one and only one preferred approach to managing and making sense of archaeological data. As discussed below, digital approaches to archaeology, like any other methodology, should be considered contestable. Keeping the playing field open for multiple technical, semantic, and even ethical perspectives is therefore in the interest of the discipline as a whole. However, many archaeological data-sharing projects exist only on limited grant-funded support. Nevertheless, these may be innovative and may publish valuable content while they explore important questions in interface design and technology. An important goal should be to ensure continued experimentation and innovation of these distributed initiatives while safeguarding and preserving data. Standards efforts and archaeological cyberinfrastructure should focus on supporting widely distributed digital efforts to help ensure that their contributions will outlast their grant funding. We hope future efforts will find feasible and cost-effective strategies to enable “data preservation as a service” so that content can be preserved by the organizations most capable of doing so, while reducing the costs and risks of innovation and experimentation in different digital methods. The California Digital Library’s model for preservation micro-services represents a very encouraging step in this direction. Establishing a reliable preservation infrastructure open for the widely distributed community to build upon would encourage greater dynamism in this field.
Information Overload and its Discontents
In Chapter 10, Harrison Eiteljorg highlights a great challenge in sharing data with contributory systems: How does one ensure that the content user fully understands the project the data come from and the data collection process itself? In other words, how can a user understand a data set if that user was not involved in its creation?
The same question can be asked of synthetic publications for field projects: How can readers understand the project if they were not involved in it? As discussed above, knowledge has tacit components that often go unrecognized. Thus, even researchers who strive to communicate as comprehensively and transparently as possible will probably not be able to provide enough explicit metadata and explanation to reveal all the assumptions, motivations, and decision-making behind their data. While often an admirable goal, total transparency in archaeological research will probably always be unattainable.
Whether explicit or not, various contributors to this volume offer approaches to the problem raised by Eiteljorg. Some contributors, such as Boast and Biehl (Chapter 4), favor greater attention to linking archaeological data sharing to narratives and interpretations. They argue that digital representations of cultural heritage often find the greatest meaning embedded within narratives. In that sense, they question the universal utility of metadata and the structural formalisms of disciplinary semantic standards. The Kansas (Chapter 2) also argue that data sharing needs to be linked with narratives, both for the sake of intelligibility and to better fit with familiar patterns of scholarly communications.
While narratives can offer more depth to guide interpretation, unfortunately, deep reading of contextual and narrative nuance “does not scale.” Archaeologists, like many other twenty-first-century knowledge workers, face increasing demands on their attention. While we work to produce more and more documentation, analyses, and interpretations about the past, we seemingly have less time and attention to devote to understanding this wealth of data.Web 2.0 systems both help and hinder in that regard. On the positive side, user-generated tags, ratings, and recommendations from social networks may help one rapidly find useful information. Whitcher Kansa and Deblauwe (Chapter 7) argue this point in the context of archaeological blogging and in the context of social media use among the zooarchaeology community. On the negative side, participation in these social networks requires precious attention. One may find too much useful information to adequately process and understand. Unfortunately, information overload problems are not limited to the world of Web 2.0, and many scholars lament the glut of literature published by their colleagues (see Harley et al. 2010: 37–38).
Thus, information overload is one of the most critical problems archaeologists face today. To help mitigate information overload, some emphasize common standards, including formal domain ontologies to explicitly define the meaning of archaeological data according to widely held community understandings. This approach has the advantage of being automation-friendly. Human effort and attention are in short supply, and the more computer systems can automate documentation, retrieval, and aggregation of archaeological content, the more content researchers can hope to use. NLP, text mining, the Semantic Web, and other automation techniques offer useful strategies to help archaeologists understand and utilize their colleagues’ research findings, and overcome information overload (see Crane’s 2006 insightful discussion). In this regard, web services and other approaches to integrate different collections also relate to this discussion. Such services help pool data from multiple sources, making search, retrieval, and use of the data easier and more efficient.
While promising for some applications and research perspectives, understanding the past through algorithmic processes will probably not be universally welcome. Efficiency has tradeoffs, especially if your theoretical perspective is more “reflexive.” Relying upon semantic standards or machine produced metadata will be somewhat [tooltip text="'Lossy' is a term referring to how some compression algorithms degrade quality and fidelity of images and other digital media in order to reduce storage and transmission costs."]“lossy,”[/tooltip] in the sense that local nuance and context may be lost to imperfect and partial mappings to a global standard. Moreover, any standard or algorithm privileges a certain set of expectations and goals. Who will set the agenda in determining the semantic standards behind automation? What perspectives will become enshrined and codified as required standards by funding bodies and professional societies, and what perspectives will be left on the margins?
It does not take much imagination to see emerging theoretical tensions between archaeological knowledge production driven from algorithms and formalized ontologies versus archaeological knowledge constructed from different threads of narrative. In some ways, the tensions between advocates of “deep reading” and advocates for “interoperability” continue long standing theoretical disputes in archaeology. Some researchers emphasize contextual nuance and particularistic interpretations, while others seek more generalized patterns in more or less interchangeable empirical data. Each different theoretical orientation fits better with a different type of technical style and systems implementation.
One would hope that the discipline will benefit from the best ideas of both the “deep reading” and the “interoperability” perspectives. Transparency and openness in analytic methods as well as in data sources should be a key requirement for technologically enabled archaeological research. Data sources, services, and software open to “deep reading” can earn greater trust. For example, it would be much better if the corpora and algorithms used in a text-mining project were open for others to use and adapt to serve other agendas.Without such openness, it is impossible to go beyond the perspectives, assumptions, and limitations of the initial text-mining or semantic data project. Openness to critique and outside improvement can lead to greater trust and legitimacy in archaeological information systems, even if few will have the time and inclination to actually bother with inspecting their inner workings.
The intersection of archaeological theory and digital technologies needs far more exploration. While we should avoid being “techno-determinists,” we would be foolish to ignore the role of technology in shaping scholarly life, including theoretical outlooks. It will be interesting to see how new technology opportunities and challenges co-evolve with theoretical trends in archaeology. How will ready access to structured data text-mined from over a thousand publications change archaeological interpretation? How will the professional community evaluate the significance of a sprawling multithreaded conversation taking place between museums and distributed social media outlets? Who will have the time and attention to devote to deep reading, and where will they focus their attention? What sorts of information will be taken for granted and what will attract great scrutiny? How much will information convenience drive future research agendas? All of these are important topics for continued research.
Future Directions and Tensions: A Brave New World for the Past
Many contributions in this volume note that Web 2.0 technologies and data sharing need to work in the context of scholarly communications. This volume offers many cautionary tales about applying emerging technologies in professional settings where such technologies may clash with incentives and perceptions of risks and rewards. Specific technologies change rapidly, but many of the issues explored in this volume will have long-lasting significance. The evolution of scholarly communication and how researchers recognize and communicate expertise and authority will remain important topics long into the future. Similarly, no matter what specific technology we deploy, we must grapple with how interfaces, data structures, and architectures are guided by, and also guide, interpretive priorities. Thus, many of the concerns explored in this volume will foreshadow areas of future research and debate, even after the term “Web 2.0” loses currency.
In looking at the longer-term impacts of these discussions, it is impossible to ignore more general trends shaping the public Web. Archaeologists, even those working on cyberinfrastructure initiatives, may not be the primary agents shaping the future of archaeology’s digital communications. Already, Google has reshaped how students and researchers search and retrieve scholarly content, an issue touched on by many chapters in the volume. While some of Google’s search and ranking algorithms are known (especially PageRank), other algorithms behind search results are trade secrets. Moreover, Google and other search engines continually change their methods and often offer personalized recommendations to individual users with little transparency. Potentially, every user of Google gets different search results, algorithmically personalized to their interests and search history. What does this mean to researchers searching through archaeological literature? How does this challenge or reinforce personal biases? How do personalized recommendations help shape archaeological discourse? These are important issues for further discussion.
As we look ahead, the ways in which archaeological content is aggregated, ranked, and presented will, in large part, be driven by the needs and interests of commercial Web giants. An increasingly large part of archaeological information retrieval is being shaped by “black-box” processes invisible to the research community. Other disciplinary communities face similar issues. But because archaeology has important relationships to tourism and marketing of cultural heritage, it is likely to feel disproportionately greater impact from emerging commercial information services than other disciplines. These extend beyond search and include various Web-based collaboration, visualization, and mapping applications. For better or worse, Google Maps, Google Earth, and social media services are now the windows through which many students and the public will encounter archaeological data.With the tremendous growth of mobile computing and location-based services, Google and other commercial web giants will likely play an increasing role in shaping how cultural heritage is delivered and presented on the Mobile Web (Kansa and Wilde 2008). The ways that commercial aggregation and ranking of cultural heritage will affect public perception and experience of the past will deserve increasing scrutiny (see also Vaidhyanathan 2011).
As technologies for disseminating, organizing, and retrieving information increasingly shape archaeological communications, debates about the theoretical implications and assumptions behind those technologies will receive greater attention. In that sense, this volume is an early sample of conversations to come.
I thank my co-editors Sarah Whitcher Kansa and Ethan Watrall for providing synopses of many of the contributions to this volume and advice on the structure of this chapter.
Borgman, C. L.
2007 – Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, MA: MIT Press.
Brin, S., and L. Page
1998 – The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30: 107–117.
2006 – What Do You Do with a Million Books? D-Lib Magazine 12. Retrieved from http://www.dlib.org/dlib/march06/crane/03crane.html (accessed October 16, 2008). “Data’s shameful neglect” 2009 – Editorial comment. Nature 461: 145 (September 10, 2009).
Harley, D., S. K. Acord, S. Earl-Novell, S. Lawrence, and C. Judson King
2010 – Assessing the Future Landscape of Scholarly Communication: An Explo- ration of Faculty Values and Needs in Seven Disciplines. UC Berkeley, Cen- ter for Studies in Higher Education. Retrieved from http:// escholarship. org/uc/cshe_fsc (accessed June 22, 2010).
Kansa, E. C., and E. Wilde
2008 – Tourism, Peer Production, and Location-Based Service Design. In IEEE International Conference on Services Computing, vol. 2: 629–636. Los Alamitos, CA: IEEE Computer Society.
2010 – You Are Not a Gadget: A Manifesto. 1st ed. New York: Knopf. Markey, K. 2007 – The Online Library Catalog. D-Lib Magazine 13. Retrieved from http://www.dlib.org/dlib/january07/markey/01markey.html (accessed October 15, 2010).
2005 – What is Web 2.0?. Retrieved from http://oreilly.com/web2/archive/what-is-web-20.html (accessed May 28, 2009).
2011 – The Googlization of Everything. Berkeley: University of California Press. http://www.ucpress.edu/excerpt.php?isbn=9780520258822#readchapter1 (Chapter 1 excerpt accessed September 10, 2010).
1998 – The Victorian Internet. New York: Berkley Publishing Group.
Yu, H., and M. Young
2004 – The Impact of Web Search Engines on Subject Searching in OPAC. Information Technology & Libraries 23/4 (December): 168–180.
“This volume carries a Creative Commons BY-SA (By Attribution, Share Alike, http://creativecommons.org/licenses/by-sa/3.0/) license. In short, this means that others can freely distribute, remix, and build upon the contents of this volume, provided two very important conditions are met: the original author receives proper attribution (especially citation) and all subsequent works carry the same license. We chose a Creative Commons license primarily because of our deep concerns in the sustainability of sharply escalating costs in scholarly publishing. These costs make it increasingly difficult for educational institutions, our colleagues in commercial archaeology, students, and members of the interested public to (legally) obtain peer-review publications. Please note that the Creative Commons BY-SA license allows for commercial use, as well as free distribution both inside and outside of the Academy. Permissions for commercial reuse does not, however, mean commercial appropriation. The “copyleft” philosophy embodied by this license enables this work to move in many contexts, but any adaptation or enhancement of this work must be shared back, openly, with the community. Finally, because this license requires proper attribution in any subsequent duplication or adaptation, we hope this volume helps build exposure and recognition for our contributions, and that our colleagues follow in this example. With enough accessible and open data (“data” that includes content like this book), we open up more opportunities for text-mining, tagging, aggregating, linking, visualizing, and hopefully better understanding.”