|
||||||||
Susannah Iltis
University of Washington, March 1995
The Need for Interoperability
With the advancement of technology in the 1980s and 1990s, the importance of having cooperative bibliographic and information retrieval networks has become paramount. The development of online bibliographic databases such as PsycINFO and MEDLINE and the explosion of online services accessible on the Internet and World Wide Web (WWW) has made it possible for librarians and users to access information in ways not possible before. However, despite having access to many new information databases locally and remotely, the disparity of software and hardware requires the information user to learn the specifics of each system. As electronic resources grow so will the problem of how to access the information of so many disparate systems. The problem is not only on the level of the user, but also involves the ability of many different database management structures and different application designs to exchange information electronically. (McCallum, 44)
The information needs of both the users and the systems demonstrate the current need for new standards to answer interoperability issues among so many different systems, bibliographic and otherwise. Exchanging bibliographic information among LC, OCLC and the like is no longer enough when so much information is available in electronic form only. The need for standards for communicating computer-to-computer became obvious in the 1980s with the explosion of technology. Having such standards with the ability to achieve transparent connections among systems would make it possible to search databases and retrieve information from any system no matter the differences in software or hardware on which it runs. There also would not be the need for private agreements to establish how a session would operate, unlike the agreements needed for the sharing of MARC tapes, and what protocols they would support. With such standards, a national and international information network could be realized through true interoperability making issues of software, hardware, and data content obsolete. ANSI/NISO Z39.50 developed out of the need to share bibliographic information electronically, but it developed beyond bibliographic information to a standard for information retrieval. Although there remains issues of implementation and interoperability, the potential exists for a cooperative national information network, and the goal of Z39.50 is to bring it one step closer.
What is Z39.50?
American National Standard Z39.50, Information Retrieval Service Definition and Protocol Specifications for Library Applications, is a standard composed of specifications for computer-to-computer linkage between different information retrieval systems. Its purpose is to encode the messages required to communicate between two computer systems for the specific purpose of information searching and retrieval. Although it developed from the need to exchange bibliographic information, the protocol is defined to serve as a search and retrieval service completely independent of the structure of the underlying data. It is designed to allow searching on remote systems without prior knowledge of the other system's syntax, strategies, or data content. The user only interacts with the local interface, while the implemented computer system acts as an intermediary between the user and the other system despite possible differences in software or hardware. The origin system only interacts with the application layer of the target system, the other services and specifics of each system are encoded so that they are hidden from the application program. The vision of Z39.50 is to accomplish all of the above; however, the standard does not tell how to accomplish interoperability. It is up to the individual implementors to create the actual code. (Ensor, Hinnebusch, Hinnebusch)
OSI and TCP/IP: Opposing standards
Essential to the development of the American standard Z39.50 (which specifies running over OSI) and its ability to achieve interoperability is the difference between the two types of standards, de facto and de jure, and how TCP/IP and OSI developed respectively. De facto standards develop out of the practices of a given vendor that becomes so widely accepted and implemented in the marketplace that it is accepted as a standard. With the growing fragmentation of the marketplace, especially in computers, there has been a shift from individual vendors to a consortia of vendors, implementors, and users establishing de facto standards. Standards being developed around the Internet, TCP/IP and FTP, are examples of this new direction in de facto standards. The other kind of standard, de jure, is the result of formal national and international processes established by policy. In the United States, de jure standards conformance is voluntary unlike most of Europe where they have the force of law. International de jure standards usually focus on compromise of individual interests and politics which can hinder the development of functional standards. In the early 1980s, recognition of the fact that implementing leading-edge information technology was becoming an increasingly expensive and long-term investment led to the development of preemptive standards. The standards were first defined, prior to any product development, and then products were released into the market. This development of preemptive de jure standards had a negative effect in relation to the developing computer networking standards because they are foremost paper standards.
There was now no reason to require, or even expect, that networking standards that were defined by international standards bodies would work at all, much less work efficiently or be implementable. It was only necessary that the standards bodies agree on some sort of international standard. (Lynch, 40)It was in this environment that the OSI networking standards were under development.
Precursors of Z39.50
Since the mid-1970s, work on information retrieval protocols has been under development. The focus, initially, was to enable organizations such as the Library of Congress, OCLC and RLIN "to create what was in effect a logical national information resource of bibliographic holdings." (Lynch, 59) In the mid-1970s, ANSI/NISO Z39.2, the MARC record format, was developed and adopted to help realize interlibrary loan, resource sharing and cooperative collection development. By the early 1980s, the Linked Systems Project (LSP) had developed out of the effort to connect their systems for the exchange bibliographic records. The work of the participants in the project defined the initial working draft of the Linked Systems Protocol which was a draft national standard for bibliographic information known as ANSI/NISO Z39.50. (Buckland, 83) The goal of the LSP was to build a prototype OSI network over which to run a group of library applications. At this time, the OSI application layer protocols were still emerging, so working agreements were reached to allow implementation to proceed around the develop of the drafts of the relevant protocols. By 1984, the LSP had submitted the draft form of its information retrieval protocol to NISO for standardization as an American National Standard Open System Interconnection application layer protocol. Through the work done by the LSP and successive drafts and refinements made by NISO, including changes made to broaden the constituency, ANSI/NISO Z39.50 was balloted and approved in 1988. (Lynch, McCallum)
How the OSI layers work
The Open System Interconnection OSI model, a paper standard
that developed out of preemptive standards, used in various ISO
standards defines Z39.50. The model is comprised of seven logical
layers of hardware or software (from top to bottom): application, presentation, session,
transport, network, data link, and physical. Each layer communicates
with the layer immediately above it and below it. Each layer provides
a "clearly defined set of services to the layer immediately above it
using the services provided by the layer immediately below it." (Hinnebusch, 3)
Except for the physical layer which is static, the other layers
negotiate service parameters with their counterpart in the other
system. Z39.50 defines protocol data units (PDUs), a set of messages,
which contain the information necessary to be transmitted between
systems to provide that service with one system producing and
transmitting the PDU to another system which decomposes and uses it.
The layer model is designed to make encoding variants invisible to
application layer software. The application layer, the top layer of
the model, provides services to software within an application system.
As a result, the set of services defined by the application layer
protocol is the only set of services available to the application
programs, and it is sufficient for interoperability with its
correspondent. PDUs only transmit between the origin and target over
the physical medium. If a layer must send a PDU to the other system,
it passes the PDU to the next layer lower which will add any
information needed for it to perform its service and then pass it to
the next layer. It continues in this manner until it reaches the
physical layer which places it on the physical medium to the other
system. The other system will pass the data through its layers in the
opposite order with each successive layer removing the information
intended for it and passing it on to the next higher layer. (Denenberg, Hinnebusch)
How Z39.50 works
The intention of the Z39.50 services is to be used by application software outside the OSI model, but it does presume an association, communication linkage, can be achieved between two Z39.50 capable systems. The origin is the system initiating the association to search the other system, the target. Within the association, each system must exclusively assume the role of origin or target no matter the software capability of both roles. However, two systems may have multiple associations running concurrently in which the roles are varied. The association between origin and target is an agreement to conform to certain rules during the connection. The ASCE, association control service element, provides the creation and termination of associations and exists solely to serve other application layer software. (Hinnebusch)
The basic services supported by Z39.50 during an association are initialize, search, present, delete, and release. These services are all determined and initiated by the origin. After the OSI association is established by the ASCE, the initialize request PDU which contains service parameters (search, present, delete) defining how the origin must operate during the association is transmitted to the target system in order to create a Z39.50 association. It may also indicate whether it will support access control or resource control requests. The initialize response from the target negotiates these service parameters. The origin and target must agree on a base set of essential services in order to successfully support an association. The systems communicate through request/response pairs of PDUs, except for the abort PDU issued by the target system. After the establishment of the Z39.50 association, the origin can send search requests to the target. There are two types of queries supported by the search service. The type 0 query requires a private agreement between the two systems on the form and substance of the query. This is a non Z39.50 query which the standard allows as long as both systems support the query type. The Z39.50 query, type 1, uses reverse Polish notation (RPN) and boolean operators. The type and position of elements in the RPN expression (operand operand operator) completely define the operations to be performed. "The reverse Polish notation is used because of the ease with which it is parsed by the computer system." (Kibbey, 29) At the time of Z39.50-1988, the proximity operators (near, adjacent, with, same) were not supported by the type 1 query. An association can only support a single pending search request at any given time. A search response notifying the origin of the query results is sent by the target when the search is completed. The result set is created on the target and the records from the set are transmitted to the origin in response to a present request. A delete request sent by the origin will dispose of the result sets being held on the target. And the origin terminates an association with a release request. (Corey, Hinnebusch, Kibbey, Lynch, McGill)
Two optional services are supported by Z39.50 which the target system may initiate to suspend the current operation until the origin responds accordingly. The first service is resource control which allows the target to request confirmation from the origin to proceed, e.g. when the request will produce a large result set or is going to take a long time to search. The second service is access control which allows the target system to challenge the origin for authentication, e.g. a password to access certain records. (Hinnebusch)
Z39.50-1988 Related to International Standards
At the same time as Z39.50-1988 was developing in the United States, what was to become ISO 10162 and ISO 10163 (ISO SR) was under development. For the next six years, 1984-1990, many drafts were prepared and reviewed and revised. As is the case with most international standards, establishing the standard was slowed by the complexity of the standards themselves and contending political issues. The United States' contributions were based on the work being done on Z39.50 and its specifications. As a result, ISO SR is highly compatible with Z39.50. One major area of difference between the two standards is that during the draft balloting process for the international standard it was decided not to include access control and resource control services. Both of which Z39.50 supports which adds to its general functionality over ISO SR. However, ISO SR does specify its protocol using an abstract syntax notation, ASN.1 (for more information on ASN.1), which Z39.50 lacked at the time.
ISO SR contains an ASN.1 description of the semantics of the protocol and specifies that an implementation must support the transfer syntax defined in ISO 8825, which describes how an ASN.1 description is to be translated into a transfer syntax. (Lynch, 61)Z39.50 specified a transfer syntax in an appendix that is to be used for implementation, but it is not part of the standard itself. Other differences between the standards at that time included element set names, record format and object identifiers in all of which ISO SR offered more flexibility. (Denenberg, Lynch)
Z39.50-1992, Version 2
The establishment of the parallel international standards ISO SR in 1991 and the developing Z39.50 implementation projects in the United States were the major influences that eventually lead to version 2 of Z39.50 in 1992. In 1990, several organizations representing vendors, information services, and universities among others formed the Z39.50 Implementors' Group (ZIG), an unofficial body. Many of the proposed features for version 2 were put off for future versions because of the significant delay their inclusion would have caused. The primary activity of the ZIG initially was to develop and recommend revisions and enhancements to the standard's maintenance agency, Library of Congress, to better align it with its international counterpart ISO SR. The ZIG functioned as an advisory committee to the Z39.50 Maintenance Agency whose primary focus by appointment was to create version 2 for greater compatibility with ISO SR. The work being done by the ZIG in effect was creating de facto standards as revisions were made and used before completing official processes. The ZIG recognized the importance of having abstract syntax as part of the standard like ASN.1 of ISO SR rather than specified in an appendix. Work started on the second version which was balloted and established in 1992. It was decided that version 2 would incorporate ISO SR's ASN.1 as the second part of the standard. (Denenberg)
Besides recognizing the need for ASN.1, other revisions were made to better align the standard with the ISO SR: search and present request parameters, delete parameters, and the flexibility of OSI object identifiers. The goal was to make implementations of Z39.50 compatible and interoperable with implementations of ISO SR despite the lack of access control and resources control services on the part of the later. Version 2 also specified new query types in addition to private and RPN; these new query types included proximity searching. To increase compatibility version 2 also registered all information objects that are registered by ISO SR: application context, abstract syntax, attribute set, diagnostic set, and record syntax definitions. In addition, Z39.50 registers through the maintenance agency resource report format for resource control and transfer syntax for nonbibliographic databases. Z39.50 tries to remain dynamic by allowing for experimental and implementation specific objects to be registered through the Library of Congress. (Denenberg, Denenberg)
Z39.50-1994, Version 2 and 3
The ZIG, working in accordance with the maintenance agency, developed a list of issues that they agreed needed more immediate attention and those to be considered and evaluated later. Through a formal survey of implementors in 1992, the maintenance agency narrowed the list of proposed new features categorizing them as indispensable, not necessary, or needs further consideration. These developed into revisions and new services for version 3 and version 4. The major new services developed for version 3 were scan and explain. Scan allows the origin to obtain a list of access points surrounding the chosen access point from an index to the database to browse. While explain allows the origin to obtain details of the implementation of a target system such as information on databases to be searched, specified data elements of a database, supported attribute sets and record syntaxes.(Denenberg, Denenberg) Many of the features of Z39.50-1992 were refined or expanded for Z39.50-1994. In April 1994, the ZIG recommended to the maintenance agency that the revised draft be finalized. It was balloted in November 1994 and the attached draft represents the Post Ballot Draft, but it is not the final draft of Z39.50-1994. The relationship between Z39.50-1988, -1992, and -1994 is best defined in the Foreword of the 1994 Draft:
Although Z39.50-1992 replaced and superseded Z39.50-1988 (and therefore Z39.50-1988 is obsolete) the relationship between Z39.50-1992 and Z39.50-1994 is quite different: Z39.50-1994 is a compatible superset of the 1992 version. An implementor may obtain complete details of version 2 from the Z39.50-1994 document, and build an implementation compatible with Z39.50-1992. (Draft)
Z39.50 and TCP/IP
As many implementation projects evolved, another issue became apparent to all. Z39.50 is defined over the OSI seven application layer protocol; however, the majority of the implementors were using another communication protocol, TCP/IP. TCP/IP is a five layer communication protocol with its layers closely corresponding to the OSI layers. The main difference in layers is that the application layer of TCP/IP must perform the same services of the application, present and session layers of OSI. Functionally, the similar layers of OSI and TCP/IP are equivalent, but they are not compatible. Another major difference between the two communication protocols is that TCP/IP developed as a de facto standard meaning that it was proven before its establishment as a standard where as OSI was first paper leaving the proof for after establishment. When Z39.50 was initially being developed little was being developed with TCP/IP and the traffic over the Internet was minimal; however, by the time that the standard was ready for implementation that situation had changed. Since most of the organizations and institutions currently working on implementation projects are already interconnected via the Internet, they are first mounting over TCP/IP instead of OSI. However, most plan to later develop pure OSI protocol stack. (Corey, Denenberg, Hinnebusch, Hinnebusch)
Out of the need to implement Z39.50 over TCP/IP, the Z39.50
Interoperability Testbed (ZIT) was begun by the Coalition for Networked
Information to facilitate implementation. Unlike the ZIG, the Testbed
is intended to be short term and disbanded after implementation and
interoperability is achieved. The ZIT is a closed group, unlike ZIG,
which requires the organization to be actively pursuing implementation
over TCP/IP on the Internet for membership. Their focus is answering
practical problems of developing codes for implementation.
Specifications developed by the Testbed are to be submitted to NISO for
consideration as revisions to Z39.50. The advantage of a Z39.50
implementation over running existing protocol services, like telnet,
across a network is the usability of implemented systems, e.g. the user
does not have to page through multiple screens to get to the desired
database. Running the telnet protocol still requires the user to be
knowledgeable about each individual system's syntax and search
strategies. Even with help screens, accessing the databases and trying
to retrieve information can be challenging. A Z39.50 association would
allow the user to access remote or local databases without knowing the
particulars of any except the local interface they are using. Z39.50
is designed to utilize the power of the computer to translate between
the local and remote systems rather than have the user do the
translation. Although there is a lot of work to be done, the goal is
to be able to run both communication protocols together on the same
network. There is hope that the Internet is working on migrating to
the corresponding OSI protocols which will help advance the goal.
(Ensor, Hardin, Lynch, Lynch)
Z39.50 and WAIS
Development of WAIS, Wide Area Information Servers, began in 1989 with its first Internet release in 1991. It is a full-text information retrieval architecture composed of the client, the server, the database, and the protocol. "The protocol, Z39.50-1988, is used to connect WAIS clients and servers and is based on the 1988 version of the NISO Z39.50 Information Retrieval Service and Protocol Standard." (RFC1625) While WAIS uses Z39.50-1988, there are additions made to the protocol: the ability to serve nonbibliographic information and the ability to serve multiple format types including gif and postscript. Another difference is that WAIS was originally designed to run over TCP/IP rather than using OSI like Z39.50-1988. WAIS development has influenced the work of the ZIG and the subsequent versions of Z39.50. (St. Pierre) For a demonstration of the use of WAIS on WWW see EINet. (For more information on issues related to WAIS and WWW that is beyond the scope of this paper see the Cited URLs page.)
Current Implementation
Currently, the information retrieval standard Z39.50 is being implemented by universities, vendors, and information services among others for different purposes. OCLC is developing a Z39.50 origin/target support for a wide range of it services including EPIC as well as connections with other communications networks, e.g. NYSERNet. The Library of Congress, OCLC and RLIN started exchanging bibliographic records using the predecessor to Z39.50 established by the LSP and plan to continue this association using Z39.50 implementation. Data Research Associates (DRA) developed Z39.50 origin/target support to connect the University of California MELVYL system to a local DRA library system on the UC Davis campus. The association supports transmission of circulation data which is not specified by Z39.50. The University of California MELVYL system has also developed Z39.50 origin/target support for its large installed base of "dumb" terminals, so as to not render them obsolete, to communicate with its IBM 3090 mainframe as the information target. Dartmouth College Information System uses some Z39.50 protocols to connect user interface programs, database servers, and the campus telecommunications network to provide over 30 databases. They are also involved in a project to allow access to these databases over the Internet using Z39.50 implementations. (Machovec) In addition to associations achieved at implementation sites among local systems, many groups reported to ZIG successful associations among the different groups. Most of the current implementation projects are using some of the Z39.50 protocols, but are not pure Z39.50 associations. In order to achieve interoperability, the standard must be adhered to and issues of coding must be answered so that one implemented origin does not interpret the association differently from the target. The process of implementation has started, and it is the true test of the standard, but there is much work left to be done.
The User and the Interface
Almost all literature on Z39.50 and its implementation focuses on the issues related to the implementor and the development of the standard. In the ten years since work on Z39.50 began, little attention has been given to the end user, the one who is suppose to ultimately benefit from the implementation of the standard. "To put it simply: we [library professionals] get so hung up about information standards that we forget why we are using the information." (Anderson, 46) Since Z39.50 is a standard for computer-to-computer communication, that leaves the responsibility for human-computer interaction solely to the implementors. What has resulted is a disparity in the quality and usability of the interface with which the user must interact. In the library environment, there are good and bad examples of interfaces for use in searching and retrieving information. Services for the user have not been developed much beyond the delivery of bibliographic information and sometimes circulation data. Beyond these basics services, not much has been explored. Cunningham and Sloan working at the University of New-Brunswick have at least pondered using hypertext links in order to be able to access biographical information of personal authors. (Cunningham, 22) Even the services offered with bibliographic retrieval are limited. At the University of Washington, both Willow (the graphical interface) and Wilco (the text interface) to the library catalog have save/mail features. Many accessible databases don't offer this feature. This seems to be ignoring a major reason that users would be searching a remote database which is to retrieve and collect information.
The disparity of interfaces becomes an even greater issue when considering the databases that are accessible through the WWW. In surveying several interfaces for Z39.50 databases or gateways on WWW, I found little to recommend them to the end user. So much effort has been invested in how the computer-to-computer communication will occur with the result of extremely high power search engine. However, the interface fails to utilize the power of the database while offering functionality to the user. One of the greatest lapses in services for the user is the lack of help or instruction for searching the database. What is the point of establishing a gateway to multiple databases through an interface when no help information and little instruction is given on how to search? If that basic service cannot be provided to potential users how are the users going to be able to benefit from the power of a Z39.50 association. Unfortunately what appears to be occurring on the WWW is an eagerness to provide access to the databases without really considering how the user will search the database or what information or services might be helpful.
Some Interfaces on WWW sites reviewed March 1995
The Future for Z39.50
While the ZIG works along side the maintenance agency to keep Z39.50 ever evolving to meet the needs of interoperability, the emphasis of the work to establish a cooperative information network through interoperability lies with the implementors. Although there are many current projects, Z39.50 is still little known or recognized outside of the library community and little attention has been given to the standard by bibliographic utilities. This is one of the major barriers to advancing the goal of achieving a cooperative information network through interoperability. Also there remains many issues involving the generality of Z39.50 and what future issues should be addressed by the standard. Work continues on other related standards such as transfer formats for documents, images, and various types of database records. (Lynch, 42) Work needs to continue on the definition of object identifiers for improved access to nonbibliographic information across the network using a wide variety of access mechanisms. In order to achieve the goal of a cooperative information network through interoperability the shift from the importance of a standard to its usability in terms of its implementation needs to continue. The value of Z39.50 derives from the ability to utilize its implementations to exchange information beyond bibliographic information. It also needs to be able to achieve interoperability with information systems and networks outside the library community and across the Internet.
Iltis, Susannah, "Z39.50: An Overview of Development and the Future," [paper online]; available from http://www.cbr.washington.edu/~camel/z/z.html; Internet.