Federated Search Engine Article

Online (Weston, Conn.) 28 no2 16-19 Mr/Ap 2004 (Reprint of article by Donna Fryer )

The magazine publisher is the copyright holder of this article and it is reproduced with permission. Further reproduction of this article in violation of the copyright is prohibited.

Federated searching aggregates multiple channels of information into a single searchable point.

    Does information in your organization reside in silos? Do you have to remember multiple database search protocols and passwords? Do you send students to the OPAC terminal to find books in your collection but to another computer to look for periodical articles? Perhaps there's a third for Internet access? Are end users in your company confused about the difference between one information source and another? Do results from a Web search and a fee-based premium information source look totally different? When researching a subject, can you imagine being able to do a single search, including subscription databases, Internet search engines, and electronic publications, instead of doing multiple searches across different sources and deleting duplicates?
    This technology is here, albeit in its infancy. Federated searching--also known as parallel search, metasearch, or broadcast search--can be seen as an extension of the common user-interface research done decades ago. Federated searching aggregates multiple channels of information into a single searchable point. This blends e-journals, subscription databases, electronic print collections, other digital repositories, and the Internet. Federated searching reduces the time it takes to search and usually displays results in a common format.

    The biggest players in the federated searching industry are MuseGlobal (MuseSearch), Fretwell-Downing (Zportal), and Webfeat (Knowledge Prism). These product offerings allow a user, regardless of vendor, to access multiple databases through one search interface. Endeavor (ENCompass) and ExLibris (MetaLib) are also in the federated searching space, as is Sirsi, with its Rooms project, but each offers slightly different features, including a combination of full-text linking and federated search results. A new entrant is TDNet, with its TES product. There are probably just a few thousand companies and libraries using this technology. Since federated searching is a young, emerging technology, the vendors are continually adding features and updating their capabilities.
    Partnerships formed by companies already in the library/information space with federated search engine technology are common. MuseGlobal's MuseSearch sells exclusively through existing channels. Some of its vendor partners are COMPanion, Endeavor, Innovative Interfaces, Kreutzfeldt Information Services, LIB-IT GmbH, Mandarin Library Automation, My Community International, Syndetic Solutions, Sirsi, and Transtech Corporation. Webfeat's product, Knowledge Prism, has vendor partnerships with Dynix, Follett Software Company, Thomson ISI, and The Library Corporation. Then there are libraries that have developed their own federated search engine. A good example is the California Digital Library's Searchlight product. One of the up and comers in the federated search space is Surfwax, which has moved from the metasearch engine space to offering federated searching on the enterprise level.
    In the library space, federated search naturally evolved from broadcast searching, which simultaneously searches OPAC targets via Z39.50 protocol. Libraries moving beyond file online catalogs find that federated search engines give them the ability to include subscription databases, the Internet, or filely anything in the electronic arena in which the access point can be authenticated.

    The use of multiple names to describe the same thing plagues the information industry. Federated search is no exception. NISO, the U.S. National Information Standards Organization, and many libraries claim federated searching as metasearching. However, vendors in this space prefer not to be known as metasearch engines, as this conjures up thoughts of searching only previously crawled databases such as Google, AlltheWeb, and AltaVista. For marketing purposes alone, these vendors have had to differentiate their search functions to bring to mind higher capabilities than Dogpile, Vivisimo, or Metacrawler. Federated search engines are different from the metasearch engines commonly found on the Internet. The public at large uses metasearch engines because they run searches via multiple Web search engines that spider the open Web, including multimedia sources. Federated searches concentrate mostly on textual information and offer subscription-based premium content database searching ignored by Web-oriented metasearch engines that miss these invisible Web sources. Another difference: Web metasearch engines offer free search. Federated search engines are enterprise software, with costs ranging from $750-$200,000, depending on the number of seats, design, and functionality.
    To make the search environment more efficient for the content provider, system provider, and end user, NISO has gathered vendors, content providers, and library systems to work on its Metasearch Initiative. The Initiative's focus is to create standards for several issues important to this emerging industry: proprietary vendor verification, authentication, and certification to use certain databases; search protocol standardization; common descriptors for data and content tags, as well as taxonomies; and how result sets should be sorted, ranked, and ordered. Another issue at hand very important to content aggregators is the copyright issue. How will each record show branding and copyright information? Right now, each vendor has a different way of handling these situations. What's needed is a common set of standards to help ensure each party's interest.

    Verification, authentication, and certification can be difficult for the federated search vendor. Since federated search engines don't hold the data locally, meaning the engines perform the search and send the results back, the federated search engine must be able to access multiple, password-protected databases behind the scenes, all at one time, and show users their results in one easy-to-read interface. The challenge for federated search vendors is to ensure that only licensed users can access databases in an appropriate manner, as specified by their license. This may require a library or a corporation to set up multiple areas where only certain licensed users can access a federated search.
    Part of the process entails deciding what content different departments or users can access. This could generate an unwelcome amount of staff resource time to ensure authentication, verification, and interface display decisions. Authentication sets federated search engines apart from other more expensive and highly sophisticated enterprise search software such as Verity and Autonomy. Enterprise search engines usually restrict searches to internally generated, enterprise-wide information, ignoring subscription databases that enterprises have brought in house.
    The number of different cookies a subscription database uses makes the authentication process either a simple or complex procedure. All the user needs to provide is the ID, password, and files to be searched for each subscription database. The federated search engine will handle the rest of the authentication procedure. However, the initial setup process can take a number of hours to a number of days, depending on the complexity and number of subscriptions.

    The second issue is the search query and results interfaces. For several years now, libraries and corporate information centers have faced the "Google phenomenon." Many patrons believe that doing a Google search covers all the bases. Libraries now have an excellent opportunity to provide a simple, yet powerful interface that out-Googles Google. They can set up their interface based on subject and sources, or customize it to specific user needs. Libraries and corporations need to take note of Google's simple interface--users expect an interface as streamlined as Google's. Uncomplicated and intuitive interfaces without a high learning curve will see expanded usage. Most of the federated search vendors allow clients to create their own "look and feel" for the search interface and results pages. However, if you do not have the staff resources, they will often allow a more static look where little decision making on your part needs to be done.
    When considering the federated search engines, you should decide how much time you will devote to designing the "look and feel" of the interface. Who will be the audience? How much staff resources are available for designing the interface? Again, the library or information center's staff will need to be in charge of authentication, licensing issues, and interface design. The end user should only see the end result of multiple databases searches without having to do any more than input their user ID and password one time and typing in a search query.
    To start a search in a federated system, individuals can usually choose either a subject or the desired databases in which to begin their search. Most vendors allow customization of search fields to title, keywords, author, publication, subject, SIC code, ticker symbol, etc. On some of the vendors, results can also be filtered by different fields such as revenue, size of company, number of employees, etc. The results can usually be displayed by date, relevancy, or title.
    The search interface and result filter customization begs the question: "What if some of the publications I'm searching do not include an author or other field?" In most cases, the federated search engine will then use the search words simply as keywords. These rules need to be clearly spelled out both by the vendor and the client. For example, how does the proprietary software handle phrase searching? Does it read the quotation marks or are quotation marks read as spaces? Since Internet search engines do read quotation marks, federated search engines should specify clearly if they don't. Most federated systems allow you to save a search and look through your search history if needed. None of the federated vendors have proximity searching or multiple-field searching.

    The next issues are hot ones: de-duping and relevancy. De-duplication of results seems to be controversial in the federated search space. The gist of it is that most federated search engines will de-dup the results you have on your current results screen. Some of the federated search engines will even de-dup all results when requested. However, this opens up a Pandora 's Box about how the results are returned.
    Anyone familiar with search engine optimization understands that audiences will usually only view the first 10 hits. How do the vendors and interface designers ensure the highest-quality hits are returned first? Would their algorithms include making the proprietary databases higher on the relevancy results? If this is the case, does this put unfair burden on the subscription databases once federated searching becomes more popular? How do you treat Web results as opposed to subscription-based databases? If only your screen results are de-duped then does de-duping have any benefit? Because most of us view de-duping as being done on the complete set of results, partial de-duping is a new concept. In my humble opinion, partial de-duping is better than none. This is an issue that NISO will have to address.
    Publishers will have to work through branding and copyright issues. They may also have to factor in the price of the subscription. Once federated searching catches on, the premium content providers will see significantly more usage--this could overload their systems, particularly for smaller publishers. Close monitoring for database usage on the part of libraries and corporations will provide valuable information as to what subscriptions to keep and which to cancel. Monitoring usage will also help to avoid having the subscription price raised because of over-usage. Several of the federated search vendors provide this type of statistical package to enable this monitoring.

    If resistance is low and libraries embrace federated search technology, this could put marketing library services in a whole new light. Because these systems can be accessed remotely, yet are simple and dynamic, this is an opportunity to expand the library's reach and service, making it the "digital one-stop service to users." With database acquisition decisions already being made by the library staff behind the scenes, users have few decisions to make on their end. For the average end user, the less decision making, the better. Google, for the general public, sets the gold standard for returning relevant results. Federated search offers another opportunity for libraries to out-Google Google, this time by returning relevant results that Google misses. When the appropriate databases are chosen in advance for the end user, then there is a higher likelihood of relevant results. The biology department might pre-select BIOSIS; the psychologists would be directed to PsycINFO, economists to EconLIt, and those in financial institutions to American Banker. This type of pre-selection makes the process seamless with little decision making required of the end user.
    Not only do I see a significant benefit to libraries to implement federated searching, but I also see great benefits to corporations. It is no secret that in these economic times, many information centers have closed, leaving company employees to fend for themselves. When it comes to locating information, employees are paddling upstream without any oars. They've been told that they can find everything they need for free on the Internet--something all information professionals know is not true.
    Subscription-based vendors of premium content are the winners here--if the majority of the results are coming from their subscription databases and if statistics support that contention. Federated searching can include not only premium content and Internet data, but it can also encompass internal company documents as well. The caution here would be to make sure this is done departmentally instead of company wide. You certainly would not want your sales force to have access to all your strategic planning material. One area of concern would be on which server the information sits. Most vendors offer the choice of the software running on their server or the customer's server. The customer would have to have the technical knowledge and staff resources to maintain the system if it did reside internally.

    Individual end users will benefit from federated search technology. The reduced time it takes to do a basic search is benefit enough. A word of caution though: Federated searching is not for power searching needs. Just as with searching metasearch engines, only basic Boolean commands can be used. In my mind, federated searching is a good starting point, but never the ending point, for sophisticated search needs. However, the federated search vendors are continually improving their systems. Six months from now, the leaders may be able to offer complex search commands. Federated search systems are not something the average independent information professional will purchase, however, because of the costs involved. Federated search is designed for larger institutions, both academic and corporate.
    Even though federated search engines are pretty much the new kids on the block, they are accomplishing some terrific feats in the information retrieval industry. Just to think that they have developed the ability to search across multiple types of databases, both public and subscription, is amazing. Knowing a system's ability and its limitations are fundamentally important, as it would be with any new software. If you are considering purchasing a federated search service, always ask for a demo system to be built using your different ids and target areas to make sure authentication can be done. Since we live in a fast-paced world, federated searching will find more and more applications, not just in the library world but the commercial sector as well, to save time and money and to enhance the user's search experience.
    Donna Fryer [] is an information consultant, researcher, and trainer with Global Information Research & Retrieval, LLC