| By Bryan Murphy, Shahriyar Neman | Article Rating: |
|
| May 25, 2000 12:00 AM EDT | Reads: |
10,445 |
Search functionality has become the status quo for all major Web sites. The typical search box/button found on home pages across the Net is considered the ultimate in user-friendly design: users type in what they're looking for and the search engine finds it quickly and easily.
By applying the tips and tricks illustrated in this article, developers can augment the Verity search engine that's packaged with ColdFusion to create a more robust - and scalable - search engine. All it costs is a little time and ingenuity.
ColdFusion Server comes packaged with the Verity search engine, a tool that makes short work of indexing, searching and retrieving information stored in virtually any format on Web and file servers. Yet the version of Verity included with ColdFusion Server provides only a limited subset of the functionality and features that are part of Verity's enterprise-level "Information Server."
This article explores some novel ways CFML and Verity can be implemented to build a more scalable search engine - and in several cases overcome some of the limitations imposed by the built-in Verity search engine.
Background Overview
All Verity functions can be performed through CFML templates using built-in ColdFusion tags. These tags are well documented within the CFML Language Reference Guide included with ColdFusion Studio. For information purposes we recap these tags here:
<cfcollection action="action"
collection="collection" path="implementation
directory" language="language">
<cfindex collection="collection"
action="action"
type="type"
title="title"
key="id"
body="body"
custom1="custom1"
custom2="custom2"
urlpath="url"
extensions="file_extensions"
query="query_name"
recurse="yes/no"
external="yes/no"
language="language">
<cfsearch name="search_name"
collection="collection_name"
type="criteria"
criteria="search_expression"
maxrows="number"
startrow="row_number"
external="yes/no"
language="language">
Types of Verity Collections
Three types of collections can be created using Verity:
Certain situations arise where it's not clear which of the three types a collection should be. Sometimes a collection needs to be a mixture of different types of data. (The implementation of such a scenario will be discussed later.) Other caveats that occur with the file-type collection are discussed in the Allaire knowledge base, www.allaire.com/Handlers/index.cfm?ID=1600&Method=Full, and they're worth reading.
Types of Verity Searches
Verity searches come in two types - simple and explicit. Depending on the functionality required from your search-engine implementation, one type may be preferred over the other.
The "Developing Web Applications with ColdFusion" section of the online docs included with ColdFusion Studio provides excellent documentation on the types of searches that can be performed on a Verity search engine.
Implementation Techniques
This section details implementation techniques that can be used to improve your Verity search engine code and even bypass the apparent limitations set by the watered-down version of Verity. All these examples work under a Windows NT environment with a Microsoft SQL Server 7.0 DBMS, but can be modified to work under any other environment.
Overcoming Two Custom Fields per Collection
First we address the limitation of having only two custom fields per collection. Some situations call for indexing more than two. For example, you may want to index the contents of a database table and include more than four fields to be indexed (four is the limit within a Verity collection because the body, title, custom1 and custom2 fields can hold custom information). A simple solution is to combine several fields into one, separating each field by a selected delimiter. To accomplish this you must be certain that the data in any combined field will never contain that delimiter. An example of how to create such a collection is located in Listing 1.
Combining Database and File Data
Under certain circumstances you may want to create a collection that's a combination of database data and file data. For example, imagine that tbl_image from Figure 1 had an additional attribute, image_text, that represented the filename of a text file that contained information associated with that image. If we wanted to create a collection that included the text in the file specified by the attribute image_text, we'd first have to query the database for the image information, then create a collection of type "file." ColdFusion's <cfindex> tag does the rest by automatically looping through the query to index the files from the paths specified in the query. Listing 2 gives an example of how this would be done.
In Listing 2 we've dynamically created a Verity file collection that includes database information as well as data from a text file. This operation isn't limited to text files and can be performed with other types of files that Verity supports. Look closely at the code: you'll notice that when the URL attribute is created by the get_images query, an extra space is appended at the end. At first glance this may seem like a mistake, but it's deliberate and there's a good reason for it.
When Verity performs searches on a collection, such as the one created in Listing 2, the value of the URL attribute returned by the search is a concatenation of the URL specified when the collection was indexed and the filename searched; that is, if we specified www.foobar.com as the URL, a search might return a result with the URL attribute something like www.foobar.com/file1.txt.
ColdFusion sites that access content through URL parameters may not want files that are indexed to show up in the URL field of the returned-search query from Verity. This is where the space at the end of the URL attribute comes into play. It serves as a delimiter so that, when searches are performed, you can get the proper URL sans filename by simply applying the listfirst() function on the URL value returned. For example:
<cfsearch collection="image_collection" name="search_images"
type="SIMPLE" criteria="dog image" language="English">
<!--- output all the URL values minus the concatenated filename added by Verity --->
<cfoutput query="search_images">
#listfirst(url, " ")#
</cfoutput>
Certain modifications can be made to Verity searches to make them more efficient. For instance, if you want to perform searches only on a particular image_group_id, you could use the following code:
<cfsearch collection="image_collection" name="search_images" type="SIMPLE"
criteria="(CF_CUSTOM2<starts>#url.image_group_id##chr(35)#)<AND>(#url.search_criteria#
<OR>CF_TITLE<substring>#url.search_criteria#<OR>CF_CUSTOM1<substring>
#url.search_criteria#<OR>CF_CUSTOM2<substring>#url.search_criteria#)"
maxrows="1000" language="English">
With this type of search in place Verity filters out all images that aren't of the image_group_id specified by the url.image_group_id parameter.
Searches can be speeded up by periodically optimizing Verity collections. Optimization can be performed either programmatically or through the ColdFusion Server Administrator. It's a good idea to create a template that programmatically optimizes your collections and uses the ColdFusion Scheduler to run it every night. A sample template would look like this:
<cfcollection action="OPTIMIZE" collection="image_collection">
Finally, when performing searches on a Verity collection, certain words and characters in the search phrase will cause the search to error. To avoid this you can "clean" any search strings before you send them to Verity. A simple way to do this is to delete the offending characters and/or words. A utility like this already exists - in the form of a custom tag named <cf_verityclean> - and can be downloaded for free from Allaire's Developer's Exchange Site at www.allaire.com/developer/gallery.cfm.
Scaling Verity for Clustered Web Server Environments
Within clustered-server environments traditional implementations of Verity wouldn't be ideal. Under clustered NT environments collections could be stored on a separate file server that all the Web servers can access via UNC paths or SMB mapped drives. The problem with such an implementation is that the Web servers themselves are doing the searching, that is, the local Verity engine on each ColdFusion Web Server is taking up that server's CPU time to perform searches and updates to various collections. Clearly the main function of a Web server should be to serve Web pages, and any CPU time taken for other tasks is highly undesirable. This situation is analogous to placing a DBMS on each server, then having each one serve Web pages and perform database queries. In our experience making network calls to collections via UNC paths is a slow process.
A more scalable and robust solution to this problem is to designate one server as a Verity server. This server will then take Verity search-and-update requests from all the Web servers through HTTP calls. To accomplish this without purchasing the full-scale version of Verity, CFML client and server templates must be implemented. The client template will reside on each of the Web servers and will be called when a Verity search or update is performed. Subsequently the client template will call the server template residing on the dedicated Verity server via an HTTP call. Each client HTTP call posts requests to the server template, and once the server template receives a request, it performs the desired action and returns results to the client template. The client template can then use this data in any fashion desired.
Listing 3 provides a sample client template, and Listing 4 illustrates a complementary server template. Figure 2 gives an overview of the entire client/server model.
Note: Although the purpose of the Verity client/server template is to make Verity scale better, calling the <cfhttp> tag is a potential bottleneck that limits the scalability of this implementation. Due to problems encountered with the single-threaded nature of the <cfhttp> tag, it's good programming practice to place a lock around all calls to it. This locking mechanism, which is responsible for the consequent scalability limit, causes multiple images of templates that call <cfhttp> to wait for the release of the lock before execution.
Conclusion
The built-in Verity search engine packaged with ColdFusion can be augmented by implementing the tips and tricks illustrated in this article. The result is a more robust and scalable tool, developed in a relatively short amount of time. Best of all, the scalability attained is free. It's a combination of features any developer will appreciate.
Published May 25, 2000 Reads 10,445
Copyright © 2000 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Bryan Murphy
Bryan Murphy is the owner of GuardianLogic, Inc. (www.guardianlogic.com), an information security firm that provides application and network vulnerability assessments and hardening. He is also one of the authors of Metazoa (www.metazoa.ca), a security-enhanced content management system; Membrane, an application-level firewall; and MetaGuard, a CFC that provides role-based login, authentication, and access control. Bryan has been an ethical hacker since the old-school BBS days. Visit his blog at www.downgrade.org.
More Stories By Shahriyar Neman
Shahriyar Neman is CTO of the Next Network, an ASP that delivers total computing packages to small- and medium-sized businesses through the Internet. He holds
a BA in computer science from NYU and is currently
pursuing his master's degree.
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Cloud Computing Journal: Adobe to Deliver ColdFusion in the Cloud
- Adobe May Cooperate with Apple to Transplant Flash Player to iPhone
- Adobe Flex Developer Earns $100K in New York City
- Adobe LiveCycle Enterprise Suite 2 for Cloud Computing
- Adobe Betas Target RIAs and Cloud Computing
- Adobe Cans Another 9% of its Workforce
- Moyea DVD4Web Converter V2.0 Converts DVD to FLV Fast and Synchronously with Watermarks
- Adobe & Salesforce Cut Cloud Deal
- Adobe Fiddles with its Web Apps
- Hosting.com Launches ColdFusion 9 in the Cloud
- The Real Time Infrastructure Ultimatum
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Eval JavaScript in a Global Context
- Fig Leaf Software to Exhibit at Government IT Conference & Expo
- Cloud Computing Journal: Adobe to Deliver ColdFusion in the Cloud
- Is Microsoft as Free as Open Source?
- Adobe Reader Sued
- The Planet Named “Bronze Sponsor” of Cloud Computing Expo
- Microsoft Expression Web Has Got Game
- Adobe May Cooperate with Apple to Transplant Flash Player to iPhone
- Adobe Flex Developer Earns $100K in New York City
- Bruce Chizen Joins Voyager Capital as Venture Partner
- My Top Seven Wishes From Adobe MAX 2009
- The Next Programming Models, RIAs and Composite Applications
- Where Are RIA Technologies Headed in 2008?
- Constructing an Application with Flash Forms from the Ground Up
- AJAX World RIA Conference & Expo Kicks Off in New York City
- CFEclipse: The Developer's IDE, Eclipse For ColdFusion
- Personal Branding Checklist
- Adobe Flex 2: Advanced DataGrid
- Has the Technology Bounceback Begun?
- Building a Zip Code Proximity Search with ColdFusion
- i-Technology Viewpoint: We Need Not More Frameworks, But Better Programmers
- The Asynchronous CFML Gateway
- Web Services Using ColdFusion and Apache CXF





















