YOUR FEEDBACK
Jeremy Geelan wrote: In response to inquiries and suggestions from readers this lexicon has recently...
AJAXWorld RIA Conference
$300 Savings Expire August 29
Register Today and SAVE!


2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TOP COLDFUSION LINKS


Optimize, Extend and Enhance the Search Functionality in ColdFusion MX
Adding value quickly and easily

Do your end users complain about the quality of your ColdFusion application's search functionality?

Have you exceeded the 250,000-document limit of the search functionality that comes embedded in ColdFusion MX? Have the requirements of your application changed to include searching for content both inside and outside of the ColdFusion environment?

If you answered yes to any of the above questions, it sounds like your ColdFusion MX application could use some help. In surveys by industry analysts, such as Gartner, IDC, and Forrester, end users, executives, and developers alike consistently rank the ability to search as one of the most important features of all online applications. Yet search continues to be one of the most maligned utilities of Web sites, business applications, and you name it. The bottom line is that if your end users can't find what they're looking for with your application's search tools, you're not realizing the full value of your ColdFusion investment.

Early on, Allaire, and then Macromedia, understood the need to provide ColdFusion developers with the ability to integrate advanced search features into their applications. Since 1997, the search functionality embedded in ColdFusion has been provided by Verity, Inc. Verity was selected for a number of reasons: ease of integration into ColdFusion, advanced functionality, and Verity's recognized position as the market leader in the enterprise search space.

But even the best search tool is only as good as its implementation. This article includes tips on optimizing the Verity search included in ColdFusion MX 6.1. With this search, you can build applications with advanced, enterprise-class full-text search of up to 250,000 documents and/or database records within the ColdFusion environment (if you need a larger search, go directly to the end of this article for more information). In addition, this article describes how to quickly and easily add value to applications by enhancing the search within ColdFusion, and by extending search to content outside ColdFusion.

Search Within ColdFusion MX
To ensure that users of ColdFusion applications can find the specific information they need in databases and hundreds of file formats, Macromedia integrated Verity full-text search. This includes:

  • Full-text search of all ColdFusion content
  • The ability to search a wide range of document types, including HTML, binary documents, and database records
  • The ability to limit search to specific groups, or collections of documents, in order to enable subject-specific searches
  • Support for multiple languages, including most European and Asian languages
  • Fielded search against index metadata
There is more to the Verity search embedded in ColdFusion than just a box into which users type a query. To deliver exceptional performance, advanced functionality, and high relevancy, Verity performs searches against Verity Collections, not against the actual documents and database records within the ColdFusion application. A Verity Collection is a special index created by the Verity "spider." The spider locates all the documents and databases that are to be made searchable and extracts the text and metadata within each document or record. It also extracts other information, such as document zone and field data, word proximity, and the physical file system address or URL. All of this information is then gathered together in the Verity Collection. Bringing all of this information together in one index and running searches against it, rather than having to locate and access the actual documents and databases each time a user searches for information, dramatically increases the speed and relevancy of ColdFusion's search capabilities. It also enables advanced features such as document summaries in results lists and the ability to limit searches to specific groups of documents.

One of the strengths of Verity search solutions is that they can be configured to meet specific business and technical objectives. To optimize search with the ColdFusion environment, Macromedia implemented Verity to support content of the following basic data types:

  • Text files such as HTML pages and CFML pages
  • Binary documents
  • Record sets returned from cfquery, cfldap, and cfpop queries
As a developer, you can build Verity Collections from individual documents or from an entire directory tree. Collections can be stored anywhere, so you have greater flexibility in accessing indexed data when building applications.

Typical Applications of Verity Search
By taking advantage of Verity's flexibility you can add significant value to your ColdFusion applications. Typical uses of Verity search within ColdFusion include:

  • Indexing the content of a Web site and providing a generalized search mechanism, such as the familiar search box
  • Indexing specific directories that contain documents on a specific topic in order to provide subject-based searching, or to limit the focus of searches to specific groups of documents
  • Indexing cfquery record sets into a single Verity Collection and letting users search against the collection with a single query rather than requiring them to perform multiple database queries to return the same data
  • Indexing cfldap and cfpop query results
  • Indexing e-mail generated by ColdFusion application pages and making the resulting Verity Collection available for searching from your ColdFusion application pages
  • Building Verity Collections with inventory data and making those collections available for searching from your ColdFusion application pages
  • Supporting international users in a range of languages, using the cfindex, cfcollection, and cfsearch tags
Advantages of Using Verity Search
One of the most obvious advantages of Verity search embedded in ColdFusion is its performance. For example, using Verity to index the output from database queries and then to perform searches against the indexed record sets is much faster than using SQL to search databases directly. Additional advantages of Verity over other search methods include:
  • Superior relevancy of search results lists
  • The display of document titles and summaries in search results lists
  • Elimination of the need to programmatically create query constructs by allowing novice and expert users alike to use the same type of full-text queries they're used to using on the Web
  • Indexing of database text fields, such as notes and product descriptions, that cannot be effectively indexed by native database tools
  • The indexing and display of document URLs in results lists, a valuable document management feature
Implementing Verity Search in ColdFusion Applications
The good news is that Verity's advanced search features are straightforward to deploy within ColdFusion MX. In general, adding optimized osearch to your application involves three basic tasks:
  1. Creating a Verity Collection
  2. Indexing the content within your ColdFusion application
  3. Designing a search interface
Each task can be performed programmatically - that is, by writing CFML code. Alternatively, you can use the ColdFusion MX Administrator to create a Verity Collection and index the content within your application. Also, Macromedia HomeSite+ has a Verity wizard, which generates ColdFusion pages that index content and design search interfaces. Table 1 summarizes the methods available for all three tasks.

There are pros and cons to using either the ColdFusion MX Administrator or CFML for deploying Verity search within ColdFusion applications. Refer to Table 2 to determine which is appropriate for your application and information environment.

Just as there is more than one method for deploying Verity search, you can configure your search implementation to meet specific business objectives. Primarily, you do this by running cfsearch or cfquery. Table 3 lists the advantages and uses of each.

Optimizing Search Relevancy
Once you've deployed the Verity search, you can also optimize its relevancy for your specific information environment. The ColdFusion implementation of Verity Query Language (VQL) uses operators and modifiers. These can either be used directly by advanced users, or implemented transparently so that they are applied automatically to all queries. The following are some of the more commonly used VQL operators:

  • Evidence operators: Evidence operators specify either a basic word search or an intelligent word search. A basic word search finds documents that contain only the word or words specified in the query. An intelligent word search expands the query terms to create an expanded word list so that the search returns documents that contain variations of the query terms. Documents retrieved using evidence operators are not relevance-ranked unless you include the MANY modifier.
  • Soundex: Expands the search to include the specified word and one or more words that sound like, or whose letter pattern is similar to, the word. Example: <SOUNDEX> sale returns documents that include words such as "sell," "seal," "shell," and "scale."
  • Stem: Expands the search to include the word entered plus its linguistic variations. Example: <STEM> film returns documents that include words such as "films," "filmed," and "filming."
  • Thesaurus: Expands the search to include the word entered plus similar words. Example: <THESAURUS> altitude returns documents that include words such as "height" and "elevation."
  • Typo/n: Expands the search to include the specified word plus words that are similar. The optional n variable specifies the maximum number of errors between the query term and matched terms. Example: <TYPO> mouse returns documents that include words such as "house," "louse," and "moose."
  • Wildcard: Matches wildcard characters included in the search string. Example: <WILDCARD> corp* returns documents that include words such as "corporation," "corporal," and "corpulent."
  • Word: Performs a basic word search, selecting documents that contain one or more occurrences of the word. Example: <WORD> rhetoric will not match "rhetorical" or "rhetorician."
  • Concept operators: Concept operators combine the meaning of search elements to identify a concept in a document. Documents retrieved using concept operators are relevance ranked.
  • And: Selects documents that contain all of the search words specified. Example: german shepherd <AND> irish wolfhound returns only documents that contain the phrases "german shepherd" and "irish wolfhound".
  • Or: Selects documents that include at least one of the search elements specified. Example: computers <OR> laptops returns documents that contain either "computers" or "laptops," or both "computers" and "laptops," but does not necessarily give a document that contains both terms a higher rank.
  • Proximity operators: Proximity operators specify the relative location of specific words in the document. In the case of the Near/N operator, retrieved documents are relevance-ranked based on the proximity of the specified words. When proximity operators are nested, use the ones with the broadest scope first.
  • In: Selects documents that contain specified values in one or more document zones. Example: "environmental regulation" <IN> summary returns documents that contain the phrase "environmental regulation" in the document summary.
  • Near/n: Selects documents that contain two or more words within n number of words of each other. N is optional. Example: apple <NEAR/1> computer returns documents that contain the phrases "apple computer" or "computer apple."
  • Paragraph: Selects documents that include all of the search elements you specify within a paragraph. Example: drug <PARAGRAPH> "cancer treatment" returns documents that contain "drug" and "cancer treatment" in the same paragraph.
  • Phrase: Selects documents that contain the specified phrase. Example: <PHRASE> (twenty, years, ago, today) returns documents that contain the phrase "twenty years ago today."
  • Sentence: Selects documents that include all of the specified words within a single sentence. Example: american <SENTENCE> innovation returns documents that contain "american" and "innovation" within the same sentence.
  • Score operators: Score operators affect how scores are calculated for retrieved documents. When a score operator is used, the search engine first calculates a separate score for each search element found in a document, then performs a mathematical operation on the individual element scores to arrive at the final score for each document. The YESNO operator has wide application, whereas the PRODUCT, SUM, and COMPLEMENT operators are intended for use mainly by application developers who want to generate queries programmatically.
  • Complement: Calculates scores for documents matching a query by taking the complement (subtracting from 1) of the scores for the query's search elements. Example: <Word> computers. If the search scores 0.80, then <Complement> <Word> computers scores 0.20.
  • Product: Calculates scores for documents matching a query by multiplying the scores for the query's search elements together. Example: <PRODUCT> ("computers," "laptops"). If a search on "computers" generated a score of 0.5 and a search on "laptops" generated a score of 0.75, the preceding search would produce a score of 0.375.
  • Sum: Calculates scores for documents matching a query by adding together the scores for the query's search elements. Example: <SUM> ("computers," "laptops"). If a search on "computers" generated a score of 0.5 and a search on "laptops" generated a score of 0.2, the search would produce a score of 0.7. If a search on "computers" generated a score of 0.5 and a search on "laptops" generated a score of 0.75, the search would produce a score of 1.00 (the maximum).
  • YesNo: Enables you to limit a search to only those documents matching a query, without the score of that query affecting the final scores of the documents. Example: <YesNo> ("Chloe"). If the retrieval result of the search on "Chloe" was 0.75, with the YesNo operator, the result would be 1; if the retrieval result is 0, it remains 0.
  • Modifiers: Modifiers are used in conjunction with operators to change the behavior of the operator.
  • Case: Performs a case-sensitive search. The search engine attempts to match the exact use of uppercase and lowercase letters provided in the query expression when a mixed case query is used. Example: <CASE> NeXt returns only the precise string "NeXt", and not "next" or "Next."
  • Many: Considers the density of search words when calculating relevance-ranked scores. This means that shorter documents with multiple occurrences of the search terms are ranked higher than larger documents with the same number of occurrences, because the relative density of the occurrences is greater in the shorter document.
  • Not: Excludes documents that contain the words or phrases indicated. Example: apple <AND> Mac <AND> <NOT> Washington returns information about Apple computers, but not Washington apples.
  • Order: Specifies the order in which search elements must occur in the document. Example: <ORDER> <PARAGRAPH> (cat, chases, dog) is more likely to return documents that refer to cats chasing dogs, than dogs chasing cats.

    Additional Resources
    For additional resources, see the ColdFusion MX documentation available at www.macromedia.com. All product questions and support for ColdFusion, including the Verity search integrated into ColdFusion MX, are provided by Macromedia.

    Extending and Enhancing the Search in Macromedia ColdFusion MX
    As advanced as the Verity search is within ColdFusion, you may eventually want to deploy ColdFusion applications with enhanced search capabilities that are not possible using Macromedia's implementation of Verity. This is not due to any inherent limitations on Verity's part as much as it is a result of the robust, advanced capabilities of ColdFusion that enable you to develop ever more powerful applications. Verity has found that the need for additional search features is largely driven by three key requirements, along with a less common fourth requirement:

    1.  Searching content outside of ColdFusion. The Verity search functionality within ColdFusion is limited to searching content and database records within your ColdFusion application. Many ColdFusion developers need to extend the search functionality to areas outside of ColdFusion, such as intranets, external Web sites, file servers, external databases, Microsoft Exchange, and third-party document management systems.
    2.  Additional Administrative Tools. To meet their specific application requirements, some developers require additional administrative tools.
    3.  Adding more advanced end-user interface options. Another common request among ColdFusion developers is the ability to add advanced search features not included with the ColdFusion implementation of Verity, such as spell checker/recommendations, and search-term highlighting.
    4.  Searching over 250,000 documents and/or records. Occasionally, developers will build applications that exceed the 250,000 documents and/or records that the ColdFusion implementation of Verity is limited to searching. For applications that must search large databases or repositories, developers need to extend this limit. This topic is not covered in this article. For information on overcoming the document number limitations of ColdFusion, contact sales@verity.com.

    Extending and Enhancing Search with Verity Ultraseek
    The three most common ColdFusion search enhancements listed above can be accomplished with the addition of Verity Ultraseek. This downloadable search engine is easily integrated into applications, using its available Java API. It can also be readily deployed into mixed application environments, using its Web services interface, which supports both the .NET and J2EE platforms. A number of factors make Ultraseek the best choice for extending search outside of the ColdFusion environment and providing additional administration tools:

    • Ultraseek's easy-to-implement, set-and-forget design requires extremely low ongoing maintenance and overhead
    • Ultraseek's end-user interfaces (i.e., search boxes and results lists) are similar to those of the Verity search embedded in ColdFusion
    • Ultraseek provides enterprise-class search at a price point in line with ColdFusion
    The best thing about Ultraseek is that you can deploy it at no cost on a 30-day trial basis. To download a free, 30-day trial version of Ultraseek, go to www.verity.com/cfsearch.

    COLDFUSION TAGS

    ColdFusion Search-Specific Tags
    Creating a Collection with the cfcollection tag
    When using the cfcollection tag, you can specify the same attributes as in the ColdFusion MX Administrator:

    • Action: (Optional) The action to perform on the collection (create, delete, repair, or optimize). The default value for the action attribute is list. For more information, see cfcollection in CFML Reference.
    • Collection: The name of the new collection, or the name of a collection on which you will perform an action.
    • Path: The location for the Verity collection.
    • Language: (Optional) The language used to create the collection (English, by default).
    You can create a collection by directly assigning a value to the name attribute of the cfcollection tag, as shown in the following code:

    
    <cfcollection action = "create"
    collection = "a_new_collection"
    path = "c:\CFusionMX\verity\collections\">
    

    Indexing a Collection Using the cfindex tag
    You can index a collection in CFML using the cfindex tag, which eliminates the need to use the ColdFusion MX Administrator.

    • Collection: The name of the collection. If you are indexing an external collection (external = "Yes"), you must also specify the fully qualified path for the collection.
    • Action: (Optional) Can be update (the default action), delete, purge, or refresh.
    • Extensions: (Optional) The delimited list of file extensions that ColdFusion uses to index files if type="Path".
    • Key: (Optional) The path containing the files you are indexing if type="path".
    • URLpath: (Optional) The URL path for files if type="file" and type="path". When the collection is searched with cfsearch, the pathname is automatically prefixed to filenames and returned as the URL attribute.
    • Recurse: (Optional) Yes or No. Yes specifies, if type = "Path", that directories below the path specified in the key attribute are included in the indexing operation.
    • Language: (Optional) The language of the collection. English is the default.
    Using the cfsearch tag
    You use the cfsearch tag to search an indexed collection. Searching a Verity Collection is similar to a standard ColdFusion query: both use a dedicated ColdFusion tag that requires a name attribute for their searches.

    The following are important attributes for the cfsearch tag:

    • Name: The name of the search query.
    • Collection: The name of the collection(s) being searched. Use a fully qualified path for an external collection. Separate multiple collections with a comma; for example, collection = "sprocket_docs,CodeColl".
    • Criteria: The search target (can be dynamic).
    Each cfsearch returns variables that provide the following information about the search:
    • RecordCount: The total number of records returned by the search.
    • CurrentRow: The current row of the record set being processed by cfoutput.
    • RecordsSearched: The total number of records in the index that were searched. If no records were returned in the search, this property returns a null value.
    Note: To use cfsearch to search a Verity K2 Server collection, the collection attribute must be the collection's unique alias name as defined in the k2server.ini and the external attribute must be "No" (the default). For more detail, see Configuring and Administering ColdFusion MX.
  • About Joe Cronin
    Joe Cronin is director of Technical Services in Verity, Inc.?s Channel Partners group. He has a BS in computer engineering technology from Wentworth Institute of Technology. Verity is recognized by industry analyst groups such as Gartner, IDC, and the Delphi Group as the market leader in intellectual capital management software, including enterprise search, classification, recommendation, monitoring, and concept extraction solutions.

    YOUR FEEDBACK
    Randy Smith wrote: Got it working - remember to remove the pound sign from in front of each of the options! I assumed that if they said there was a "default" version that you didn't need to specify the parameters. Wrong!
    Randy Smith wrote: This was written in 2005. Here it is 2008 and I'm trying to apply this using Cold Fusion 2008 Enterprise, but I can't get the instance to start. The log files are basically saying that talk.google.com won't let me connect. Is there now a different IP I should connect to, or did Google shut down this "portal"?
    emanuel wrote: Where can i download this bot? Thanks.
    Mark Holton wrote: This is an awesome overview - Thanks for taking the time to supply this great info, Ben!
    CFDJ LATEST STORIES . . .
    Two of the biggest launches in Rich Internet Application history took place in 2007/2008 when Adobe launched AIR 1.0 in February '08 and Microsoft launched Silverlight (September '07). At the 6th International AJAXWorld RIA Conference & Expo in October SYS-CON Events is delighted to be...
    Red Hat CTO Brian Stevens, Citrix CTO Simon Crosby, Egenera CTO Pete Manca, Allen Stewart, Group Manager, Windows Virtualization at Microsoft, and Brian Duckering, Sr. Director of Products and Alliances at Symantec were the top industry executives who joined Jeremy Geelan in the 4th Fl...
    Mike Neil is general manager for virtualization strategy in the Windows Server Division at Microsoft. Mike is focused on the delivery of the Windows virtualization technology, including Windows Server 2008 Hyper-V, Microsoft Hyper-V Server and Virtual PC 2007. Mike also directs the tec...
    SQL Injection attacks are one of the easiest ways to hack into a website. One recent hack, using a script from verynx.cn, involves injecting sql into a web form that then appends some JavaScript code into fields in a database that then gets executed on the client side when a user views...
    Recursion Software released a private beta version of their Voyager mobile platform, with powerful interoperability for Android, Microsoft .NET and Compact Framework (CF), all Java editions (JME CDC, JSE and JEE), and more than 15 embedded operating systems. The Voyager platform is a p...
    2008 is going to be an important year for Rich Internet Applications. Most organizations are delivering or planning to deliver Rich Internet Applications; however, at the same time, most IT managers are facing a dilemma: which Rich Internet Application technology and platform to use? T...
    SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
    SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


    SYS-CON FEATURED WHITEPAPERS

    ADS BY GOOGLE