Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Maureen O'Gara, Nancy Y. Nee, Tad Anderson, Daniel Kaar

Related Topics: ColdFusion

ColdFusion: Article

Unlocking Verity's Potential

Unlocking Verity's Potential

Since version 2.0, ColdFusion's freely bundled Verity search engine remains one of the most powerful yet seldom exploited components of the ColdFusion server. The Verity Search'97 indexing technology incorporated into ColdFusion Server provides a means for creating collections of indexed data optimized for fast retrieval, adding enormous value to any Web site big or small.

This article demonstrates the basics of setting up a Verity search collection and how to encompass all your data – static and dynamic – into one intelligent indexing solution. In addition, it shows how to display summaries without the need for preexisting META tags, highlight keywords (users love this) and build searches within searches, and the advantages of using Verity over CFQUERY.

The Basics
The Verity engine performs searches against collections. A collection is a special database created by Verity that contains pointers to the indexed data that you specify for that collection. ColdFusion's Verity implementation supports collections of three basic data types:

  1. Text files such as HTML pages and CFML templates

  2. Binary document types such as PDF and DOC (see Figure 1 for a list of all supported file types)

  3. Result sets returned from CFQUERY, CFLDAP and CFPOP queries
To use Verity searching and indexing technology:
  • Create a Verity collection using the ColdFusion Administrator Verity page or the CFCOLLECTION tag at runtime (see Figure 2). You must name the collection now regardless of what you're indexing.
  • Populate a collection with data using options on the ColdFusion Administrator Verity page to index specific directories (usually for static or binary pages) or the CFINDEX tag at runtime (usually for dynamic data, but can also be used for static pages or building custom Verity admin templates).

  • Build search forms and indexing capability into your applications using the CFINDEX and CFSEARCH tags.

Use the guidelines in Table 1 to determine which indexing method is best for you.

Populating a Collection Using CFINDEX
Using the Administrator to create a collection of static documents is fairly straightforward. Simply specify a directory path, whether to index subdirectories, what file extensions to index (good for filtering), foreign language (if any) and, optionally, a return URL to prepend to all indexed files.

However, since most of you will likely need to index database content, follow these steps:

  1. Create the collection name on the ColdFusion Administrator Verity page; at this point it's an empty container standing by for you to input data.
  2. Create a CFM template that executes any query.
  3. Populate the collection with data from that query using the CFINDEX tag.
  4. (Optional) Schedule a task in the Administrator that runs your indexing template nightly to keep your collections up to date.
The code below is all you need to populate a collection you named MsgIndex (following Step 1 above) from a database of threaded discussion messages:
<!--- Select the entire table --->
<CFQUERY NAME="Messages" DATASOURCE="Threads">
SELECT * FROM Messages
</CFQUERY>

<!--- Index the results --->
<CFINDEX COLLECTION="MsgIndex"
ACTION="UPDATE"
TYPE="CUSTOM"
BODY="MessageText"
KEY="Message_ID"
TITLE="Subject"
QUERY="Messages">

The table column(s) specified in the BODY attribute are what Verity actually compares search criteria against. It may contain multiple columns separated by commas, like this:

BODY="MessageText,Title,Company"
The ACTION="UPDATE" attribute ap-pends data to your collection if the KEY doesn't already exist. The collection's KEY is similar to the primary key in a database. Using ACTION="REFRESH" would purge, then overwrite, all data in your collection. REFRESH takes more time, but it's necessary if your rows were being updated as well as added to (e.g., if users were able to edit their messages).

Advantages of Indexing a Data Source
The main advantage of performing searches against a Verity collection instead of using CFQUERY alone is that results are ordered by relevance; the database is indexed in a form that provides faster access; and Verity offers more intelligent search capabilities. For example, Verity can find common words, both plural and singular (this is called stemming). Verity also allows users to apply Boolean logic (AND/OR/NEAR-type operators), which is impossible with CFQUERY. As a general rule use Verity instead of CFQUERY when you want:

  • Results returned by order of relevance (Verity offers scoring variables)
  • To index textual data; Verity collections containing textual data can be searched more efficiently with CFINDEX than a database can with CFQUERY
  • To give users access to data without interacting directly with the data source itself
  • To enable users to search more intelligently by applying Boolean logic, proximity searches and/or stemming
Indexing Static and Dynamic Content Together
Using CFINDEX (or via the Administrator), you may populate a collection with static pages by specifying a directory tree. Then, using CFINDEX, update the collection with query data as in the example above. You may continue to update the collection with new queries or static data at any time. In theory, a single collection could contain as much of your static and dynamic data together as you like. However, you may not process multiple queries on a single collection at the same time.

A tricky situation develops, however, when you try to output combined data from more than one table since the collection's KEY value will (usually) contain numeric IDs and not know which of your tables the ID belongs to. For example, if you index two tables – Messages and Users – in the same collection and use the primary ID as the key, then ID=50 could reference either one. Therefore, when adding data from multiple database tables to the same collection, use the CUSTOM1 and/or CUSTOM2 variable of CFINDEX to hold a description that you create. Then write conditional code so when the custom attribute is recalled, the code points the ID to the correct table variable (see Listing 1; listings for this article are on page 16).

Displaying Search Results
Once a collection has been populated via the Administrator or via CFINDEX, create a form that passes a query parameter into the CFSEARCH tag. CFSEARCH is similar to CFQUERY in that it returns records or rows of data from a collection just as CFQUERY returns rows from a database (see Listing 1).

You can pass criteria simultaneously to multiple collections by specifying a comma-delimited list of collections. Relevancy is applied to the group as a whole:

<!--- Passing criteria --->
<CFSEARCH NAME="search"
COLLECTION="a,b,c,d"
TYPE="simple"
CRITERIA="#Keyword#">
Or consider grouping your output by individual collection.

In the CFSEARCH CRITERIA attribute, if you pass a mixed-case entry (mixed upper- and lowercase), case sensitivity is applied to the search. If you pass all upper- or all lowercase, case insensitivity is assumed.

Every search conducted with the CFSEARCH tag returns, as part of the record set, a number of result attribute variables you can reference in your CFOUTPUT:

  • URL: Returns the value of the URLPATH attribute defined in the CFINDEX tag that's used to populate the collection. This value is always empty when you populate the collection with CFINDEX when TYPE="Custom".
  • KEY: Returns the value of the KEY attribute defined in the CFINDEX tag that's used to populate the collection. It can be any value you choose, usually ID when indexing a database.
  • TITLE: Returns whatever was placed in the TITLE attribute in the CFINDEX operation used to populate the collection, including the titles of PDF and Office documents. If a title wasn't provided in the TITLE attribute, CFSEARCH returns CF_TITLE.
  • SCORE: Returns the relevancy score of the document based on the search criteria.
  • CUSTOM1 and CUSTOM2: Returns whatever was placed in the custom fields in the CFINDEX operation used to populate the collection (crucial when indexing multiple databases or a database with the fields you wish to display).
  • SUMMARY: Returns the contents of the automatic summary generated by CFINDEX. The default summarization selects the three best matching sentences, up to a maximum of 500 characters.
  • RECORDCOUNT: Returns the number of records returned in the record set.
  • CURRENTROW: Returns the current row being processed by CFOUTPUT.
  • COLUMNLIST: Returns the list of the column names within the record set.
  • RECORDSSEARCHED: Returns the number of records searched.

Use these attribute variables in standard CFML expressions by preceding the variable with the name of the query:

#search.URL#
#search.TITLE#
#search.SUMMARY#
#search.SCORE#
etc...
The SUMMARY attribute is probably one of the most powerful and useful attributes of Verity. This solution is perfect if you're wondering how to display useful summaries from static or dynamic pages without META tags or other meaningful abstracts built into your database content. You can always trim the summary to fewer than 500 characters by using the MID function. For instance, if you wanted to display only 100 characters, use:
Mid(#search.summary#, 1, 100)...
In case you're wondering, there's a file under every collection called style.prm located under the Cfusion\Verity\Collections\whatever\custom\style folder. It can be opened with any ASCII editor and contains collection schema parameters. This file is used to enable/disable index schema features through macro definitions similar to those allowed by the C preprocessor. Different levels of document summarization can be uncommented in the style.prm file:
  • (Default) stores the three best sentences of the document, but not more than 500 bytes
  • Stores the first four sentences of the document, but not more than 500 bytes
  • Stores the first 150 bytes of the document with white space compressed
Combining Verity and CFQUERY
A powerful way to use Verity is to take search results (from a query-populated collection) and recycle them back into a CFQUERY statement. You may want to do this to output the other fields of your table that Verity didn't index. When you populate query-driven Verity collections, specify a KEY attribute. Most of the time the KEY is the primary ID of the table. Therefore, the #search.KEY# results can be cycled into CFQUERY like this:
<!--- Passing criteria --->
<CFSEARCH NAME="search"
COLLECTION="MsgIndex"
TYPE="simple"
CRITERIA="#Keyword#">

<!--- query from search --->
<CFOUTPUT QUERY="search">
<CFQUERY NAME="query1"
DATASOURCE="threads">
SELECT * FROM Messages
WHERE id = #search.key#
</CFQUERY>
#var1# #var2# #var3# ...
</CFOUTPUT>

Instead of being passed one at a time, the KEYs can also be passed in a ValueList like this:
WHERE id IN (#ValueList(search.key)#)
which would then allow you to GROUP and ORDER BY the results. Note: If you start grouping and ordering output from the same collection, you're logically removing the relevancy – one of the primary reasons for using Verity.

Filtering Data
Unfortunately, ColdFusion's Verity administrator doesn't make it easy to filter out directories and files you don't want indexed. For a legacy Web site this can be a major challenge as the site's developer(s) may have kept public and private files (admin, stats, CF docs, etc.) under the same root.

One solution is to move directories and files off the root and into virtual directories. Just make sure you have any redirects set up if necessary, which can be a pain.

The other option is to delete collection records after you've indexed everything. This issue is addressed in the sidebar that contains excerpts from the Allaire Knowledge Base article #1080.

Searches Within Searches
If a search result set returns enough records – let's say over 50 – users will usually appreciate a way to search them. Listing 2 demonstrates a simple way of doing this by populating a new Verity collection on the fly. Basically, you would:

  1. Create a new collection in the Administrator for holding temporary data called tempCollection.
  2. Output the KEY in a hidden form field after each primary result returned.
  3. When the user hits a button to perform a secondary search, pass those hidden fields to another query that passes the query results into a CFINDEX tag with the ACTION="Refresh" and COLLECTION="tempCollection". This will now populate the collection in Step 1 on the fly.
  4. Output the secondary search keyword into a CFSEARCH tag that's connected to the freshly populated collection from Step 3.
Understandably, this process can be somewhat system intensive if repeated over and over and large result sets are being passed. Therefore, it's a good idea to specify a maximum record set that, when reached, asks the user to perform another primary search. This on-the-fly secondary search process can be repeated down to the third level, fourth level, and so on. You can keep recycling the IDs in hidden fields.

Highlighting Keywords
This works particularly well on the SUMMARY output for either static or dynamic records. Basically, you build a regular expression that replaces any instance of the query keyword with a new highlighted instance. In the example below:

Step 1: Establish a stylesheet in the document HEAD to display the font with a yellow background:

<STYLE TYPE="text/css">
font.hl {background-color: yellow}
</STYLE>
Step 2: Set your new output field to newSummary.

Step 3: Use Replace to replace the keyword in the current summary field with your highlighted version:

<CFSET newSummary = #Replace(#search.summary#, "#keyword#", "<font class=hl><b>#keyword#</b></font>", "All")#>
To take this one step further, pass the keyword variable into the URL so when users click through, the following page will also have its query text highlighted. For query-driven data it's a matter of replacing the text for your output field. For static pages you may need to read in the page via CFHTTP so you can manipulate the text as you read it back via #CFHTTP.FileContent#.

Performance Considerations
Always optimize your collections, either via the Administrator or, preferably, immediately after using CFINDEX:

<!--- Index the collection --->
<CFINDEX COLLECTION="MsgIndex"
ACTION="UPDATE"
TYPE="CUSTOM"
BODY="MessageText"
KEY="Message_ID"
TITLE="Subject"
QUERY="Messages">

<!--- Then optimize ---> <CFCOLLECTION COLLECTION="MsgIndex" action="OPTIMIZE">

Optimizing collections will significantly increase the performance of keyword searches on your site. On larger collections (e.g., 3,000-plus records) the difference can be up to seconds. Check by turning on debugging for your site to show processing time before and after optimizing.

RAM use is another consideration. From Allaire Knowledge Base Article #3690, Verity support states that "the memory requirement for a small installation using IIS (small being about 20 queries per minute and fetching HTML documents) is 64 Megs." If you plan on running lots of Verity-driven searches, plan on the extra RAM consumption.

Conclusion
While other search engine technology exists, such as Infoseek's Ultraseek Server (expensive), Netscape Server's built-in engine or freeware Perl scripts, ColdFusion's freely bundled Verity search technology will be as easy to understand and seamless to implement as the rest of your ColdFusion applications. Remember to optimize your collections and be aware of RAM. There's good system documentation but not a lot of support in the forums. Like most things CF, you can be up and running with Verity in seconds. It's powerful and flexible, and, when properly implemented, your Web site users will praise you for making their life easier.

More Stories By David C. Smith

David C. Smith is the Webmaster and manager of Internet development for the Telecommunications Industry Association (TIA) in Arlington, Virginia, and the lead developer behind TIA's new B2B portal, GetCommStuff.com.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
DevOps Summit 2015 New York, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that it is now accepting Keynote Proposals. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, phone and digital TV services to consumers primarily in rural areas.
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com), moderated by Ashar Baig, Research Director, Cloud, at Gigaom Research, Nate Gordon, Director of T...
The Internet of Things is not new. Historically, smart businesses have used its basic concept of leveraging data to drive better decision making and have capitalized on those insights to realize additional revenue opportunities. So, what has changed to make the Internet of Things one of the hottest topics in tech? In his session at @ThingsExpo, Chris Gray, Director, Embedded and Internet of Things, discussed the underlying factors that are driving the economics of intelligent systems. Discover how hardware commoditization, the ubiquitous nature of connectivity, and the emergence of Big Data a...
"BSQUARE is in the business of selling software solutions for smart connected devices. It's obvious that IoT has moved from being a technology to being a fundamental part of business, and in the last 18 months people have said let's figure out how to do it and let's put some focus on it, " explained Dave Wagstaff, VP & Chief Architect, at BSQUARE Corporation, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.

ARMONK, N.Y., Nov. 20, 2014 /PRNewswire/ --  IBM (NYSE: IBM) today announced that it is bringing a greater level of control, security and flexibility to cloud-based application development and delivery with a single-tenant version of Bluemix, IBM's platform-as-a-service. The new platform enables developers to build ap...

SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada. Our partner network encompasses some 300 of the world's leading systems integrators and security s...
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Nigeria has the largest economy in Africa, at more than US$500 billion, and ranks 23rd in the world. A recent re-evaluation of Nigeria's true economic size doubled the previous estimate, and brought it well ahead of South Africa, which is a member (unlike Nigeria) of the G20 club for political as well as economic reasons. Nigeria's economy can be said to be quite diverse from one point of view, but heavily dependent on oil and gas at the same time. Oil and natural gas account for about 15% of Nigera's overall economy, but traditionally represent more than 90% of the country's exports and as...
The Internet of Things is a misnomer. That implies that everything is on the Internet, and that simply should not be - especially for things that are blurring the line between medical devices that stimulate like a pacemaker and quantified self-sensors like a pedometer or pulse tracker. The mesh of things that we manage must be segmented into zones of trust for sensing data, transmitting data, receiving command and control administrative changes, and peer-to-peer mesh messaging. In his session at @ThingsExpo, Ryan Bagnulo, Solution Architect / Software Engineer at SOA Software, focused on desi...
"At our booth we are showing how to provide trust in the Internet of Things. Trust is where everything starts to become secure and trustworthy. Now with the scaling of the Internet of Things it becomes an interesting question – I've heard numbers from 200 billion devices next year up to a trillion in the next 10 to 15 years," explained Johannes Lintzen, Vice President of Sales at Utimaco, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
"For over 25 years we have been working with a lot of enterprise customers and we have seen how companies create applications. And now that we have moved to cloud computing, mobile, social and the Internet of Things, we see that the market needs a new way of creating applications," stated Jesse Shiah, CEO, President and Co-Founder of AgilePoint Inc., in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the industry’s first all flash version of HyperConverged Appliances that include both compute and storag...
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile devices as well as laptops and desktops using a visual drag-and-drop application – and eForms-buildi...
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core of our infrastructures. At the same time, we have old approaches made new again like micro-services...
Code Halos - aka "digital fingerprints" - are the key organizing principle to understand a) how dumb things become smart and b) how to monetize this dynamic. In his session at @ThingsExpo, Robert Brown, AVP, Center for the Future of Work at Cognizant Technology Solutions, outlined research, analysis and recommendations from his recently published book on this phenomena on the way leading edge organizations like GE and Disney are unlocking the Internet of Things opportunity and what steps your organization should be taking to position itself for the next platform of digital competition.
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is now open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
As the Internet of Things unfolds, mobile and wearable devices are blurring the line between physical and digital, integrating ever more closely with our interests, our routines, our daily lives. Contextual computing and smart, sensor-equipped spaces bring the potential to walk through a world that recognizes us and responds accordingly. We become continuous transmitters and receivers of data. In his session at @ThingsExpo, Andrew Bolwell, Director of Innovation for HP's Printing and Personal Systems Group, discussed how key attributes of mobile technology – touch input, sensors, social, and ...
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial Cloud.