| By Aaron Johnson | Article Rating: |
|
| July 11, 2003 12:00 AM EDT | Reads: |
16,517 |
One of the many reasons to use ColdFusion MX is that it comes standard with the majority of the tools you'll need to write full-featured, dynamic Web applications. Tags like
There are, however, a couple of situations when you can't use the full-text searching capabilities of Verity. The ability to run ColdFusion MX on the Apple OS X operating system, while a boon to developers who code on the Apple platform, does not include the ability to use Verity. Programmers who work in a hybrid J2EE/ColdFusion MX environment (possibly using ColdFusion MX for J2EE) cannot natively use the Verity search capabilities in the J2EE environment. Finally, programmers who need customized searching and indexing capabilities may find the standard Verity integration limiting.
Enter Lucene, an open source full-text searching framework from the Apache Jakarta project, which, when combined with ColdFusion MX, can be run on Apple OS X, can be programmatically accessed by both J2EE and ColdFusion MX developers, and can be fully customized and extended.
This two-part series will illustrate two different methods you can use when approaching a ColdFusion and Java integration project. In this article, I'll walk you through the creation of three CFML scripts, all using native CFML syntax: one that creates, populates and optimizes Lucene indexes; one that searches the same index; and a final script that optimizes the index.
In the next article, I'll show you how to write a Java-based CFX tag using the ColdFusion Extension Application Programming Interface to create, populate, and search a Lucene index.
This article is not intended to be an in-depth introduction to the Lucene API. If you're interested in learning more about the internal workings of Lucene, SYS-CON Media's sister publication, Java Developer's Journal, featured an article entitled "Search-Enable Your Application with Lucene," which you can find a link to in the resources section at the end of this article.
Before beginning, you'll need to make sure that your system (be it Unix, Linux, Windows, or Apple) is appropriately configured:

Creating a Lucene Index Using ColdFusion
Now that you've downloaded Lucene, modified the ColdFusion classpath, restarted ColdFusion, and created the Lucene folder, you should be ready to write some code! Open up your IDE of choice, create a new file, and save it as "luceneindex.cfm" into the cfusionmx/wwwroot/lucene/ directory. The first thing you'll need to create a Lucene index is a StopAnalyzer object, which is a Java object that eventually will "tokenize" or split up the large blobs of text you feed to the engine into individual words. You'll use cfscript syntax to get access to a StopAnalyzer object:
analyzer = CreateObject("java",
"org.apache.lucene.analysis.StopAnalyzer");
The first line of code above uses the CreateObject() function of ColdFusion to assign the "analzyer" variable to a variable of the type "org.apache.lucene.analysis.StopAnalyzer". If you're familiar with Java, the cfscript syntax roughly maps to the following Java syntax:
StopAnalzyer anazlyer = new StopAnalyzer();
You should note that at this point in the code, you do not have a reference to an object; no object has yet been created on the heap in Java. The next line uses a method, "init()" to call the default constructor of the StopAnalyzer object, which returns a reference to a Java object:
analyzer.init();
Those last two sentences are important so I'll repeat: using the CreateObject() function does not get you access to an instance of an object. In order to access a Java object, you must either a) first call the CreateObject() method and then the init() method, which in the above example, maps to the default constructor in Java, or b) call any nonstatic method on the object, which causes ColdFusion to then instantiate the object for you.
Now that we have a StopAnalyzer object, we'll need to get an object that handles the creation of the Lucene index:
writer = CreateObject("java",
"org.apache.lucene.index.IndexWriter");
writer.init("c:\cfusionmx\wwwroot\lucene\docsindex",
analyzer, true);
The first line looks very much like the syntax we just used to create the StopAnalyzer object. However, the second line is different. Instead of calling the default Java constructor, we're calling a constructor that takes three arguments: a) the path to the index we want to create (note that Lucene writes out numerous files or "segments" as part of its indexing routine just like Verity does; the "path" mentioned here is the path where you want those files stored); b) a StopAnalyzer object that you created above; and c) a Boolean variable that determines whether to create an index or simply update an index.
To make it easier to understand what's going on, I've hard coded the path to the index above. You'll notice that the source code included at the end of this article uses a variable from ColdFusion's attri-butes scope, which allows you to determine where you want your index files stored at runtime.
Next, you'll need to instantiate a couple more Java objects:
document = CreateObject("java", "org.apache.lucene.document.Document");
field = CreateObject("java", "org.apache.lucene.document.Field");
system = CreateObject("java", "java.lang.System");
Next, you'll need to read in the content of the file you want to index. www.cflib.org is a wonderful resource for ColdFusion functions. I found one called FileRead() that does exactly what you'll need. I included a link to the function in the resources section so that you can download it, but for now, just know that I'm using that function to read the entire contents of a file as a string (again I've hard coded the path to the ColdFusion documentation home page on my local system. The source code for the finished product can be downloaded at sys-con.com/coldfusion/sourcec.cfm).
content = FileRead("c:\cfusionmx\wwwroot\cfdocs\dochome.htm");
Now I need to extract the title from the HTML document. I use the FindNoCase() function to get the start and end points of the elements and then use the Mid() function to get the value of :
startTitle = FindNoCase("", content);
if (endTitle GT 0) {
title = trim(Mid(content, startTitle + 7, endTitle - startTitle - 7));
}
Now you'll use the add() method of the "document" object in combination with the "field" object to add the URL, the title, and body of the file you just read using FileRead() to the document that Lucene will index:
document.add(field.Keyword("url",
"http://localhost:8500/cfdocs/dochome.htm");
document.add(field.Text("title", title));
document.add(field.UnIndexed("summary", content));
document.add(field.UnStored("body", content));
Finally, we use the addDocument() method of the writer object to add the document to the Lucene index
writer.addDocument(document);
After you're done adding documents to the index, you'll use the close() method of the writer object:
writer.close();
You're done! If you've been following along in your editor, save the file and request the page:
http://localhost:8500/lucene/luceneindex.cfm
After running the page, you should see a bunch of oddly named files in the /cfusionmx/wwwroot/lucene/docsindex/ directory. Now you're ready to search the Lucene index.
Searching a Lucene Index Using ColdFusion
To start, create a new file, save it in the /cfusionmx/wwwroot/lucene/ directory as "lucenesearch.cfm". You'll use CFScript syntax again, so add a
indexReader = CreateObject("java",
"org.apache.lucene.index.IndexReader");
searcher = CreateObject("java",
"org.apache.lucene.search.IndexSearcher");
searcher = searcher.init(indexReader.open
("c:\cfusionmx\wwwroot\lucene\docsindex"));
analyzer = CreateObject("java",
"org.apache.lucene.analysis.StopAnalyzer");
analyzer.init();
luceneQuery = CreateObject("java",
"org.apache.lucene.search.Query");
queryParser = CreateObject("java",
"org.apache.lucene.queryParser.QueryParser");
luceneQuery = queryParser.parse("cfx", "body", analyzer);
hits = CreateObject("java",
"org.apache.lucene.search.Hits");
hits = searcher.search(luceneQuery);
Starting at the top, I create an IndexReader object, an IndexSearcher object, and then call the constructor of the IndexSearcher object using the ColdFusion init() method. You'll notice that instead of calling the init method like this
searcher.init();
I use a different constructor, on whose argument is an IndexReader object. The IndexReader object is retrieved using a static method (a static method is a method that doesn't require the instantiation of an object before its use) called "open", whose argument is the path to the index you want to search:
searcher.init(indexReader.open("c:\cfusionmx\wwwroot\lucene\docsindex"));
Again, I hardcoded the path to the index for the sake of simplicity; the source code at the end of this document uses a value from the attributes scope, enabling this script to be used as a CFML custom tag.
After instantiating the IndexSearcher object, I created a StopAnalyzer object, the same one that I used when creating and populating the index. (It's important to note that whatever Analyzer you use to create and populate your index, you must use the same type of Analyzer object to search your index. If you don't use the same type of object, you'll get inconsistent, if not invalid, results.)
Next, I create a Query object and a QueryParser object, instantiating the Query object by calling a static method of the QueryParser object "parse()", which returns an instance of org.apache.lucene.search.Query. The "parser()" method's arguments are the keywords you want to search for, the property of the index you want to search, and the Analyzer object. Note that the CFML source code for this article substitutes the string "cfx" for a variable "keyword", which comes from the attributes scope, again enabling this script to be used as a CFML custom tag. Finally, I create a Hits object, which is instantiated using the IndexSearch objects' method "search()".
You're almost done! At this point, the variable "hits" has been populated with the results of the Lucene search, so all we need to do is iterate over the variable hits, extract the Document, and add the resulting values to a ColdFusion query, which you then return to the calling template. First things first: create a ColdFusion query using an array of columns:
localQuery = QueryNew("URL, TITLE, SUMMARY");
Next, you'll get a Document object (the same kind of object you used when creating the index):
doc = CreateObject("java", "org.apache.lucene.document.Document");
and then iterate over the hits collection:
for (i=0; i LT hits.length(); i=i+1) {
doc = hits.doc(javacast("int", i));
QueryAddRow(localQuery);
QuerySetCell(localQuery, "url",
doc.get("url"), i+1);
QuerySetCell(localQuery, "title",
doc.get("title"), i+1);
QuerySetCell(localQuery, "summary",
doc.get("summary"), i+1);
}
There isn't much that's complicated here; the "hits" variable contains a method called "length()" that returns the number of elements in the collection (much like you can use the property "recordcount" when referring to a ColdFusion query). You retrieve the Document object from the hits collection using the "doc()" method. Notice that you must use the "javacast()" function to cast the variable "i" to the primitive Java type "int". Finally, you use the "get()" method of the Document object to retrieve the string values of the URL, title, and summary fields, which you originally populated when creating the index. Also, you'll notice that the hits collection is a zero-based collection, while the ColdFusion query starts with 1, hence the use of the "i + 1", in the QuerySetCell() function.
For the purposes of testing this script, you can use
outside of the
Optimizing a Lucene Index Using ColdFusion analyzer = CreateObject("java", This code should look relatively familiar to you. You start out by instantiating a StopAnalyzer object and an IndexWriter object and then you call the IndexWriter constructor, passing it the path to the Lucene index you want to optimize, along with the Analyzer object and a Boolean variable that determines whether or not to "create" an index. Obviously in this situation you don't want to create a new index; you simply want to optimize an existing one. Finally, you call the optimize() method of the IndexWriter object
writer.optimize();
which "... merges all segments together into a single segment, optimizing an index for search" according to the Lucene API documentation.
Conclusion
Adding multiple documents to the Lucene index using the scripts you wrote above leaves the index in a fragmented state. For optimum performance, you'll want to optimize the Lucene index on a regular basis. The Lucene API makes this operation simple:
"org.apache.lucene.analysis.StopAnalyzer");
analyzer.init();
writer = CreateObject("java",
"org.apache.lucene.index.IndexWriter");
writer.init("c:\cfusionmx\wwwroot\
lucene\docsindex", analyzer, false);
writer.optimize();
After you've read this article, I encourage you to download and explore the source code. I've exploded the short scripts into two complete custom tags that you can use to closely mimic the behavior of
Published July 11, 2003 Reads 16,517
Copyright © 2003 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Aaron Johnson
Aaron Johnson is a senior software engineer at Jive Software. He lives with his wonderful wife, young son and dog in a Portland, Oregon. You can find out more by reading his blog at http://cephas.net/blog/.
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Cloud Computing Journal: Adobe to Deliver ColdFusion in the Cloud
- Adobe Reader Sued
- Adobe May Cooperate with Apple to Transplant Flash Player to iPhone
- My Top Seven Wishes From Adobe MAX 2009
- Adobe LiveCycle Enterprise Suite 2 for Cloud Computing
- Adobe Cans Another 9% of its Workforce
- Adobe MAX 2009 Online
- Thinking of Flex in London
- Moyea DVD4Web Converter V2.0 Converts DVD to FLV Fast and Synchronously with Watermarks
- Adobe Betas Target RIAs and Cloud Computing
- Adobe Flex Developer Earns $100K in New York City
- Contrary Opinion: Why Silverlight is Good for Adobe
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Analytics for Adobe Air Applications
- Eval JavaScript in a Global Context
- Fig Leaf Software to Exhibit at Government IT Conference & Expo
- Is Microsoft as Free as Open Source?
- Cloud Computing Journal: Adobe to Deliver ColdFusion in the Cloud
- Adobe Reader Sued
- The Planet Named “Bronze Sponsor” of Cloud Computing Expo
- Microsoft Expression Web Has Got Game
- Adobe May Cooperate with Apple to Transplant Flash Player to iPhone
- Bruce Chizen Joins Voyager Capital as Venture Partner
- The Next Programming Models, RIAs and Composite Applications
- Where Are RIA Technologies Headed in 2008?
- Constructing an Application with Flash Forms from the Ground Up
- AJAX World RIA Conference & Expo Kicks Off in New York City
- CFEclipse: The Developer's IDE, Eclipse For ColdFusion
- Personal Branding Checklist
- Adobe Flex 2: Advanced DataGrid
- Has the Technology Bounceback Begun?
- Building a Zip Code Proximity Search with ColdFusion
- i-Technology Viewpoint: We Need Not More Frameworks, But Better Programmers
- The Asynchronous CFML Gateway
- Web Services Using ColdFusion and Apache CXF



































