Welcome!

ColdFusion Authors: Pat Romanski, Liz McMillan, Maureen O'Gara, Greg Ness, Andreas Grabner

Related Topics: ColdFusion

ColdFusion: Article

Extending ColdFusion with Java

Extending ColdFusion with Java

One of the many reasons to use ColdFusion MX is that it comes standard with the majority of the tools you'll need to write full-featured, dynamic Web applications. Tags like and make it relatively simple to query a relational database and send e-mail. In the same way, you can use and to create and search Verity full-text indexes.

There are, however, a couple of situations when you can't use the full-text searching capabilities of Verity. The ability to run ColdFusion MX on the Apple OS X operating system, while a boon to developers who code on the Apple platform, does not include the ability to use Verity. Programmers who work in a hybrid J2EE/ColdFusion MX environment (possibly using ColdFusion MX for J2EE) cannot natively use the Verity search capabilities in the J2EE environment. Finally, programmers who need customized searching and indexing capabilities may find the standard Verity integration limiting.

Enter Lucene, an open source full-text searching framework from the Apache Jakarta project, which, when combined with ColdFusion MX, can be run on Apple OS X, can be programmatically accessed by both J2EE and ColdFusion MX developers, and can be fully customized and extended.

This two-part series will illustrate two different methods you can use when approaching a ColdFusion and Java integration project. In this article, I'll walk you through the creation of three CFML scripts, all using native CFML syntax: one that creates, populates and optimizes Lucene indexes; one that searches the same index; and a final script that optimizes the index.

In the next article, I'll show you how to write a Java-based CFX tag using the ColdFusion Extension Application Programming Interface to create, populate, and search a Lucene index.

This article is not intended to be an in-depth introduction to the Lucene API. If you're interested in learning more about the internal workings of Lucene, SYS-CON Media's sister publication, Java Developer's Journal, featured an article entitled "Search-Enable Your Application with Lucene," which you can find a link to in the resources section at the end of this article.

Before beginning, you'll need to make sure that your system (be it Unix, Linux, Windows, or Apple) is appropriately configured:

  • ColdFusion MX must be installed; you'll see that I'm using the ColdFusion MX integrated Web server running on port 8500 throughout the examples.
  • You'll need to download Lucene; binaries and source code are available on the Jakarta Apache site. You'll see links to those resources at the end of this document.
  • After downloading the Lucene JAR file, you'll need to add the location of the JAR file to the classpath in ColdFusion Administrator (http://localhost:8500/ cfide/administrator; click on "Java and JVM" under "Server Settings," and type the full path to the location of the Lucene JAR file you downloaded (see Figure 1). Make sure that you restart the ColdFusion service after saving your changes.
  • Finally, create a folder in your /cfusionmx/wwwroot/ called "Lucene," into which you'll put the source code written during this article.

     

    Creating a Lucene Index Using ColdFusion
    Now that you've downloaded Lucene, modified the ColdFusion classpath, restarted ColdFusion, and created the Lucene folder, you should be ready to write some code! Open up your IDE of choice, create a new file, and save it as "luceneindex.cfm" into the cfusionmx/wwwroot/lucene/ directory. The first thing you'll need to create a Lucene index is a StopAnalyzer object, which is a Java object that eventually will "tokenize" or split up the large blobs of text you feed to the engine into individual words. You'll use cfscript syntax to get access to a StopAnalyzer object:

    analyzer = CreateObject("java",
    "org.apache.lucene.analysis.StopAnalyzer");

    The first line of code above uses the CreateObject() function of ColdFusion to assign the "analzyer" variable to a variable of the type "org.apache.lucene.analysis.StopAnalyzer". If you're familiar with Java, the cfscript syntax roughly maps to the following Java syntax:

    StopAnalzyer anazlyer = new StopAnalyzer();

    You should note that at this point in the code, you do not have a reference to an object; no object has yet been created on the heap in Java. The next line uses a method, "init()" to call the default constructor of the StopAnalyzer object, which returns a reference to a Java object:

    analyzer.init();

    Those last two sentences are important so I'll repeat: using the CreateObject() function does not get you access to an instance of an object. In order to access a Java object, you must either a) first call the CreateObject() method and then the init() method, which in the above example, maps to the default constructor in Java, or b) call any nonstatic method on the object, which causes ColdFusion to then instantiate the object for you.

    Now that we have a StopAnalyzer object, we'll need to get an object that handles the creation of the Lucene index:

    writer = CreateObject("java",
    "org.apache.lucene.index.IndexWriter");
    writer.init("c:\cfusionmx\wwwroot\lucene\docsindex",
    analyzer, true);

    The first line looks very much like the syntax we just used to create the StopAnalyzer object. However, the second line is different. Instead of calling the default Java constructor, we're calling a constructor that takes three arguments: a) the path to the index we want to create (note that Lucene writes out numerous files or "segments" as part of its indexing routine just like Verity does; the "path" mentioned here is the path where you want those files stored); b) a StopAnalyzer object that you created above; and c) a Boolean variable that determines whether to create an index or simply update an index.

    To make it easier to understand what's going on, I've hard coded the path to the index above. You'll notice that the source code included at the end of this article uses a variable from ColdFusion's attri-butes scope, which allows you to determine where you want your index files stored at runtime.

    Next, you'll need to instantiate a couple more Java objects:

    document = CreateObject("java", "org.apache.lucene.document.Document");
    field = CreateObject("java", "org.apache.lucene.document.Field");
    system = CreateObject("java", "java.lang.System");

    Next, you'll need to read in the content of the file you want to index. www.cflib.org is a wonderful resource for ColdFusion functions. I found one called FileRead() that does exactly what you'll need. I included a link to the function in the resources section so that you can download it, but for now, just know that I'm using that function to read the entire contents of a file as a string (again I've hard coded the path to the ColdFusion documentation home page on my local system. The source code for the finished product can be downloaded at sys-con.com/coldfusion/sourcec.cfm).

    content = FileRead("c:\cfusionmx\wwwroot\cfdocs\dochome.htm");

    Now I need to extract the title from the HTML document. I use the FindNoCase() function to get the start and end points of the elements and then use the Mid() function to get the value of :

    startTitle = FindNoCase("", content);
    if (endTitle GT 0) {
    title = trim(Mid(content, startTitle + 7, endTitle - startTitle - 7));
    }

    Now you'll use the add() method of the "document" object in combination with the "field" object to add the URL, the title, and body of the file you just read using FileRead() to the document that Lucene will index:

    document.add(field.Keyword("url",
    "http://localhost:8500/cfdocs/dochome.htm");
    document.add(field.Text("title", title));
    document.add(field.UnIndexed("summary", content));
    document.add(field.UnStored("body", content));

    Finally, we use the addDocument() method of the writer object to add the document to the Lucene index

    writer.addDocument(document);

    After you're done adding documents to the index, you'll use the close() method of the writer object:

    writer.close();

    You're done! If you've been following along in your editor, save the file and request the page:

    http://localhost:8500/lucene/luceneindex.cfm

    After running the page, you should see a bunch of oddly named files in the /cfusionmx/wwwroot/lucene/docsindex/ directory. Now you're ready to search the Lucene index.

    Searching a Lucene Index Using ColdFusion
    To start, create a new file, save it in the /cfusionmx/wwwroot/lucene/ directory as "lucenesearch.cfm". You'll use CFScript syntax again, so add a block to your script. After completing that, you'll need to instantiate a couple of Java objects, just like you did in the luceneindex.cfm script.

    indexReader = CreateObject("java",
    "org.apache.lucene.index.IndexReader");
    searcher = CreateObject("java",
    "org.apache.lucene.search.IndexSearcher");
    searcher = searcher.init(indexReader.open
    ("c:\cfusionmx\wwwroot\lucene\docsindex"));
    analyzer = CreateObject("java",
    "org.apache.lucene.analysis.StopAnalyzer");
    analyzer.init();
    luceneQuery = CreateObject("java",
    "org.apache.lucene.search.Query");
    queryParser = CreateObject("java",
    "org.apache.lucene.queryParser.QueryParser");
    luceneQuery = queryParser.parse("cfx", "body", analyzer);
    hits = CreateObject("java",
    "org.apache.lucene.search.Hits");
    hits = searcher.search(luceneQuery);

    Starting at the top, I create an IndexReader object, an IndexSearcher object, and then call the constructor of the IndexSearcher object using the ColdFusion init() method. You'll notice that instead of calling the init method like this

    searcher.init();

    I use a different constructor, on whose argument is an IndexReader object. The IndexReader object is retrieved using a static method (a static method is a method that doesn't require the instantiation of an object before its use) called "open", whose argument is the path to the index you want to search:

    searcher.init(indexReader.open("c:\cfusionmx\wwwroot\lucene\docsindex"));

    Again, I hardcoded the path to the index for the sake of simplicity; the source code at the end of this document uses a value from the attributes scope, enabling this script to be used as a CFML custom tag.

    After instantiating the IndexSearcher object, I created a StopAnalyzer object, the same one that I used when creating and populating the index. (It's important to note that whatever Analyzer you use to create and populate your index, you must use the same type of Analyzer object to search your index. If you don't use the same type of object, you'll get inconsistent, if not invalid, results.)

    Next, I create a Query object and a QueryParser object, instantiating the Query object by calling a static method of the QueryParser object "parse()", which returns an instance of org.apache.lucene.search.Query. The "parser()" method's arguments are the keywords you want to search for, the property of the index you want to search, and the Analyzer object. Note that the CFML source code for this article substitutes the string "cfx" for a variable "keyword", which comes from the attributes scope, again enabling this script to be used as a CFML custom tag. Finally, I create a Hits object, which is instantiated using the IndexSearch objects' method "search()".

    You're almost done! At this point, the variable "hits" has been populated with the results of the Lucene search, so all we need to do is iterate over the variable hits, extract the Document, and add the resulting values to a ColdFusion query, which you then return to the calling template. First things first: create a ColdFusion query using an array of columns:

    localQuery = QueryNew("URL, TITLE, SUMMARY");

    Next, you'll get a Document object (the same kind of object you used when creating the index):

    doc = CreateObject("java", "org.apache.lucene.document.Document");

    and then iterate over the hits collection:

    for (i=0; i LT hits.length(); i=i+1) {
    doc = hits.doc(javacast("int", i));
    QueryAddRow(localQuery);
    QuerySetCell(localQuery, "url",
    doc.get("url"), i+1);
    QuerySetCell(localQuery, "title",
    doc.get("title"), i+1);
    QuerySetCell(localQuery, "summary",
    doc.get("summary"), i+1);
    }

    There isn't much that's complicated here; the "hits" variable contains a method called "length()" that returns the number of elements in the collection (much like you can use the property "recordcount" when referring to a ColdFusion query). You retrieve the Document object from the hits collection using the "doc()" method. Notice that you must use the "javacast()" function to cast the variable "i" to the primitive Java type "int". Finally, you use the "get()" method of the Document object to retrieve the string values of the URL, title, and summary fields, which you originally populated when creating the index. Also, you'll notice that the hits collection is a zero-based collection, while the ColdFusion query starts with 1, hence the use of the "i + 1", in the QuerySetCell() function.

    For the purposes of testing this script, you can use

    outside of the block to check the results of your search. If you've been following along and you've correctly created the Lucene index from the first section of this article, you should see something like Figure 2 after dumping the results of the query to the screen.

     

    Optimizing a Lucene Index Using ColdFusion
    Adding multiple documents to the Lucene index using the scripts you wrote above leaves the index in a fragmented state. For optimum performance, you'll want to optimize the Lucene index on a regular basis. The Lucene API makes this operation simple:

    analyzer = CreateObject("java",
    "org.apache.lucene.analysis.StopAnalyzer");
    analyzer.init();
    writer = CreateObject("java",
    "org.apache.lucene.index.IndexWriter");
    writer.init("c:\cfusionmx\wwwroot\
    lucene\docsindex", analyzer, false);
    writer.optimize();

    This code should look relatively familiar to you. You start out by instantiating a StopAnalyzer object and an IndexWriter object and then you call the IndexWriter constructor, passing it the path to the Lucene index you want to optimize, along with the Analyzer object and a Boolean variable that determines whether or not to "create" an index. Obviously in this situation you don't want to create a new index; you simply want to optimize an existing one. Finally, you call the optimize() method of the IndexWriter object

    writer.optimize();

    which "... merges all segments together into a single segment, optimizing an index for search" according to the Lucene API documentation.

    Conclusion
    After you've read this article, I encourage you to download and explore the source code. I've exploded the short scripts into two complete custom tags that you can use to closely mimic the behavior of and , which use Verity internally. The tag uses functions downloaded from www.cfilib.org to recursively crawl a directory, indexing each file along the way. Additionally, the tag gives you the ability to optimize a Lucene index with only a single line of code:

    http://jakarta.apache.org/lucene

  • Jakarta Lucene Downloads: http://jakarta.apache.org/builds/ jakarta-lucene/release/v1.2
  • Search-Enable Your Application with Lucene: www.sys-con.com/java/article.cfm?id=1777
  • ColdFusion MX Documentation: http://livedocs.macromedia.com/cfmxdocs
  • ColdFusion CFScript Documentation: http://livedocs.macromedia.com/cfmxdocs/ Developing_ColdFusion_MX_Applications_with_CFML/CFScript.jsp
  • CFLib.org: FileRead.cfm: www.cflib.org/udf.cfm?ID=417
  • CFLib.org: DirectoryList.cfm: www.cflib.org/udf.cfm?ID=615
  • More Stories By Aaron Johnson

    Aaron Johnson is a senior software engineer at Jive Software. He lives with his wonderful wife, young son and dog in a Portland, Oregon. You can find out more by reading his blog at http://cephas.net/blog/.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.