Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: ColdFusion

ColdFusion: Article

Maintaining Live Verity Collections in a Clustered Environment

Maintaining Live Verity Collections in a Clustered Environment

The task seemed monumental: "Come up with a way to keep live Verity data indexed and accessible to multiple Web servers in a clustered environment." The scope of the data was huge - hundreds of thousands of pieces of content - with new additions made 24 hours a day, every day of the year. All known methods to accomplish this task just did not seem to do the job adequately. It was time to get creative.

While analyzing the inner workings of the stock Verity 97 engine that ships with ColdFusion 4.5, the following keys to success were identified:

  1. Verity searches are CPU-intense; thus it would be best not to force all the servers in the cluster to search on one server, but rather make each server responsible for its own searches to best spread out the workload.
  2. Verity indexing is even more CPU-intensive than the searching; thus any box stuck with the task of continual indexing would spend most of its time pegged at 100% CPU usage and would not be desirable for any other task.
The theory started to clarify. We would dedicate one server to continually update the indexes. This would be called the Utility Server. As soon as this server finished a round of updates, it would start the next. As mentioned above, this Utility Server would spend most of its time at 100% CPU usage, so it would be used only for the index updates.

The next piece of the puzzle was to determine the best way to share file access to these Verity collections. With ever-changing data and a CPU operating at 100%, the Utility Server would not be an option. So we decided to dedicate another box to simply hold the most current and searchable version of the indexes. This would be called the Index Server.

The final component was the Web servers. This was easy, as all they had to do was map Verity collections to the Index Server.

The basic setup made sense, but some of the details were still missing:

  1. If the Utility Server was copying updated index files to the Index Server, what would happen to any ongoing searches that were currently using the collections on the Index Server?
  2. How would we clean old index data from the Index Server without impacting ongoing searches?
One idea was to work with multiple collection groups on the Index Server, say collection "A" and collection "B." While collection "A" was being indexed and copied over, collection "B" would handle all of the searches from the Web servers. This sounded like a possible solution but added an extra degree of complexity. Was the extra complication necessary?

What if we could simply use CFFILE to copy new index data and delete old index data directly to the live set of collections on the Index Server? But that would go directly against what we have been told is a best practice (locking all Verity collections during update transactions). Before we gave up on the idea, we contacted Verity representatives to see if they could offer any advice. We were pleasantly surprised when they informed us that the Verity 97 engine was designed to handle ongoing "file transactions" while live searches are being performed. Time to run some tests and see what we could find out.

For our test environment we set up a small clustered environment of our own (see Figure 1). On another set of eight machines (two machines per Web server), we opened a total of 20 browser instances on each of the four Web servers and ran test code (see Listing 1) that was looping over a basic CFSEARCH on a Verity collection.

At the same time the searches were being performed, we were updating the Verity collection on the Utility Server, copying the collection to the Index Server, and deleting old Verity files from the Index Server. Building the indexes using CFINDEX and copying the files with the use of CFDIRECTORY and CFFILE was very straightforward (see Listing 2) but deleting the files required a little more thought. Because Verity collections store many files in pairs, we would need some logic that deleted all but the latest pair of files in each Verity subfolder. So we came up with a custom tag called tag_delete (see Listing 3).

Deleting Old Verity Files
It's important to understand the structure of the Verity files. This helps to better understand why we delete what we do with the Delete tag. To learn the file structure of Verity collections and why the deletion of the old Verity files is important, see www.allaire.com/handlers/index.cfm?id=18429&method=full ("Understanding Verity Collections in ColdFusion").

<!--- call to the tag ---> <cf_tag_Delete source="" fileDateTime="">
The tag is called by passing two parameters. The source parameter is the directory path in which to begin deletion, and fileDateTime is a time stamp used to mark how far in the past the tag can delete (see Listing 3).

As the code in Listing 1 was executing, the output would begin by returning the old record count. As we updated the indexes, the output would change to reflect the new data:

103
103
103
103
127
127

Over the course of many such tests, we did not experience any collection corruption or errors of any kind.

Interacting with Copies of Index Files
To clarify a major point in regard to not locking our Verity index transactions, we're talking only about interacting with copies of the index files themselves and CFSEARCH, not the actual index creation and modification process of CFINDEX - as that code is always run with single thread access on the Utility Server.

Live for Six Months
Our tests were successful. We moved from our test data to an actual clustered environment and started using real data. The final product has been live for six months (at the time of this writing) with no ill effects. We are confident in this theory, but, as always, make sure to run tests in your own environment to guarantee that it will work for you.

The following is a more detailed look at the machines we used and a basic set-up procedure for each machine.

Machines Used

  • One Utility Server
  • One Index Server
  • Four Web servers (multiple)

Utility Server
The Utility Server has a scheduled task set to run every minute. This scheduled task sets a lock file (lock_verityCollectionUpdate.txt). When this scheduled task begins, it checks the existence of this lock file. If the file exists, the task will end without attempting to do an update. If the lock file does not exist, the task will run. It will write the lock_verityCollectionUpdate.txt file, retrieve all new data since the last time it ran, and then update and optimize all Verity collections. Next the task will copy all new index files to the Index Server. It then reads over the Index Server's collection files and removes all but the latest files. Finally, the task will delete the lock_verityCollectionUpdate.txt file.

Index Server
The Index Server simply holds the "live" Verity collections.

Web Servers
The Web servers all map their collections to the Index Server. There are no local collections on these machines; thus, as per load balancing setup, each Web server will supply the CPU usage for any searches its visitors require. As Verity searches can become CPU-intense, this gives your searches the most bang for your buck.

Setup Procedure

  • Ensure that all ColdFusion tasks are running under accounts that have security privileges to work with all servers. (In Windows NT 4.0 and 2000, go to Services, then check the properties of the CF Services to see what account they're running under.)
  • Make sure to use realistic request timeouts, so that your indexing code will not timeout before it's complete.

Utility Server

  • Map a drive to the location in which you'll store the files on the Index Server.
  • Create the collections on this machine. Depending upon how long it takes for your collections to build, it's advisable to build your collections using Netscape 4.X - as IE has been known to time out.
  • Set up a scheduled task to run task_verityCollection-Update.cfm (see Listing 4) to run every minute. Enable this task as soon as the Index Server is set up.

Index Server
Copy the collections from the Utility Server to the Index Server. This will prime the pump and needs to be done by hand only once, for initial setup. This step is also important to set up the proper directory structure for the files on the Index Server, so that the file transactions can take place.

Web Servers

  • Map drive to the Index Server.
  • In the CF Administrator, set up a new mapped Verity collection pointing the map drive to the Index Server you just set up. Note: You will need to get the collection name exactly as it's shown on the Utility Server.

More Stories By Jeremy Petersen

Jeremy Petersen is certified in ColdFusion and has been using it since version 1.5. He is the manager of Web application engineering for TeachStream, Inc., in Salt Lake City, Utah.

More Stories By Dan Kison

Dan Kison has worked with ColdFusion since version 3.0. He created Web sites for the Air Force for four years. Last year, he took a position in Austin ,Texas, where he continues to build Web applications.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
Artificial intelligence, machine learning, neural networks. We’re in the midst of a wave of excitement around AI such as hasn’t been seen for a few decades. But those previous periods of inflated expectations led to troughs of disappointment. Will this time be different? Most likely. Applications of AI such as predictive analytics are already decreasing costs and improving reliability of industrial machinery. Furthermore, the funding and research going into AI now comes from a wide range of com...
"When we talk about cloud without compromise what we're talking about is that when people think about 'I need the flexibility of the cloud' - it's the ability to create applications and run them in a cloud environment that's far more flexible,” explained Matthew Finnie, CTO of Interoute, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara California. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infras...
In this presentation, Striim CTO and founder Steve Wilkes will discuss practical strategies for counteracting fraud and cyberattacks by leveraging real-time streaming analytics. In his session at @ThingsExpo, Steve Wilkes, Founder and Chief Technology Officer at Striim, will provide a detailed look into leveraging streaming data management to correlate events in real time, and identify potential breaches across IoT and non-IoT systems throughout the enterprise. Strategies for processing massive ...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
In his session at Cloud Expo, Alan Winters, an entertainment executive/TV producer turned serial entrepreneur, presented a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to ma...
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assista...
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists looked at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deliver...
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), provided an overview of various initiatives to certify the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldwide re...
With the introduction of IoT and Smart Living in every aspect of our lives, one question has become relevant: What are the security implications? To answer this, first we have to look and explore the security models of the technologies that IoT is founded upon. In his session at @ThingsExpo, Nevi Kaja, a Research Engineer at Ford Motor Company, discussed some of the security challenges of the IoT infrastructure and related how these aspects impact Smart Living. The material was delivered interac...
IoT solutions exploit operational data generated by Internet-connected smart “things” for the purpose of gaining operational insight and producing “better outcomes” (for example, create new business models, eliminate unscheduled maintenance, etc.). The explosive proliferation of IoT solutions will result in an exponential growth in the volume of IoT data, precipitating significant Information Governance issues: who owns the IoT data, what are the rights/duties of IoT solutions adopters towards t...