Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: ColdFusion

ColdFusion: Article

Maintaining Live Verity Collections in a Clustered Environment

Maintaining Live Verity Collections in a Clustered Environment

The task seemed monumental: "Come up with a way to keep live Verity data indexed and accessible to multiple Web servers in a clustered environment." The scope of the data was huge - hundreds of thousands of pieces of content - with new additions made 24 hours a day, every day of the year. All known methods to accomplish this task just did not seem to do the job adequately. It was time to get creative.

While analyzing the inner workings of the stock Verity 97 engine that ships with ColdFusion 4.5, the following keys to success were identified:

  1. Verity searches are CPU-intense; thus it would be best not to force all the servers in the cluster to search on one server, but rather make each server responsible for its own searches to best spread out the workload.
  2. Verity indexing is even more CPU-intensive than the searching; thus any box stuck with the task of continual indexing would spend most of its time pegged at 100% CPU usage and would not be desirable for any other task.
The theory started to clarify. We would dedicate one server to continually update the indexes. This would be called the Utility Server. As soon as this server finished a round of updates, it would start the next. As mentioned above, this Utility Server would spend most of its time at 100% CPU usage, so it would be used only for the index updates.

The next piece of the puzzle was to determine the best way to share file access to these Verity collections. With ever-changing data and a CPU operating at 100%, the Utility Server would not be an option. So we decided to dedicate another box to simply hold the most current and searchable version of the indexes. This would be called the Index Server.

The final component was the Web servers. This was easy, as all they had to do was map Verity collections to the Index Server.

The basic setup made sense, but some of the details were still missing:

  1. If the Utility Server was copying updated index files to the Index Server, what would happen to any ongoing searches that were currently using the collections on the Index Server?
  2. How would we clean old index data from the Index Server without impacting ongoing searches?
One idea was to work with multiple collection groups on the Index Server, say collection "A" and collection "B." While collection "A" was being indexed and copied over, collection "B" would handle all of the searches from the Web servers. This sounded like a possible solution but added an extra degree of complexity. Was the extra complication necessary?

What if we could simply use CFFILE to copy new index data and delete old index data directly to the live set of collections on the Index Server? But that would go directly against what we have been told is a best practice (locking all Verity collections during update transactions). Before we gave up on the idea, we contacted Verity representatives to see if they could offer any advice. We were pleasantly surprised when they informed us that the Verity 97 engine was designed to handle ongoing "file transactions" while live searches are being performed. Time to run some tests and see what we could find out.

For our test environment we set up a small clustered environment of our own (see Figure 1). On another set of eight machines (two machines per Web server), we opened a total of 20 browser instances on each of the four Web servers and ran test code (see Listing 1) that was looping over a basic CFSEARCH on a Verity collection.

At the same time the searches were being performed, we were updating the Verity collection on the Utility Server, copying the collection to the Index Server, and deleting old Verity files from the Index Server. Building the indexes using CFINDEX and copying the files with the use of CFDIRECTORY and CFFILE was very straightforward (see Listing 2) but deleting the files required a little more thought. Because Verity collections store many files in pairs, we would need some logic that deleted all but the latest pair of files in each Verity subfolder. So we came up with a custom tag called tag_delete (see Listing 3).

Deleting Old Verity Files
It's important to understand the structure of the Verity files. This helps to better understand why we delete what we do with the Delete tag. To learn the file structure of Verity collections and why the deletion of the old Verity files is important, see www.allaire.com/handlers/index.cfm?id=18429&method=full ("Understanding Verity Collections in ColdFusion").

<!--- call to the tag ---> <cf_tag_Delete source="" fileDateTime="">
The tag is called by passing two parameters. The source parameter is the directory path in which to begin deletion, and fileDateTime is a time stamp used to mark how far in the past the tag can delete (see Listing 3).

As the code in Listing 1 was executing, the output would begin by returning the old record count. As we updated the indexes, the output would change to reflect the new data:

103
103
103
103
127
127

Over the course of many such tests, we did not experience any collection corruption or errors of any kind.

Interacting with Copies of Index Files
To clarify a major point in regard to not locking our Verity index transactions, we're talking only about interacting with copies of the index files themselves and CFSEARCH, not the actual index creation and modification process of CFINDEX - as that code is always run with single thread access on the Utility Server.

Live for Six Months
Our tests were successful. We moved from our test data to an actual clustered environment and started using real data. The final product has been live for six months (at the time of this writing) with no ill effects. We are confident in this theory, but, as always, make sure to run tests in your own environment to guarantee that it will work for you.

The following is a more detailed look at the machines we used and a basic set-up procedure for each machine.

Machines Used

  • One Utility Server
  • One Index Server
  • Four Web servers (multiple)

Utility Server
The Utility Server has a scheduled task set to run every minute. This scheduled task sets a lock file (lock_verityCollectionUpdate.txt). When this scheduled task begins, it checks the existence of this lock file. If the file exists, the task will end without attempting to do an update. If the lock file does not exist, the task will run. It will write the lock_verityCollectionUpdate.txt file, retrieve all new data since the last time it ran, and then update and optimize all Verity collections. Next the task will copy all new index files to the Index Server. It then reads over the Index Server's collection files and removes all but the latest files. Finally, the task will delete the lock_verityCollectionUpdate.txt file.

Index Server
The Index Server simply holds the "live" Verity collections.

Web Servers
The Web servers all map their collections to the Index Server. There are no local collections on these machines; thus, as per load balancing setup, each Web server will supply the CPU usage for any searches its visitors require. As Verity searches can become CPU-intense, this gives your searches the most bang for your buck.

Setup Procedure

  • Ensure that all ColdFusion tasks are running under accounts that have security privileges to work with all servers. (In Windows NT 4.0 and 2000, go to Services, then check the properties of the CF Services to see what account they're running under.)
  • Make sure to use realistic request timeouts, so that your indexing code will not timeout before it's complete.

Utility Server

  • Map a drive to the location in which you'll store the files on the Index Server.
  • Create the collections on this machine. Depending upon how long it takes for your collections to build, it's advisable to build your collections using Netscape 4.X - as IE has been known to time out.
  • Set up a scheduled task to run task_verityCollection-Update.cfm (see Listing 4) to run every minute. Enable this task as soon as the Index Server is set up.

Index Server
Copy the collections from the Utility Server to the Index Server. This will prime the pump and needs to be done by hand only once, for initial setup. This step is also important to set up the proper directory structure for the files on the Index Server, so that the file transactions can take place.

Web Servers

  • Map drive to the Index Server.
  • In the CF Administrator, set up a new mapped Verity collection pointing the map drive to the Index Server you just set up. Note: You will need to get the collection name exactly as it's shown on the Utility Server.

More Stories By Jeremy Petersen

Jeremy Petersen is certified in ColdFusion and has been using it since version 1.5. He is the manager of Web application engineering for TeachStream, Inc., in Salt Lake City, Utah.

More Stories By Dan Kison

Dan Kison has worked with ColdFusion since version 3.0. He created Web sites for the Air Force for four years. Last year, he took a position in Austin ,Texas, where he continues to build Web applications.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
Because IoT devices are deployed in mission-critical environments more than ever before, it’s increasingly imperative they be truly smart. IoT sensors simply stockpiling data isn’t useful. IoT must be artificially and naturally intelligent in order to provide more value In his session at @ThingsExpo, John Crupi, Vice President and Engineering System Architect at Greenwave Systems, will discuss how IoT artificial intelligence (AI) can be carried out via edge analytics and machine learning techn...
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution and join Akvelon expert and IoT industry leader, Sergey Grebnov, in his session at @ThingsExpo, for an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web communications world. The 6th WebRTC Summit continues our tradition of delivering the latest and greatest presentations within the world of WebRTC. Topics include voice calling, video chat, P2P file sharing, and use cases that have already leveraged the power and convenience of WebRTC.
From 2013, NTT Communications has been providing cPaaS service, SkyWay. Its customer’s expectations for leveraging WebRTC technology are not only typical real-time communication use cases such as Web conference, remote education, but also IoT use cases such as remote camera monitoring, smart-glass, and robotic. Because of this, NTT Communications has numerous IoT business use-cases that its customers are developing on top of PaaS. WebRTC will lead IoT businesses to be more innovative and address...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, discussed the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insights from the vast t...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, will provide a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to ...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
Recently, IoT seems emerging as a solution vehicle for data analytics on real-world scenarios from setting a room temperature setting to predicting a component failure of an aircraft. Compared with developing an application or deploying a cloud service, is an IoT solution unique? If so, how? How does a typical IoT solution architecture consist? And what are the essential components and how are they relevant to each other? How does the security play out? What are the best practices in formulating...
In his session at @ThingsExpo, Arvind Radhakrishnen discussed how IoT offers new business models in banking and financial services organizations with the capability to revolutionize products, payments, channels, business processes and asset management built on strong architectural foundation. The following topics were covered: How IoT stands to impact various business parameters including customer experience, cost and risk management within BFS organizations.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
SYS-CON Events announced today that Elastifile will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Elastifile Cloud File System (ECFS) is software-defined data infrastructure designed for seamless and efficient management of dynamic workloads across heterogeneous environments. Elastifile provides the architecture needed to optimize your hybrid cloud environment, by facilitating efficient...
There is only one world-class Cloud event on earth, and that is Cloud Expo – which returns to Silicon Valley for the 21st Cloud Expo at the Santa Clara Convention Center, October 31 - November 2, 2017. Every Global 2000 enterprise in the world is now integrating cloud computing in some form into its IT development and operations. Midsize and small businesses are also migrating to the cloud in increasing numbers. Companies are each developing their unique mix of cloud technologies and service...
SYS-CON Events announced today that Golden Gate University will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Since 1901, non-profit Golden Gate University (GGU) has been helping adults achieve their professional goals by providing high quality, practice-based undergraduate and graduate educational programs in law, taxation, business and related professions. Many of its courses are taug...
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...