Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: ColdFusion

ColdFusion: Article

Maintaining Live Verity Collections in a Clustered Environment

Maintaining Live Verity Collections in a Clustered Environment

The task seemed monumental: "Come up with a way to keep live Verity data indexed and accessible to multiple Web servers in a clustered environment." The scope of the data was huge - hundreds of thousands of pieces of content - with new additions made 24 hours a day, every day of the year. All known methods to accomplish this task just did not seem to do the job adequately. It was time to get creative.

While analyzing the inner workings of the stock Verity 97 engine that ships with ColdFusion 4.5, the following keys to success were identified:

  1. Verity searches are CPU-intense; thus it would be best not to force all the servers in the cluster to search on one server, but rather make each server responsible for its own searches to best spread out the workload.
  2. Verity indexing is even more CPU-intensive than the searching; thus any box stuck with the task of continual indexing would spend most of its time pegged at 100% CPU usage and would not be desirable for any other task.
The theory started to clarify. We would dedicate one server to continually update the indexes. This would be called the Utility Server. As soon as this server finished a round of updates, it would start the next. As mentioned above, this Utility Server would spend most of its time at 100% CPU usage, so it would be used only for the index updates.

The next piece of the puzzle was to determine the best way to share file access to these Verity collections. With ever-changing data and a CPU operating at 100%, the Utility Server would not be an option. So we decided to dedicate another box to simply hold the most current and searchable version of the indexes. This would be called the Index Server.

The final component was the Web servers. This was easy, as all they had to do was map Verity collections to the Index Server.

The basic setup made sense, but some of the details were still missing:

  1. If the Utility Server was copying updated index files to the Index Server, what would happen to any ongoing searches that were currently using the collections on the Index Server?
  2. How would we clean old index data from the Index Server without impacting ongoing searches?
One idea was to work with multiple collection groups on the Index Server, say collection "A" and collection "B." While collection "A" was being indexed and copied over, collection "B" would handle all of the searches from the Web servers. This sounded like a possible solution but added an extra degree of complexity. Was the extra complication necessary?

What if we could simply use CFFILE to copy new index data and delete old index data directly to the live set of collections on the Index Server? But that would go directly against what we have been told is a best practice (locking all Verity collections during update transactions). Before we gave up on the idea, we contacted Verity representatives to see if they could offer any advice. We were pleasantly surprised when they informed us that the Verity 97 engine was designed to handle ongoing "file transactions" while live searches are being performed. Time to run some tests and see what we could find out.

For our test environment we set up a small clustered environment of our own (see Figure 1). On another set of eight machines (two machines per Web server), we opened a total of 20 browser instances on each of the four Web servers and ran test code (see Listing 1) that was looping over a basic CFSEARCH on a Verity collection.

At the same time the searches were being performed, we were updating the Verity collection on the Utility Server, copying the collection to the Index Server, and deleting old Verity files from the Index Server. Building the indexes using CFINDEX and copying the files with the use of CFDIRECTORY and CFFILE was very straightforward (see Listing 2) but deleting the files required a little more thought. Because Verity collections store many files in pairs, we would need some logic that deleted all but the latest pair of files in each Verity subfolder. So we came up with a custom tag called tag_delete (see Listing 3).

Deleting Old Verity Files
It's important to understand the structure of the Verity files. This helps to better understand why we delete what we do with the Delete tag. To learn the file structure of Verity collections and why the deletion of the old Verity files is important, see www.allaire.com/handlers/index.cfm?id=18429&method=full ("Understanding Verity Collections in ColdFusion").

<!--- call to the tag ---> <cf_tag_Delete source="" fileDateTime="">
The tag is called by passing two parameters. The source parameter is the directory path in which to begin deletion, and fileDateTime is a time stamp used to mark how far in the past the tag can delete (see Listing 3).

As the code in Listing 1 was executing, the output would begin by returning the old record count. As we updated the indexes, the output would change to reflect the new data:

103
103
103
103
127
127

Over the course of many such tests, we did not experience any collection corruption or errors of any kind.

Interacting with Copies of Index Files
To clarify a major point in regard to not locking our Verity index transactions, we're talking only about interacting with copies of the index files themselves and CFSEARCH, not the actual index creation and modification process of CFINDEX - as that code is always run with single thread access on the Utility Server.

Live for Six Months
Our tests were successful. We moved from our test data to an actual clustered environment and started using real data. The final product has been live for six months (at the time of this writing) with no ill effects. We are confident in this theory, but, as always, make sure to run tests in your own environment to guarantee that it will work for you.

The following is a more detailed look at the machines we used and a basic set-up procedure for each machine.

Machines Used

  • One Utility Server
  • One Index Server
  • Four Web servers (multiple)

Utility Server
The Utility Server has a scheduled task set to run every minute. This scheduled task sets a lock file (lock_verityCollectionUpdate.txt). When this scheduled task begins, it checks the existence of this lock file. If the file exists, the task will end without attempting to do an update. If the lock file does not exist, the task will run. It will write the lock_verityCollectionUpdate.txt file, retrieve all new data since the last time it ran, and then update and optimize all Verity collections. Next the task will copy all new index files to the Index Server. It then reads over the Index Server's collection files and removes all but the latest files. Finally, the task will delete the lock_verityCollectionUpdate.txt file.

Index Server
The Index Server simply holds the "live" Verity collections.

Web Servers
The Web servers all map their collections to the Index Server. There are no local collections on these machines; thus, as per load balancing setup, each Web server will supply the CPU usage for any searches its visitors require. As Verity searches can become CPU-intense, this gives your searches the most bang for your buck.

Setup Procedure

  • Ensure that all ColdFusion tasks are running under accounts that have security privileges to work with all servers. (In Windows NT 4.0 and 2000, go to Services, then check the properties of the CF Services to see what account they're running under.)
  • Make sure to use realistic request timeouts, so that your indexing code will not timeout before it's complete.

Utility Server

  • Map a drive to the location in which you'll store the files on the Index Server.
  • Create the collections on this machine. Depending upon how long it takes for your collections to build, it's advisable to build your collections using Netscape 4.X - as IE has been known to time out.
  • Set up a scheduled task to run task_verityCollection-Update.cfm (see Listing 4) to run every minute. Enable this task as soon as the Index Server is set up.

Index Server
Copy the collections from the Utility Server to the Index Server. This will prime the pump and needs to be done by hand only once, for initial setup. This step is also important to set up the proper directory structure for the files on the Index Server, so that the file transactions can take place.

Web Servers

  • Map drive to the Index Server.
  • In the CF Administrator, set up a new mapped Verity collection pointing the map drive to the Index Server you just set up. Note: You will need to get the collection name exactly as it's shown on the Utility Server.

More Stories By Jeremy Petersen

Jeremy Petersen is certified in ColdFusion and has been using it since version 1.5. He is the manager of Web application engineering for TeachStream, Inc., in Salt Lake City, Utah.

More Stories By Dan Kison

Dan Kison has worked with ColdFusion since version 3.0. He created Web sites for the Air Force for four years. Last year, he took a position in Austin ,Texas, where he continues to build Web applications.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
Announcing Poland #DigitalTransformation Pavilion
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
CloudEXPO | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution. In his session at @ThingsExpo, Akvelon expert and IoT industry leader Sergey Grebnov provided an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, added the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor analytic...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Michael Maximilien, better known as max or Dr. Max, is a computer scientist with IBM. At IBM Research Triangle Park, he was a principal engineer for the worldwide industry point-of-sale standard: JavaPOS. At IBM Research, some highlights include pioneering research on semantic Web services, mashups, and cloud computing, and platform-as-a-service. He joined the IBM Cloud Labs in 2014 and works closely with Pivotal Inc., to help make the Cloud Found the best PaaS.
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...