Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: ColdFusion, Adobe Flex

ColdFusion: Article

ColdFusion Feature — Directory Watcher Dangers - A Follow-Up

A roadmap from the trenches

In the July 2006 issue of CFDJ, I wrote about the Directory Watcher event gateway, and how easy it was to set up and how powerful a tool it could be for managing files and external interfaces. While this is true, there are some potential hazards waiting for the unsuspecting developer who jumps into DW waters without a life preserver.

Fortunately, Dave Ferguson has used this particular gateway extensively in his work, and offered to share some of his experiences and solutions with us. Never one to turn down a good follow-up article, I immediately agreed to an education courtesy of Dave, and here it is. - Jeff Peters

When I read Jeff's article, I was interested because I use the DW gateway extensively and I was struck by memories of the double-edged nature of the DW gateway in action. Glory does not come cheap.

Imagine that our hero needs to build a DW gateway to monitor an FTP directory. He writes all the code, does all the tests, then puts the code into the production environment. When it's all done he marvels at his accomplishment, basking in the early glow of success. Then the phone rings - it's Murphy. The new DW gateway system is failing. Files aren't being processed, or only parts of files are being processed. Our hero checks the log files and to his horror it's filled with file-read errors from the gateway.

That beautiful, streamlined, straightforward process, written in accordance with all best practices, has gone to hell. Shaken, our hero begins banging his head against his desk and staring at the screen, wishing it worked the way it's supposed to. Everything is there, the gateway is firing when a file is written, but it's not processing files. Oh, yes -and to make matters worse, the problem only happens about 30% of the time.

I know all this because I was that guy. This exact scenario happened to me. I went from hero to zero in about 2.3 seconds. About three weeks of coding turned into two weeks of pain. It took me roughly five days to figure out what was going wrong. Then another week of trying to solve it, and when I reached the point of scrapping the whole thing, a light went off in my head. I had one of those moments of clarity where the solution seemed so clear. Now all I had to do was create a test and see if my bright idea would work. Or was it just a flash of light?

My pain and anguish is your reward. I'm going to walk through the problems with the DW gateway and provide some tools to overcome them. I tried to make the code as lean and mean as possible for performance reasons, and the principles can be applied to your DW gateway with minimal effort. For purposes of this article I'm going to assume that the reader has a general understanding of the DW gateway. Details on setting up a DW gateway and how it operates are available in the article entitled "DirectoryWatcher Event Gateway: Ditching the Scheduler" in the July '06 issue of CFDJ.

I was using DW to watch a directory for any new files uploaded via FTP. I needed to process each new file after it uploaded. The root cause of the scenario I experienced was, in my opinion, a flaw in the DW gateway that causes the gateway onAdd event to fire before the file is finished writing. fires
A second problem happens when a file is locked for writing and usually occurs when a DW gateway is watching a network shared directory.

Both of these problems are fatal for the DW gateway, but it's possible to get around them and process the file. ( Note: The failures I encountered had almost nothing to do with the size of the file being written; it was all about the timing. However, the bigger the file, the more apt we are to run into problems.)

A third issue is that it's hard to access debugging information from the DW gateway. It's not just an issue with the DW gateway but any gateway or other call that doesn't output to a browser. This issue is fixable too, but it takes a bit of effort on the part of the developer. The important thing to remember is that any piece of code can fail at any given time. With the DW gateway there's no user to get an error message, so we need to send those messages to someplace useful.

Before we can try to fix these issues we have to understand their nature and how CF reacts to each one. Like I said , file size has almost nothing to do with the problem, it's all about timing: when the gateway fires and when the file stops writing. Unfortunately, the gateway doesn't have a built-in way to detect that a file write is complete.

Let's take a closer look at the first issue. Since the server starts writing the file to the drive as soon as it's received, the gateway can pick up the file as soon as a single byte is written. While this in itself isn't a problem, it can become a problem for the gateway if the gateway kicks off a process that needs the entire file. The gateway itself doesn't fail, but if we used CFFILE to read the file content we wouldn't get the whole file. Only that part of the file that was uploaded when you tried to read it would be returned. So we need a way to have our process continue only if the whole file has been uploaded.The second issue is a file that's write-locked, which normally happens when writing a file to a shared directory or a UNC path using a Windows server. If we write a 5MB file to a shared directory Windows instantly creates an empty 5MB file as a placeholder. It then copies the file to the destination. During this operation there's an exclusive write-lock on the file that prevents anything from trying to read the file until the write is complete. Unlike the first issue, if we try to use CFFILE to read the file while it's locked, the gateway crashes. So we also need a way to avoid reading locked files.

While both these issues seem pretty extreme they're fairly easy to overcome. We can even couple the fixes for each scenario together for a more robust solution. When I first wrote the fixes they were independent but over time I've combined them and put them in every DW gateway I create. Let's take a look at them.

We can avoid reading a file that's still being written when the onAdd event fires by putting the CF thread to sleep and checking the file's size repeatedly until it stops changing. Of course this can tax server performance and I have experimented with other solutions that work, but it's is the most straightforward approach.

Note: All the code examples in the article have been stripped to a minimum to save space. The complete code is available online.

Here's a normal onAdd event inside a gateway cfc file:

<CFFUNCTION NAME="onAdd" ACCESS="public" RETURNTYPE="string">
<CFARGUMENT NAME="CFEvent" TYPE="struct" REQUIRED="yes">
<CFSET thisFile = cfevent.data.filename>
<CFFILE ACTION="READ" FILE="#thisFile#" VARIABLE="fileContent">
</CFFUNCTION>

At the third line we get the file name and path then on the fourth we read the file. If the file write wasn't complete, the file read would get a partial file.

Take a look at the code in Listing 1. There we attempt to overcome the issue by checking the size of the file. This is a delicate operation because we don't want to disturb the write of the file. To do this we'll leverage the java.io.FileInputStream. Let's walk through the code and examine what's going on.

First we create a Java object to read the file system:

fileRead = createObject("java","java.io.FileInputStream");

This will enable us to check the file size. This can be done with any of several Java file objects, or we could do it with CFDIRECTORY. However, this way lets us do a non-blocking read on the file. We can use it to get a byte count that can be read from the file without causing a block. We want to do our best to make sure that we don't disrupt anything that's happening to the file while it's still uploading.

Next we create a Java object that lets us manipulate the current thread. Now we can pause the code:

thisThread = CreateObject("java", "java.lang.Thread");

Initialize the Java file object with the file that triggered the onAdd event:

fileRead.init(thisFile);

Next we set a variable loopCT to count the loops and control a while loop that, if gone unchecked, could run infinitely. Then we get the current size of the file and pause the thread for a second:

sizeA = fileRead.available();
thisThread.sleep(1000);

After the one-second pause we check the file size again and do a comparison. If the file is still being written the sizes won't match. If the file sizes match, we do one more check, just in case there's an FTP pause or some other delay that caused a pause in the file writing:

if (sizeA EQ sizeB){
thisThread.sleep(1000);
sizeC = fileRead.available();
if (sizeC EQ sizeB){
break;
}
}

If the file sizes still match then the file is finished writing. If the sizes don't match then we continue looping and start the process all over again. Keep an eye on the loop count. We don't want the loop to go on forever. If every pause was hit on every loop then this check would stop the loop after about two minutes of runtime. We would then have to put some code in that would handle this condition. In the example we just return:".

incrementValue(loopCT);
if (loopCT GT 60){
return ;
}

Lastly, we close the fileRead object. If this isn't done we could end up with a lock on the file that could only be released by stopping and starting ColdFusion.

fileRead.close();

After all this is done we can process the file as the application requires. In this example, I've removed the error checking from the code so we can focus on the file size check and not clutter the code.

The example above handles a file that we can read even though it's still in a write state. The other scenario is a little trickier to get around. There we were trying to get around a file being locked when we try to read it. Even though java.io.fileInputStream does a non-blocking read it still can't get around this hurdle. The only thing to do is to attempt to read the file and trap the read error. It's not very elegant but it gets the job done. The code in Listing 2 works to overcome the lock issue. There are some advanced Java objects you can use to check for locks. However, they do just about the same thing. They try to put a lock on a file to see if it worked. It works on the basis that only one process can have a write-lock on a file.

The flow of this code is just about the same as the code from Listing 1. The difference is that we put a try/catch around the fileRead.init(). This lets us try to read the file without causing the code to fail. If the file is locked when we try to read it, we fail to catch and sleep the thread. The code continues to loop until it can read the file or the loop count hits 60.

The code in Listing 3 is the best of both worlds. We combine the file lock check with the file size check. My suggestion would be to always do it this way. You can never be too safe when it comes to running a DW gateway.

Now that we know how to check for file-write completion and overcome file blocking, we can start to get creative. We can create a process that can handle multiple files as one batch. This is done by using CFDIRECTORY to check file counts in the directory. We can then cause the current DW gateway event to quit if the directory count increases. The next gateway event will pick up the new file. Each gateway event does the same check. When the file counts no longer increase then the last gateway run processes all the files in the directory. The code and detailed explanation is too lengthy for this article, but the full code, with documentation, will be available for download.


More Stories By Jeff Peters

Jeff Peters works for Open Source Data Integration Software company XAware.

More Stories By Dave Ferguson

Dave Ferguson is a system architect and principal programmer. He has been doing website design and development for over 10 years. He is also a Certified Advanced ColdFusion Developer. You can read his blog at http://dfoncf.blogspot.com

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at @ThingsExpo, James Kirkland, Red Hat's Chief Archi...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Personalization has long been the holy grail of marketing. Simply stated, communicate the most relevant offer to the right person and you will increase sales. To achieve this, you must understand the individual. Consequently, digital marketers developed many ways to gather and leverage customer information to deliver targeted experiences. In his session at @ThingsExpo, Lou Casal, Founder and Principal Consultant at Practicala, discussed how the Internet of Things (IoT) has accelerated our abilit...
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
DXWorldEXPO LLC announced today that ICC-USA, a computer systems integrator and server manufacturing company focused on developing products and product appliances, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of ...
DXWorldEXPO LLC announced today that the upcoming DXWorldEXPO | CloudEXPO New York event will feature 10 companies from Poland to participate at the "Poland Digital Transformation Pavilion" on November 12-13, 2018.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
Michael Maximilien, better known as max or Dr. Max, is a computer scientist with IBM. At IBM Research Triangle Park, he was a principal engineer for the worldwide industry point-of-sale standard: JavaPOS. At IBM Research, some highlights include pioneering research on semantic Web services, mashups, and cloud computing, and platform-as-a-service. He joined the IBM Cloud Labs in 2014 and works closely with Pivotal Inc., to help make the Cloud Found the best PaaS.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.