Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: ColdFusion, Adobe Flex

ColdFusion: Article

ColdFusion Feature — Directory Watcher Dangers - A Follow-Up

A roadmap from the trenches

In the July 2006 issue of CFDJ, I wrote about the Directory Watcher event gateway, and how easy it was to set up and how powerful a tool it could be for managing files and external interfaces. While this is true, there are some potential hazards waiting for the unsuspecting developer who jumps into DW waters without a life preserver.

Fortunately, Dave Ferguson has used this particular gateway extensively in his work, and offered to share some of his experiences and solutions with us. Never one to turn down a good follow-up article, I immediately agreed to an education courtesy of Dave, and here it is. - Jeff Peters

When I read Jeff's article, I was interested because I use the DW gateway extensively and I was struck by memories of the double-edged nature of the DW gateway in action. Glory does not come cheap.

Imagine that our hero needs to build a DW gateway to monitor an FTP directory. He writes all the code, does all the tests, then puts the code into the production environment. When it's all done he marvels at his accomplishment, basking in the early glow of success. Then the phone rings - it's Murphy. The new DW gateway system is failing. Files aren't being processed, or only parts of files are being processed. Our hero checks the log files and to his horror it's filled with file-read errors from the gateway.

That beautiful, streamlined, straightforward process, written in accordance with all best practices, has gone to hell. Shaken, our hero begins banging his head against his desk and staring at the screen, wishing it worked the way it's supposed to. Everything is there, the gateway is firing when a file is written, but it's not processing files. Oh, yes -and to make matters worse, the problem only happens about 30% of the time.

I know all this because I was that guy. This exact scenario happened to me. I went from hero to zero in about 2.3 seconds. About three weeks of coding turned into two weeks of pain. It took me roughly five days to figure out what was going wrong. Then another week of trying to solve it, and when I reached the point of scrapping the whole thing, a light went off in my head. I had one of those moments of clarity where the solution seemed so clear. Now all I had to do was create a test and see if my bright idea would work. Or was it just a flash of light?

My pain and anguish is your reward. I'm going to walk through the problems with the DW gateway and provide some tools to overcome them. I tried to make the code as lean and mean as possible for performance reasons, and the principles can be applied to your DW gateway with minimal effort. For purposes of this article I'm going to assume that the reader has a general understanding of the DW gateway. Details on setting up a DW gateway and how it operates are available in the article entitled "DirectoryWatcher Event Gateway: Ditching the Scheduler" in the July '06 issue of CFDJ.

I was using DW to watch a directory for any new files uploaded via FTP. I needed to process each new file after it uploaded. The root cause of the scenario I experienced was, in my opinion, a flaw in the DW gateway that causes the gateway onAdd event to fire before the file is finished writing. fires
A second problem happens when a file is locked for writing and usually occurs when a DW gateway is watching a network shared directory.

Both of these problems are fatal for the DW gateway, but it's possible to get around them and process the file. ( Note: The failures I encountered had almost nothing to do with the size of the file being written; it was all about the timing. However, the bigger the file, the more apt we are to run into problems.)

A third issue is that it's hard to access debugging information from the DW gateway. It's not just an issue with the DW gateway but any gateway or other call that doesn't output to a browser. This issue is fixable too, but it takes a bit of effort on the part of the developer. The important thing to remember is that any piece of code can fail at any given time. With the DW gateway there's no user to get an error message, so we need to send those messages to someplace useful.

Before we can try to fix these issues we have to understand their nature and how CF reacts to each one. Like I said , file size has almost nothing to do with the problem, it's all about timing: when the gateway fires and when the file stops writing. Unfortunately, the gateway doesn't have a built-in way to detect that a file write is complete.

Let's take a closer look at the first issue. Since the server starts writing the file to the drive as soon as it's received, the gateway can pick up the file as soon as a single byte is written. While this in itself isn't a problem, it can become a problem for the gateway if the gateway kicks off a process that needs the entire file. The gateway itself doesn't fail, but if we used CFFILE to read the file content we wouldn't get the whole file. Only that part of the file that was uploaded when you tried to read it would be returned. So we need a way to have our process continue only if the whole file has been uploaded.The second issue is a file that's write-locked, which normally happens when writing a file to a shared directory or a UNC path using a Windows server. If we write a 5MB file to a shared directory Windows instantly creates an empty 5MB file as a placeholder. It then copies the file to the destination. During this operation there's an exclusive write-lock on the file that prevents anything from trying to read the file until the write is complete. Unlike the first issue, if we try to use CFFILE to read the file while it's locked, the gateway crashes. So we also need a way to avoid reading locked files.

While both these issues seem pretty extreme they're fairly easy to overcome. We can even couple the fixes for each scenario together for a more robust solution. When I first wrote the fixes they were independent but over time I've combined them and put them in every DW gateway I create. Let's take a look at them.

We can avoid reading a file that's still being written when the onAdd event fires by putting the CF thread to sleep and checking the file's size repeatedly until it stops changing. Of course this can tax server performance and I have experimented with other solutions that work, but it's is the most straightforward approach.

Note: All the code examples in the article have been stripped to a minimum to save space. The complete code is available online.

Here's a normal onAdd event inside a gateway cfc file:

<CFFUNCTION NAME="onAdd" ACCESS="public" RETURNTYPE="string">
<CFARGUMENT NAME="CFEvent" TYPE="struct" REQUIRED="yes">
<CFSET thisFile = cfevent.data.filename>
<CFFILE ACTION="READ" FILE="#thisFile#" VARIABLE="fileContent">
</CFFUNCTION>

At the third line we get the file name and path then on the fourth we read the file. If the file write wasn't complete, the file read would get a partial file.

Take a look at the code in Listing 1. There we attempt to overcome the issue by checking the size of the file. This is a delicate operation because we don't want to disturb the write of the file. To do this we'll leverage the java.io.FileInputStream. Let's walk through the code and examine what's going on.

First we create a Java object to read the file system:

fileRead = createObject("java","java.io.FileInputStream");

This will enable us to check the file size. This can be done with any of several Java file objects, or we could do it with CFDIRECTORY. However, this way lets us do a non-blocking read on the file. We can use it to get a byte count that can be read from the file without causing a block. We want to do our best to make sure that we don't disrupt anything that's happening to the file while it's still uploading.

Next we create a Java object that lets us manipulate the current thread. Now we can pause the code:

thisThread = CreateObject("java", "java.lang.Thread");

Initialize the Java file object with the file that triggered the onAdd event:

fileRead.init(thisFile);

Next we set a variable loopCT to count the loops and control a while loop that, if gone unchecked, could run infinitely. Then we get the current size of the file and pause the thread for a second:

sizeA = fileRead.available();
thisThread.sleep(1000);

After the one-second pause we check the file size again and do a comparison. If the file is still being written the sizes won't match. If the file sizes match, we do one more check, just in case there's an FTP pause or some other delay that caused a pause in the file writing:

if (sizeA EQ sizeB){
thisThread.sleep(1000);
sizeC = fileRead.available();
if (sizeC EQ sizeB){
break;
}
}

If the file sizes still match then the file is finished writing. If the sizes don't match then we continue looping and start the process all over again. Keep an eye on the loop count. We don't want the loop to go on forever. If every pause was hit on every loop then this check would stop the loop after about two minutes of runtime. We would then have to put some code in that would handle this condition. In the example we just return:".

incrementValue(loopCT);
if (loopCT GT 60){
return ;
}

Lastly, we close the fileRead object. If this isn't done we could end up with a lock on the file that could only be released by stopping and starting ColdFusion.

fileRead.close();

After all this is done we can process the file as the application requires. In this example, I've removed the error checking from the code so we can focus on the file size check and not clutter the code.

The example above handles a file that we can read even though it's still in a write state. The other scenario is a little trickier to get around. There we were trying to get around a file being locked when we try to read it. Even though java.io.fileInputStream does a non-blocking read it still can't get around this hurdle. The only thing to do is to attempt to read the file and trap the read error. It's not very elegant but it gets the job done. The code in Listing 2 works to overcome the lock issue. There are some advanced Java objects you can use to check for locks. However, they do just about the same thing. They try to put a lock on a file to see if it worked. It works on the basis that only one process can have a write-lock on a file.

The flow of this code is just about the same as the code from Listing 1. The difference is that we put a try/catch around the fileRead.init(). This lets us try to read the file without causing the code to fail. If the file is locked when we try to read it, we fail to catch and sleep the thread. The code continues to loop until it can read the file or the loop count hits 60.

The code in Listing 3 is the best of both worlds. We combine the file lock check with the file size check. My suggestion would be to always do it this way. You can never be too safe when it comes to running a DW gateway.

Now that we know how to check for file-write completion and overcome file blocking, we can start to get creative. We can create a process that can handle multiple files as one batch. This is done by using CFDIRECTORY to check file counts in the directory. We can then cause the current DW gateway event to quit if the directory count increases. The next gateway event will pick up the new file. Each gateway event does the same check. When the file counts no longer increase then the last gateway run processes all the files in the directory. The code and detailed explanation is too lengthy for this article, but the full code, with documentation, will be available for download.


More Stories By Jeff Peters

Jeff Peters works for Open Source Data Integration Software company XAware.

More Stories By Dave Ferguson

Dave Ferguson is a system architect and principal programmer. He has been doing website design and development for over 10 years. He is also a Certified Advanced ColdFusion Developer. You can read his blog at http://dfoncf.blogspot.com

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The hierarchical architecture that distributes "compute" within the network specially at the edge can enable new services by harnessing emerging technologies. But Edge-Compute comes at increased cost that needs to be managed and potentially augmented by creative architecture solutions as there will always a catching-up with the capacity demands. Processing power in smartphones has enhanced YoY and there is increasingly spare compute capacity that can be potentially pooled. Uber has successfully ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...