| By Jeff Peters, Dave Ferguson | Article Rating: |
|
| January 11, 2007 12:30 AM EST | Reads: |
18,330 |
In the July 2006 issue of CFDJ, I wrote about the Directory Watcher event gateway, and how easy it was to set up and how powerful a tool it could be for managing files and external interfaces. While this is true, there are some potential hazards waiting for the unsuspecting developer who jumps into DW waters without a life preserver.
Fortunately, Dave Ferguson has used this particular gateway extensively in his work, and offered to share some of his experiences and solutions with us. Never one to turn down a good follow-up article, I immediately agreed to an education courtesy of Dave, and here it is. - Jeff Peters
When I read Jeff's article, I was interested because I use the DW gateway extensively and I was struck by memories of the double-edged nature of the DW gateway in action. Glory does not come cheap.
Imagine that our hero needs to build a DW gateway to monitor an FTP directory. He writes all the code, does all the tests, then puts the code into the production environment. When it's all done he marvels at his accomplishment, basking in the early glow of success. Then the phone rings - it's Murphy. The new DW gateway system is failing. Files aren't being processed, or only parts of files are being processed. Our hero checks the log files and to his horror it's filled with file-read errors from the gateway.
That beautiful, streamlined, straightforward process, written in accordance with all best practices, has gone to hell. Shaken, our hero begins banging his head against his desk and staring at the screen, wishing it worked the way it's supposed to. Everything is there, the gateway is firing when a file is written, but it's not processing files. Oh, yes -and to make matters worse, the problem only happens about 30% of the time.
I know all this because I was that guy. This exact scenario happened to me. I went from hero to zero in about 2.3 seconds. About three weeks of coding turned into two weeks of pain. It took me roughly five days to figure out what was going wrong. Then another week of trying to solve it, and when I reached the point of scrapping the whole thing, a light went off in my head. I had one of those moments of clarity where the solution seemed so clear. Now all I had to do was create a test and see if my bright idea would work. Or was it just a flash of light?
My pain and anguish is your reward. I'm going to walk through the problems with the DW gateway and provide some tools to overcome them. I tried to make the code as lean and mean as possible for performance reasons, and the principles can be applied to your DW gateway with minimal effort. For purposes of this article I'm going to assume that the reader has a general understanding of the DW gateway. Details on setting up a DW gateway and how it operates are available in the article entitled "DirectoryWatcher Event Gateway: Ditching the Scheduler" in the July '06 issue of CFDJ.
I was using DW to watch a directory for any new files uploaded via FTP. I needed to process each new file after it uploaded. The root cause of the scenario I experienced was, in my opinion, a flaw in the DW gateway that causes the gateway onAdd event to fire before the file is finished writing. fires
A second problem happens when a file is locked for writing and usually occurs when a DW gateway is watching a network shared directory.
Both of these problems are fatal for the DW gateway, but it's possible to get around them and process the file. ( Note: The failures I encountered had almost nothing to do with the size of the file being written; it was all about the timing. However, the bigger the file, the more apt we are to run into problems.)
A third issue is that it's hard to access debugging information from the DW gateway. It's not just an issue with the DW gateway but any gateway or other call that doesn't output to a browser. This issue is fixable too, but it takes a bit of effort on the part of the developer. The important thing to remember is that any piece of code can fail at any given time. With the DW gateway there's no user to get an error message, so we need to send those messages to someplace useful.
Before we can try to fix these issues we have to understand their nature and how CF reacts to each one. Like I said , file size has almost nothing to do with the problem, it's all about timing: when the gateway fires and when the file stops writing. Unfortunately, the gateway doesn't have a built-in way to detect that a file write is complete.
Let's take a closer look at the first issue. Since the server starts writing the file to the drive as soon as it's received, the gateway can pick up the file as soon as a single byte is written. While this in itself isn't a problem, it can become a problem for the gateway if the gateway kicks off a process that needs the entire file. The gateway itself doesn't fail, but if we used CFFILE to read the file content we wouldn't get the whole file. Only that part of the file that was uploaded when you tried to read it would be returned. So we need a way to have our process continue only if the whole file has been uploaded.The second issue is a file that's write-locked, which normally happens when writing a file to a shared directory or a UNC path using a Windows server. If we write a 5MB file to a shared directory Windows instantly creates an empty 5MB file as a placeholder. It then copies the file to the destination. During this operation there's an exclusive write-lock on the file that prevents anything from trying to read the file until the write is complete. Unlike the first issue, if we try to use CFFILE to read the file while it's locked, the gateway crashes. So we also need a way to avoid reading locked files.
While both these issues seem pretty extreme they're fairly easy to overcome. We can even couple the fixes for each scenario together for a more robust solution. When I first wrote the fixes they were independent but over time I've combined them and put them in every DW gateway I create. Let's take a look at them.
We can avoid reading a file that's still being written when the onAdd event fires by putting the CF thread to sleep and checking the file's size repeatedly until it stops changing. Of course this can tax server performance and I have experimented with other solutions that work, but it's is the most straightforward approach.
Note: All the code examples in the article have been stripped to a minimum to save space. The complete code is available online.
Here's a normal onAdd event inside a gateway cfc file:
<CFFUNCTION NAME="onAdd" ACCESS="public" RETURNTYPE="string">
<CFARGUMENT NAME="CFEvent" TYPE="struct" REQUIRED="yes">
<CFSET thisFile = cfevent.data.filename>
<CFFILE ACTION="READ" FILE="#thisFile#" VARIABLE="fileContent">
</CFFUNCTION>
At the third line we get the file name and path then on the fourth we read the file. If the file write wasn't complete, the file read would get a partial file.
Take a look at the code in Listing 1. There we attempt to overcome the issue by checking the size of the file. This is a delicate operation because we don't want to disturb the write of the file. To do this we'll leverage the java.io.FileInputStream. Let's walk through the code and examine what's going on.
First we create a Java object to read the file system:
fileRead = createObject("java","java.io.FileInputStream");
This will enable us to check the file size. This can be done with any of several Java file objects, or we could do it with CFDIRECTORY. However, this way lets us do a non-blocking read on the file. We can use it to get a byte count that can be read from the file without causing a block. We want to do our best to make sure that we don't disrupt anything that's happening to the file while it's still uploading.
Next we create a Java object that lets us manipulate the current thread. Now we can pause the code:
thisThread = CreateObject("java", "java.lang.Thread");
Initialize the Java file object with the file that triggered the onAdd event:
fileRead.init(thisFile);
Next we set a variable loopCT to count the loops and control a while loop that, if gone unchecked, could run infinitely. Then we get the current size of the file and pause the thread for a second:
sizeA = fileRead.available();
thisThread.sleep(1000);
After the one-second pause we check the file size again and do a comparison. If the file is still being written the sizes won't match. If the file sizes match, we do one more check, just in case there's an FTP pause or some other delay that caused a pause in the file writing:
if (sizeA EQ sizeB){
thisThread.sleep(1000);
sizeC = fileRead.available();
if (sizeC EQ sizeB){
break;
}
}
If the file sizes still match then the file is finished writing. If the sizes don't match then we continue looping and start the process all over again. Keep an eye on the loop count. We don't want the loop to go on forever. If every pause was hit on every loop then this check would stop the loop after about two minutes of runtime. We would then have to put some code in that would handle this condition. In the example we just return:".
incrementValue(loopCT);
if (loopCT GT 60){
return ;
}
Lastly, we close the fileRead object. If this isn't done we could end up with a lock on the file that could only be released by stopping and starting ColdFusion.
fileRead.close();
After all this is done we can process the file as the application requires. In this example, I've removed the error checking from the code so we can focus on the file size check and not clutter the code.
The example above handles a file that we can read even though it's still in a write state. The other scenario is a little trickier to get around. There we were trying to get around a file being locked when we try to read it. Even though java.io.fileInputStream does a non-blocking read it still can't get around this hurdle. The only thing to do is to attempt to read the file and trap the read error. It's not very elegant but it gets the job done. The code in Listing 2 works to overcome the lock issue. There are some advanced Java objects you can use to check for locks. However, they do just about the same thing. They try to put a lock on a file to see if it worked. It works on the basis that only one process can have a write-lock on a file.
The flow of this code is just about the same as the code from Listing 1. The difference is that we put a try/catch around the fileRead.init(). This lets us try to read the file without causing the code to fail. If the file is locked when we try to read it, we fail to catch and sleep the thread. The code continues to loop until it can read the file or the loop count hits 60.
The code in Listing 3 is the best of both worlds. We combine the file lock check with the file size check. My suggestion would be to always do it this way. You can never be too safe when it comes to running a DW gateway.
Now that we know how to check for file-write completion and overcome file blocking, we can start to get creative. We can create a process that can handle multiple files as one batch. This is done by using CFDIRECTORY to check file counts in the directory. We can then cause the current DW gateway event to quit if the directory count increases. The next gateway event will pick up the new file. Each gateway event does the same check. When the file counts no longer increase then the last gateway run processes all the files in the directory. The code and detailed explanation is too lengthy for this article, but the full code, with documentation, will be available for download.
Published January 11, 2007 Reads 18,330
Copyright © 2007 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jeff Peters
Jeff Peters works for Open Source Data Integration Software company XAware.
More Stories By Dave Ferguson
Dave Ferguson is a system architect and principal programmer. He has been doing website design and development for over 10 years. He is also a Certified Advanced ColdFusion Developer. You can read his blog at http://dfoncf.blogspot.com
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Cloud Computing Journal: Adobe to Deliver ColdFusion in the Cloud
- Adobe Reader Sued
- Adobe May Cooperate with Apple to Transplant Flash Player to iPhone
- Adobe Flex Developer Earns $100K in New York City
- Adobe LiveCycle Enterprise Suite 2 for Cloud Computing
- Adobe Cans Another 9% of its Workforce
- Adobe Betas Target RIAs and Cloud Computing
- Adobe MAX 2009 Online
- Thinking of Flex in London
- Moyea DVD4Web Converter V2.0 Converts DVD to FLV Fast and Synchronously with Watermarks
- Adobe & Salesforce Cut Cloud Deal
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Eval JavaScript in a Global Context
- Fig Leaf Software to Exhibit at Government IT Conference & Expo
- Is Microsoft as Free as Open Source?
- Cloud Computing Journal: Adobe to Deliver ColdFusion in the Cloud
- Adobe Reader Sued
- The Planet Named “Bronze Sponsor” of Cloud Computing Expo
- Microsoft Expression Web Has Got Game
- Adobe May Cooperate with Apple to Transplant Flash Player to iPhone
- Bruce Chizen Joins Voyager Capital as Venture Partner
- My Top Seven Wishes From Adobe MAX 2009
- Adobe Flex Developer Earns $100K in New York City
- The Next Programming Models, RIAs and Composite Applications
- Where Are RIA Technologies Headed in 2008?
- Constructing an Application with Flash Forms from the Ground Up
- AJAX World RIA Conference & Expo Kicks Off in New York City
- CFEclipse: The Developer's IDE, Eclipse For ColdFusion
- Personal Branding Checklist
- Adobe Flex 2: Advanced DataGrid
- Has the Technology Bounceback Begun?
- Building a Zip Code Proximity Search with ColdFusion
- i-Technology Viewpoint: We Need Not More Frameworks, But Better Programmers
- The Asynchronous CFML Gateway
- Web Services Using ColdFusion and Apache CXF





































