Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: ColdFusion, Adobe Flex

ColdFusion: Article

ColdFusion Feature — Directory Watcher Dangers - A Follow-Up

A roadmap from the trenches

In the July 2006 issue of CFDJ, I wrote about the Directory Watcher event gateway, and how easy it was to set up and how powerful a tool it could be for managing files and external interfaces. While this is true, there are some potential hazards waiting for the unsuspecting developer who jumps into DW waters without a life preserver.

Fortunately, Dave Ferguson has used this particular gateway extensively in his work, and offered to share some of his experiences and solutions with us. Never one to turn down a good follow-up article, I immediately agreed to an education courtesy of Dave, and here it is. - Jeff Peters

When I read Jeff's article, I was interested because I use the DW gateway extensively and I was struck by memories of the double-edged nature of the DW gateway in action. Glory does not come cheap.

Imagine that our hero needs to build a DW gateway to monitor an FTP directory. He writes all the code, does all the tests, then puts the code into the production environment. When it's all done he marvels at his accomplishment, basking in the early glow of success. Then the phone rings - it's Murphy. The new DW gateway system is failing. Files aren't being processed, or only parts of files are being processed. Our hero checks the log files and to his horror it's filled with file-read errors from the gateway.

That beautiful, streamlined, straightforward process, written in accordance with all best practices, has gone to hell. Shaken, our hero begins banging his head against his desk and staring at the screen, wishing it worked the way it's supposed to. Everything is there, the gateway is firing when a file is written, but it's not processing files. Oh, yes -and to make matters worse, the problem only happens about 30% of the time.

I know all this because I was that guy. This exact scenario happened to me. I went from hero to zero in about 2.3 seconds. About three weeks of coding turned into two weeks of pain. It took me roughly five days to figure out what was going wrong. Then another week of trying to solve it, and when I reached the point of scrapping the whole thing, a light went off in my head. I had one of those moments of clarity where the solution seemed so clear. Now all I had to do was create a test and see if my bright idea would work. Or was it just a flash of light?

My pain and anguish is your reward. I'm going to walk through the problems with the DW gateway and provide some tools to overcome them. I tried to make the code as lean and mean as possible for performance reasons, and the principles can be applied to your DW gateway with minimal effort. For purposes of this article I'm going to assume that the reader has a general understanding of the DW gateway. Details on setting up a DW gateway and how it operates are available in the article entitled "DirectoryWatcher Event Gateway: Ditching the Scheduler" in the July '06 issue of CFDJ.

I was using DW to watch a directory for any new files uploaded via FTP. I needed to process each new file after it uploaded. The root cause of the scenario I experienced was, in my opinion, a flaw in the DW gateway that causes the gateway onAdd event to fire before the file is finished writing. fires
A second problem happens when a file is locked for writing and usually occurs when a DW gateway is watching a network shared directory.

Both of these problems are fatal for the DW gateway, but it's possible to get around them and process the file. ( Note: The failures I encountered had almost nothing to do with the size of the file being written; it was all about the timing. However, the bigger the file, the more apt we are to run into problems.)

A third issue is that it's hard to access debugging information from the DW gateway. It's not just an issue with the DW gateway but any gateway or other call that doesn't output to a browser. This issue is fixable too, but it takes a bit of effort on the part of the developer. The important thing to remember is that any piece of code can fail at any given time. With the DW gateway there's no user to get an error message, so we need to send those messages to someplace useful.

Before we can try to fix these issues we have to understand their nature and how CF reacts to each one. Like I said , file size has almost nothing to do with the problem, it's all about timing: when the gateway fires and when the file stops writing. Unfortunately, the gateway doesn't have a built-in way to detect that a file write is complete.

Let's take a closer look at the first issue. Since the server starts writing the file to the drive as soon as it's received, the gateway can pick up the file as soon as a single byte is written. While this in itself isn't a problem, it can become a problem for the gateway if the gateway kicks off a process that needs the entire file. The gateway itself doesn't fail, but if we used CFFILE to read the file content we wouldn't get the whole file. Only that part of the file that was uploaded when you tried to read it would be returned. So we need a way to have our process continue only if the whole file has been uploaded.The second issue is a file that's write-locked, which normally happens when writing a file to a shared directory or a UNC path using a Windows server. If we write a 5MB file to a shared directory Windows instantly creates an empty 5MB file as a placeholder. It then copies the file to the destination. During this operation there's an exclusive write-lock on the file that prevents anything from trying to read the file until the write is complete. Unlike the first issue, if we try to use CFFILE to read the file while it's locked, the gateway crashes. So we also need a way to avoid reading locked files.

While both these issues seem pretty extreme they're fairly easy to overcome. We can even couple the fixes for each scenario together for a more robust solution. When I first wrote the fixes they were independent but over time I've combined them and put them in every DW gateway I create. Let's take a look at them.

We can avoid reading a file that's still being written when the onAdd event fires by putting the CF thread to sleep and checking the file's size repeatedly until it stops changing. Of course this can tax server performance and I have experimented with other solutions that work, but it's is the most straightforward approach.

Note: All the code examples in the article have been stripped to a minimum to save space. The complete code is available online.

Here's a normal onAdd event inside a gateway cfc file:

<CFFUNCTION NAME="onAdd" ACCESS="public" RETURNTYPE="string">
<CFARGUMENT NAME="CFEvent" TYPE="struct" REQUIRED="yes">
<CFSET thisFile = cfevent.data.filename>
<CFFILE ACTION="READ" FILE="#thisFile#" VARIABLE="fileContent">
</CFFUNCTION>

At the third line we get the file name and path then on the fourth we read the file. If the file write wasn't complete, the file read would get a partial file.

Take a look at the code in Listing 1. There we attempt to overcome the issue by checking the size of the file. This is a delicate operation because we don't want to disturb the write of the file. To do this we'll leverage the java.io.FileInputStream. Let's walk through the code and examine what's going on.

First we create a Java object to read the file system:

fileRead = createObject("java","java.io.FileInputStream");

This will enable us to check the file size. This can be done with any of several Java file objects, or we could do it with CFDIRECTORY. However, this way lets us do a non-blocking read on the file. We can use it to get a byte count that can be read from the file without causing a block. We want to do our best to make sure that we don't disrupt anything that's happening to the file while it's still uploading.

Next we create a Java object that lets us manipulate the current thread. Now we can pause the code:

thisThread = CreateObject("java", "java.lang.Thread");

Initialize the Java file object with the file that triggered the onAdd event:

fileRead.init(thisFile);

Next we set a variable loopCT to count the loops and control a while loop that, if gone unchecked, could run infinitely. Then we get the current size of the file and pause the thread for a second:

sizeA = fileRead.available();
thisThread.sleep(1000);

After the one-second pause we check the file size again and do a comparison. If the file is still being written the sizes won't match. If the file sizes match, we do one more check, just in case there's an FTP pause or some other delay that caused a pause in the file writing:

if (sizeA EQ sizeB){
thisThread.sleep(1000);
sizeC = fileRead.available();
if (sizeC EQ sizeB){
break;
}
}

If the file sizes still match then the file is finished writing. If the sizes don't match then we continue looping and start the process all over again. Keep an eye on the loop count. We don't want the loop to go on forever. If every pause was hit on every loop then this check would stop the loop after about two minutes of runtime. We would then have to put some code in that would handle this condition. In the example we just return:".

incrementValue(loopCT);
if (loopCT GT 60){
return ;
}

Lastly, we close the fileRead object. If this isn't done we could end up with a lock on the file that could only be released by stopping and starting ColdFusion.

fileRead.close();

After all this is done we can process the file as the application requires. In this example, I've removed the error checking from the code so we can focus on the file size check and not clutter the code.

The example above handles a file that we can read even though it's still in a write state. The other scenario is a little trickier to get around. There we were trying to get around a file being locked when we try to read it. Even though java.io.fileInputStream does a non-blocking read it still can't get around this hurdle. The only thing to do is to attempt to read the file and trap the read error. It's not very elegant but it gets the job done. The code in Listing 2 works to overcome the lock issue. There are some advanced Java objects you can use to check for locks. However, they do just about the same thing. They try to put a lock on a file to see if it worked. It works on the basis that only one process can have a write-lock on a file.

The flow of this code is just about the same as the code from Listing 1. The difference is that we put a try/catch around the fileRead.init(). This lets us try to read the file without causing the code to fail. If the file is locked when we try to read it, we fail to catch and sleep the thread. The code continues to loop until it can read the file or the loop count hits 60.

The code in Listing 3 is the best of both worlds. We combine the file lock check with the file size check. My suggestion would be to always do it this way. You can never be too safe when it comes to running a DW gateway.

Now that we know how to check for file-write completion and overcome file blocking, we can start to get creative. We can create a process that can handle multiple files as one batch. This is done by using CFDIRECTORY to check file counts in the directory. We can then cause the current DW gateway event to quit if the directory count increases. The next gateway event will pick up the new file. Each gateway event does the same check. When the file counts no longer increase then the last gateway run processes all the files in the directory. The code and detailed explanation is too lengthy for this article, but the full code, with documentation, will be available for download.


More Stories By Jeff Peters

Jeff Peters works for Open Source Data Integration Software company XAware.

More Stories By Dave Ferguson

Dave Ferguson is a system architect and principal programmer. He has been doing website design and development for over 10 years. He is also a Certified Advanced ColdFusion Developer. You can read his blog at http://dfoncf.blogspot.com

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to gre...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develop...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.