| By Christian Thompson | Article Rating: |
|
| June 16, 2003 12:00 AM EDT | Reads: |
16,686 |
I read somewhere that when faced with a task that takes one hour to do manually, or one hour to automate, a good programmer will choose to automate the process. As ColdFusion developers, we often face this decision when we need to programmatically use data contained in a text file.
There are two ways to access this data - automating the process by parsing the text file or manually inputting the data. This article presents some basic techniques that can help you make the choice to be a "good programmer" by automating the process.
Bulk upload and data import are the two primary types of functionality in which text file parsing is used in Web application development. The main difference between these two is the format of the text file being parsed. Bulk upload functionality parses text files designed specifically to be processed programmatically, while data import functionality tries to parse files created for human consumption. To handle these differences, slightly different techniques are required.
Bulk Upload
Figure 1 shows an example of a file created for bulk upload. It contains a listing of new employees who need to be added to an employee database. Because this file was created specifically for computer processing, it was created using a table structure. The first row contains the column names, and each subsequent row contains one employee record with name, phone, and e-mail address columns.
The table structure makes parsing this file pretty easy with either the cfloop tag or the cfhttp tag. To use the cfloop tag, first read the file into a variable (str_Content in the example below) using the cffile tag and then loop over the content using the correct delimiter (carriage return/line feed - chr(13)chr(10) - on Microsoft Windows and line feed - chr(10) - on Unix). Use the listGetAt() function to access specific fields in each line.
<cfset bln_FirstLine = true>
<cfloop
list="#str_Content#"
delimiters="#chr(13)##chr(10)#"
index="str_Line">
<!--- Ignore the column name
line --->
<cfif bln_FirstLine is false>
<cfset str_Name =
listGetAt(str_Line, 1, ",")>
<cfset str_Phone =
listGetAt(str_Line, 2, ",")>
<cfset str_Email =
listGetAt(str_Line, 3, ",")>
<cfelse>
<cfset bln_FirstLine = false>
</cfif>
</cfloop>
For rigidly formatted documents accessible via HTTP, the cfhttp tag makes parsing the document even easier. Given the right parameters, this tag will automatically parse the content and return a query object containing the results. The following example produces the output shown in Figure 2:
<cfhttp method="GET"
url="#dir_URL#/bulkupdate.txt"
name="qry_Contents"
delimiter=","
textQualifier="">
</cfhttp>
<cfdump var="#qry_Contents#">
Data Import
Figure 3 shows an example of a file that was not created to be used programmatically. It is an employee directory designed to be viewed by humans. Parsing files like this one is more complicated because each line cannot be processed in the same way. The logic must first determine what information is contained in the current line and then process the line accordingly. One way to do this is to track the current line type with a variable. After each line is processed, you should be able to infer the type of line that will be processed next, based on the current line type, and set the variable accordingly.
For example, when processing the line containing the employee's name, you know the next line will contain the employee's phone number. Therefore, after processing the name line, you set the line type equal to "phone" and loop. On the next loop, the appropriate logic processes the phone line, sets the line type back to "name," and the process repeats.
<!--- The first line contains
the employee's name --->
<cfset str_LineType = "name">
<cfloop list="#str_Content#"
delimiters="#chr(13)##chr(10)#"
index="str_Line">
<cfif str_LineType is "name">
<cfset str_Name = str_Line>
<cfset str_LineType = "phone">
<cfelseif str_LineType is "phone">
<cfset str_Phone =
listGetAt(str_Line, 2, ": ")>
<cfset str_LineType = "name">
</cfif>
</cfloop>
CFML Parsing Functionality Limitations
As the examples above show, ColdFusion has several powerful built-in functions and tags that you can use for parsing, such as listGetAt(), cfhttp, and cfloop. Unfortunately, these functions share a common limitation: they treat consecutive delimiters as one delimiter. For example, the built-in ColdFusion functions consider the line "Jim Doe,,jim@acompany.com" to have only two tokens ("Jim Doe" and "jim@acompany.com"), even though the strings are separated by two commas.
This was not a problem in the above examples because there were no empty fields in the data being processed. If the data did have empty fields, however, the example code would process it incorrectly. Consider the result of parsing the line "Jim Doe,,jim@acompany.com" with the bulk update code in the first example. Since the listGetAt function treats consecutive delimiters as one delimiter, the code would set str_Phone equal to "jim@acompany.com" and str_Email equal to a blank string. Obviously, this is incorrect.
My Solution
I have developed a ColdFusion tag to address this shortcoming: TextParse.cfm (Listing 1). The TextParse tag treats consecutive delimiters as delimiters surrounding an empty string. Therefore, it considers "Jim Doe,,jim@acompany.com" as having three tokens - "Jim Doe", "", and "jim@acompany.com" - rather than two.
The code implementing the TextParse tag is relatively straightforward. From a high level, it simply loops over the delimiters in the content, and extracts the strings that fall between the delimiters. This logic is actually done twice, once in the tag's body and once in the function TokenizeLine(). The tag body breaks the content into separate lines and TokenizeLine() then breaks each line into tokens.
Like cfloop, the TextParse tag is used by placing the code to process each line between its start and end tags. The tag takes three parameters: str_Filename, str_LineDelimiter, and str_TokenDelimiter. The variable "str_Filename" expects the full path of the file to be parsed. "str_LineDelimiter" allows you to specify the delimiter used to separate the lines in the file (by default "#chr(13)##chr(10)#"), and "str_TokenDelimiter" allows you to specify the delimiter to separate the tokens in each line (by default ","). The TextParse tag returns two variables to the caller scope: TextParse.str_Line and TextParse.ar_Tokens. TextParse.strLine is a string containing the complete current line and TextParse.ar_Tokens is an array containing the tokens of the current line.
The following example demonstrates the use of the TextParse tag to parse BulkUpload.txt. Because text files often have inconsistent formats, I wrapped the token array accesses in a cftry/cfcatch statement. Without the error handling, the example code would generate an error when processing a line with fewer than expected fields. Although this seems to complicate things, it's good to know when a line doesn't have the right format, since your parsing logic might otherwise handle it incorrectly. The error handling allows you to flag the line for later examination, and continue processing.
<cf_TextParse
str_Filename="c:\BulkUpload.txt">
<cftry>
<cfif TextParse.ar_Tokens[1]
is not "Name">
<cfset str_Name =
TextParse.ar_Tokens[1]>
<cfset str_Phone =
TextParse.ar_Tokens[2]>
<cfset str_Email =
TextParse.ar_Tokens[3]>
</cfif>
<cfcatch>
<!--- Handle error --->
</cfcatch>
</cftry>
</cf_TextParse>
Conclusion
ColdFusion developers can use text file parsing to import data meant for human consumption and to allow Web site users to make bulk uploads. Using text file parsing effectively requires knowledge of basic techniques as well as the limitations of ColdFusion's parsing functionality. Armed with the techniques presented above and the TextParse tag, the decision to automate data import and bulk upload processes should be an easier one. Having made the choice to automate, you can then be confident in your status as a "good programmer."
Published June 16, 2003 Reads 16,686
Copyright © 2003 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
About Christian Thompson
Christian Thompson is a certified advanced Macromedia ColdFusion MX developer. He is a senior software engineer for Inserso, a technology consulting firm headquartered in Annandale, VA, where he has specialized in ColdFusion application development for over two years.
![]() |
DeUndre' Rushon 04/16/08 05:01:14 PM EDT | |||
In the code below: <cfset bln_FirstLine = true> <!--- Ignore the column name <cfset bln_FirstLine = false> Is there a possibility that "index" variable within the might night be utilized within other functions used inside the tag? |
||||
- AJAX World RIA Conference & Expo Kicks Off in New York City
- Ulitzer’s Amazing First 30 Days in Public Beta
- "Government IT Expo" to Highlight Cloud Computing and SOA
- Will Ulitzer Dominate News Content on The Web? -Gartner
- Clear Toolkit 4: The Road Map
- Creating Adobe AIR Native Menu with Flash CS4
- Ulitzer Responds to Published Reports
- Ulitzer vs. Ning - a Quick Review
- Adobe AIR: Creating Dock and System Tray Icon Menus
- Social Media Terrorists
- AJAX World RIA Conference & Expo Kicks Off in New York City
- Web Services Using ColdFusion and Apache CXF
- Adobe Takes LiveCycle into the Cloud
- Ulitzer’s Amazing First 30 Days in Public Beta
- Adobe Creates a Sandbox in the Sky
- "Government IT Expo" to Highlight Cloud Computing and SOA
- Will Ulitzer Dominate News Content on The Web? -Gartner
- The Role of an RIA in the Enterprise
- Clear Toolkit 4: The Road Map
- Creating Adobe AIR Native Menu with Flash CS4
- The Next Programming Models, RIAs and Composite Applications
- Constructing an Application with Flash Forms from the Ground Up
- AJAX World RIA Conference & Expo Kicks Off in New York City
- CFEclipse: The Developer's IDE, Eclipse For ColdFusion
- Personal Branding Checklist
- Adobe Flex 2: Advanced DataGrid
- i-Technology Viewpoint: We Need Not More Frameworks, But Better Programmers
- The Asynchronous CFML Gateway
- Building a Zip Code Proximity Search with ColdFusion
- Web Services Using ColdFusion and Apache CXF






































