You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: ColdFusion

ColdFusion: Article

Writing an RSS Aggregator

Completing the task

Two months ago I put together an article about building an RSS aggregator (CFDJ, Vol. 8, issue 5). Before reading this you might want to refresh your mind on the original article. Go over here - http://coldfusion.sys-con.com/read/235976.htm - to read it.

I discussed what an aggregator is and why we care to write one. RSS is a version of XML that is used to make syndicating data easy. Most blogs have an RSS feed attached to them. An aggregator takes a bunch of blog feeds and combines them. Weblogs.macromedia.com and FullAsaGoog.com are two great examples of aggregators in the ColdFusion community.

The last article stepped you through the thought process of designing the database and object model. We built two of the components: an RSSCategory component that is used to categorize the RSS feeds and an RSSFeed component that is used to enter an RSS feed into the database. We also wrote some admin code to enter a new RSS feed into the database. The article was, unfortunately, lacking the real meat of things, which is the RSSAggregator component. In this article, we'll flesh it out along with the item component and the scheduled task for running in. Before we do that, I did find one bug so let's fix that.

One Quick Bug Fix
While testing the code from the last article, I discovered a bug. It happens to the best of us, right? When entering the feed into the database (Feedip.cfm), I had written some code to retrieve the feed using cfhttp and then parse the XML to get the feed's title. This worked fine for standard RSS feeds, and referenced the title like this:


The problem here is that the root element, RSS, is hard coded. When I tried to run this code against the weblogs.macromedia.com site, it didn't work. The reason is the RSS feed offered by weblogs.macromedia.com is RDF. The root element isn't RSS, it is rdf:RDF. The fix for this was easy:


Instead of hard coding the RSS root name, I used the xmlroot value. RDF and RSS handle items differently too, so this will come into play in some of the code from this article.

Writing the Item Component
In RSS, an item is the equivalent of a single blog post. For each item, we are storing the link to the original item, the title, and the description. There can be a lot more data associated with this, but for the sake of these articles, I decided to keep it simple. The component also has some internal values, such as a primary key named ItemID and a foreign key named RSSFeedID. The RSSFeedID tells us which item the RSS Feed has. The Item component code is shown in Listing 1.

The component starts with the cfcomponent tag (of course) and the pseudo constructor code. The pseudo constructor sets up the instance variables of the component. Once again, our components are borrowing Hal Helms basecomponent from http://halhelms.com/webresources/BaseComponent.cfc. I use this instead of writing manual getter and setter methods.

Other than inherited methods, this component contains an init method and a commit method. The init method takes an ItemID and the datasource and loads all the relevant information from a database. The commit method will insert, or update, the information in the database as needed.

Creating the Aggregator
The RSSAggregator component is shown in Listing 2. This component is a bit different than many I usually write, since its purpose is not to get data in and out of a database. It is pulling data from some far away off place and putting it in our database. This component does not contain an instance variable, and as such does not have a pseudo constructor. There are three methods, which I can explain in more detail.

The first method is GetAllFeeds. It is a private method, so it cannot be called outside of the CFC. It runs a query to retrieve all the feeds that are being watched in the database. The method returns the query. There is nothing special about this.

The second method it called ItemExists. It accepts an item component (which you learned about in the previous section of this article) and checks to see whether this item already exists in the database. If it does, it returns true, otherwise it returns false. I made the assumption that each item has a unique URL pointing to it, so that is the value the code checks to see if the item is unique.

The third method is an init method. This is the one that retrieves the feeds and stores the data in the database, if relevant. This is the only public method in the component; the getallfeeds and ItemExists methods are used by init. The init method starts by setting some local variables. These are the local variables:

  • GetAllFeeds: This variable calls the GetAllFeeds function. It contains a list of all feeds that we want to pull data from.
  • Error: The second local variable is the error variable. The most likely cause of an error when running this code will be that a feed times out. If the feed times out, we won't want to attempt to process the data. It defaults to false, which is no error.
  • TempItem: As we loop over the feed data, this item object will be used to decide whether the data is duplicated or needs to be saved. The item component also contains the SQL for saving the item component.
  • ItemArray: The ItemArray value is a temporary value that will contain an array of all items in the current feed.
  • TempItemindex: When we look over the ItemArray array, a counter will be needed to keep track of which item is being examined at the moment. This is the counter variable.
  • MyxMLVar: The MyXMLVar will contain the returned XML feed.
While initializing the local variables, the GetAllFeeds query was executed. The code starts by looping over it. The error variable is initialized to false. You want to make sure to initialize it each time through the loop, so that you aren't processing the current feed based on the result of the previous feed.

Next comes a try block. Inside the try block is code to retrieve the RSS feed. If the feed times out, a catch block switches the error value to try. Exiting the try block, if the error is false, the code processes the feed. If the error is true, skip the processing and go right to the next feed. Although left out at this time, there should probably be some sort of logging for feeds that cause errors.

Earlier I spoke about the differences between RSS and RDF. Most blogs I read pass out data in the RSS format, but weblogs.macromedia.com was using RDF. You can read more about RDF at www.w3.org/RDF/. In RSS, items are stored inside the channel. In RDF they are not. The ItemArray is initialized differently depending on the root. (If this code tries to parse another flavor of XML, it will cause problems.) The next code block uses cfloop to loop over the Item Array. It creates an item object using the tempitem variable. It sets the relevant instance data, then it uses the ItemExists function to check whether or not the item exists yet. If it doesn't, the commit method is run to save the data. Otherwise, nothing happens. The loop ends, and the method returns true. This is simple stuff, right?

The Scheduled Task
There is one last bit of code to examine in this article and that is the actual scheduled task code. The bulk of the code is located in the component, so all the scheduled task does is create an instance of the RSSAggregator and run the init method. The code looks like this:

    variables.RSSAggregator = CreateObject("component","#request.ComponentLoc#.RSSAggregator");

It is probably one of the easiest scheduled tasks I've ever written.

This app is far from complete. Most RSS feeds contain a lot more data than just link, description, and title. A full-featured app would address those features. I left them out in the interest of length. Also, we've only built code for collecting RSS data. There is no "view' portion of this app. The only way to view the data you're collecting is to open up the database. That isn't conducive to good usability.

Every good project needs a code name, and I decided to give this project one. After some deep soul-searching, I've decided to name this project MyFriend. There are two reasons for this. The first is that I modeled the whole idea after the LiveJournal friends list. The second is that, My Friend is the name of a song that my first band recorded the first time we went into a recording studio. It was my first time in a professional recording studio, and I had been playing bass for less than a month. The results came out better than you might have expected, really. I'm a sentimental freak. Check out my www.jeffryhouser.com for the latest version of this code and let me know what you think.

More Stories By Jeffry Houser

Jeffry is a technical entrepreneur with over 10 years of making the web work for you. Lately Jeffry has been cooped up in his cave building the first in a line of easy to use interface components for Flex Developers at www.flextras.com . He has a Computer Science degree from the days before business met the Internet and owns DotComIt, an Adobe Solutions Partner specializing in Rich Internet Applications. Jeffry is an Adobe Community Expert and produces The Flex Show, a podcast that includes expert interviews and screencast tutorials. Jeffry is also co-manager of the Hartford CT Adobe User Group, author of three ColdFusion books and over 30 articles, and has spoken at various events all over the US. In his spare time he is a musician, old school adventure game aficionado, and recording engineer. He also owns a Wii. You can read his blog at www.jeffryhouser.com, check out his podcast at www.theflexshow.com or check out his company at www.dot-com-it.com.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@ThingsExpo Stories
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, introduced two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a multip...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics gr...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things’). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing? IoT is not about the devices, it’s about the data consumed and generated. The devices are tools, mechanisms, conduits. In his session at Internet of Things at Cloud Expo | DXWor...