Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: ColdFusion

ColdFusion: Article

Collaborative Filtering

Using predictive analysis to make recommendations

Collaborative filtering on the Web has existed for a long time, dating all the way back to the original incarnations of sites like CDNow and Amazon.com. Recommendation systems are a powerful tool for businesses to extract additional value from their e-commerce and customer databases. They benefit customers by enabling them to find products they like, and help businesses by generating more sales.

We're going to look at some of the basic principles of predictive systems and introduce some methods you can utilize to make recommendations in your own applications. Along the way, we'll attempt to point out the benefits and limitations of each type of system.

Basic Predictions
At the most basic level, predictive information can be provided manually for your items. This can be built into the back-end administration of the site. When adding products to an e-commerce site, you could include a multiple select box listing all of the additional items in that category. Selecting items in the list would create a list of product IDs to be stored in an additional "related items" field in your database. With one additional query on a product detail page, you can pull up details on all of the related items that have been associated with the item being viewed.

This scenario can provide quick, quality recommendations as the computer is not guessing at the association and also does not have to perform any on-the-fly calculations. The technique suffers, however, by requiring your product administrator to have a deep knowledge of the products in your store, which may be unrealistic for larger sites. It also requires you to continuously update the "related items" lists of older items as new products are added to the catalog.

User-Based Collaborative Filtering
A second approach to providing recommendations is to use collaborative filtering, which is a technique to make predictions without any explicit relationships defined within the database. There are two types of collaborative filtering that are common: user-based and item-based. User-based filtering works by building a database of ratings for products by consumers (see Listing 1).

We'll assume an Items table and a Users table in the database with respective primary keys of ItemID and UserID, and we'll rate using a scale of 1 (lowest) to 5 (highest). You can go as high as you'd like, though statistically there's not much value in going above 7. The system will determine, on the fly, a community of like users whose ratings of items most closely match those of the current user. We'll set up a sample table of five users providing ratings for each of the colors of the rainbow (Figure 1).

 

To determine our community of users, we'll use the "Mean Squared Differences (MSD)" algorithm. This measures the degree of dissimilarity between two user profiles. Squaring adds more weight to the larger differences, which is appropriate since points further from the mean may be more significant (we care more about things that a user has a positive or negative feeling about versus items they are ambivalent about). To perform the calculation in laymans' terms: take the difference between the two users' rankings on each item that they have both rated, square that number, add those all up, and take the average. The lower the result, the closer that user's preferences are to the current user. Listing 2 provides the query used to determine the community of users with the lowest mean squared difference to the user. Figure 2 provides the results of the query and the MSD values. We'll use a TOP value of 5 at the beginning of our query to display only the five most similar users to userID 1.

 

We're going to use the three most like-minded users to come up with predictions on what colors this user would like. We see from Figure 2 that our three closest neighbors are Mike, Laura, and Sam, since they have the lowest MSD values. Products that this community likes most will then be recommended to the user, as he will probably also like them. We loop over each member in the community and assign a weighted rating (based upon their MSD value) to each of the other items that they have rated (see Listing 3). These weighted ratings from the query in Listing 3 are then inserted into a database table (see Listing 4) to aid with our calculations.

Now that we have all of our weighted ratings in the database, we total up the weighted ratings and divide by the total MSDs to give us the items with the highest weighted averages that have not already been rated by the user (see Listing 5).

Our final results are shown in Listing 6. Although this is a simplified example, it allows us to see where our recommendations come from. Better predictions would be gained by increasing the neighborhood size (up to a point), so you should experiment to find a reasonably large neighborhood size that does not significantly affect processing time. Since we were using a scale of 1-5, the higher the weighted average for the prediction, the more likely this user is to desire this item (or color in our case).

Although we used the Mean Squared Differences algorithm, there are several other mathematical formulas each with their own drawbacks and limitations. The model presented could easily be modified to provide recommendations of favorite artists, authors, or whatever your site calls for. You could also base recommendations on the demographics of your users, or you may want to provide an explicit survey for all of your users to fill out to gain knowledge of your users' preferences on whatever topic your site deals with.

Drawbacks of User-Based Collaborative Filtering
One of the major drawbacks of user-based predictive systems in general is that they do not scale well. The computational complexity of these methods grows linearly with the number of customers and items, which in commercial applications can each grow to be several million. Another problem deals with the sparsity of recommendations on the data set, which might be quite large. In large e-commerce sites, even active customers may have purchased well under 1% of the products. Therefore, a system based upon nearest neighbors may be unable to make any product recommendations for a particular user. To address these scalability concerns, item-based recommendation techniques have been developed to identify relationships between the items themselves, and to use these to compute a list of recommendations.

Item-Based Collaborative Filtering
One way to make item-based recommendations is to simply look at items that a user has purchased together or that were part of the same transaction. Items that appear the most in orders in which the specific item appears would be the most likely to be a successful prediction. A sample query is provided in Listing 7.

This is the simplest way to provide quality item-based recommendations. It should perform quickly on the fly, but could always be run offline as a scheduled job for your entire database. A more in-depth discussion is beyond the scope of this article, but you can visit the link below for articles that will lead you in the right direction.

Conclusion
The recommendation technique you choose depends on the nature of your users and your application. You may have a small, controlled site with a limited set of users where user-based collaborative filtering may work just fine, or you may have a very large site with many items, which would necessitate an item-based solution. The key is to choose carefully and test things out to make sure they perform and scale correctly. It should also be noted that in many cases, it may make sense to perform the predictions themselves as a scheduled job and just store them in the database as part of the record for each item. Other cases may allow you to perform the recommendations on the fly in a brief amount of time.

Credit should also be given to Peter Boot who put out the first collaborative filter custom tag back in 2001. For more info on the science of collaborative filtering, you can visit http://jamesthornton.com/cf/ to find links to more than 40 articles and research papers that deal with the subject. Much research continues to be done on the science of determining which collaborative filtering algorithms work best.

More Stories By Joe Danziger

Joe Danziger is a senior web applications developer with Multimax, Inc., a provider of Enterprise IT Services and Solutions supporting the critical missions of the Air Force, Army, Navy, and other Department of Defense components. He is certified as an Advanced Macromedia ColdFusion MX Developer, and also maintains the Building Blocks site (www.ajaxcf.com) dedicated to AJAX and ColdFusion, as well as DJ Central (www.djcentral.com), a Website serving DJs and the electronic dance music industry.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
After years of investments and acquisitions, CloudBlue was created with the goal of building the world's only hyperscale digital platform with an increasingly infinite ecosystem and proven go-to-market services. The result? An unmatched platform that helps customers streamline cloud operations, save time and money, and revolutionize their businesses overnight. Today, the platform operates in more than 45 countries and powers more than 200 of the world's largest cloud marketplaces, managing mo...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Apptio fuels digital business transformation. Technology leaders use Apptio's machine learning to analyze and plan their technology spend so they can invest in products that increase the speed of business and deliver innovation. With Apptio, they translate raw costs, utilization, and billing data into business-centric views that help their organization optimize spending, plan strategically, and drive digital strategy that funds growth of the business. Technology leaders can gather instant recomm...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
As you know, enterprise IT conversation over the past year have often centered upon the open-source Kubernetes container orchestration system. In fact, Kubernetes has emerged as the key technology -- and even primary platform -- of cloud migrations for a wide variety of organizations. Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility. As they do so, IT professionals are also embr...
CloudEXPO has been the M&A capital for Cloud companies for more than a decade with memorable acquisition news stories which came out of CloudEXPO expo floor. DevOpsSUMMIT New York faculty member Greg Bledsoe shared his views on IBM's Red Hat acquisition live from NASDAQ floor. Acquisition news was announced during CloudEXPO New York which took place November 12-13, 2019 in New York City.
In an age of borderless networks, security for the cloud and security for the corporate network can no longer be separated. Security teams are now presented with the challenge of monitoring and controlling access to these cloud environments, at the same time that developers quickly spin up new cloud instances and executives push forwards new initiatives. The vulnerabilities created by migration to the cloud, such as misconfigurations and compromised credentials, require that security teams t...
The graph represents a network of 1,329 Twitter users whose recent tweets contained "#DevOps", or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 18,000 tweets. The network was obtained from Twitter on Thursday, 10 January 2019 at 23:50 UTC. The tweets in the network were tweeted over the 7-hour, 6-minute period from Thursday, 10 January 2019 at 16:29 UTC to Thursday, 10 January 2019 at 23:36 UTC. Additional tweets that were mentioned in this...