Welcome!

ColdFusion Authors: Greg Ness, Liz McMillan, Pat Romanski, Andreas Grabner, David Strom

Related Topics: ColdFusion

ColdFusion: Article

Using OLAP with ColdFusion

Take data analysis to the next level using MS Analysis Services

As a ColdFusion developer, you are no doubt frequently asked for Web-based reports on data such as sales, site hits, project metrics, and other important business information. Users often want features such as the ability to filter or sort by any column, to drill down into the data, or to re-arrange the report to their liking. These can be time-consuming tasks for developers to implement.

This is where Online Analytical Processing (OLAP) comes in. OLAP is a technology for accomplishing all of the above tasks with ease. It is a data engine designed to analyze multidimensional data sets such as sales totals over product family, geographical region, and time period. If you've ever created a pivot table in Excel, then you already understand the basic model of OLAP, as pivot tables are a basic client-side OLAP tool.

To implement OLAP you might think you have to buy expensive client and server tools such as Cognos, Hyperion, or Microstrategy. But if you own SQL Server 2000 you already have a great OLAP setup, because it is bundled and licensed with the companion product, Analysis Services, which I'll shortcut to "MSOLAP."

In this article I'll first give an overview of data warehousing and then briefly show you how to set up the MSOLAP server and how to work with a data cube. Finally, I will show you how to create Excel spreadsheets and Web pages with interactive OLAP capabilities and how to query a MSOLAP cube via <CFQUERY>.

Data Warehousing and OLAP
Before you dive into OLAP technology, you must first understand the basic concepts of data warehousing. The terms "data warehouse" and "data mart" refer to read-only databases whose purpose is to pull in data from other databases, scrub and normalize the data, and then optimize it for query performance. These functions are commonly referred to as "ETL," for extraction, transformation, and loading.

If you have an environment where you have to pull in information from multiple databases and/or run some data cleansing algorithms, you would be well served to set up an intermediary database. Using DTS packages or third-party ETL tools you can pull and clean the data from multiple sources. Then you point the MSOLAP server to this intermediary database for its data source. OLAP itself is not meant for data cleansing tasks; you should treat data cleansing and data analysis as two different steps in the OLAP development process.

Data warehouses reduce the query load on your primary online transaction processing (OLTP) databases, which should be optimized for the quickest application performance. Data warehouses and OLAP are meant to offload the intensive reporting or data mining-based queries from the OLTP system. OLAP queries usually run very quickly, as many aggregations (a.k.a. "group by" results) are precomputed and stored, so querying 8 billion rows of source data can be as fast as querying 8 rows, once the data is pulled into the cube (more on cubes shortly).

OLAP systems should not be expected to store real-time data, since refreshing the OLAP system can take time and place a load on the source system(s). You should schedule data refreshes on an interval during nonworking hours.

Getting Started
MSOLAP is installed by running a separate installer from that of the SQL Server; it is located in the /MSOLAP folder of the installation CD. Run the setup program and make sure all the components are installed. Once you've installed it, make sure to apply the latest Analysis Services Service Pack (as of this writing, it's SP3) from the Microsoft site on your server and any administrator clients. Note that this is not the same SP3 as SQL Server; it is a separate package.

Once you have installed and patched the MSOLAP server you can open the Analysis Manager MMC console to work with the server, as shown in Figure 1. With MSOLAP 2000 this is a separate administrative tool from SQL Server Enterprise Manager, but in the forthcoming product version 2005, they will be combined into one tool.

In the Analysis Manager MMC you will see a tree of objects, starting with the server name. Under the server (in this case QBIC1) you will see the preinstalled FoodMart 2000 database. This database is used in the online documentation and in many companion books. I will use the FoodMart 2000 database to explain query options with ColdFusion, and you can then apply the concepts to your own cube.

In MSOLAP, databases are containers for data sources, cubes, security roles, and other shared elements. You can perform administrative tasks such as backups or data refreshes at the database level, so it's best to group each major subject area into a database.

At the next level down in Analysis Manager tree you will see the list of cubes. Cubes are the heart of OLAP, as they are the containers for the data, relationships, and calculations of your data model. To see the cube elements, right-click on a cube and select "edit" to view the cube editor window, as shown in Figure 2. From this view you can edit the cube or browse the cube's data by clicking the data tab at the bottom.

The tables on the right represent the physical tables from which the data are drawn. These tables are arranged in a "star schema" format, in which a central "fact" table is accompanied by one or more "dimension" tables. Note that this is only a visual representation of the source tables and their relationships; it is not a database view, and it cannot be queried directly. Fields from these tables are used to create cube dimensions and measures, listed in the cube object tree on the left, which are what we query from an MSOLAP data source.

Facts tables contain the numeric fields we want to perform mathematical operations on (such as sum, count, or average). These fields are known as measures in OLAP vocabulary. In our sales example each line item in an order comprises one row in the fact table; each row in the fact table contains the measures store sales, store cost, and unit sales. We can also easily create calculated members such as profit (which is simply [store sales - store cost]) to complement fact tables.

Dimension tables contain the data elements you will use to separate the measures into categories such as product family or geographical region. Dimensions are used for anything you want to filter by or drill down by. You can choose a simple dimension employing just one level, such as gender, or you can have a more complicated dimension such as product, which employs six levels of drill-down detail. Here, note that the gender dimension came from the single customer dimension table only, but the product dimension came from the combination of the product and product class dimension tables. Combinations of dimension tables are known as "snowflake" schemas.

To review, fact tables contain the numerical data (measures) we want returned by MSOLAP, whereas dimension tables contain the categories by which the measures can be separated. At this point you understand the basics of the OLAP data model and are ready to see it in action. Next I'll discuss how to query the server with MS Excel.

Querying MSOLAP with Excel
One of the benefits of OLAP is the ability to allow your users to define their own reports and arrange the data the way they like. Most proficient computer users are familiar with Excel, which is a perfect client tool for querying MSOLAP. As a developer, you need only set up a workbook with interactive pivot table functionality, and the client can use the workbook as a fully customizable query tool. Follow these steps to setup an MSOLAP client workbook:

  1. In a new Excel workbook, click Data>Import External Data>New Database Query.
  2. On the OLAP Cubes tab click <New Data Source>. Its name can be FoodMart Sales, and its provider is "Microsoft OLE DB for Olap Services 8.0."
  3. Click connect, click Analysis Server, and enter the server name. The database is FoodMart 2000, and the cube is Sales.
  4. Now you can connect to the OLAP datasource and create a new pivot table in a new worksheet.
You now have a full-fledged OLAP client that your users will like, because it supports drill-down, drag-and-drop pivoting, and filtering on both grids and charts. Users can effectively create their own reports, customize their appearance, and use OLAP data as a basis for other worksheets or reports, with data being automatically refreshed from the server.

Make sure to test whether your users can access the spreadsheet and pivot the data from their machine. They need MDAC 2.7 and the Pivot Table Services drivers. If they are using Windows 2000 or Office 2000, they won't necessarily have the "Microsoft OLE DB for OLAP Services 8.0" driver mentioned in step 1 above. You can use the older driver set included in Office 2000, but it doesn't support all the newest cube features. The easiest solution to get your clients' machines up to date is to distribute the newer driver package (ptsfull.exe) found on the SQL Server installation CD or on the Microsoft Web site.

Excel Over the Web
You can publish those same interactive pivot tables and charts to clients as ActiveX objects embedded in any Web page provided the clients have met the Excel requirements mentioned above and have IE 5.01 or higher installed (see Figure 3). The simplest way is:

  1. With our pivot table spreadsheet in Excel, choose File>Save As Web Page...>Selection: Sheet, with Add interactivity turned on. Click Publish. Select which type of page to create, and select HTML. Click the "add interactivity" button, and select PivotTable functionality.
  2. Save this page to your Web server, and give it any Web extension, e.g., .htm or .cfm. The first time it is viewed by users' IE browsers it will show them a security warning. Users can avoid this in the future if they add the site to their list of trusted sites in IE's security options.
The data source connection that you configured in the spreadsheet is hard-coded into the generated OBJECT code, so you don't have to set up the same data source on each client that was set up in the original spreadsheet.

If you want more control over the data and formatting options, you can interact with the ActiveX object using VBScript. The documentation on this is available on the MSDN site; just search for "PivotTable List Control."

Connecting with ColdFusion
Security or client configuration issues may rule out Excel as an option. If so, you can simply use MSOLAP as a normal database to query with ColdFusion and CFQUERY.

The quickest way to connect ColdFusion to MSOLAP is not to set up a new data source, but to reuse your existing data source for SQL Server. You simply set up a linked server on the SQL Server that points to the Analysis Server and passes through the query requests. Then you connect ColdFusion to your normal SQL Server using CFQUERY.

To set up a linked server on SQL Server, just click on Security>Linked Servers>New Linked Server. The driver type is again "Microsoft OLE DB Provider for OLAP Services 8.0," the data source is the OLAP Server name, and the catalog is the OLAP database name surrounded by square brackets. Once you have this (or any other) linked server configured, you can query that data source via SQL Server using the OPENQUERY function, which passes the query through to the host server. I will demonstrate this capability shortly.

MSOLAP supports a subset of the SQL query language, but you will run into its functional limits very quickly. You are better off using the MSOLAP engine's built-in query language, Multi-Dimensional Expressions (MDX). It is similar to SQL, in the sense that it has SELECT, FROM, and WHERE clauses, but the query format is very different.

To get started using MDX, you can use the provided MDX sample application shown in Figure 4. In this query we asked for the store states as rows, and the product family as columns. We are asking for a multidimensional recordset, in which rows are not numbered 1-n, but actually have data attached to them. Notice that the WHERE clause doesn't filter the recordset the way SQL does; it is used to specify which measure we want to look at, in this case, store sales.

Listing 1 shows the code to run the same query run through CFMX, and Figure 5 shows the result. We don't have the option of getting a multidimensional result returned through a standard query, so it gets flattened by SQL Server into a compatible recordset.

In Figure 5, notice that the "rows" data are flattened into columns. In fact, the entire hierarchy of the dimension requested as rows is returned as separate columns, and the text "member_caption" is appended to all the column labels. Additionally, the columns are returned with long, detailed names that are hard to work with. One solution to this problem is to alias the column names in your outer SQL SJ33 erver query SELECT statement so they're more useable. But if you don't know the column names in advance, you won't be able to do this, so you would have to add logic to your ColdFusion template to parse the dynamic column names into usable text.

As you can see, additional work must go into setting up the OLAP engine and query capabilities through ColdFusion. For a simple query of store sales by region and product family, the OLAP solution is overkill. But OLAP offers powerful statistical or analytic functions for business intelligence - such as time-series analysis - that are not readily available in SQL.

The next example highlights some more advanced features of MSOLAP (see Figure 6). This query returns the store sales, three-quarter moving average, and difference from the previous quarter, grouped by product category for 1997 sales data. This would be a much more difficult query to develop in standard SQL, don't you think?

Wrap-Up
At the time of this writing, Microsoft has released its next major SQL Server version (2005) to beta. Reporting and business intelligence are more of a priority for this package, so we will see better ETL tools, tighter integration with SQL Server, and more powerful cube and MDX features. Take a look at www.microsoft.com/sql for further information.

I hope this has been a useful overview of MS Analysis Services and query options over the Web. For more information about getting started, I recommend the book, SQL Server 2000 Analysis Services, Step by Step, by Reed Jacobson, published by Microsoft press. You can also find good information in the books online documentation provided with the SQL Server package. You'll see that creating an OLAP cube can be quick and easy, and can provide you with more data analysis options for your applications.

More Stories By Mark Murphy

Mark Murphy is the president of Electric Labs, Inc. (www.electriclabs.com), a software consulting company located in New York City. He has been developing ColdFusion applications and data warehouses for over six years and is happy to answer e-mailed questions.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.