A relational–XML data warehouse for data aggregation with SQL and XQuery

  • Published on
    06-Jul-2016

  • View
    215

  • Download
    1

Transcript

  • SOFTWAREPRACTICE AND EXPERIENCESoftw. Pract. Exper. 2008; 38:11831213Published online 3 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/spe.868

    A relationalXML datawarehouse for data aggregationwith SQL and XQueryJoseph Fong,, Herbert Shiu and Davy Cheung

    Computer Science Department, City University of Hong Kong, Hong Kong

    SUMMARYIntegrating information from multiple data sources is becoming increasingly important for enterprisesthat partner with other companies for e-commerce. However, companies have their internal businessapplications deployed on diverse platforms and no standard solution for integrating information fromthese sources exists. To support business intelligence query activities, it is useful to build a data warehouseon top of middleware that aggregates the data obtained from various heterogeneous database systems.Online analytical processing (OLAP) can then be used to provide fast access to materialized views from thedata warehouse. Since extensible markup language (XML) documents are a common data representationstandard on the Internet and relational tables are commonly used for production data, OLAP must handleboth relational and XML data. SQL and XQuery can be used to process the materialized relational andXML data cubes created from the aggregated data. This paper shows how to handle the two kinds ofdata cubes from a relationalXML data warehouse using extract, transformation and loading. Copyright 2008 John Wiley & Sons, Ltd.

    Received 5 February 2007; Revised 6 November 2007; Accepted 7 November 2007

    KEY WORDS: data warehouse; XML data cube; relational data cube; aggregation data; XQuery; SQL; schemaintegration; data integration; metadata

    1. INTRODUCTION

    To extract business intelligence and perform analyses from a variety of heterogeneous databasesystems and sources, a corporation would normally begin by building a data warehouse that aggre-gates the data. The standard approach has been to build a relational data warehouse by using a full setof data warehousing tools. The set includes middleware, messaging and extract, transformation and

    Correspondence to: Joseph Fong, Computer Science Department, City University of Hong Kong, Hong Kong.E-mail: csjfong@cityu.edu.hkContract/grant sponsor: Strategic Research Grant; contract/grant number: 7002140

    Copyright q 2008 John Wiley & Sons, Ltd.

  • 1184 J. FONG, H. SHIU AND D. CHEUNG

    loading (ETL) tools. This paper offers a methodology for developing a heterogeneous relationalextensible markup language (XML) data warehouse (RXDW) by using OLAP in conjunction withSQL and XQuery.

    An ETL methodology is provided to summarize information from heterogeneous databasesystems using relational and XML data. The approach is to integrate a companys businessprocesses with its subsidiaries, partners, customers and suppliers. Thus, the system exposes dataextraction, data integration and data materialization on the middleware to facilitate the following:

    To evaluate the use of schema translation between relational and XML databases. To integrate relational tables and XML documents for information aggregation. To access the use of relationalXML as a data warehouse. To represent materialized views of XML data using XQuery [1], and of relational tables using

    SQL.To tackle these issues, a data warehouse on top of a middleware is a solution for system developersfor such situations. Middleware integrates all the business applications to perform on-demand dataintegration, and stores the integrated data into a relational and an XML database. Two approachesexist to handle data integration on the middleware. The first approach is direct database access,which acquires direct access and knowledge of the schema of the targeted database systems. Thesecond is service call, through which each physically separated business application communicateswith the middleware by web service. These individual applications maintain their informationsharing based on their own criteria, which is therefore a costly solution. Yet, it solves the case offlat or hierarchical structures of some legacy database systems as well as security issues enforcedfor political reasons.

    Both approaches have different strengths and weaknesses. Hence, this paper proposes an opti-mized solution through a hybrid implementation. It can balance the strengths and weaknesses ofdifferent approaches so that data integration is feasible, in which the system resolves the conflictsbetween entities and relationships from the metadata using predefined rules. The several XMLdocuments and relational tables are merged together conceptually into a relational or an XML datacube based on the relations in the metadata.

    Information aggregation is a technology that gathers relevant information from a variety ofdata sources to help corporations effectively analyze the aggregated information using Internettechnologies. The desired structure of information aggregation is a RXDW that can provide bothrelational and XML data cubes to the users as shown in Figure 1.

    2. RELATED WORK

    Much research has been done in the information aggregation area. Zhu [2] categorized informa-tion aggregation into three categories based on a survey of over 100 existing and emerging webaggregators that provided information aggregation services:

    1. comparison aggregation,2. relationship aggregation, and3. intra-organization and inter-organization aggregation.

    Madnick and Siegel [3] discussed each aggregation category with examples and the after-aggregationanalysis; that is, how to make use of the aggregated information to improve the business. However,

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1185

    Relationaltables

    XMLdocuments

    Step 2 Data materalization with common warehouse data cube using ETL

    Users Users

    SQL queryXQuery

    Relationaldata cube

    Materialized warehouse data storage

    Source Databases

    metadata

    Step 1: Schema integration with common star schema

    Step 3: OLAP on Relational-XML data warehouse

    stored in

    star schema

    Catalogue system

    File systems .............

    XMLdata cube

    Datawarehouse

    Middleware

    Figure 1. Architecture of a relationalXML data warehouse. Note: A data cube is adenormalized database for data warehouse.

    they did not explain how to extract the information being integrated together from differentresourcesespecially from the mixed data sources of relational data.

    The web site [4] claimed that data integration and aggregation are important in financial services.Financial service providers (FSP) rely on the aggregated information to make decisions and drivetheir financial products and services. A major cost for FSP lies in data integration. With XMLtechnology, data integration is simplified [5]. Also, the web site [4] suggested that having an XML-based mid-tier operational data server is a flexible data aggregation solution, and proposed anapproach towards such a data server and explained why an RDBMS was not the choice. However,no explanation was provided on how to integrate the relational data into an XML-based mid-tierdata server.

    The web site [6] demonstrated how Stylus Studio queries the real-world data from a stock-quote Web service and combines the historical data stored in a relational database for informationaggregation processing. They showed how to make an application to transform the relational schemainto an XML schema so that relational data are mapped into an XML document. However, thisapplication is not an open source and, hence, from an academic point of view, it cannot be studied(although it did provide operational experience).

    The web site [7] forecast that mid-tier aggregation would continue to growfrom $44.7 millionin 2004 to a projected $1.8 billion by 2009whereas this paper discusses how such mid-tieraggregation can be developed to handle the real-time data aggregation of heterogeneous databasesystems. The data sources can be relational and XML database systems.

    Chaudhri et al. [8] gave a detailed description on how to design and manage the XML datawarehouse and developed a system called DAWAX. One problem in DAWAX is that it stores all

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1186 J. FONG, H. SHIU AND D. CHEUNG

    XML documents as relational data. Chaudhri et al. did not explain how to ensure that the originalXML document is recovered.

    Xyleme [9] is an XML data warehouse system that stores all data on the web as XML documents.A special-purpose DBMS called NATIX is used to store all XML data. In order to have a unifiedview of the data in the data warehouse, Xyleme provides the domain ontology model as abstractdocument-type definition (DTD) and defines mapping rules between the DTD of the stored docu-ments and the abstract DTD. Since Xyleme stores all XML documents into a domain, the storagespace is a challenge.

    Pedersen et al. [10] introduced a federated query language SQLXM that allows XML data to beused directly in an OLAP query to furnish multi-dimensional cubes with external XML data, andto group and select cube data based on XML data values. Park et al. [11] proposed a new multi-dimensional expression language for XML cubes, XML-MDX, which used XQuery expressions tospecify the measured data, axis dimensions and slicer. Madnick and Zhu [5] used COntext INter-change to capture context knowledge and improve data quality by reconciling semantic differencesbetween the sources and the receivers. Kimball and Caserta [12] conformed heterogeneous datafrom multiple sources into standardized dimension tables and fact tables.

    Besides Fong et al. [13], who proposed an object-relational data warehousing, most researchwork has focused on developing an XML data warehouse. This paper contributes by adding arelational view in addition to an XML view of a data warehouse.

    3. METHODOLOGY

    In general, middleware [14] applies many technologies to handle information aggregation [15] inheterogeneous database system environments, and hosts a global database schema that representsthe schemas of multiple heterogeneous database systems in distributed environments. The globalschema stores the detailed information of the database systems of host locations and database types.Middleware makes use of this information to perform user queries by distributing the translatedsub-queries to the corresponding database system.

    In the case of a legacy system with proprietary database or third party systems, queries can behandled by the web service. Once the sub-queries are performed, in cases of an XML data cube, allof the results are translated into XML documents that can be aggregated into an XML document.In cases of a relational data cube, all of the results are translated into relational tables using anaggregation. Finally, the aggregated set of XML documents or relational tables is represented inthe web database.

    The following is a stepwise semi-automated middleware process shown as a methodology:Step 1. Map and integrate source schemas into metadata: Map XML schemas into a metadata. Map a relational schema into a metadata. Integrate schemas between source database schemas. Map source metadata schemas into an integrated metadata schema.Step 2. Implement star schema [12] and data cubes for aggregation: Create dimension table by mapping table and attributes to a global metadata schema. Create fact table by mapping table and attributes to a global metadata schema.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1187

    Create star schema by relating the dimensions and fact tables. Convert and load relational data into a relational data cube. Convert and load XML documents into an XML data cube.Step 3. Aggregate data:

    Aggregate data in a materialized relational or XML data cube using OLAP [16].Metadata is data about data. The structure of middleware metadata includes three sections: sourcedatabase metadata, global schema metadata and star schema metadata. The source database schemais recognized by the middleware, including database type, tables, attributes, relations and keys.Since the middleware integrates multiple heterogeneous database systems, a separate set of metadatais needed to describe all the source database schemas in order to integrate them. A wizard wasprepared to guide users to complete source metadata creationwhich requires good technicalknowledgehence, a database administrator or a superuser with knowledge of database structureswould probably accomplish the metadata setup better. Relational metadata is applied to develop anRXDW for data catalog and internal directory usage. XML schemas and documents are applied tobuild the data cube for the data warehouse.

    For example, a metadata can be a data dictionary, a data catalog or a data directory for a database.The metadata schema stores the source database schemas as shown in Figure 2.

    3.1. Step 1. Map source schemas into a metadata

    The methodology is a bottom-up approach that integrates existing relational data and XML docu-ments into a metadata for building the relationalXML data cube. The XML schema is mapped

    Source Database Metadata

    Star Schema Metadata

    Global Schema Metadata

    SourcesMetadata 1

    SourcesMetadata 2

    SourcesMetadata n

    Global Schema

    Star Schema 1

    Star Schema 2

    Star Schema n

    Figure 2. The three types of metadata.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1188 J. FONG, H. SHIU AND D. CHEUNG

    Meta_Table

    Table Name

    Table Name

    Attribute1Attribute2

    Map attribute name

    Map table nam

    e

    Meta_Attribute

    Attribute Name

    Figure 3. Mapping a relational table name and attribute to metadata.

    into a metadata [17] to provide a global view of the heterogeneous database systems. To map anXML schema into a metadata, an Extended Entity Relationship model is extracted from the XMLschema.

    Mapping XML schema into source metadata: The following are the rules on mapping an XMLschema into a relational schema [18]:

    Mapping elements to tables: All of the XML elements found in XML documents are mappedto metadata tables.

    Mapping attributes to columns: The attributes of the XML elements are mapped to the columnsof the metadata tables.

    Mapping XML elements with relationships to relational tables: The parent element and itschild elements are mapped into a parent relational table and child table in the metadata tables.

    Mapping relational schema into source metadata: Similarly, a relational schema can be mappedinto the metadata table.

    As a result, both the XML schema and relational schema can be mapped into a metadata schemaas shown in Figure 3. Primary keys are underlined in a metadata schema; foreign keys have * astheir prefix, and types of operation include the insert, update and delete procedures of an operation.

    Relation Meta database (database id, database name, database type, drive string,jdbc url, jdbc username, jdbc password).

    Relation Meta table (table id,*database id, parent id, table name, primary key,operation type).

    Relation Meta attribute (*table id, attribute name, attribute type, *method name, cardinality).Relation Meta method (method name, parameters, web service, description).

    3.2. Case study

    A case study shows three different applications in which each one connects to a different backenddatabase system. They are a customer relationship management (CRM) system, a warehousemanagement system (WMS) and an order management system (OMS) with databases of relational,XPath and XML DB [19] types, respectively. Figure 4 shows the overview of the system. In thiscase study, a star schema is created to find the total sales of orders along with items and customers.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1189

    Enquiry / Reporting Application

    Middleware for ETL

    Customer Relationship Management

    System

    WarehouseManagement

    System

    Order ManagementSystem

    RelationalDatabase XPath

    XML DB

    Metadata

    Figure 4. System overview of the case study.

    Table I. Metadata.Application Database type Columns Particulars

    CRM Relational ACCOUNT ID Customer account IDCRM Relational AMOUNT OWED Amount owedCRM Relational AVAILABLE CREDIT Credit limitWMS XML ITEM ID Product IDWMS XML ITEM DESC Product descriptionWMS XML QUANTITY Available quantityWMS XML UNIT PRICE Product unit priceOMS XML OrderID Sales order IDOMS XML CustomerID Customer IDOMS XML Ordered Date Order dateOMS XML ShipmentDate Shipment dateOMS XML ShipmentLocation Shipment destinationOMS XML TotalAmount Total order amountOMS XML DeliveryMethod Delivery methodOMS XML Priority PriorityOMS XML OrderStatus Order statusOMS XML ItemID Product IDOMS XML Quantity Ordered product quantityOMS XML ItemStatus Ordered product status

    3.3. Step 1. Map and integrate source schemas into metadata

    Metadata of the systems: Table I shows the data dictionary of different database schemas in thecase study.

    To set up the metadata of the source databases, the user must understand the remote databasesschema structures and tables relations. The simplest way to set up this configuration is to followthe wizard walkthrough to enter the required content step by step.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1190 J. FONG, H. SHIU AND D. CHEUNG

    Metadata Database.Database id Database name Database type Jdbc url

    ID CRM CRM RELATIONAL jdbc:mysql://localhost/crmID OMS OMS XML XMLdb:xindice://localhost/db/omsID WMS WMS XML XMLdb:xindice://localhost/db/wms

    Metadata TABLE.Table id Primary key Table name Database id Parent id

    ID CUSTOMER ACCOUNT ID CUSTOMER ID CRMID SALES ORDER ORDER ID SALES ORDER ID OMSID ORDER DETAIL ORDER ID ORDER DETAIL ID OMS ID SALES ORDERID INVENTORY ITEM ID INVENTORY ITEM ID WMS

    Metadata ATTRIBUTE.Attribute name Table id Attribute type Cardinality

    ACCOUNT ID ID CUSTOMER TEXT 1AMOUNT OWNED ID CUSTOMER TEXT 1AVAILABLE CREDIT ID CUSTOMER NUMERIC 1ITEM ID ID INVENTORY NUMERIC 1ITEM DESC ID INVENTORY TEXT 1QUANTITY ID INVENTORY NUMERIC 1UNIT PRICE ID INVENTORY NUMERIC 1OrderID ID SALES ORDER NUMERIC 1CustomerID ID SALES ORDER NUMERIC 1Ordered Date ID SALES ORDER TEXT 1ShipmentDate ID SALES ORDER TEXT 1ShipmentLocation ID SALES ORDER TEXT 1TotalAmount ID SALES ORDER NUMERIC 1DeliveryMethod ID SALES ORDER TEXT 1Priority ID SALES ORDER TEXT 1OrderStatus ID SALES ORDER TEXT 1OrderID ID ORDER DETAIL NUMERIC 1ItemID ID ORDER DETAIL NUMERIC 1Quantity ID ORDER DETAIL NUMERIC 1ItemStatus ID ORDER DETAIL TEXT 1

    3.4. Map metadata into a global metadata schema

    The source metadata is used as input to create the global metadata schema of the middleware. Theintegration of the global schema resolves conflicts in the database tables. The conflict resolutionrules are stored in the global metadata for data integration. The setup of global metadata is the

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1191

    same as the source metadata setup. To smooth the process, a wizard with question and answerwalkthrough is employed.

    Different source database schemas might contain the same tables for the representation of theirown system. The following are guidelines on how to perform schema integration between twodifferent database schemas.

    3.4.1. Schema integration between source database schemas [20]

    To map the relational database schema into metadata, several procedures are available in the mappingprocess for recovering data semantics [17]. The integrated schema is a global view of heterogeneousdatabases, consisting of multiple enterprise applications of a company. Each local database systemis treated as a unit object for integration. The objective of schema integration is to ensure that noduplicated data exist. The identified conflicts are resolved by storing them in a global metadataschema.

    Resolve semantic conflicts among source schemas: This resolves the definition-related conflicts ofinconsistency in key or synonyms and homonyms. This is a kind of enterprise application integration(EAI) process among different applications. The data type conflicts among schemas are capturedin metadata as relationship mapping.

    Merge entities and relationships: The merging of entities can be automated using union operatorsif their domains are the same. By identifying the same keys with the same entity name in differentdatabase schemas, the entities are merged by union as follows:

    Merge relationship by cardinality. Merge entities by subtype relationship. Merge entities by generalization. Merge entities by aggregation.

    The integrated metadata schema is mapped with heterogeneous source schemas and presented asa simple database schema in the middleware. It is useful for analytical tasks with an integrateddatabase schema. The integrated table defines the integrated view information, and the integratedfield defines mapping of integrated fields and source fields, and the mapping rules describe themethod(s) for handling data type conflicts in source fields mapping.

    Integrated Metadata schema setup wizard:

    BeginFor each metadata table doBegin

    Create global metadata table with table name and key attributes;Select available tables list from source Metadata;Map available attributes from source Metadata to integrated schema;For each attribute do

    Add conflict resolve rules based on available selections;End

    End

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1192 J. FONG, H. SHIU AND D. CHEUNG

    The integrated schema is as follows:Relation meta join key (join key id, table, attribute, key type, *mapping rule).Relation meta global table (global table, *join key id, table, attribute).Relation meta global field (*global table, global field, local table, local field, field type,

    *mapping rule).Relation meta mapping rule (mapping rule, rule name, type, rule).

    3.5. Global schema extracted from an integrated schema

    The global schema is mapped from the distributed data sources of the source metadata. To accom-plish this metadata setup, user supervisionwith knowledge of the source schema of the metadatais needed.

    Global table.Global table Global field name Join field id Key type

    CUSTOMER ACCOUNT ID ACCOUNT ID PRIMARYSALES ORDER CUSTOMER ID CUSTOMER ID FOREIGNSALES ORDER ORDER ID ORDER ID PRIMARYORDER DETAIL ORDER ID ORDER ID PRIMARYORDER DETAIL ITEM ID ITEM ID FOREIGNINVENTORY ITEM ID ITEM ID PRIMARY

    Global field.Global field Local table Local field Field type Rule

    CUSTOMER ID CUSTOMER ACCOUNT ID NUMERIC SYNONYMAMOUNT OWED CUSTOMER AMOUNT OWED TEXTAVAILABLE CREDIT CUSTOMER AVAILABLE CREDIT NUMERICITEM ID INVENTORY ITEM ID NUMERICITEM DESC INVENTORY ITEM DESC TEXTQUANTITY INVENTORY QUANTITY NUMERICUNIT PRICE INVENTORY UNIT PRICE NUMERICORDER ID SALES ORDER ORDERID NUMERIC SYNONYMCUSTOMER ID SALES ORDER CUSTOMERID NUMERIC SYNONYMORDER DATE SALES ORDER ORDERDATE TEXT SYNONYMSHIPMENT DATE SALES ORDER SHIPMENTDATE TEXT SYNONYMSHIPMENT LOCATION SALES ORDER SHIPMENTLOCATION TEXT SYNONYMTOTAL AMOUNT SALES ORDER TOTALAMOUNT NUMERIC SYNONYMDELIVERY METHOD SALES ORDER DELIVERYMETHOD TEXT SYNONYMPRIORITY SALES ORDER PRIORITY TEXTORDER STATUS SALES ORDER ORDERSTATUS TEXT SYNONYMORDER ID ORDER DETAIL ORDERID NUMERIC SYNONYMITEM ID ORDER DETAIL ITEMID NUMERIC SYNONYMQUANTITY ORDER DETAIL QUANTITY NUMERICITEM STATUS ORDER DETAIL ITEMSTATUS TEXT SYNONYM

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1193

    XML schema definition (XSD) of OMS system:

    Convert attribute into column name

    Convert element totable name

    DTD of WMS system:

    ]>

    Convert document type to table name

    Convert elements into columns

    3.6. Step 2. Implement the star schema and data cubes for aggregation

    Once a global metadata is built, data cubes are developed from the derived integrated databasemetadata. Schema integration [17] provides a global view of multiple schemas of different databasesystems. The global schema design of this middleware employs a bottom-up approach to integrateexisting database schemas into a global database schema. A metadata structure of a star schemais shown in Figure 5. A wizard with walkthrough approach guides the user during the creation

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1194 J. FONG, H. SHIU AND D. CHEUNG

    Star Schema 1

    Fact Table A

    Dimension Table A

    Dimension Table C

    Dimension Table B

    Star Schema Metadata

    Star Schema 1

    Star Schema 2

    Star Schema n

    Figure 5. Metadata structure of a star schema.

    process. The metadata of a star schema contains a star schema of each OLAP cube. Each starschema contains a fact table element and related dimension tables. These fact table and dimensiontables are used in OLAP cube formation.

    Walkthrough process:

    BeginFor each cube doBegin

    Select available tables from global metadata;Create fact table by mapping tables and attributes in global metadata schema by the userdefining the measurement fields;Select available tables from global metadata related to fact table;

    For each dimension doBeginCreate dimension table by mapping tables and attributes in global schema by userdefining the dimension;

    Map the dimensions into fact table and store the relations in star schema table;End

    endend

    The heterogeneous database systems are integrated in the form of a source database schema andstored in global metadata. Once the schemas are integrated, there will be no duplicate data in theglobal schema. The information concerning databases, tables, attributes and methods is stored inthe source metadata. The information is mapped into the star schema.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1195

    The star schema metadata is as follows:Relation meta fact table (star schema id, star name, fact name).Relation meta fact table dtl (*star schema id, global table, global field, data field,

    measure name).Relation meta dimension table (dimension id, *star schema id, dimension name).Relation meta dimension table dtl (*dimension id, global table, global field, data field,

    level desc, level no).Creation of a star schema: To enable multi-dimensional queries, a star schema is applied in this

    system. The relational schema shows the star schema metadata of the system. The star schemametadata includes fact table and dimension tables. The star schema is designed to aggregate theinformation from multiple data sources. Since the star schema selecting data is based on userrequirements, user supervision is involved in the star schema design.

    Creation of a data cube [21]: To enable multi-dimensional queries, OLAP query languagewhich focuses on data aggregation instead of on data analysisis required.

    3.7. Star schema for aggregation

    In this case study, three different database schemas are mapped into a metadatathe relationalschema of CRM, XML schema of OMS, and DTD of WMS.

    Translate XSD into relational schema: XSD is translated into relational schema and then storedinto metadata.

    Map XML schemas OMS and WMS into Relation Sales order and Relations Inventory Item.Map a relational database schema into metadata: Once all schemas are translated into relational

    schemas, the schemas are stored in metadata.Map metadata into a star schema: After the schema integration, the global view of all database

    schemas is represented in Figure 6.

    Figure 6. The global view of integrated schemas.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1196 J. FONG, H. SHIU AND D. CHEUNG

    CUSTOMER DIMENSION

    PK ACCOUNT_ID

    AMOUNT_OWED AVA ILABLE_CREDIT

    SALES FACT TABLE

    PK DATEPK,FK1 ACCOUNT_IDPK ITEM_ID

    TOTAL_AMOUNT

    DATE DIMENSION

    PK DATE

    MONTH QUARTER YEAR

    INVENTORY DIMENSION

    PK ITEM_ID

    ITEM_DESC QUANTITY UNIT_PRICE

    Figure 7. A star schema of the case study.

    Based on user requirements, the star schema is created in Figure 7.The schema is designed to query sales facts of customers. It contains customer dimension,

    inventory dimension, date dimension and an aggregated sales fact table that combines sales orderand order detail tables. This information is stored in the metadata tables of the star schema as shownbelow.

    Dimension table metadata in Meta Dimension Table.Schema Dimension Dimension field Global table Global field

    Sales Inventory ITEM DESC INVENTORY ITEM ITEM DESCSales Inventory QUANTITY INVENTORY ITEM QUANTITYSales Inventory UNIT PRICE INVENTORY ITEM UNIT PRICESales Customer AMOUNT OWED CUSTOMER ACCOUNT AMOUNT OWEDSales Customer AVAILABLE CREDIT CUSTOMER ACCOUNT AVAILABLE CREDIT

    Fact table metadata of the star schema.Schema Dimension Global table Global field Measure

    Sales Inventory SALES ORDER ITEM ID NSales Customer INVENTORY ACCOUNT ID NSales Sales SALES ORDER TOTAL AMOUNT Y

    3.8. Loading data into the RXDW

    After building a star schema, data are loaded into a data cube according to the derived star schema.The system aims to aggregate data information from heterogeneous databases or applications. Once

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1197

    the metadata of global schema and star schema are completed, the information is used to load datafrom multiple data sources. Since the data loading consists of heterogeneous database systems,several processes are performed to build the XML data warehouse automatically.

    (i) XML data cube of the RXDW

    Data centric schema in an XML data warehouse: The schema of this XML data warehousehas a data centric design. The RXDW is used to store integrated data for OLAP processing.XQuery is used to retrieve data from XML documents.

    Retrieve the fact table and dimension tables.

    Algorithm for loading XML document into XML data cube:

    Fact table creation AlgorithmBegin

    Retrieve fact table from star schema metadata;Retrieve global tables, attributes and mapping rules from global metadata;Retrieve source tables, attributes and mapping rules from sources metadata;

    For each source database doBegin Select the targeted tables in the database

    For each targeted table doBegin Retrieve required attributes data by generating local queries to the database

    Transform the queried results into XML documentsEnd

    Integrate the XML documents using XQueryEnd

    Integrate the XML documents using XQuery of different databases;If fact table existsThen append full integrated XML into XML databases into fact tableElse create full integrated XML into XML databases into fact table

    EndDimension tables creationBegin

    Retrieve dimension from star schema metadata;Retrieve global tables, attributes and mapping rules from global metadata;Retrieve source tables, attributes and mapping rules from sources metadata;For each source database do

    Begin Select the targeted tables in the databaseFor each targeted table doBegin Retrieve required attributes data by generating local queries to the database;

    Transform the queried results into XML documents;End

    Integrate the XML documents of different tables using XQuery;End

    Integrate the XML documents of different databases using XQuery;Store the full integrated into XML databases into dimension directory;

    End

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1198 J. FONG, H. SHIU AND D. CHEUNG

    Data retrieval: The middleware retrieves the source database information using the metadatadatabase. Based on the metadata, the system traces back the original database, table andattributes from the metadata of the star schema, global schema and source database schema.The loading of relational data is a projection based on primary key matching. If there areduplicate attributes, one of them will be ignored. If there are new attributes, they will beinserted into the element(s) that they match as shown in Figure 8.

    Data loading into XML DB: After the data integration process has been accomplished, theretrieved data from heterogeneous data sources are loaded into an XML data warehouse.The XML DB schema is a tree structure.

    XML schema for data warehouse:

    (ii) Relational data cube of the RXDW

    Retrieve fact table and dimension table: The algorithm for loading relational data into arelational data cube is similar to the algorithm for loading XML documents into an XMLdata cube by replacing XML document and XQuery with relational tables and SQL.

    Retrieve target data fields from source relational tables: An SQL SELECT statement canbe issued to merge two relational databases tables A and B into a metadata as shown inFigure 9.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1199

    xquery version "1.0"; {

    for $a in doc(docA.XML)/elementA/elementAA, $b in doc(docB.XML)/elementB/elementBB

    where $a/key=$b/keyorder by $a/key

    return {$a/key} {$a/attribute1} {$b/attribute2}

    }

    The where clause for join primary key and the order by clause to sort the display

    The source XML documents with defined elements for merging

    Attributes to be displayed

    Figure 8. XQuery structure of XML documents merger.

    Relation Ra

    ==>

    A1 A2

    a11 a21

    a12 a22

    Relation Rx

    A1 A2

    a11 a21

    a12 a22

    A3

    null

    null

    a13

    a14

    a31

    a32

    null

    null

    Relation Rb

    A1 A3

    a13 a31

    a14 a32

    Figure 9. SQL structure of relational tables merger.

    Select Ra A1, Ra A2, Rb A3 from Ra, Rb where Ra A1=Rb A1.Insert Rx (A1, A2, A3) Value (a11, a21, null).Insert Rx (A1, A2, A3) Value (a12, a22, null).Insert Rx (A1, A2, A3) Value (a13, null, a31).Insert Rx (A1, A2, A3) Value (a14, null, a32).

    3.9. Case study: loading data according to XML or relational view

    As the global database schema and star schema are created, the mapped information in the metadatais used to perform the data aggregation. The database data for the case study appear below.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1200 J. FONG, H. SHIU AND D. CHEUNG

    Data from CRM system:

    Customer.

    Account id Amount owed Available credit

    100001 1000.00 4000.00100002 2000.00 3000.00100003 5000.00 0

    Data from WMS system:

    Inventory item.

    Item id Item desc Quantity Unit price1 A01 30 10.002 B01 20 20.003 C01 10 20.004 C02 15 25.005 C03 15 20.00

    Data from OMS system:

    Sales order.

    Order id Customer id Total amount . . . (Omitted)1001 100001 2000.00 . . .1002 100001 2000.00 . . .1003 100002 1000.00 . . .

    Order detail.

    Order id Item id Quantity . . . (Omitted)1001 2 10 . . .1002 2 10 . . .1003 3 10 . . .

    Construct local database queries: The middleware constructs local database queries accordingto the target database system. In this case study, three different types of queries were constructed:relational query, XML query and web database.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1201

    3.10. Retrieve target tables fields from metadata

    Retrieve dimension field information from metadata: Tables and fields from dimension tableSELECT

    D.DIMENSION ID, D.DIMENSION NAME, D.STAR SCHEMA ID,DT.GLOBAL TABLE, DT.GLOBAL FIELD, DT.LEVEL DESC, DT.PRIMARY FIELD, DT.LEVEL NO,GF.LOCAL TABLE, GF.LOCAL FIELD, GF.FIELD TYPE, GF.MAPPING RULE,AT.TABLE ID, AT.ATTRIBUTE TYPE, AT.METHOD NAME, AT.CARDINALITY, TB.PRIMARY KEY,TB.OPERATION TYPE, TB.PARENT ID, DB.DATABASE ID, DB.DATABASE NAME, DB.DRIVER STRING,DB.JDBC URL, DB.JDBC USERNAME, DB.JDBC PASSWORD, DB.DATABASE TYPE

    FROMMETA DIMENSION TABLE DINNER JOIN META DIMENSION TABLE DTL DT ON D.DIMENSION ID = DT.DIMENSION IDINNER JOIN META GLOBAL FIELD GF ON DT.GLOBAL TABLE = GF.GLOBAL TABLE

    AND DT.GLOBAL FIELD = GF.GLOBAL FIELDINNER JOIN META ATTRIBUTE AT ON GF.LOCAL FIELD =AT.ATTRIBUTE NAMEINNER JOIN META TABLE TB ON AT.TABLE ID = TB.TABLE ID AND GF.LOCAL TABLE = TB.TABLE NAMEINNER JOIN META DATABASE DB ON TB.DATABASE ID = DB.DATABASE ID

    WHERE D.START SCHEMA ID = SALESAND D.DIMENSION ID = INVENTORY

    Retrieve fact information from a star schema: Tables and fields from fact tableSELECT

    F.MEASURE NAME, F.FACT NAME, F.STAR SCHEMA ID, F.GLOBAL TABLE, F.GLOBAL FIELD,F.LEVEL DESC, F.PRIMARY FIELD,GF.LOCAL TABLE, GF.LOCAL FIELD, GF.FIELD TYPE, GF.MAPPING RULE, AT.TABLE ID,AT.ATTRIBUTE TYPE, AT.METHOD NAME, AT.CARDINALITY,TB.PRIMARY KEY, TB.OPERATION TYPE, TB.PARENT ID,DB.DATABASE ID, DB.DATABASE NAME, DB.DRIVER STRING, DB.JDBC URL, DB.JDBC USERNAME,DB.JDBC PASSWORD, DB.DATABASE TYPE

    FROMMETA FACT TABLE FINNER JOIN META FACT TABLE DTL DT ON F.STAR SCHEMA ID = DT.STAR SCHEMA IDINNER JOIN META GLOBAL FIELD GF ON DT.GLOBAL TABLE = GF.GLOBAL TABLE

    AND DT.GLOBAL FIELD = GF.GLOBAL FIELDINNER JOIN META ATTRIBUTE AT ON GF.LOCAL FIELD = AT.ATTRIBUTE NAMEINNER JOIN META TABLE TB ON AT.TABLE ID = TB.TABLE IDAND GF.LOCAL TABLE = TB.TABLE NAMEINNER JOIN META DATABASE DB ON TB.DATABASE ID = DB.DATABASE ID

    WHERE D.STAR.SCHEMA ID = SALES

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1202 J. FONG, H. SHIU AND D. CHEUNG

    3.11. Construct queries for target database system

    After the target tables and fields information have been retrieved from metadata, the middlewareconstructs queries according to database target and database type.

    SQL for CRM:SELECT ACCOUNT ID, AMOUNT OWED, AVAILABLE CREDITFROM CUSTOMER ACCOUNT

    XQuery for OMS:xquery version 1.0;

    {for $i in doc(salesorder.XML)//sales orderlet $b :=doc(orderdetail.XML)//order detail[order id=$i/order id]order by $i/order idreturn

    {current-dateTime()}{$i/order id}{$i/customer id}{$b/item id}{$i/total amount}

    }

    XQuery for WMS:xquery version 1.0;

    {

    let $d :=/inventory/Rowreturn

    {$d/ITEM ID}{$d/ITEM DESC}{$d/QUANTITY}{$d/UNIT PRICE}

    }

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1203

    3.12. Data integration of queried results

    The queried results are stored in the data warehouse. The data from the relational database areinserted into the data warehouse, while the XML documents from the XML database are transformedinto data object module (DOM) objects and then loaded into the data warehouse.

    (i) Query on XML data cube: Data are transformed into XML documents by mapping eachrelational table into each XML document with each tuple into each row instance.

    Translated XML document from RELATIONAL:

    10000110004000

    10000220003000

    1000035000

    0

    Data from WMS system (retrieve data from fact table for later OLAP operation).

    Inventory dimension:

    1A013010

    2B012020

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1204 J. FONG, H. SHIU AND D. CHEUNG

    3C011020

    4C021525

    5C031520

    Data from OMS system.

    Sales order fact:

    2006-04-23T17:49:52.359+08:00100110000122000.00000000

    2006-04-23T17:49:52.359+08:00100210000122000.00000000

    2006-04-24T17:49:52.359+08:00100310000231000.00000000

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1205

    (ii) Query on relational view:

    Data from CRM relational table.Customer dimension.

    Account id Amount owned Available credit

    100001 1000.00 4000.00100002 2000.00 3000.00100003 5000.00 0

    Transform RELATIONAL from WMS XML document by mapping each XML documentinto each relational table with each row instance into each tuple.

    Inventory dimension.

    Item id Item desc Quantity Unit price1 A01 30 102 B01 20 203 C01 10 204 C02 15 255 C03 15 20

    Transform RELATIONAL from OMS XML document by mapping each XML documentinto each relational table with each row instance into each tuple.

    Sales order fact.

    Time Order id Customer id Item id Total amount

    2006-04-23T17:49:52.359+08:00 1001 100001 2 2000.000000002006-04-23T17:49:52.359+08:00 1002 100001 2 2000.000000002006-04-24T17:49:52.359+08:00 1003 100002 3 1000.00000000

    3.13. Step 3. Aggregate data

    After the data cube has been built, aggregated data can be extracted from the data cube. Themiddleware stores aggregated data from heterogeneous sources into a relational or an XML database.The schema design is data centric since it minimizes the programming complexity during dataintegration, and is beneficial to data materialization when it performs SQL or XQuery for theaggregation of relational or XML data. Figure 10 illustrates how the OLAP cube retrieves an XMLdocument or relational data for data materialization.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1206 J. FONG, H. SHIU AND D. CHEUNG

    Star Schema Metadata

    Star Schema 1

    Star Schema 2

    Star Schema n

    XML DB //Relational DB

    Cube Processor

    Retrieve target cube structure

    Structure Details1. Star Schema Name2. Fact table name3. Measures4. Related dimension table names5.Attributes and level

    Retrieve fact table and dimension tables

    Figure 10. Cube data retrieval and materialization.

    (i) Data materialization of XML data cube aggregation data using XQueryCube formation algorithm.

    BeginRetrieve OLAP schema from star schema of metadata;For each dimension do

    Load XML contents from XML DB to DOM object;Load fact table content from XML DB into DOM object;Retrieve cube default setting;Load slice dimension parameters for filter;For each aggregate dimension row do

    Begin Build XQuery with slice and aggregate dimensions parameters;Aggregate and filter to summarize measures using XQuery to XML document;

    EndIntegrate all aggregate dimension rows XML into one XML document;Transform the serialized XML content into HTML using XSLT;

    End

    Use OLAP to retrieve aggregate data [22]. The middleware retrieves the fact table, dimen-sion tables and XML documents based on the metadata configuration. The retrieved XMLdocuments are aggregated and transformed into HTML using an XSLT engine.

    TransforminganXMLcubetovisual representation.Thecreatedcubeis transformedtoHTMLusing XSLT transformation. The transformed HTML is displayed in the Internet browser.

    Interacting on the materialized OLAP cube. The formed OLAP cube is transformed intovarious materialized views with the changed dimension values. The selection of OLAPreport by the user triggers the OLAP cube to retrieve data from the metadata of star schema.The OLAP cube uses the metadata of the star schema to build the materialized view of thereport. It can be implemented by ETL.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1207

    Dimension table retrieval:

    1: Retrieve star schema dimensions from star schema metadata2: Retrieve dimensions XML documents from XML data warehouse.3: Store the dimensions XML documents to the OLAP cube object.

    Fact table retrieval:

    1: Retrieve star schema facts from star schema metadata2: Retrieve fact XML documents from XML data warehouse.3: Store the fact XML documents to the OLAP cube object.

    Data materialization with aggregation and slice:Algorithm

    BeginReceive user input of dimensions;Process user input with dimensions XML documents in OLAP cube object;For each slice dimension doRetrieve XML contents using XQuery;For each aggregate dimensions doAggregate XML contents using XQuery;

    Transform materialized XML contents into HTML using XSLT engine;End

    (ii) Data materialization of relational data cube aggregation data using SQLCube formation algorithm:

    BeginRetrieve OLAP schema from star schema of metadata;For each dimension do

    Load relational data from Relational database;Load fact table content from Relational database;Retrieve cube default settingLoad slice dimension parameters for filter;For each aggregate dimension row do

    Begin Build SQL with slice and aggregate dimensions parameters;Aggregate and filter to summarize measures using SQL to relational data;

    EndIntegrate all aggregate dimension rows relational data into one relational table;

    End

    Dimension table retrieval:

    1: Retrieve star schema dimensions and levels from star schema metadata2: Retrieve dimensions relational tables from relational database.3: Store the dimensions relational data to the OLAP cube object.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1208 J. FONG, H. SHIU AND D. CHEUNG

    Fact table retrieval:

    1: Retrieve star schema facts and levels from star schema metadata2: Retrieve fact relational tables from relational database3: Store the fact relational tables to the OLAP cube object.

    Data materialization with aggregation and slice:

    Algorithm

    BeginReceive user input of dimensions;Process user input with dimensions relational tables in OLAP cube object;For each slice dimensions doRetrieve relational contents using SQL;For each aggregate dimensions doAggregate relational contents using SQL;End

    Select Sum(A02) from Fact Table I, where Dimension table 1 A02=Fact table 1 A02.Insert Fact table II (A02) value (Sum(A02)).

    3.14. Data materialization of aggregated XML data

    Finally, the integrated XML data in the data warehouse are retrieved and materialized by the OLAPmiddleware engine. When a user browses an OLAP report page in the browser, the middlewareretrieves the required fact table and dimension tables based on the information in the star schemametadata. The middleware performs slice and aggregation of the XML contents with expand modeand transforms the queried contents to HTML (Figures 1113).

    3.14.1. Rollup and drill down on an XML data cube

    The middleware captures the user input and aggregates the XML contents using XQuery. Figure 14shows the result of rollup where item B01 is in Order 1001 and 1002, and which aggregate to atotal amount of 4000 by referring back the Sales Order Fact Table, and Order 1001, 1002 and 1003are summed up to a total amount of 5000.

    3.15. Performance analysis

    The objective performance analysis of RXDW system was to test the actual performance of thesystem using sets of testing data. RXDW is a system capable of distributed computing. In orderto mitigate the deviation in performance affected by a network environment, the whole testingenvironment was set up and configured in a standalone server. Since RXDW required excess memoryusage, the program logic was changed by partitioning the XML during merging, and the Java heap

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1209

    FACT

    1 A02 100 2 A02 200 3 A03 200 ...

    N A0N 200

    AggregateOperation

    DIMENSION

    A02 A03

    FACT

    2 A02 300 3 A03 200

    Join XML by XQuery

    Aggregate ATTR2 =

    Figure 11. Aggregation done by XQuery.

    XQuery Structure of aggregation per rowxquery version "1.0";

    { let $d := /element_fact/Row where $d/attr2=A02 or $d/attr2=A03 return

    {sum($d/ measure1)}

    < measure2>

    {sum($d/ measure2)}

    }

    The where clause as filtering criteria

    The source XMLdocuments with defined elements for aggregation

    Aggregated measures in a row

    Figure 12. XQuery data aggregation for measures.

    size should be increased on the application server. The performance result was acceptable exceptduring the partial aggregation when it merged the data with tuples. The operation was the actualmerge of two or more XML documents. It is central processing unit (CPU) bounded. Though theactual merge time of 100 000 would take 8 min time in total, the web application will display a few

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1210 J. FONG, H. SHIU AND D. CHEUNG

    AO2100200

    AO2

    Join Fact tables bySQL

    AO2

    300

    Fact table I

    Fact table II

    Dimension tableAggregateOperation

    Figure 13. Aggregation done by SQL.

    Figure 14. Screen capture of an OLAP report in the case study.

    hundred rows on one page, and the merge process is partitioned to many pages with a processingtime of 23 s each on average.

    The hardware and software configuration are given as follows:

    Pentium M Intel 1.86 GHz CPU 1.5 GB RAM 100 GB 7200 rpm Hard disk Windows XP Professional service pack 2 MySQL 5.0 Java Development Kit 1.5.0 Xindice 1.1-XML Database Tomcat 5.5 (Xms 256 Xmx 768)

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1211

    3.15.1. Statistical result

    Table II shows the test results of RXDW.

    XML Data Import Statisitics

    0

    2000

    4000

    6000

    8000

    10000

    12000

    14000

    500 1000 3000 5000 10000 20000 30000 40000 50000 60000 80000 100000No. of database records

    Mill

    iseco

    nds

    Cube Operations Statistics

    0

    200000

    400000

    600000

    No. of database records

    Mill

    iseco

    nds

    Aggregation 1593 1862 1954 2360 4031 6578 9985 12422 14578 18328 24531 27109Merge & Aggregation 2593 13610 13250 32875 44047 88094 132141 176188 232331 264282 352376 547750

    500 1000 3000 5000 10000 20000 30000 40000 50000 60000 80000 100000

    4. CONCLUSION

    XML plays an important role in todays EAI [23,24]. The critical performance issue is how themiddleware aggregates data from multiple data sources. In fact, the main bottleneck of the middle-ware is in the data translation and data loading processes. This issue can be solved by distributingthe data integration process on the remote server with SOA [25], which is the preprocessed resultbefore it sends it back to the middleware host. After using XQuery and XSLT during the integrationprocess, the data integration process becomes easier and increases performance amazingly.

    The main contribution of this paper shows that it is feasible to provide a relational data cubeas well as an XML data cube of a-RXDW, where the data are extracted from relational tables and

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • 1212 J. FONG, H. SHIU AND D. CHEUNG

    Table II. Test results of RXDW.Data volume Import to XML Cube retrieve and merge

    (no. of records) database (ms) XML (ms)Transaction Aggregate Aggregate With aggregate With merge operation

    Master data data dimension fact only and aggregation

    3000 500 1282 1437 1593 25933000 1000 1282 1515 1862 76103000 3000 1282 1984 1954 132503000 5000 1282 1991 2360 328753000 10000 1282 2531 4031 440473000 20000 1282 3766 6578 880943000 30000 1282 5406 9985 1321413000 40000 1282 6172 12422 1761883000 50000 1282 7531 14578 3523763000 60000 1282 7906 18328 2642823000 80000 1282 10125 24531 3523763000 100000 1282 12828 27109 547750

    XML documents. The benefit is that the user is able to select a familiar OLAP languageeitherSQL or XQueryto retrieve aggregation data from an RXDW.

    ACKNOWLEDGEMENTS

    This paper is funded by Strategic Research Grant 7002140 of City University of Hong Kong.

    REFERENCES

    1. Katz H, Chamberlin D, Draper D, Femandez M, Kay M, Robie J, Rys M, Simeon J, Tivy J, Wadler P. XQueryfrom the Experts: A Guide to the W3C XML Query Languages, Katz H (ed.). Addison-Wesley: Reading, MA, 2004;187234, 355.

    2. Zhu H, Siegel MD, Madnick SE. Information aggregationA value-added E-service. http://ebusiness.mit.edu/research/papers/106%20SMadnick,%20Siegel%20Information%20Aggregation.pdf, Working Paper #106, 2001 [1 June2001].

    3. Madnick SE, Siegel MD. Seizing the opportunity: Exploiting web aggregation. http://ebusiness.mit.edu/research/papers/144%20Madnick,Aggregator.pdf, Working Paper #144, 2001 [7 December 2001].

    4. High-Performance Information Aggregation Using XML-Based Operational Data Servers. http://www.zapthink.com/report.html?id=WP-0114 [1 May 2003].

    5. Madnick S, Zhu H. Improving data quality through effective use of data semantics. Data & Knowledge Engineering2006; 59:460475.

    6. Building XQuery Based Web Service Aggregation and Reporting Applications. http://www.stylusstudio.com/xquery tutorial.html [1 January 2004].

    7. Mid Tier XML Market Opportunities, Strategies, and Forecasts, 2004 to 2009. http://www.giichinese.com.tw/chinese/wg17637 XML market toc.html [1 January 2004].

    8. Chaudhri AB, Rashid A, Zicari R. XML Data Management: Native XML and XML-Enabled Database Systems (1st edn).Addison-Wesley: Reading, MA, 2003.

    9. Xyleme L. Xyleme, a dynamic warehouse for XML data of the web. International Symposium on Database Engineeringand Applications, Grenoble, France, 2001; 38.

    10. Pedersen D, Riis K, Pedersen TB. XML-extended OLAP querying. Proceedings of the 14th International Conferenceon Scientific and Statistical Database Management (SSDBM02), Edinburgh, Scotland, 2002.

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe

  • RELATIONALXML DATA WAREHOUSE FOR DATA AGGREGATION 1213

    11. Park B, Han H, Song I. XML-OLAP: A multidimensional analysis framework for XML warehouses. Proceedings ofDAWAK05 (Lecture Notes in Computer Science, vol. 3589). Springer: Berlin, 2005; 3242.

    12. Kimball R, Caserta J. The Data Warehouse ETL Toolkit. Wiley: New York, 2004, ISBN: 0-764-56757-8.13. Fong J, Li Q, Huang SM. Universal data warehousing based on a meta-data modeling approach. International Journal

    of Cooperative Information System 2003; 12(3):325363.14. IEEE Society Distinguish the Type of Middleware. http://dsonline.computer.org/portal/site/dsonline/menuitem.20d6846

    e1c7ed783f1a516106bbe36ec/index.jsp?&pName=dso level1 home&path=dsonline/topics/middleware&file=index.XML&xsl=generic.xsl& [1 January 2004].

    15. Aggregate Data with Xquery from Oracle Technical Network. http://www.oracle.com/technology/oramag/oracle/05-mar/o25XML.html [1 March 2005].

    16. Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufmann: Los Altos, CA, 2001; 5870.17. Fong J. Information Systems Reengineering and Integration. Springer: Berlin, 2006; 2122, 160198.18. Lee D, Chu WW. CPI: Constraints-preserving inlining algorithm for mapping XML DTD to relational schema. Data &

    Knowledge Engineering 2001; 39(1):325.19. Deitel HM, Deitel PJ, Nieto TR, Lin T, Sadhy P. XML How to Program. Prentice-Hall: Englewood Cliffs, NJ, 2001;

    297318.20. Kwan I, Fong J. Schema integration methodology and its verification by use of information capacity. Information Systems

    1999; 24(5):355376.21. Poe V, Klanuer P, Brost S. Building a Data Warehouse for Decision Support (2nd edn), ch. 9. Designing the database

    for a data warehouse. Prentice-Hall: Englewood Cliffs, NJ, 1998.22. Han J, Kamber M. Data Mining: Concepts and Techniques, ch. 2. Morgan Kaufmann: Los Altos, CA, 2001; 39103.23. Chase N. XML Primer Plus. SAMS: Indianapolis, Indiana, 2003; 47113, 291373.24. Morgenthal JP, La Forge B. Enterprise Application Integration with XML and Java. Prentice-Hall PTR, 2001; 131146,

    291373, 719853.25. http://www.w3org/2003/Talks/1211-XML2003-wssoa [11 December 2003].

    Copyright q 2008 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2008; 38:11831213DOI: 10.1002/spe