ShowTable of Contents
Introduction
This article provides a brief introduction to GAIAN Database (hereafter referred to as GaianDB) and shows how it can be quite useful. Our example involved collecting data about vehicles sold from 10 different garages, each with their own Microsoft® Excel spreadsheets, importing them into IBM® Mashup Center (Mashup Center) to create a feed, and building the mashup from that feed.
This article compares a method of doing this via native mashups with how it's done with the help of GaianDB, and should be useful to anyone creating a mashup using data collected from spreadsheets, database tables or any other data sources.
Native mashups
In the pre-GaianDB days, this was our process:
- Collect all the spreadsheets together on a single server.
- Use mashup hub to load each page of the spreadsheets in turn, creating 20 different feeds, and combine the 20 feeds to create a single mashed-up view of the spreadsheets.
- This involved merging each spreadsheet’s two pages, combining the result, and then transforming the combined result to alter the repeating element, so that the final combined feed could be searched, as shown in figure 1.
Figure 1. Using mashup hub to combine 20 feeds
4. Then, to get results “by brand”, we added the output from Step 3 into another data mashup, and performed other functions---filtering, grouping, and sorting---to get the required output (see figure 2).
Figure 2. Getting results by brand
If new spreadsheets became available, this view needed updating with the new spreadsheets. This was not plausible, however, for more than approximately 20 spreadsheets, due to lack of processing speed in the browser.
Using GaianDB
The GaianDB allows data to be queried from many different sources at once, including relational database tables, flat files, and spreadsheets, by federating data from multiple sources (so, you could say a field in one place is the same data as a field in another place, even though it has a different name).
In this case, it’s behaving as an abstraction layer above all the spreadsheets, meaning that, instead of creating 22 feeds to obtain results by brand, we need only create one that connects to the GaianDB’s
Derby database and runs a query (see figure 3).
Figure 3. SQL statement for query
The “(?)” is translated by Mashup Center into a URL parameter, and so instantly you can call a URL with, for example, ?brand=Audi, and this performs the query across the spreadsheets and returns the results as a feed.
You can do more complicated aggregations as part of the SQL statements that would be cumbersome to do with Mashup Center. However, before this can be done, some configuring of GaianDB must be done. To do this:
- Unzip the GaianDB .zip folder you downloaded from developerWorks, and run ./launchGaianServer.sh (or bat for Microsoft Windows®). This looks for the other GaianDB nodes in the network which it automatically forms a connection. This command must be run on every machine with a spreadsheet to share.
- Configure the config.properties file:
a) First, create a “Logical Table” containing these columns in the spreadsheet:
LGARAGES_DEF=Garage_Ref VARCHAR(50),Brand VARCHAR(10),Make_Sold VARCHAR(100),Model VARCHAR(80), Number_Sold VARCHAR(80) etc…
b) Then define where the files are, and what the data range is (refer to the Gaian docs part of the downloaded ZIP ):
LGARAGES_DS0_VTI=com.ibm.db2j.GExcel
LGARAGES_DS0_OPTIONS=MAP_COLUMNS_BY_POSITION
LGARAGES_DS0_ARGS=/home/Documents/OxonAudi.xls,Sales,A14,AA118,False
Alternatively, the following SQL API can be used for creating the same configuration:
call setltforexcel ('LGARAGES', '/home/Documents/OxonAudi.xls,Sales,A14,AA118,false')
c) If you later need to add more spreadsheets, there are two ways to do it. You can either:
- Copy and paste into the GaianDB config_properties files the three lines in the preceding step (b) for each new spreadsheet you are adding and, without your having to restart anything, the new data will be included in the query the next time the mashup’s feed is used; or
- Dynamically add the new data sources programmatically, using the SQL API, without your having to modify the GaianDB config_properties files at all. This can even be done from the Mashup Center environment itself. The SQL API call would look like this:
call setdsExcel('LGARAGES', '1',
'/home/Documents/OxonAudi2.xls,Inventory,A3,AA57,false' ,
'MAP_COLUMNS_BY_POSITION', '')
The XL files or other data sources don’t need to be on one single machine; they can be spread across multiple machines. The GaianDB will take care of maintaining the connections between all remote XL files and combining them for you with its built-in discovery mechanism.
For instance, each Garage could host its own XL file and be accessed by the GaianDB seamlessly. There's no longer any need to copy them onto one single machine or copy/paste into one single spreadsheet, as shown in figure 4.
Figure 4. Federated garage data sources
The GaianDB lets us mix and match different data sources under the federation layer while still providing a single view of the data. Therefore one of the garages could actually have the data held in a DB2 database instead of an XL file, or held in any other RDBMs, like ORACLE, MySQL and SQLServer.
For example, the Garage 4 data could be a mixture of a local XL file and some records in DB2 and be merged automatically by the GaianDB layer, which would not impact the Mashup Hub application at all. The Mashup Hub still sees only one logical table aggregating all records from all these heterogeneous data sources.
The GaianDB maintains connection to other GaianDBs automatically in a way that is quite similar to a human social network (this is the default behavior, but it can be changed). Therefore, if a Garage IP address changes, or the machine is moved from one network domain to another, the GaianDBs will still keep the connections, and the Mashup Hub data feeds will not be impacted at all. Only one feed will be needed as input to the mashup.
The GaianDB layers therefore provide a very nice abstraction layer, keeping data away from machine implementation issues (heterogeneity, location, partition, etc.).
Conclusion
In summary, this approach not only has saved us hours of development time but also, because the GaianDB can discover and access datasources wherever they are located, and maintain connection between all remote Excel files, there is no need for the business to either collate the spreadsheets or place them in one central repository.
Resources
GAIAN Database overview:
https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=f6ce657b-f385-43b2-8350-458e6e4a344f
IBM Mashup Center:
http://www-01.ibm.com/software/info/mashup-center/
developerWorks article, “IBM Mashup Center: OpenSocial interoperability”:
http://www.ibm.com/developerworks/lotus/library/mashups-opensocial/index.html
IBM Mashup Center product page:
http://www-01.ibm.com/software/info/mashup-center/
IBM Mashup Center Forum:
http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1470
About the authors
Patrick Dantressangle is a Senior Technical Staff Member working in information management for IBM Emerging Technology Services. He has worked on RDBMs since 1990 and at IBM since 1996. Mr. Dantressangle is now working on new emerging information management issues, and within the International Technology Alliance (ITA) project was one of the architect who created the GaianDB. He can be reached at
DANTRESS@uk.ibm.com.
Kevin Brown spent his first 10 years in IBM's Services organization as a developer, then architect, before enjoying a further six years in Emerging Technology Services. Here he works with innovative customers on Proof of Concept / Demonstrations, using new and emerging technologies ranging from the integration of many different types of sensors, mobile devices, Web and mashup technology, augmented reality, and brain---computer interfaces.