Understanding general performance characteristics of the catalog, feed generators, and data mashup builderAdded by IBM | Edited by Stanley Bradbury on May 9, 2012 | Version 7 (Show original)
|This topic describes general performance characteristics of the catalog, feed generators, and the data mashup builder. This information was included in the Mashup Center 2.0 release notes, which are located on the Mashup Center support site.
Contents: IBM Mashup Center Performance Tuning Guide
Download this content as a PDF file
The main performance characteristics of the catalog, feed generators, and the data mashup builder performance are as follows:
- The aggregated size of data that is processed by MashupHub.
- The complexity and the number of operations in typical data flow mashups.
- The number of concurrent users.
- The size and usage patterns of the mashup catalog, for example, the number of objects in the catalog and the number of searches that are supported.
- The caching and feed response characteristics when using network data sources.
- The memory allocated to the server
The guidelines in this section are based on WebSphere Application Server with the following configuration:
- JVM initial heap size: 384 MB
- Maximum heap size: 1024 MB
- JVM parameter: -Xgcpolicy:gencon -XgcthreadsN (N is double the CPU core number)
- Generational garbage collector enablement: Set by adding the setting -Xgcpolicy:gencon to the JVM generic arguments in WebSphere Application Server. Enable verbosegc logging and monitor heap use with the previous settings. All the JVMs running on a physical server must fit into the physical memory of the server. See the instructions for changing the JVM settings in WebSphere at the following Web site: http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/trun_jvm.html .
- Web container Thread pool: minimum 50, maximum 100.
- Database Connection Pool for the MashupHub data source: minimum 10, maximum 50, Statement Cache Size of 100 statements.
- Servlet caching: enabled.
- Performance monitoring on the server: disabled.
Following are the general performance characteristics for a regular low-end server with a typical hardware configuration, which includes a dual-core chip on 2-way servers and 4 GB memory:
- Data feed size: Feeds up to 30 MB output size can be created. The optimal size for feeds is up to the 2 MB range for the feed output. The input size for feeds for a 2 MB output feed depend on the nature of the input feed. For example, a 500 KB Excel input file can generate a 2 MB output XML feed.
- Complexity and the number of operations in typical data mashups: There is no limit for the number of operators. The optimal number of operators is less than eight operators and the data within the flow is 100 - 1,000 rows.
- For large feeds that generate an output feed of 10 MB or more, the server will be constrained by the amount of memory that it can use. If multiple users execute large feeds at the same time, it can exhaust the Java heap memory. Increase the maximum heap setting to 1400 MB.
- For feeds that use back end data sources, for example, relational databases, enable feed caching to improve performance. Feed caching will show a significant improvement for relational, Web services, and other feeds that require access to a back end server. For the feeds that are generated from CSV and Excel files, it is not required to specify feed caching.
- Data mashup Source operator: If caching is turned on, performance will improve the second and subsequent times that the URL for the Source operator is accessed.
- Preview and Load in the Data mashup builder might perform slowly for input XML feeds greater than 2 MB. When you specify large feeds that have several megabytes or more of data in the Source operator, it might take several minutes for the Source operator to finish loading. The time depends on the volume of data and the current server load.
- Feed Policy Processing: Enabling feed policy processing decreases throughput. Depending on the client application workload when you enable feed policy processing, MashupHub might need additional resources:
- Increase the web container thread pool size on the server to at least twice the number of concurrent clients expected in typical client application workloads. Choose the web container thread pool, and then change the maximum size. See the following Web site for information about changing the thread pool settings:
- Increase the database connection pool size for the MashupHub data source to twice the number of concurrent clients expected in typical client application workloads. See the following Web site for information about increasing the database connection pool size:
- Data mashups guidelines. As the complexity of a feed transformation increases, the performance can inversely decrease. Following are the key elements of transformation complexity:
Number of operators
Types of sources
Size of feeds and the amount of data within feeds
Types of operators
Size of the resulting feeds
- Caching data mashup feeds can impact the capacity and server response. Working from cached feeds eliminates the computation and I/O costs associated with a mashup creation, similar to that of a database cache. Enable caching whenever it is appropriate.
For example, consider a dashboard mashup application for the total sales by month. It is probably sufficient to work from orders received an hour or a day earlier and caching can be enabled. In contrast, if the mashup application supports a service representative scenario, then waiting until the cache expires before seeing the most recent order is probably not acceptable.
As the number of feeds in a data mashup increases without caching, feed performance decreases. Enable data mashup source caching as appropriate. Some sources can be fast such as an XML file or a spread sheet, whereas other feeds sources can be expensive to retrieve.
- Sources from external sites can affect performance because of network and site availability. To offset the network cost, enable caching whenever possible at the feed, data mashup, or mashup application level.
- Feed size: MashupHub does not perform as well as for large feeds as it does for small feeds. To reduce feed size, filter the feed source, for example, for an external feed or data from a relational data source. For a relational source, filter the data using an SQL query. When creating a mashup, reduce the amount of data by adding a Filter operator after the Source operator.
- Some operators affect performance:
- The For Each operator affects performance because of the network cost associated with this operator. The For Each operator takes values from each entry in a feed and fetches data from another feed based on a variable in the second feed. The variable in the second feed is substituted with the values from the first feed. For each entry in the first input, there is a corresponding URL for the second input from which data is fetched. The network cost of fetching data can increase and slow the mashup. For example, if the first feed has 500 entries, for each of those entries the second source is fetched 500 times. If the cost of retrieving from an external site is 1 second, retrieving time is 500 seconds, which is over 8 minutes.
- The Merge operator affects performance because of the complexity of the merge operation.
- The Sort operator is an expensive operator in terms of the CPU usage.
- Eliminate any unnecessary data in the output feed and generate the mashup tailored to your specific needs. The serialization cost depends on the size of the output feed and also the network cost of transferring of data from the server to the client.
- Data mashups are more CPU-intensive than base feeds.