The Mashup Center cache improves performance by avoiding the processing of data that doesn't change very often. This article explains how to take advantage of caching in Mashup Center 2.0 to improve the performance of feeds and data mashups.
But before we dive into a discussion of feeds and data mashups, it's worth it to mention that the web browser cache settings should not affect Mashup Center caching because the feed headers that get generated specify a no-caching option. The one exception to this rule is if you are accessing more than one version of Mashup Center from the same browser; in this case, clear the browser cache when switching from one Mashup Center instance to another. If you find any other situation that requires you to flush your browser cache for a feed to work properly, then contact technical support or post a question to the IBM Mashup Center forum (https://www.ibm.com/developerworks/forums/forum.jspa?forumID=1470).
Feed caching is off by default. You can enable caching for a feed from the Edit Details page for that feed. In the Advanced section, expand the Caching data option. Check the Cache data from the feed option and enter the number of seconds the data should be cached. For example, if you want to cache the data for one day, enter 86400 seconds, as shown below:
For the purposes of this article, when we say that caching is enabled, we mean that the Cache data from the feed box is checked and the number of seconds entered is greater than zero.
This section provides more details for several feeds.
Access, CSV, Excel, and XML Feeds
Mashup Center automatically caches Access, CSV, Excel, and XML feeds in the database. A copy of the uploaded resource is stored in the database, whether it was uploaded via URL or from the filesystem. The generated feed is also stored in the database and will only be regenerated if the feed is edited or the remote resource is updated. If caching is not enabled, the feed is delivered from the copy stored in the database. If caching is enabled, the feed is delivered from the Mashup Center cache. There is a difference in how the feed gets refreshed depending on whether the source is a file or URL and whether caching is enabled.
If you upload the resource from a file, the feed is regenerated if you edit it to upload a new file.
If you upload the resource via URL, the remote resource is checked to see if it has changed, but when exactly that check occurs depends on the cache setting:
- If caching is not enabled, the last modified timestamp for the remote resource is checked to see if the resource has been updated since it was cached locally. If the cached copy is outdated, a fresh copy is downloaded. If the cached copy is current or the remote resource is unavailable, then the cached copy is used for generating the feed.
- If caching is enabled, the processing is the same as for step 1, but occurs only if the cache time has expired. Until it expires, the feed is delivered from the local cache.
For example, the National Park XML Feed, which is one of the installed examples, is an excellent candidate for caching because it stores information about national parks in an XML file that does not get updated very often. Follow the steps shown below to enable caching:
- Login as the administrative user since it owns the installed examples.
- On the Home:Catalog tab, find the National Park XML Feed, hover over the title, then select Show Details.
- In the Actions menu on the right, select Edit Details.
- In the Advanced section, expand the Caching data option.
- Check the Cache data from the feed box.
- Fill in the number of seconds you want to cache the data.
- At the bottom of the page, click the Finish button.
The first time you run the feed after enabling caching, the feed result will be cached. Now any user on your Mashup Center server who runs that feed will get the cached copy and avoid the overhead of generating the feed.
Remember that the XML file for the National Park XML Feed was stored in the database when the feed was created. If you update the XML file, for example to add more parks, edit the feed source to upload the modified file. Changing the data deletes the cache entry. The first person who executes the feed will cause the feed to be regenerated and cached.
When you enable feed caching for a Relational feed, then the feed result is fetched from cache and the SQL statement is not executed.
Relational feed examples are installed with caching enabled for 86400 seconds. So, the Parts Order Feed and the Policy Holders Feed examples both have caching enabled for one day. A feed that takes parameters will have a separate cache entry for each unique set of parameters. If you run the Parts Order Feed with the Orderid set to 10000, then run the feed again with the Orderid set to 10001, there will be two cache entries. If the Parts Order Feed is modified and saved, all cache entries for that feed are deleted.
Mashup Center 2.0 adds insert, update, and delete support to relational feeds. The feed result is a rowcount that indicates the number of rows in the database that were affected. A relational feed that updates the database should not enable caching because Mashup Center will fetch the feed result from cache instead of executing the feed.
Data Mashup Caching
A data mashup has two kinds of caching that the developer can set: caching of the resource that the Source operator imports into the mashup and caching of the generated feed. The default cache settings are different for the Source operator and for feeds, which sometimes causes confusion, so this section clarifies how it works.
Because feed caching is simpler, we discuss it first.
When you enable feed caching for a data mashup, then the feed result is fetched from cache and the data mashup is not executed.
Data mashup examples are installed with caching off. Enable caching on the Edit Details page.
A data mashup might take query parameters. For example, the Expiring Policies Data Mashup example takes CURRENT_YEAR and NEXT_MONTH parameters, which default to 2009 and 12, respectively. If you enable caching for the Expiring Policies Data Mashup, the query parameters create a unique cache entry. If you run the mashup again with CURRENT_YEAR set to 2010 and NEXT_MONTH set to 1, that feed result will be cached separately. If the data mashup is modified and saved, all cache entries associated with it are deleted.
Source Operator Caching
The Source operator in a data mashup caches the imported resource for one hour by default. This one hour default is the direct opposite of feed caching, which sets caching off by default, and hence the confusion. But it makes sense to cache imported resources during data mashup development because each time you preview an operator, it sends a mashup execution request, which in turn fetches the source(s) for that mashup. If you submit too many requests to an external web site during mashup development, instead of fetching the data from local cache, you might find your IP banned. After you finish developing or updating a data mashup, review the cache setting for each Source operator and set it appropriately.
To disable caching or set a different time, click on the Advanced tab and change the Refresh Interval. To turn caching off, select the "Always from source" option:
The input source is refreshed when:
- The refresh interval expires.
- You're working in the Data Mashup Builder editor and:
- You set the refresh interval to "Always from source".
- You reduce the refresh interval to a value that causes cached data to expire. For example, if the cache was set to one hour, then 20 minutes later you reduce the refresh interval to 10 minutes, the cached data will expire immediately because the data is already 20 minutes old.
- The feed-level caching for the input Catalog feed has expired. So, the data mashup sees that the entry expired, and even though the Source operator's refresh interval has not expired, the input source is regenerated. (However, if a Source operator imports a Mashup Center feed that has caching disabled, then the net effect is the data mashup caches that feed.)
The Mashup Center 2.0 National Park Forecast Data Mashup, an installed example, imports the National Park XML Feed, which we discussed above. For each zipcode in the National Park XML Feed, the National Park Forecast Data Mashup gets the current weather from Yahoo!. Below is a snapshot of the Source operator with the default zipcode of 94618 loaded:
Notice the query parameters in the URL: ?p=94618&u=f
When a feed executes, the cache is unique for each set of query parameters. At runtime, this mashup iterates through the XML feed and fetches the weather for each zipcode. There will be one cache entry for zipcode 94618, another cache entry for zipcode 94123, and so on.
The Source operator caches the Yahoo! result for the default one hour. When you save the mashup, evaluate that refresh interval and decide what the best setting might be for your situation.
Incidently, if the source is imported from the Catalog and takes query parameters, then, like the Yahoo! example, separate cache entries will be created, one for each unique set of query parameters. Unlike the Yahoo! feed, if the Catalog feed is updated, all of its related cache entries will be deleted regardless of the refresh interval.
If you set any of the sources for a data mashup to Always from source, don't enable feed-level caching for the mashup.
Also beware of importing feeds that do an update as part of running the feed, such as a Relational feed that updates the database.
Feeds and Source Operators Use Different Caches
Internally a cache is given a name to store the data result, then it is later fetched by that name (the "key"). When you have DEBUG enabled for the Mashup Center log, you'll see the cache keys output to the log file. Feed caching and Source operator caching create different cache names, so they don't share each others cache.
For example, let's say you create a new feed using the Atom or RSS Feed plug-in and you register the IBM Software News feed:
If you enable caching for that registered feed, internally it creates a cache name that looks something like this (for readability, the name is split across two lines):
That initial number, 352, is the entry id for the Mashup Center feed.
Now create a data mashup with a Source operator that imports that Mashup Center feed. The refresh interval is set to the default (one hour), so it gets cached. Internally, the name for the cache is the URL of the Mashup Center feed that it imports:
Up through V184.108.40.206, the Source operator always creates its own cache and won't use feed-level caching. Starting in V220.127.116.11, the Source operator takes advantage of feed-level caching.
When Feed Execution Fails
When a feed fails to execute -- for example, if the source is unavailable -- that feed result is not cached (exceptions are listed in the section titled When a Feed Error Gets Cached). However, it's possible for a data mashup Source operator to cache a partial or empty result. This section explains how this can happen.
Consider a chain of feeds where Mashup 3 loads Mashup 2, and Mashup 2 loads Atom/RSS Feed 1:
- Atom/RSS Feed 1
- The source is an external URL.
- Mashup 2
- The Source operator loads Atom/RSS Feed 1.
- The Source operator sets the refresh interval to cache the result (i.e., it is not set to always from source).
- If the feed fails to load, the Source operator WON'T CACHE the result.
Most Source operator load failures don't cause the mashup as a whole to fail. Instead, the mashup continues processing and returns results without that Source operator. So, the mashup may return partial results, or even an empty result.
- Mashup 3
- The Source operator loads Mashup 2.
- The Source operator sets the refresh interval to cache the result.
- If Mashup 2 returns a partial or empty result because a Source operator failed to load, Mashup 3's Source operator WILL CACHE that result because Mashup 3 doesn't know that Mashup 2 has a Source operator that failed to load.
Starting in V18.104.22.168, the Source operator will not cache the result if there was an earlier load failure, so this will fix the problem with Mashup 3. However, some plugins sometimes return a feed containing the error information instead of returning the error directly. In this case, the error will be cached. More details are described in the section below titled When a Feed Error Gets Cached.
One way to avoid caching partial or empty results in mashup chains is to set the refresh interval in the Source operators to Always from source; however, that won't perform as well as caching. Enabling caching, and deciding for how long, is a trade-off between performance and tolerance for old (and possible partial or empty) data.
Another nuance to consider is plugin-specific. If Atom/RSS Feed 1 has caching enabled and it fails, the feed will return an error and won't be cached at the feed level. But what if Feed 1 was created by the Access, CSV, Excel, or XML plug-ins? Remember that these plug-ins store a copy of the feed result in the database. If the feed has been successfully executed and a copy of the result has been stored in the database, now if the feed fails, the last good copy stored in the database will be returned and will also be cached for the feed.
When a Feed Error Gets Cached
As previously noted, if caching is enabled for a feed and feed execution fails, MashupHub does not cache the feed. However, up through V22.214.171.124, some MashupHub plugins sometimes return a feed containing the error information and do not indicate that an error occurred. In these cases, MashupHub will cache that feed; and, even when the resource becomes available, an error will continue to be returned from cache until the cache expires. The following plugins are affected:
- IBM Information Server
- Web service
If a feed accesses a service that is often unavailable or returns errors, caching can cause confusion especially if other feeds accessing the same resource succeed. In this case it might be prudent to turn caching off or set it to a low value. As noted earlier in this article, caching is enabled in two places:
- Feed caching is off by default and can be enabled on the Edit Details page for that feed.
- Data mashup Source operator caching is on by default and set to one hour on the Advanced tab.
About the author
Jean Anderson is a member of the IBM Information Management Advanced Technology team and works as a development lead on MashupHub. You can contact her at firstname.lastname@example.org.