IBM®
Skip to main content
    Country/region select      Terms of use
 
 
   
     Home      Products      Services & solutions      Support & downloads      My account     

developerWorks  >  Lotus  >  Forums & community  >  Best Practice Makes Perfect

Best Practice Makes Perfect

A collaboration with Domino developers about how to do it and how to get it right in Domino

Got the following question and it seemed like a good blog topic:

I have created a categorized notes view with the column formula for the categorized column as:

@Explode(Item_Desc+"-"+Month+"~"+Store_Loc+"-"+Store_Name+"~"+Month+"-"+Store_Loc+"-"+Store_Name;"~")

Using @Explode for categorized column actually makes it look like 3 different categorized views. This will be used as an embedded view that will help with 3 different types of queries, so that I can save on the number of views and its indexes etc.

However, using categorized view supposedly results in performance degradation. Will it become worse if I use the method like above (as its more like 3 categories)  and increase view indexing? Can this possibly lead to a corrupted views?

Before I address the stated question, a couple of side notes.

  • The formula is more efficient, readable and reliable if written this way: (Item_Desc+"-"+Month) : (Store_Loc+"-"+Store_Name) : (Month+"-"+Store_Loc+"-"+Store_Name)
    (Why more reliable? Because you don't have a problem if ~ appears in your data. Unlikely, I know, but who knows these days what someone might name their store?)
  • There's a tradeoff (there's always a tradeoff) between performance and space usage here. By using this formula for a computed field on the form, you could design a view that just refers to the computed field instead of having the formula evaluated in a view column. In this case, the same formula is probably not used in any other views, so it's probably not worth it, but it's something I always consider when I have a complicated formula in a view.
As for the questions actually asked, instead of added by busybody Andre:

Will [performance be] worse if I use the method like above (as its more like 3 categories) and increase view indexing?

Worse compared to what? The alternative is to use three separate views, one for each type of key. The short answer is, performance will probably be better with separate views, and there are other reasons that a combined view is less than optimal also.

Multiply-categorized views are not so huge a problem that we will shoot you for using them. This is not an avoid-at-all-costs thing, just a use-caution thing. They have their uses. I'm not so sure this is a good one, though.

On the up side, there's some per-view overhead which makes one view preferable to three. If a document appears multiple times in a view, the column values are only calculated once, though the view entry is duplicated multiple times in the view index b-tree. So the index of a view where each document appears three times, is roughly three times the size of a view where each document appears only once -- no significant space saving.

On the down side, view performance is greater than O(n) to the number of view entries -- a view containing three times the number of entries is more than three times slower. So when you do your lookup, the user may have to wait noticeably longer for the combined view to index any recently added/modified documents, than they would for one of the separate views. This would be true even if the performance were linear.

On the up side, the more frequently a view is used, the more likely it is that its index is up to date already when you go to access it. The combined view would be accessed three times as frequently as the single views, so it's more likely to be up to date. If documents are created and modified infrequently, the combined view might be faster on average.

On the down side, the combined view forces you to put all the columns you might need to read as your lookup result, in the same view index. Let's take the extreme situation and suppose that you have a separate "result" column for each type of key. A single document appears three times in the index with a copy of each column, for a total of nine values, plus the key makes twelve. If you had three separate views, each with only one result column, each document stores one key and one result in the index, for a total of six values -- half the size.

Note: I'm not quite clear on how the categorized column is stored with the view entries. Obviously the category values are stored as a separate sorted tree each with its own set of entries, but I think the view entries also contain a copy of the complete value from the category column. If you use the ColumnValues property you can retrieve the complete category value -- multiple values -- so it seems likely that the multiple values are stored multiple times. The amount of storage needed is the square of the number of categories. This obviously makes the combined view even less desirable.

If I'm right about this, any use of categories involves some duplication of data, which is worse if there are fewer documents in a category. If you do your lookups using LotusScript, you can search multiple uncategorized sorted columns by passing an array as your key value -- so you could have a lookup view with separate sorted Item_Desc and Month columns, and this would be a faster view for lookups than a view that's categorized by Item_Desc+"-"+Month. I notice you have two keys that are the same except that one includes a month. You might be able to get away with just two lookup views instead of three, by sorting one by Store_Loc, then Store_Name, then Month, and using an array of either two or three values as your key, depending whether you wanted to search by month.

Again on the down side, with the combined view, you have risk of a conflict between keys of different types. For instance, if you look up a (Item_Desc+"-"+Month) and the resulting value happens to be the same as a (Store_Loc+"-"+Store_Name). You know your data, so you might have some idea of how likely this is. You could make problems less likely by distinguishing the keys in some other way, e.g. with an extra character at the beginning or end ( ("I"+Item_Desc+"-"+Month) : ("L"+Store_Loc+"-"+Store_Name) : ("M"+Month+"-"+Store_Loc+"-"+Store_Name) ). Of course, this will also slow things somewhat since it makes your formula more complex and probably makes the search take a tiny bit longer.

A final disadvantage of the combined view: There's an upper limit to the size of a view index. It's a rather high limit, so you don't have to worry about this unless you have a whole lot of documents, but if that is the case, you can have three times as many documents with separate views, before you run up against this limit.

So overall, it looks like in almost all cases, the combined view is going to be slower than the separate views and cause more potential problems.

Can this possibly lead to a corrupted view?

No, except insofar as you might run up against that index size limit. I'm not sure exactly what happens in that case -- anyone care to comment?

Andre Guirard | 28 November 2007 05:00:00 PM ET | Plymouth, MN, USA | Comments (6)

Search this blog 

Disclaimer 

    About IBM Privacy Contact