IBM®
Skip to main content
    Country/region select      Terms of use
 
 
   
     Home      Products      Services & solutions      Support & downloads      My account     

developerWorks  >  Lotus  >  Forums & community  >  Best Practice Makes Perfect

Best Practice Makes Perfect

A collaboration with Domino developers about how to do it and how to get it right in Domino

Got the following question and it seemed like a good blog topic:

I have created a categorized notes view with the column formula for the categorized column as:

@Explode(Item_Desc+"-"+Month+"~"+Store_Loc+"-"+Store_Name+"~"+Month+"-"+Store_Loc+"-"+Store_Name;"~")

Using @Explode for categorized column actually makes it look like 3 different categorized views. This will be used as an embedded view that will help with 3 different types of queries, so that I can save on the number of views and its indexes etc.

However, using categorized view supposedly results in performance degradation. Will it become worse if I use the method like above (as its more like 3 categories)  and increase view indexing? Can this possibly lead to a corrupted views?

Before I address the stated question, a couple of side notes.

  • The formula is more efficient, readable and reliable if written this way: (Item_Desc+"-"+Month) : (Store_Loc+"-"+Store_Name) : (Month+"-"+Store_Loc+"-"+Store_Name)
    (Why more reliable? Because you don't have a problem if ~ appears in your data. Unlikely, I know, but who knows these days what someone might name their store?)
  • There's a tradeoff (there's always a tradeoff) between performance and space usage here. By using this formula for a computed field on the form, you could design a view that just refers to the computed field instead of having the formula evaluated in a view column. In this case, the same formula is probably not used in any other views, so it's probably not worth it, but it's something I always consider when I have a complicated formula in a view.
As for the questions actually asked, instead of added by busybody Andre:

Will [performance be] worse if I use the method like above (as its more like 3 categories) and increase view indexing?

Worse compared to what? The alternative is to use three separate views, one for each type of key. The short answer is, performance will probably be better with separate views, and there are other reasons that a combined view is less than optimal also.

Multiply-categorized views are not so huge a problem that we will shoot you for using them. This is not an avoid-at-all-costs thing, just a use-caution thing. They have their uses. I'm not so sure this is a good one, though.

On the up side, there's some per-view overhead which makes one view preferable to three. If a document appears multiple times in a view, the column values are only calculated once, though the view entry is duplicated multiple times in the view index b-tree. So the index of a view where each document appears three times, is roughly three times the size of a view where each document appears only once -- no significant space saving.

On the down side, view performance is greater than O(n) to the number of view entries -- a view containing three times the number of entries is more than three times slower. So when you do your lookup, the user may have to wait noticeably longer for the combined view to index any recently added/modified documents, than they would for one of the separate views. This would be true even if the performance were linear.

On the up side, the more frequently a view is used, the more likely it is that its index is up to date already when you go to access it. The combined view would be accessed three times as frequently as the single views, so it's more likely to be up to date. If documents are created and modified infrequently, the combined view might be faster on average.

On the down side, the combined view forces you to put all the columns you might need to read as your lookup result, in the same view index. Let's take the extreme situation and suppose that you have a separate "result" column for each type of key. A single document appears three times in the index with a copy of each column, for a total of nine values, plus the key makes twelve. If you had three separate views, each with only one result column, each document stores one key and one result in the index, for a total of six values -- half the size.

Note: I'm not quite clear on how the categorized column is stored with the view entries. Obviously the category values are stored as a separate sorted tree each with its own set of entries, but I think the view entries also contain a copy of the complete value from the category column. If you use the ColumnValues property you can retrieve the complete category value -- multiple values -- so it seems likely that the multiple values are stored multiple times. The amount of storage needed is the square of the number of categories. This obviously makes the combined view even less desirable.

If I'm right about this, any use of categories involves some duplication of data, which is worse if there are fewer documents in a category. If you do your lookups using LotusScript, you can search multiple uncategorized sorted columns by passing an array as your key value -- so you could have a lookup view with separate sorted Item_Desc and Month columns, and this would be a faster view for lookups than a view that's categorized by Item_Desc+"-"+Month. I notice you have two keys that are the same except that one includes a month. You might be able to get away with just two lookup views instead of three, by sorting one by Store_Loc, then Store_Name, then Month, and using an array of either two or three values as your key, depending whether you wanted to search by month.

Again on the down side, with the combined view, you have risk of a conflict between keys of different types. For instance, if you look up a (Item_Desc+"-"+Month) and the resulting value happens to be the same as a (Store_Loc+"-"+Store_Name). You know your data, so you might have some idea of how likely this is. You could make problems less likely by distinguishing the keys in some other way, e.g. with an extra character at the beginning or end ( ("I"+Item_Desc+"-"+Month) : ("L"+Store_Loc+"-"+Store_Name) : ("M"+Month+"-"+Store_Loc+"-"+Store_Name) ). Of course, this will also slow things somewhat since it makes your formula more complex and probably makes the search take a tiny bit longer.

A final disadvantage of the combined view: There's an upper limit to the size of a view index. It's a rather high limit, so you don't have to worry about this unless you have a whole lot of documents, but if that is the case, you can have three times as many documents with separate views, before you run up against this limit.

So overall, it looks like in almost all cases, the combined view is going to be slower than the separate views and cause more potential problems.

Can this possibly lead to a corrupted view?

No, except insofar as you might run up against that index size limit. I'm not sure exactly what happens in that case -- anyone care to comment?

Andre Guirard | 28 November 2007 05:00:00 PM ET | Plymouth, MN, USA | Comments (8)


 Comments

1) View index size
Karl-Henry Martinsson | 11/28/2007 7:39:25 PM

What is the limit for a view index? And is there a total index limit to a database?

2) re: View index size
Andre Guirard | 11/28/2007 10:50:03 PM

I assume the limit is unchanged from 6.5, 130 MB. Anyone have other information?

3) re: View index size
Kerr | 11/29/2007 7:00:14 AM

I played around stress testing views a little while back.

Partly trying to see what happened in java with a call view.getTopLevelEntryCount() where more than 32767 docs exist. (It works fine, there is a documentation error). See: { Link }

Anyway, some of the view indices I was building were way way over 130MB; over 1GB. Or at least that is what the Admin client was saying the space taken up by the view was.

The view was build by indexing documents with 5 multi value fields, each field containing 10 - 100 values. The view had 5 columns, matching each field, each column selected to show multiple documents per value, though not categorized.

Obviously each doc could generate hundreds of thousands of view entries. I had over 10 million entries with out any problem. It took hours to index, but once it was I could look up discrete values in java extremely quickly. Adding new docs with single values in the fields would also update the index pretty quickly.

I've just ramped up some stress testing on a little experiment I've been doing and the view I'm tesing on just indexed at over 200MB. This one is a categorized view, built from docs with a multi value field. I calculate all combinations of the multiple values on a particular doc and those combinations become the categories. So if there are 10 values you end up with 2^10 categories. My current test has 1280 docs in the view and the only limit I'm finding is in the column formula calculating the combinations.

Just to clarify, these are stress tests and investigations, not anything I'd particularly endorse as part of an everyday domino design. ;)

--

Kerr

4) re: View index size
Kerr | 11/29/2007 7:01:00 AM

Oh, these tests were don one 6.5 servers.

--

Kerr

5) re: View index size
Harkpabst Meliantrop | 11/29/2007 4:26:03 PM

I've seen view indexes of many hundreds of megs before (not that those views performed too well). The limit noted in Designer help is definitely not correct (any more?). Ben's list of known Domino limits contains some info why the limit should be 128 GiB.

{ Link }

I would also expect that exploding a string should actually perform better that using list concatenation. At least if Damien Katz' classic blog post "Formula language's dirty secret" still holds true.

{ Link }

6) @explode vs list concatenation:
Ganapathiram Natarajan | 11/30/2007 9:07:49 PM

@5, thanks for the link.

There was a follow-up on Rocky Oliver's blog as well (also check comment #10 in there):

{ Link }

I think the point about @explode was from code maintenance stand point and list concatenation scores over @explode. The best performance scenario would be to compute and save it in the document and in that case performance advantage of @explode may not be noticeable and list concatenation would make sense from code maintainability perspective.

7) Large view indices
Erik Brooks | 12/4/2007 11:43:12 AM

I've had plenty of views (multi-categoried and otherwise) for several years that are > 130 mb, even back in the R5 days. IIRC the documented limit hasn't changed since 4.6.

The biggest problems I see with "one big view" are:

1. Lack of good sorting options

2. Lack of good "restrict single category" functionality

3. NotesViewEntries bugs. E.g. if you iterate through NotesViewEntries on a view with 2 docs in 3 categories, you'll only get TWO view entries. What's worse is they might be the same doc, so you'll never know the other doc even exists.

4. NotesViewNavigator frustrations. To work around #3, you might be tempted to use the NotesViewNavigator class instead. There are various bugs and pitfalls there as well.

8) Larger view indices
Harkpabst Meliantrop | 12/6/2007 1:47:51 AM

I feel tempted to give some more details on the big views, I've been talking about (although it's probably of no particular interest and rather of anecdotical value. In this case a customer (a phone company) managed many of their customer related conversation in mainly one database on Notes 6. This had grown to over 600000 documents and they had implemented multiple views, that showed all documents ...

Need I say more? Well, if there was any good in it, it's the fact that I became 100% convinced that the documented view index limit is not correct.

 Add a Comment
Subject:
   
Name:
Comment:  (No HTML - Links will be converted if prefixed http://)
 
Remember Me?     Cancel

Search this blog 

Disclaimer 

    About IBM Privacy Contact