IBM®
Skip to main content
    Country/region select      Terms of use
 
 
   
     Home      Products      Services & solutions      Support & downloads      My account     

developerWorks  >  Lotus  >  Forums & community  >  Best Practice Makes Perfect

Best Practice Makes Perfect

A collaboration with Domino developers about how to do it and how to get it right in Domino

The saga continues; Johannes writes:

I'm not an expert on performance. It requires a lot spare time for testing and I think it's more fun to code, but I attended a presentation on the subject ... and one of the conclusions I remember was that if you compare the performance of a @Formula agent with a similar agent in LotusScript, the @Formula agent will perform much better. I think it would run twice as fast on large amounts of data, and this was with R4 or R5 and we all know that the @Formula interpreter was totally recoded (and made even more efficient) for R6.

I love this kind of stuff, so I decided to do a test. I took a local database containing 2400 documents that use a particular form ("Pirate"), and a few documents that use a different form. In about 20% of the documents, the field "Job" contains the value "Scurvy Dog". I wrote five agents and ran them each a few times for average timing. Each of these agents does the same task: it locates all the documents that use the "Pirate" form where the Job field contains "Scurvy Dog", and assigns a numeric value to the Ranking field. Each agent assigns the same Ranking value to all the documents it processes, but each one assigns a different value, so that as I run each one in turn, they are actually making a change to the documents. Here are the different approaches I tried:
Agent description / Execution time: Code:
A. The Johannes Method (run on All documents) Document selection: none
9.47s SELECT Form = "Pirate" & Job = "Scurvy Dog";
FIELD Ranking := 5;
B. Formula agent with search (run on All documents) Document selection: ([Form] = "Pirate") and ([Job] = "Scurvy Dog")
5.23s
Note that this was mostly the time to update the full-text index, since the agent that just ran before this one, modified hundreds of documents. Since the FT index is kept pretty up to date, this task would take less time in a more typical situation: fewer documents would need indexing. Also note that if the database is on a server, the server does the indexing task without having to send document data to the client, whereas if the A agent runs on the workstation, it must read all the document data from the server.
SELECT @All;
FIELD Ranking := 4;
C. LotusScript agent with search, StampAll (run on All documents) Document selection: ([Form] = "Pirate") and ([Job] = "Scurvy Dog")
3.91s
Of course, normally StampAll is not going to be useful, because it's seldom that assigning a constant value to one field is all you need to do. But when it works, it's speedy!
Sub Initialize
        Dim session As New notessession
        Dim db As NotesDatabase
        Dim coll As NotesDocumentCollection
        Set db = session.CurrentDatabase
        Set coll = db.UnprocessedDocuments
        Call coll.StampAll("Ranking", 3)
End Sub
D. LotusScript agent with search (run on All documents) Document selection: ([Form] = "Pirate") and ([Job] = "Scurvy Dog")
4.06s
The exact same task as B but in a different language, proving that formula language is really not faster than LotusScript for this task. I would agree it's faster for form actions and the like, though, but in that case the performance difference is generally going to be too small to notice.
Sub Initialize
        Dim session As New notessession
        Dim db As NotesDatabase
        Dim coll As NotesDocumentCollection
        Set db = session.CurrentDatabase
        Set coll = db.UnprocessedDocuments
        Dim doc As NotesDocument
        Set doc = coll.GetFirstDocument()
        Do Until doc Is Nothing
                Call doc.ReplaceItemValue("Ranking", 8)
                Call doc.Save(True, False, True)
                Set doc = coll.GetNextDocument(doc)
        Loop
End Sub
E. LotusScript agent from view (run on None) Document selection: N/A
1.63s
I was surprised this was so quick. I'd been running the agents from that same view, so I tried switching to a different view, and that didn't make a difference in the timing. The view indexing task is a lot faster than full-text indexing.
Sub Initialize
        Dim session As New notessession
        Dim db As NotesDatabase
        Dim coll As NotesDocumentCollection
        Set db = session.CurrentDatabase
        Dim vu As NotesView
        Set vu = db.GetView("PiratesByJob")
        Set coll = vu.GetAllDocumentsByKey("Scurvy Dog")
        Dim doc As NotesDocument
        Set doc = coll.GetFirstDocument()
        Do Until doc Is Nothing
                Call doc.ReplaceItemValue("Ranking", 12)
                Call doc.Save(True, False, True)
                Set doc = coll.GetNextDocument(doc)
        Loop
End Sub

I emphasize, I think it's best to go with the simplest approach if performance is not an issue, but in cases where it does matter, knowing about these different approaches and their relative merits can be the difference between meeting your design goal and not.

Andre Guirard | 12 April 2007 04:48:29 PM ET | Plymouth, MN, USA | Comments (15)


 Comments

1) I’m surprised...
Nathan T. Freeman | 4/12/2007 10:26:40 PM

...Andre, that you were surprised by that last one. I would expect a .getAllDocumentsByKey to be considerably quicker than a from-scratch FT search, where the view's index was already built. The categorization process in a view is going to force optimization of the index in the first place. After that, it's just getting the noteTable of that part of the tree.

If I have time in the near future, I'll try some of this out on some >100000 doc NSFs that I have around. (I have some 1000000 doc ones, but at that point, prepping the test ends up taking too much time.)

Some other interesting comparisons against very large sets might include...

1) using your long standing current/next document swap process to conserve memory.

2) running all processes from a client AGAINST a server vs. having the server perform them locally via .runOnServer.

3) comparing similar processes using Java

4) running one in Notes 8 standard vs. Notes 8 basic

5) using StampAll against a NotesViewEntryCollection instead of a doc collection

6) using a DXL pipeline to change the data (just 'cause it'd be interesting)

7) build an arbitrary NotesNoteCollection and try from there -- again, just 'cause.

2) A man after my own heart!
Kendall | 4/12/2007 10:32:33 PM

I salute you and your table of timing comparisons! I had to optimize some web-based Lotusscript searches thatused custom sorting (i.e., the few built-in options in FTSearch were useless to me). So I did something similar, testing various parameters to tune each part of the process, keeping stats in a table as I narrowed it down to the solution I went with.

To avoid network hiccups skewing my data, I set up several local databases, used Island mode, deleted my local cache.ndk after each search, etc. (That might have been overkill, but I was trying to focus on the things I could control; e.g., I didn't bother printing results because the result info I'd be printing wasn't one of the variables.)

I tried various ways of doing FTSearch (FTSearch on database, view, viewentrycollection, et al.), various ways of getting the result info to print to the browser (mostly involving columnvalues, but how you get the columnvalues affects performance), and of course various sorting algorithms (and more than one implementation for some of them). ;-) Speed was the most important issue, but differences in overhead was a variable I hadn't planned on, so I wound up optimizing for a smaller result set, versus a larger one. (I think in this case, smaller was like 0-2000 hits.)

Occasionally the results surprised me. BTW if you could add a sorting option to sort like the view.... ;-)

3) p.s. I mean...
Kendall | 4/12/2007 10:35:45 PM

Sorry, that last sentence--I was talking about FTSearch. It's a drag that one can't do FTSearch to sort like a view (esp. since the client has this feature). ;-) Okay, I'll be quiet now.

4) Agent error messge: No documents found
Markus Koller | 4/13/2007 4:18:46 AM

The selection formula approach is something that each developer should consider more. It's just too easy to run on all or all unprocessed docs in script, rather than think twice and enter a good selection formula.

What I don't like though is the error message I get on the console from the agent manager, when the selection formula matches no documents. If I write a handy garbage collection agent that deletes some temporary docs every night, or every hour, I don't want to see those errors in the log...

5) To be fair, I think you need to compare #1 with db.Search
Jörg Asmussen | 4/13/2007 6:15:15 AM

All other tests use a prebuild index to help. Sometimes FT-indexes are out of question due to policies. Prebuild views seem to be a good alternative though.

Nathan: Other variables for testing could be the dependency of large attachments.

Markus: you mean FT-Search formula? I often wish it would be possible to designate a specific view to agents of the type: Run on all documents in view. The current limit only makes this type of agents usefull for UI, not for background processing.

6) questions/observations on db.Search, UnprocessedDocuments, and document selection
Charles Robinson | 4/13/2007 10:29:47 AM

I remember reading on Notes.Net that using db.Search({criteria}, Nothing, 0) was much less efficient than using db.Search({criteria},SomeDate,0). Specifying something for the date -- even one far into the future -- reduced search times by some incredible amount.

I asked this on the original thread, and I'm bringing it up again because no one responded. In my experience the UnprocessedDocuments flag for an agent gets reset when the agent is saved. For example, run an agent that acts on UnprocessedDocuments, edit it and re-save. It will process the same documents again. Was I doing something wrong, has it changed in more recent versions, or is that working as designed?

Finally, in most cases that I would use a document selection formula in an agent I need something like "[Job] Does Not Contain "" ". How do you specify you only want to include documents that have any value in a field? I usually end up with an agent that runs on all documents and something like this:

SELECT Form = "Pirate" & Job != ""

Inefficient, I know, but it's the only way I know of to get the documents I need.

7) Questions about D - LotusScript
Mike Amberg | 4/13/2007 11:53:51 AM

Hi

I had a couple of questions ...

Unless I am wrong in Option "D" you stampled all the documents in the database even those that were not form "Pirate"

Is there something i'm missing?

Also would the following be faster or the same?

Lastly I really have to find @6's comment about ... using "nothing" versus an old date

Sub newsearch

Dim session As New notessession

Dim db As NotesDatabase

Dim coll As NotesDocumentCollection

Dim doc As NotesDocument

Dim searchFormula As String

Set db = session.CurrentDatabase

searchFormula = |Form="Pirate" & Job = "Scurvy Dog"|

Set coll = db.Search(searchFormula$,Nothing,0)

Set doc = coll.GetFirstDocument

While Not (doc Is Nothing)

Call doc.ReplaceItemValue("Ranking", 8)

Call doc.Save(True, False, True)

Set doc= coll.GetNextDocument(doc)

Wend

End Sub

8) about D
Charles Robinson | 4/13/2007 2:20:37 PM

Mike, it's not stamping everything in the database because he's using a GetAllDocumentsByKey call to get his document collection.

Set vu = db.GetView("PiratesByJob")

Set coll = vu.GetAllDocumentsByKey("Scurvy Dog")

I'm assuming the view's selection formula restricts it to the Pirate form.

I think your example would be slower because Andre is working against a view index until he starts walking the document collection. Using db.Search means touching all the documents to see if they meet your one-off criteria. Although, with beefy enough hardware, you won't even notice. :-P

9) About StampAll (Option "C") - answer to Mike’s first question
Thomas Bahn | 4/15/2007 4:54:44 AM

@7: Andre used StampAll in option "C", not "D". And: No, not all documents are stamped, because of the document selection formula: ([Form] = "Pirate") and ([Job] = "Scurvy Dog")

And I really hope, your approach with db.Search is about the same performance as option "D", since I use it, too, if not view is usable. I prefer to use db.Search instead of the document selection formula, since everything is visible in one place (the code), instead of two.

@8: Andre used GetAllDocumentsByKey in option "E", not "D".

@Andre: Did you shuffle the table in the mean time? ;-)

Thomas Bahn

tbahn@assono.de

{ Link }

10) re: About StampAll (Option "C") - answer to Mike’s first question
Andre Guirard | 4/15/2007 3:38:33 PM

> I really hope, your approach with db.Search is about the same performance as option "D"...

I was expecting it to be more like A, since it has to look at each document, but actually it was pretty close to "D".

I think a lot of the time for the execution of A must be due to display updates -- macro agents display a progress bar whereas LotusScript agents don't. To get a better idea of the real performance, I think we need to have a much larger database and run the agents on a server.

> @Andre: Did you shuffle the table in the mean time? ;-)

Nope.

11) Re #6: Nothing vs. VeryOldDate
Jörg Asmussen | 4/21/2007 5:47:35 AM

This is strange. I heard the opposite. The reasoning was that using a date vs. Nothing would actually perform something like 2 searches, one for the date and one for the {criteria}.

As a result of that, I've been removing hundreds of hardcoded dates with Nothing. I hope you are wrong ... :-)

12) Pandora Box opened
Bruno Grange - brunog[at]br[dot]ibm[dot]com | 6/18/2007 9:29:07 AM

Hi Andre

This is what a call Pandora Box!

A fried of mine, - Alessandro Mariscal -, wrote a RedBook about this issue. The link is here: { Link }

Take a look at the appendix C-4! They give some details about it...

Update All Documents

We started out by updating all 500 documents in the “500” database. Our

first test was running a LotusScript agent. It was surprisingly slow (about

200 seconds), so we ran an @formula agent and found that it did the same

work in about 30 seconds!

Regards!

13) An IBM TechNote on the Issue [#1110222]
Gerry | 9/9/2007 2:59:01 PM

Performance and usability issues when using LotusScript vs. @formulas

{ Link }

Just look for the section entitled "Working with many documents (agents", it goes over the following:

- Update all documents

- Update all documents (30 fields)

- Update a subset of all documents

They did a lot of variations on those themes, some good reading. It would be nice to see all these tests run again on a modern box with an updated version of Notes (they used N4).

Anyway, just thought I would add the link. Hope people find it useful.

14) StampAll Method
Giri | 6/11/2009 2:39:45 AM

i want to know the stampall method.Here for not working .My requirment is to delete the old document and enter the new documents from the excel sheet.if old document not yet show the error "no documents".After creating the documents from the excel then delete the old document. we can do stampall and if any another method please let me know.

I hope u can replay as soon as possible.

15) Re: StampAll
Andre Guirard | 6/13/2009 7:13:39 PM

@Giri, your comment is off topic for the thread, but the StampAll method is not really useful for the application you describe. Also, I urge you not to delete all the documents and create new ones. Just update the documents you have with a replication algorithm such as you can find in http://www-10.lotus.com/ldd/sandbox.nsf/ecc552f1ab6e46e4852568a90055c4cd/96d6da0e9224105485256fa800506d46?OpenDocument (5. Replicate Pirates agent)

 Add a Comment
Subject:
   
Name:
Comment:  (No HTML - Links will be converted if prefixed http://)
 
Remember Me?     Cancel

Search this blog 

Disclaimer 

    About IBM Privacy Contact