IBM®
Skip to main content
    Country/region select      Terms of use
 
 
   
     Home      Products      Services & solutions      Support & downloads      My account     

developerWorks  >  Lotus  >  Forums & community  >  Best Practice Makes Perfect

Best Practice Makes Perfect

A collaboration with Domino developers about how to do it and how to get it right in Domino

(Note: edited to correct some errors)

One problem we hear about now and then, is documents that were deleted long ago, suddenly making a reappearance. It’s no secret where these documents are coming from; somebody had a replica of the database, which they hadn’t replicated in quite a while. When they did replicate it, rather than deleting from their replica the documents that were deleted from your server, it re-created the old documents on your server.

How does this happen? When you delete a document, Notes creates a “deletion stub” to record the fact of the deletion. The deletion stub has the same UNID as the deleted document. When Notes replicates, if a deletion stub’s UNID matches the UNID of a document in the other replica, that document is replaced by a deletion stub also.

However, deletion stubs expire. The expiration time is based on the number of days setting in the replication settings – the one that says “Remove documents that have not been modified in xx days” where xx is 90 by default. Even if you don’t check the box to remove documents, this number has an effect. The replication engine won’t look for documents older than that if this is not an initial replication, and it will age out replication stubs in 1 1/3 times that number of days – so, 120 days by default (not exact, could be longer).

So, if you delete some documents and you haven’t managed to replicate with every replica in 90 days, the documents could come back. They wouldn’t ordinarily, because the 90-day cutoff should also exclude documents from consideration if they haven’t been modified in that time. But first, perhaps they have been modified, and second, this might be an initial replication – the user might have cleared their replication history, or be using a server they never replicated with before, or have gotten an old file-copy of the nsf that has never been replicated with any server. (Incidentally, this is a danger of distributing databases on CD – it could be years later that someone installs the CD and tries to replicate).

Now, it might occur to you to wonder why Notes does it that way, if it causes a problem. The difficulty is that in some cases it is the desired behavior. If you really are creating a new replica, you usually want to get all the documents, not just those that were edited recently. If you have deliberately removed deletion stubs by setting the cutoff to zero, then back to some reasonable number, you did it because you want to restore old documents that were deleted by accident.

So, the replication behavior is unlikely to change. You just have to be aware of it and deal with any problem that may arise.
What do you do to prevent problems? To begin with, in cases where you think there might be old copies hanging around, you could change the “Remove documents” setting to a much higher number than the default 90, to make deletion stubs hang around longer. Or you could tick the box to remove old documents, if the application is such as to make this practical.

Setting the number of days to a high number can get cumbersome if there are large numbers of deletions, because the deletion stubs do take up some space and have some performance impact. As a rule, though, you should not have large numbers of deletions. I think I’ve written elsewhere about the Very Bad Implementation of synchronization with an outside data source, where all the documents in the database are deleted at regular intervals and then replaced with brand-new documents from the source data. Don’t do that.

In considering other measures, let's think about how the situation can arise.

a)        An old backup copy of the application has been restored onto a server, and it replicates to other servers.

b)        A user has a computer they use frequently, with a local replica that hasn’t been replicated in a while because of the replication settings for that replica, or because they have unticked it from the replication list.

c)        A user has a spare computer they use very infrequently, containing a local replica of the application which replicates automatically when they start Notes on that computer.

d)        Have I missed any likely scenario?

Case [a] is the easiest to address, because it all happens under the control of a database administrator. If it wasn’t the intention to restore deleted documents, you can write a little script to compare two databases and see whether one contains documents that don’t occur in the other and whose LastModified dates are fairly old. In fact, I think I’ll add this to my “to do” list of things to add to the Developer’s Friend application. While you’re doing this synchronization, of course, you need to temporarily disable replication of the backup copy so that no leakage occurs.

Case [b] can also be addressed with a little scripting. This would involve designing the application to notice when it was last replicated and adjust its own replication settings and/or nag the user to replicate, in the extreme case booting them out if the replication wasn’t recent enough, well before deletion stub expiration occurs. There need to be two levels of warning – one saying “Please replicate” and another saying “The database is too old; do not replicate.” In the latter case we might even helpfully delete the database for them – I think there’s a way to do that. They are fairly unlikely to start replicating the database again without opening it, but if this does occur, it is the same as case [c].

Incidentally, the easiest way I can think of to determine how long since replication happened, is to have a special document that the server modifies at regular intervals – with a scheduled agent, say – and the database Postopen code can find this document in the local replica and confirm that its last modified date is fairly recent. The NotesReplication and NotesReplicationEntry classes (or DXL) can be used to check the replication settings of the database, but bear in mind that just because a database is set to replicate the right notes, doesn’t mean that it does in fact replicate – this is controlled by replication lists, based on location, stored outside of the database, and I don’t know offhand how to check those.

As regards scenario [c], this is the most difficult to deal with, since there’s no administrator controlling it and the user can’t be expected to know. If we assume the deleted documents were not modified in the local replica, then the replication history should still be valid, and the old documents shouldn’t be selected as replication candidates – they won’t be deleted from the local replica, but at least they won’t show up on the server. However, it’s also the case that documents created on the server longer ago than the replication cutoff, will not be received in the local replica. The user can get these documents by clearing the replication history to get a new “initial” replication, but this also will make their old copies of deleted documents reappear on the server. Users are unlikely to discover that they need to clear replication history unless they actually open the database and see what documents are (not) in it, so database Postopen code might also be helpful here. But this is a harder situation to detect, because a replication has recently occurred. We have no direct way to tell that the replication failed to touch all documents – we could only see this by actually comparing the UNIDs of documents in the server and local replicas. This not only takes time, but may not be possible to do automatically since the server may not be available when the database is next opened – the user might be offline.

In this case, it becomes a matter of user education. End-users need to be made aware that if they clear replication history they might cause old documents to reappear on the server, and that it’s better in such cases to delete the local replica and start over with a new one.

I think I will suggest rewording the confirmation when deleting replication history entries, to mention this possibility. That might save us a few service calls.

Finally, if all preventive measures fail and new copies of deleted documents do reappear, we would like to have something to do about it – some way to identify them automatically and delete them. The distinguishing characteristic of a resurrected document is that its “Last modified” date is considerably older than the “Last modified in this replica” – so it should be possible to write an agent to find all these documents and corral them all in a folder for administrator action (or just delete them, if you’re really sure of yourself). That is, it should be possible except that the “Last modified in this replica” isn’t available as a document property in LotusScript or formula language. This is a case where a C or C++ API program might come in handy – or you can use DXL to export all documents meeting certain criteria and parse the DXL – or perhaps one of my readers knows a clever way to find the relevant documents? I know when you do a NotesDatabase.Search, you can specify a cutoff date/time, but I’m not sure which header value this applies to.

Andre Guirard | 5 February 2008 05:00:00 AM ET | Man-cave, Plymouth, MN, USA | Comments (21)

Search this blog 

Disclaimer 

    About IBM Privacy Contact