IBM®
Skip to main content
    Country/region select      Terms of use
 
 
   
     Home      Products      Services & solutions      Support & downloads      My account     

developerWorks  >  Lotus  >  Forums & community  >  Best Practice Makes Perfect

Best Practice Makes Perfect

A collaboration with Domino developers about how to do it and how to get it right in Domino

(Note: edited to correct some errors)

One problem we hear about now and then, is documents that were deleted long ago, suddenly making a reappearance. It’s no secret where these documents are coming from; somebody had a replica of the database, which they hadn’t replicated in quite a while. When they did replicate it, rather than deleting from their replica the documents that were deleted from your server, it re-created the old documents on your server.

How does this happen? When you delete a document, Notes creates a “deletion stub” to record the fact of the deletion. The deletion stub has the same UNID as the deleted document. When Notes replicates, if a deletion stub’s UNID matches the UNID of a document in the other replica, that document is replaced by a deletion stub also.

However, deletion stubs expire. The expiration time is based on the number of days setting in the replication settings – the one that says “Remove documents that have not been modified in xx days” where xx is 90 by default. Even if you don’t check the box to remove documents, this number has an effect. The replication engine won’t look for documents older than that if this is not an initial replication, and it will age out replication stubs in 1 1/3 times that number of days – so, 120 days by default (not exact, could be longer).

So, if you delete some documents and you haven’t managed to replicate with every replica in 90 days, the documents could come back. They wouldn’t ordinarily, because the 90-day cutoff should also exclude documents from consideration if they haven’t been modified in that time. But first, perhaps they have been modified, and second, this might be an initial replication – the user might have cleared their replication history, or be using a server they never replicated with before, or have gotten an old file-copy of the nsf that has never been replicated with any server. (Incidentally, this is a danger of distributing databases on CD – it could be years later that someone installs the CD and tries to replicate).

Now, it might occur to you to wonder why Notes does it that way, if it causes a problem. The difficulty is that in some cases it is the desired behavior. If you really are creating a new replica, you usually want to get all the documents, not just those that were edited recently. If you have deliberately removed deletion stubs by setting the cutoff to zero, then back to some reasonable number, you did it because you want to restore old documents that were deleted by accident.

So, the replication behavior is unlikely to change. You just have to be aware of it and deal with any problem that may arise.
What do you do to prevent problems? To begin with, in cases where you think there might be old copies hanging around, you could change the “Remove documents” setting to a much higher number than the default 90, to make deletion stubs hang around longer. Or you could tick the box to remove old documents, if the application is such as to make this practical.

Setting the number of days to a high number can get cumbersome if there are large numbers of deletions, because the deletion stubs do take up some space and have some performance impact. As a rule, though, you should not have large numbers of deletions. I think I’ve written elsewhere about the Very Bad Implementation of synchronization with an outside data source, where all the documents in the database are deleted at regular intervals and then replaced with brand-new documents from the source data. Don’t do that.

In considering other measures, let's think about how the situation can arise.

a)        An old backup copy of the application has been restored onto a server, and it replicates to other servers.

b)        A user has a computer they use frequently, with a local replica that hasn’t been replicated in a while because of the replication settings for that replica, or because they have unticked it from the replication list.

c)        A user has a spare computer they use very infrequently, containing a local replica of the application which replicates automatically when they start Notes on that computer.

d)        Have I missed any likely scenario?

Case [a] is the easiest to address, because it all happens under the control of a database administrator. If it wasn’t the intention to restore deleted documents, you can write a little script to compare two databases and see whether one contains documents that don’t occur in the other and whose LastModified dates are fairly old. In fact, I think I’ll add this to my “to do” list of things to add to the Developer’s Friend application. While you’re doing this synchronization, of course, you need to temporarily disable replication of the backup copy so that no leakage occurs.

Case [b] can also be addressed with a little scripting. This would involve designing the application to notice when it was last replicated and adjust its own replication settings and/or nag the user to replicate, in the extreme case booting them out if the replication wasn’t recent enough, well before deletion stub expiration occurs. There need to be two levels of warning – one saying “Please replicate” and another saying “The database is too old; do not replicate.” In the latter case we might even helpfully delete the database for them – I think there’s a way to do that. They are fairly unlikely to start replicating the database again without opening it, but if this does occur, it is the same as case [c].

Incidentally, the easiest way I can think of to determine how long since replication happened, is to have a special document that the server modifies at regular intervals – with a scheduled agent, say – and the database Postopen code can find this document in the local replica and confirm that its last modified date is fairly recent. The NotesReplication and NotesReplicationEntry classes (or DXL) can be used to check the replication settings of the database, but bear in mind that just because a database is set to replicate the right notes, doesn’t mean that it does in fact replicate – this is controlled by replication lists, based on location, stored outside of the database, and I don’t know offhand how to check those.

As regards scenario [c], this is the most difficult to deal with, since there’s no administrator controlling it and the user can’t be expected to know. If we assume the deleted documents were not modified in the local replica, then the replication history should still be valid, and the old documents shouldn’t be selected as replication candidates – they won’t be deleted from the local replica, but at least they won’t show up on the server. However, it’s also the case that documents created on the server longer ago than the replication cutoff, will not be received in the local replica. The user can get these documents by clearing the replication history to get a new “initial” replication, but this also will make their old copies of deleted documents reappear on the server. Users are unlikely to discover that they need to clear replication history unless they actually open the database and see what documents are (not) in it, so database Postopen code might also be helpful here. But this is a harder situation to detect, because a replication has recently occurred. We have no direct way to tell that the replication failed to touch all documents – we could only see this by actually comparing the UNIDs of documents in the server and local replicas. This not only takes time, but may not be possible to do automatically since the server may not be available when the database is next opened – the user might be offline.

In this case, it becomes a matter of user education. End-users need to be made aware that if they clear replication history they might cause old documents to reappear on the server, and that it’s better in such cases to delete the local replica and start over with a new one.

I think I will suggest rewording the confirmation when deleting replication history entries, to mention this possibility. That might save us a few service calls.

Finally, if all preventive measures fail and new copies of deleted documents do reappear, we would like to have something to do about it – some way to identify them automatically and delete them. The distinguishing characteristic of a resurrected document is that its “Last modified” date is considerably older than the “Last modified in this replica” – so it should be possible to write an agent to find all these documents and corral them all in a folder for administrator action (or just delete them, if you’re really sure of yourself). That is, it should be possible except that the “Last modified in this replica” isn’t available as a document property in LotusScript or formula language. This is a case where a C or C++ API program might come in handy – or you can use DXL to export all documents meeting certain criteria and parse the DXL – or perhaps one of my readers knows a clever way to find the relevant documents? I know when you do a NotesDatabase.Search, you can specify a cutoff date/time, but I’m not sure which header value this applies to.

Andre Guirard | 5 February 2008 05:00:00 AM ET | Man-cave, Plymouth, MN, USA | Comments (20)


 Comments

1) Difficult workarounds
Peter von Stöckel | 2/5/2008 6:48:14 AM

The numbers you talk about (30, 45 and 1.5) are not the ones I learned in the Domino classes I took, but never mind.

The problem with creating workarounds for the problem is mainly that the user might not even open the database. Let's take the user that is on parental leave for 6 months. He/she comes back to work and starts the PC. Notes replicates, the damage is done, and the user haven't even opened the mail file yet...

The database QueryOpen/PostOpen only fires if you open the database. If you use doclinks or open views directly, there will never be any database events happening.

If you schedule a local agent, it's unlikely that it will start before the damage is done either.

So, it's still a problem, but one I can live with.

2) I think it was 8 years ago...
Nathan T. Freeman | 2/5/2008 7:04:12 AM

...that I floated the idea of QueryReplicate and PostReplicate events in a database. Or QueryReplicate and PostReplicate agent settings. Either one is still a great idea.

3) IdeaJam
Nathan T. Freeman | 2/5/2008 7:20:49 AM

{ Link }

4) re: Difficult workarounds
Andre Guirard | 2/5/2008 9:01:00 AM

I agree that some programmability around this would be a good idea. One can do this with the DSAPI, and deal with all databases at once, but it's a bit cumbersome to deploy since it's not in an NSF.

In case of someone who goes away for a while and then comes back to turn on their computer and replicate, there is not automatically going to be a problem, however. The replication cutoff (whether it's 30 or 90 days or whatever :-) ) would keep their old documents from reappearing on the server, unless they clear replication history, or replicate with a new server that they never did before.

5) Tool to help cleanup
Jens Olesen | 2/5/2008 9:24:26 AM

To solve the problem after the old documents has occured, i use a tool called ScanEz from Ytria. This tool has a Auditor, that run through all documents and then list potential resurrection documents etc.

I know this does not address the initial case of avoiding it ever to happen, but it has saved us countless hours of cleanup

6) Another cleanup suggestion
Jan den Otter | 2/5/2008 2:29:54 PM

We've had our share of documents from the dead in our Domino directory. So we have backup copies ready for the last 7 days and an agent that can check old documents in the directory to one of those backups (depending on when the docs where back in the database), it will flag the docs that are old but where not in the backup database, so that we can delete them (again). You can off course also restore from the normal backup, but keeping backups nearby was helpful for our Domino directory anyway.

If interested I can post the code somewhere.

7) "LastModified in this file" is available in Lotusscript
Werner Goetz | 2/5/2008 2:33:32 PM

see { Link }

So you can compare Evaluate("@Modified") against doc.LastModified - but be careful, doc.LastModified is always at least the date when the replica was created ...

8) Domino Directory
James | 2/5/2008 10:12:12 PM

Please clarify something. If a user has only Reader access to the Domino Directory, they can't "undeleted" documents to the server's replica. Right?

9) I use a view with @accessed
Peter Herrmann | 2/5/2008 10:51:28 PM

In the suspect db instance, you can list all docs sorted by @Accessed (or Simple Function "Last Read or Edited"). This gives the date/time that each document was placed into the database. You can review these and it's easier to discern the rogue docs - they will have come in at the same time within a few seconds of each other.

To determine who did it, use Admin Client > Files (then right click the db) > Analyze and select "Changes in Data Documents" & "User Writes". In the resulting report, look for go to the "+ Activity" event type category and look around the same date time as you found in the @accessed view.

10) Wasn’t that a bug?
Nathan T. Freeman | 2/5/2008 10:58:46 PM

@7 - I always thought that was a bug in the R6.x codestream. Is it still the case?

That kinda cool if it is.

11) Thomas Jona
| 2/6/2008 1:55:43 AM

Even if it sometimes is a desired behaviour that deleted documents are re-appearing there should also be a way to prevent this from happening as in my experience this is the exception, i.e. in most cases re-appearing documents is a problem and not intended so the default behaviour of the replication process should be to prevent this from happening.

12) The default behaviour should be to prevent deleted documents from reappearing
Thomas Jona | 2/6/2008 1:57:30 AM

Even if it sometimes is a desired behaviour that deleted documents are re-appearing there should also be a way to prevent this from happening as in my experience this is the exception, i.e. in most cases reappearing documents is a problem and not intended so the default behaviour of the replication process should be to prevent this from happening.

13) Find resurrected documents
Andre | 2/6/2008 6:27:18 AM

Jens, thanks for pointing to the Post Replication Auditor in scanEZ. I just thought I would link to the Solution you mentioned. { Link }

Andre Guirard adds: I didn't post this; if your name is Andre and you post, please use your last name also. Thanks.

14) Untitled
Werner Goetz | 2/6/2008 6:48:28 AM

@10 - It's still in R7.

15) LastModified in R7
Werner Goetz | 2/6/2008 6:49:55 AM

@10 - In R7 NotesDocument.LastModified still delivers the "last modified in this file" property.

16) re: Domino Directory
Andre Guirard | 2/6/2008 9:00:07 AM

There's no way to create documents in a server database in which you have no access to create documents, including by replication.

17) On OpenNtf, Free tool to find Old documents pushed back by replication
JYR | 2/8/2008 12:18:36 AM

Hi Andre,

You mentionned : perhaps one of my readers knows a clever way to find the relevant documents?

I think you have your answer

http://www.openntf.org/Projects/codebin/codebin.nsf/0/300F25985BCB5CA38625737900608E54

Description:

Tool to find Old documents or deleted documents pushed back to server by replication

This db allows you to find the Added to file date of all person documents in your NAB.

The search is done against the Mail Users views.

In your search , you have to specify which mail servers (From your mail users view) to look at.

Find Old documents or deleted documents pushed back to server by replication

Deleted documents are reappearing after replication

http://www-1.ibm.com/support/docview.wss?rs=0&uid=swg21098733

It's possible to find them by script with the AddedToThisFile API.

Q&As about replication purge intervals and cutoff dates

http://www-1.ibm.com/support/docview.wss?rs=475&context=SSKTWP&context=SSKTMJ&dc=DB520&q1=documents+and+cut-off&uid=swg21110117&loc=en_US&cs=utf-8&lang=en

How to track down where replication changes originate

http://www-1.ibm.com/support/docview.wss?rs=475&context=SSKTWP&context=SSKTMJ&dc=DB520&q1=documents+and+replication+and+old&uid=swg21225071&loc=en_US&cs=utf-8&lang=en

http://www-10.lotus.com/ldd/nd6forum.nsf/55c38d716d632d9b8525689b005ba1c0/1acb01c8dc57378785257377002dfd5f?OpenDocument

You can reuse this code to search other dbs, or other types of document sin the NAB (server documents, holiday documents, etc)

18) number 13
Andre Hausberger | 2/12/2008 6:12:26 AM

@ Andre Guirard - comment 13 was from me. I work for Ytria. I thought it would add value to this discussion..... I only realized it could add to confusion because of our identical first names after I had posted - sorry..

19) The documents that WOULD NOT DIE!
Juraj Melis | 9/19/2008 12:05:05 PM

Hi,

in order to solve this problem, I focused on tracking users.

I retrieved all users in all access groups related to the current database and the create a user document for each user.

(notes id as key).

Then I have created the 2 agents:

1. LastAccess Agent - tracks when users accessed the server last time (entries retrieved from server log.nsf - User Activity | Session form) and update each user document with date and time of the last access and schedule him to run every day.

2. Replication Agent - tracks when user replicates last time (entries retrieved from their local log.nsf - Replication form) and update particular user document. But unfortunately, agent is triggered on database open.

Sub GetLog()

' =========================================================================

' Retrieve all User Activity (Session) documents (created by Session form) from server Log.nsf

' related to current database for number of days specified in profile document. Then create a

' new user doc (if doesn't exist) or modify existed one with informations about last access (date,

' time)

' GetLog : No Return value

' =========================================================================

Dim lastUpdateDate As New NotesDateTime (Today)

srv = profDoc.PeopleSrvName(0)

path = profDoc.OppDbName(0)

logDays = 0 - profDoc.LogSearchDays(0)

Set dbLog = session.GetDatabase (srv, "log")

Print "Log database has opened succesfully !"

searchFormula = {Form = "Session" & @IsMember ("} & path & {"; Pathname)}

Call lastUpdateDate.AdjustDay(logDays)

Set collLog = dbLog.Search (searchFormula, lastUpdateDate, 0)

collLogNbr = collLog.Count

Print "Number of log access entries: " + Cstr(collLogNbr)

Set docLog = collLog.GetFirstDocument

While Not (docLog Is Nothing)

Set docUser = GetUser(docLog.UserName(0))

If (docUser Is Nothing) Then

Set docUser = CreateDocLog ( docLog.UserName(0), docLog)

Else

updRepUsr = updRepUsr + 1

Call UpdateDocLog ( docUser, docLog)

End If

docLogNbr = docLogNbr +1

Print "Progress : " & Cstr(Round(( docLogNbr / collLogNbr), 2)*100) & "% completed !"

Set docLog = collLog.GetNextDocument(docLog)

Wend

End Sub

Sub GetRep()

' =========================================================================

' Retrieve all replication documents (created by Replication form) from local Log.nsf related to

' current database for last 3 days (including today). Then create a new user doc on local (if doesn't

' exist) or modify existed one with informations about last replication. (date, time).

' GetRep: No Return Value

' =========================================================================

Dim lastUpdateDate As New NotesDateTime (Today)

Dim location As String

location = dbCurrent.Server

If (location ="") Then

usrName = session.UserName ' User Name of the current session

pathLocal = dbCurrent.FileName ' File Name of the current database

srv = profDoc.PeopleSrvName(0) ' Server where current database is running

Set dbLog = session.GetDatabase ("", "log") ' Get log.nsf on local

Print "Log database has opened succesfully !"

searchFormula = {Form = "Replication" & @IsMember ("} & srv & {"; SourceServer) & @Contains (Body;"} & pathLocal & {") & @IsMember ("} & usrName & {"; Server)}

Call lastUpdateDate.AdjustDay(-3) ' We are working on log documents created today, yesterday and day before yesterday

Set collLog = dbLog.Search (searchFormula, lastUpdateDate, 0)

Set docLog = collLog.GetLastDocument

While Not (docLog Is Nothing)

Set docUser = GetUser(docLog.Server(0))

If (docUser Is Nothing) Then

Set docUser = CreateDocRep ( docLog.Server(0), docLog)

Else

Call UpdateDocRep( docUser, docLog)

End If

Set docLog = collLog.GetNextDocument(docLog)

Wend

End If

End Sub

20) A view sorted by NoteID can help
Norbert Goeth | 7/28/2009 3:41:36 PM

Reappearing documents can be found in a view sorted by NoteID. Because they are recreated by a replication during a small timeframe you can identify them as the documents having directly ascending NoteIDs, but typically Creation-dates which do not match the ascending order of the surrounding Note-IDs of all other documents.

In each database I have such a view.

21) deleted documents are getting recreated by Compact
Srinivasa Tenkasala | 10/11/2012 9:35:13 AM

Well. I have read all the scenarios posted in this link and tried most of them. Ours is a huge lotus notes database( >15 GB) with >2000 users replicating and also there are agents which frequently ( every 10 mins) updates the documents in the db. We had a similar issue where deleted documents started reappearing. To prevent this we increased the purge interval to 1095 days, so that to prevent creation of deleted documents by some users who do not replicate regularly. This could not help and documents kept reappearing in regular interval. The creation of deleted documents usually happens on a Monday. and compact job is scheduled to run on every Sat/Sunday.

As it was not possible to identify the users nor to prevent we decided to create a new database copy( Yes very hard). We asked all the users to delete the existing local replicas and and create new ones. After this we had a sigh of relief that for another at least 3 years( as purge interval is set to 1095)we will not encounter this issue.

STRANGELY, documents getting reappeared, that's too with a user name who do not even had a local replica on his machine. This happens now on a MONDAY!!. Hence we came to a conclusion that it is the compact job thats doing mischief( however there are no evidences right now). Any help/suggestions are welcome to resolve this issue

 Add a Comment
Subject:
   
Name:
Comment:  (No HTML - Links will be converted if prefixed http://)
 
Remember Me?     Cancel

Search this blog 

Disclaimer 

    About IBM Privacy Contact