Skip to main content link. Accesskey S
  • Translate Page ▼
  • Anonymous
  • Log on
  • Help
  • IBM logo
  • Lotus Notes and Domino Application Development wiki
  • All Wikis
  • Home
  • Community Articles
  • Product Documentation
  • Learning Center


Search

Advanced Search

Categories

Tag Cloud

  • 6.0
  • 6.5
  • 8.0
  • 8.5
  • 8.5.1
  • 8.5.2
  • 8.5.3
  • action bar
  • Agents
  • Ajax
  • app dev
  • beginner
  • C&S
  • calendaring and scheduling
  • client
  • composite applications
  • Controls
  • converters
  • css
  • Custom controls
  • Data Binding
  • db2
  • design elements
  • dialog boxes
  • Documents
  • Dojo
  • Domino
  • Domino Designer
  • Domino Designer 8.5
  • DXL
  • Eclipse
  • error handling
  • errors
  • extensions
  • FAQ
  • Forms
  • formulas
  • getting started
  • globalization
  • Help
  • html
  • Installation
  • interface
  • internationalization
  • Java
  • JavaScript
  • JSF
  • localization
  • Lotus Domino Designer
  • LotusScript
  • LotusSphere
  • LotusTechInfo
  • menu bar
  • Mobile
  • new user
  • Notes
  • Notes 8
  • notes.ini
  • NSD
  • OpenNTF
  • partial update
  • performance
  • Pickers
  • Portal
  • presentations
  • programming
  • Redbooks
  • Requested Articles
  • roadmap
  • rooms and resources
  • samples
  • Scripting
  • security
  • tabs
  • templates
  • themes
  • Tips
  • toolbar
  • troubleshooting
  • tutorials
  • validation
  • variables
  • video
  • VideoFest
  • View
  • view control
  • ViewPanel
  • Views
  • web
  • Web apps
  • Web services
  • webdev
  • x-pages
  • XML
  • Xpage
  • XPages
  • XPages Extensibility API
  • XPages Extension Library
  • xsp-config
  • データソース
InformationInformation
You are currently viewing machine translated content. IBM translation might be available. Click IBM Translated Product Documentation to see what is available.X


Home > Redbooks Wiki: Best Practices for Domino 8.0 Web Application Development > robots.txt
Rate this article 1 starRate this article 2 starsRate this article 3 starsRate this article 4 starsRate this article 5 stars

robots.txt 

expanded Abstract
collapsed Abstract
No abstract provided.

robots.txt


Table of contents | Previous | Next
  • robots.txt
  • Using robots.txt on Domino
  • robots meta tag

robots.txt


The robots.txt file is a Web standard file that is used on the whole World Wide Web to declare what search engines should not index from a Web site. This is an "old" technique, but is still helpful. By using this file, you can select which files not to index, avoiding the display of private files on search engines. This file is flexible and allows you to implement several rules in the same file to ensure distinct behavior for bots.

The robots.txt files was created around 1994 by the members of the Robots mailing list. There is no standards message or RFC for this issue. It is important to remember that robots.txt should not be used to flag what should be indexed, but to indicate what should not be indexed. You need robots.txt, for example, in an intranet with WWW access that has sensitive information for that company. Restrict areas and personal documents that are hosted on your server in a specific directory for backup reasons that are possible assets that you may want to prevent from getting indexed.

If you want a search engine to index your whole site, do not use robots.txt.

Using robots.txt on Domino


When creating a robots.txt file, keep the following considerations in mind:

  • The robots.txt file is a text file that must be created by using plain ASCII text and should be saved by using the "txt" file extension.
  • This file should be on the root directory of your Web site. This is the first that a spider visits on a Web site.
  • The file should be written in lowercase and have the proper public read access to the world. If your Web site root directory is your NSF file, you can upload your robots.txt to your NSF database as a file resource. You can also create a page named robots.txt, insert its content on the design body, and change its content type to text. For further information about this topic, refer to Page design elements.
  • Since Web crawlers consider subdirectories or subdomains as completely different Web sites, keep a new version of the robots.txt file on every subdirectory with a new site or with sensitive data. For example, if you have the unique option to create a page, as described previously. You may also consider using the robots metatag approach, described later in this section.
    • Important

      If your Web site root directory is not your NSF database, you must upload the robots.txt file to your Lotus Domino Web server root directory. For further information about this topic, refer to topology.



There are basically two rules to declare on this file: User-Agent and Disallow. The User-Agent is used to declare a specific agent. A User-Agent in this context is a search engine spider, like the Googlebot from Google.

    User-Agent: Googlebot



If you want all agents (and not only the Google robot) to index the content, use an asterisk as the value, so that the search engines do not index.

    User-agent: *



To block the whole site, use the root directory bar, as in the following example:

    Disallow: /



To block a specific directory, enter the directory path, as in the following example:

    Disallow: /private_directory/



To block a specific file, enter the file path, as in the following example:

    Disallow: /private_file.html



You can use as many Disallow rules as you want. Start a new line on your file.
    Important
    Remember that URLs are case sensitive. Therefore, a page called Coffee.htm cannot be declared as coffee.html.



robots.txt examples

The following example prohibits any robot from indexing the whole site:


    User-agent: *

    Disallow: /

An asterisk indicates everything or that all the robots should follow that rule. A practical example is preventing indexing folders on your site from containing private information. The code in the following example prevents for directories from being indexed.


    User-agent: *

    Disallow: /cgi-bin/ #scripts e and programs

    Disallow: /login/

    Disallow: /tmp/ #testing area

    Disallow: /private/ #corporate files

The number sign (# ) is used for comments. You can use this sign to explain the reason for excluding the file, without impacting its usage.

If you do not have a robots.txt file, the tool indexes your site normally. It is the same as having the following robots.txt file:


    User-agent: *

    Disallow:

The following example shows a more complex example. In the first lines, we declare that /directory/ should not be indexed by any robot. This rule should be followed by all spiders. Then, on lines 3 to 5, we define that the Google robot, Googlebot, should not index /cgi-bin/ and /corporate/hr/ directories. Then, in lines 6 and 7, we define that Yahoo robot, Slurp, should not index /corporate/accountancy/. Then, to finish, on lines 8 and 9, we define that the MSNĀ® robot, msnbot, should not index /msoffice_docs/ directory.


    User-agent: *

    Disallow: /private/

    User-agent: Googlebot # Google (line 3)

    Disallow: /cgi-bin/

    Disallow: /corporate/hr/

    User-agent: Slurp # Yahoo (line 6)

    Disallow: / corporate/accountancy/

    User-agent: msnbot # MSN (line 8)

    Disallow: /msoffice_docs/

    Tip
    The robots.txt file does not affect the search results returned by Domino on the Web. If you want to see which pages are brought by a search result, you may want to review your search query and view selections, and implement security by using access control lists (ACLs) and Readers fields. For further information about how to implement security on Domino applications, refer to security considerations.

robots meta tag


If you do not have access to the robots.txt file, you can use another approach to prevent a page from getting indexed. There is an HTML meta tag, called robots, that prevents spiders from indexing a Web site. This tag has a property that can have a pair of values, brought by the combination of these options: index, follow, noindex, and nofollow. Index and follow are the implicit defaults for this tag. The index option allows a page to be indexed, and follow allows its links to be indexed. The noindex option prevents indexing the page, which means not to put the page in the search results. The nofollow value prohibits following the links on this page in the index. If no other pages point to the same pages as the links on this page, this can have the same effect on those pages as a noindex on those pages. However, since anyone using any Web page can deep-link to those pages, this can fail. In the following example, a robot indexes the page and follows all the links on the page:


    <meta name="robots" content="index,follow" />

In the following example, a robot indexes the page, but treats it as a "dead end" and does not follow any of the links on it.:


    <meta name="robots" content="index,nofollow" />

In the following example, a robot skips over the page, without indexing its content, but continues indexing all the other pages to which this page links:


    <meta name="robots" content="noindex,follow" />

In the following example, an ethical robot neither indexes the page nor follows any of its links. It considers this page as nonexistent on their indexes.


    <meta name="robots" content="noindex,nofollow" />

This approach has the problem of being implemented in hypertext files, preventing its use for such files as PDFs or DOCs. Thus, robots.txt have a higher scale than this approach, but both have their importance.

For further information about how to insert meta tags on your pages, refer to Common design properties on Web applications.

    Important
    Despite the fact that the most reliable search engine robots respect the Web site indexing rules defined on the robots.txt file, do not expose sensitive data to the World Wide Web. "Thief spiders" can crawl your Web site to search for sensitive data. To avoid this, you must implement efficient security by using such techniques as firewalls, files access control, and ACLs. For further information about how to implement security on Domino applications, refer to security considerations.

expanded Article information
collapsed Article information
Category:
Redbooks Wiki: Best Practices for Domino 8.0 Web Application Development
Tags:

This Version: Version 10 March 24, 2011 3:29:54 PM by Amanda J Bauman  IBMer
   
expanded Attachments (0)
collapsed Attachments (0)

 


expanded Versions (9)
collapsed Versions (9)
expanded Version Comparison
collapsed Version Comparison
     
Version Date Changed by               Summary of changes
This version (10) Mar 24, 2011 3:29:54 PM Amanda J Bauman  
9 Aug 18, 2008 11:04:58 AM Krista McKenzie  
8 Aug 18, 2008 11:02:26 AM Krista McKenzie  
7 Aug 18, 2008 11:00:12 AM Krista McKenzie  
6 Aug 18, 2008 10:58:22 AM Krista McKenzie  
5 Aug 12, 2008 9:37:26 AM Krista McKenzie  
4 Aug 11, 2008 9:15:28 AM Krista McKenzie  
3 Aug 7, 2008 12:46:41 PM Krista McKenzie  
1 Aug 6, 2008 1:55:38 PM Krista McKenzie  
expanded Comments (0)
collapsed Comments (0)
Copy and paste this wiki markup to link to this article from another article in this wiki.
Tip: When linking to articles use the original title, not the edited title. The alias for the link can be the edited title.
Go ElsewhereStay ConnectedSubscribe to RSSHelpAbout
  • All Lotus and WebSphere Portal wikis
  • IBM developerWorks
  • IBM Software support
  • Lotus Technical Information and Education Team Blog
  • Lotus Tech Info on Twitter
  • Lotus Tech Info on Facebook
  • Lotus product forums
  • Lotus Tech Info blog
  • IBM Collaboration Solutions
  • Recently added feedRecently added
  • Recently edited feedRecently edited
  • Recently added comments feedRecently Added Comments
  • Wiki Help
  • Forgot user name/password
  • Wiki design feedback
  • Content feedback
  • About the wiki
  • About IBM
  • Privacy
  • Contact IBM
  • IBM Terms of use
  • Wiki terms of use
Return to English
Arabic
Chinese Simplified
Chinese Traditional
French
German
Italian
Japanese
Korean
Portuguese
Russian
Spanish