Skip to main content

OpenLDAP – Fixing or Recovering a Corrupt Directory

Posted on January 26, 2013 by TerryCoxTerryCox

Let’s discuss OpenLDAP. LDAP is a wonderful thing, but just when you get to relying on it too much, BANG! You are dead in the water. It won’t restart after a system reboot, it won’t allow authentication with an account that you know worked just yesterday, or worse, your local server logins seem to hang or time out altogether.
So what do we do? First, you have to get logged in. If you are unlucky enough to have your local server logins tied to LDAP and they are no longer working (you did have a couple of local accounts with admin privileges right?), you are either going to have to use a backup local account, the root account or reboot your system in single user mode and login as root. If you don’t know how to do any of those things, we can visit that topic at another time. For the purposes of our exercise, let us assume you are logged into the LDAP server itself with elevated privileges. Here are some things to try: 

Typical Corruption

  • Stop the LDAP server
    /etc/init.d/ldap stop
  • Run the daemon manually with the debug flag so we can see what’s going on (NOTE: replace port 389 with 636 if you specified encryption during initial setup)
    /usr/sbin/slapd -u ldap -h ldap:// -d 256

You may get any number of things here, if you see a number of errors or the directory appears to be corrupt, you will be able to tell since it will likely stop at ‘database initiation’ and hang. You will have to ‘CTRL-C’ if it doesn’t die on its own.

  • Perform the Recovery (NOTE: Depending on the size and number of records in the directory, this process could take some time)
    /usr/sbin/slapd_db_recover -h /var/lib/ldap
  • and finally restart the LDAP server
    /etc/init.d/ldap start

Houston, We Have a Problem

Sometimes, the LDAP is so corrupt for any number of reasons (file system corruption, system power loss during LDAP update or optimize, etc) that the above steps will not correct the problem. We will have to resort to some more drastic steps. Although you can perform these steps logged in locally as the ‘root’ user, I would recommend booting to single user mode if at all possible.

  • Stop the LDAP server
    /etc/init.d/ldap stop
  • Time to backup the existing structure (NOTE: It is possible that the path shown below is not appropriate for your situation if you customized the LDAP location during installation, the path shown below is the default directory location on an OpenLDAP install)
    tar cvf ldap.bkup.tar /var/lib/ldap/*
  • Now we are going to attempt another recovery; it may get no further than the initial try, but it will mark the files appropriately for the following steps
    /usr/sbin/slapd_db_recover -h /var/lib/ldap
  • Construct an LDIF file containing the structure to restore (NOTE: If you receive an error here, you may need to delete all BDB files except ‘dn2id’ and ‘id2entery’ to complete the command – we have a backup in case it all goes south)
    slapcat > ldap.restore.ldif
  • Verify that our newly created definition file contains our directory entries (view in a text editor). If it does not (most often would result in a very small or ‘zero byte’ file), we can try to run the recovery routine at a deeper level to recover from severe corruption
    /usr/sbin/slapd_db_recover -h /var/lib/ldap -v -c
  • If there are no errors (and it is unlikely there will be since that will either work or when you repeat the routine, we will still get an empty LDIF), then repeat the previous step to create our backup LDIF above
  • Now, delete the corrupted LDAP directory as follows (NOTE: Do not panic, we backed it all up before we started)
    rm -rf /var/lib/ldap/*
  • The DB_CONFIG file needs to be recreated with some basic information
    echo -en 'set cachesize 0 15000000 1nset_lg_bsize 2097152n' > /var/lib/ldap/DB_CONFIG
  • Attempt to load the backed up LDIF file back to the directory
    slapadd -l ldap.restore.ldif
  • Make sure ownership is correct
    chown -R ldap:ldap /var/lib/ldap
  • Crossing appropriate digits, restart the server
    /etc/init.d/ldap start

Final Comments

In most cases (I have found four out of five), the first set of steps will be enough to recover the directory from normal corruption causes. In the event of a serious corruption, the longer series of steps should save most if not all of your directory before you begin to think about trying to recover from a backup. Typical backup scenarios (tape, disk, etc) will fail to yield a recoverable LDAP structure since the files will be in inconsistent states as they are copied. It is always a good idea to do a nightly dump of your records and set up to a location that gets backed up and can then be easily restored.
If anyone is interested, we can explore some of those steps as well as how to set up OpenLDAP to begin with. Comments and questions below and good luck.


Image of Kareem
4 years ago

Very informative and well laid-out article
Wondering if you have any advice on having several slave servers tied to one master?
We have a couple of slaves that are regularly getting out of sync with the master and/or failing to respond to requests to LDAP clients they serve. (We’ve been fixing it be renaming the ldap working directory and restarting the service, but this does not fix the issue permanently, it recurs regularly.)
Would you work on one slave at a time (as you describe above) and go round the network that way?
It doesn’t seem the master is misbehaving at all.

Image of Dimitar
2 years ago

Thanks foir this! Really helped me on a production incident..

Image of Julian Opificius
Julian Opificius
1 year ago

Thanks, Terry, this guidance worked for me after a couple of iterations. I had to go all the way down the rabbit hole to “/usr/sbin/slapd_db_recover -h /var/lib/ldap -v -c”, but it worked.
A couple of notes: 1) the command line shown above to cat the new DB_CONFIG file is corrupt: the “\n” doesn’t show up properly, so it isn’t clear that the command includes line feeds. One “skilled in the art” would be able to figure it out, but in moments of stress it takes a bit more effort than it might with proper formatting; 2) one has to delete “/var/lib/ldap/alock”, otherwise one can get confusing errors about “unclean shutdown detected”; 3) in my SME Server 9.2 installation , “/usr/sbin/slap_db_recover” is replaced with “‘usr/bin/db_recover”, so it may be reasonable to suppose that that may be the case on other distros too; 4) you may wish to advise users that the “slapcat > ldap.restore.ldif” should be executed in a directory other than /var/lib/ldap, otherwise it will be deleted when the user performs “rm -rf /var/lib/ldap/*”. Care should also be taken when performing the tar backup.
Anyway, my backup appears to be running now, for which I am supremely grateful. Now if only I knew what caused the corruption in the first place…

Leave a Reply

Your email address will not be published. Required fields are marked *