Eliminating duplicate contacts in Evolution (HowTo)

This had been bothering me for a while. Due mostly to artifacts in synchronizing with my PalmOS device, a large number of duplicates had accumulated in my Evolution addressbook. Today I put together this short script to eliminate them. It’s crude, but it sort of works.

Standard disclaimer applies: use at your own risk, backup your data, floss your teeth, etc. etc.

Because wordpress tends to mess with quotes and such, instead of cutting-and-pasting the code below, you may want to download it from here:
evo_eliminate_duplicate_contacts.pl

#!/usr/bin/perl -w

use DB_File;

$addrdb=$ARGV[0]
    || "$ENV{HOME}/.evolution/addressbook/local/system/addressbook.db";
print "* Examining $addrdb\n";

tie %h, 'DB_File', $addrdb, O_RDWR, 0777, $DB_HASH
  or die "Error opening file: $!\n";

# Keep track of names we've seen
%names=();

for $k (keys %h) {
    $card=$h{$k};
    if ($card =~ /^FN:(.*)$/m) {
        $name=$1;
        $name=~s/\r//g;
        chomp $name;
        if (exists($names{$name})) {
            print "* Previously found $name, removing\n";
            delete $h{$k};
        }
        $names{$name}++;
    }
}

print "* Done. Duplicate statistics:\n";
print join("\n", map { "$_: $names{$_} times" }
           sort { $names{$b}  $names{$a} }
           grep { $names{$_} > 1 } keys %names
          )."\n";

Technorati Tags: , , , ,

Advertisements

27 responses to “Eliminating duplicate contacts in Evolution (HowTo)

  1. sorry i don;t undertsnad how to use this. I saved the file as “evolution_clean_script” to my home directory & fired the following

    adam@adams-desktop:~$ sudo perl evolution_clean_script

    I get the following output
    * Examining /home/adam/.evolution/addressbook/local/system/addressbook.db
    Error opening file: No such file or directory

    but I can see the file?

    adam@adams-desktop:~$ ls /home/adam/.evolution/addressbook/local/system/ addressbook.db
    addressbook.db.summary
    beagle-cufGXc9lGkWLA5Ap1WyKpw.changes.db
    pilot-map-1000.xml
    pilot-sync-evolution-addressbook-1000.changes.db

  2. adam – I’m not sure what could be causing the problem. But you do not need to run it using sudo, since the file that it modifies belongs to you, so you have permissions on it already. Could you please try without sudo, and see if the problem persists?

  3. I have the same problem, whether I run sudo perl evolution_clean_script or perl evolution_clean_script
    * Examining /home/drfox/.evolution/addressbook/local/system/addressbook.db
    Error opening file: No such file or directory

    Can you offer any other suggestions?

    Larry

  4. Larry – your addressbook.db must be in a different location, you should find it and pass it as an argument to the script. Which version of evolution are you using? The use of $HOME/.evolution is only in recent versions, previously it used to be $HOME/evolution (without the dot).

  5. adam, Larry – I just noticed I had a stupid mistake in the script, it was not using the $addrdb variable, no matter how it was defined. This was the cause of your problems – please try the new version.

  6. This worked great! Thanks, Hitchhiker!

  7. Thanks a lot for this – found it through a search.
    Evolution had gone crazy and created 100 copies of several of my contacts!
    Anyway, this program cleared up all the mess – brilliant!

  8. Thanks a lot!
    Never even knew Perl could read/write the evolution db, this is great!

  9. I tried this script and got: “Unrecognized character xE2 at ./evo line 33.” . I don’t know what it is. I don’t know any programming language but I have a lot duplicates in my Evolution contacts.

  10. Alex: the script has only 32 lines. Maybe you inserted some weird character when copying/pasting into the script?

  11. Sorry maybe I am stupid. I just copied and pasted.

    Line 33: ).”n”;

    Unrecognized character xE2 at ./evo line 33.

  12. OK, it’s working now. The problem is your HTML code: ).”n”; for line 33. I copied it from the source and changed #8221; into regular quotes and it works now. Thank’s for the script.
    Alex.

  13. Alex – I think WordPress was messing with the quotes. I have added a link to the script for download – I’m glad you got it working.

  14. Did you try to contact Evolution people? Maybe they will want to include your script into their next release? Your script is really great. It removed 578 duplicates from my contacts. I don’t know why duplicates appeared. Maybe Evolution is buggy?

  15. Worked great for me. Removed whole load of duplicates after synchronizing Palm.

    Thanks.

    Bruno

  16. Any such script out there for calendar entires? TIA

  17. Matthew: I just looked and the calendars are not stored in database files (which would have made adapting the script really easy) but as ical files (.ics), so a completely different technique would have to be used to remove duplicates. Unfortunately I am no longer using evolution myself, and don’t have the time to look into it.

  18. My adressbook had gone wild after using syncevolution with scheduleworld.

    Your script worked really fine for me. thank you.

  19. You’re script is very nice, but it removes all contacts not only duplicates but complementary contacts for same persons in different roles.

    Anyhow, it’ll be really useful a cleaning tool for evo.

  20. Thanks a lot – your script worked well for me.

    I just had a question –
    your script just eliminates duplicates – but does it merge duplicate contacts? (e.g. if I’d added a work telephone No. to one entry and a Mob. No. to the other does this info get saved) – or does the script just delete the second version. Many thanks.

  21. Everyone: thanks for all the comments! I am happy that the script has been useful for some people.

    @Addressed Out & Ismael: it’s a very dumb script, it simply keeps the first record for each person, and it does the checking based on the person’s name. So it will not merge duplicate records, and it will blindly remove everything beyond the first one it finds for each person. If the first record it finds is empty, that’s the one it’ll keep. Furthermore, there’s no way to know which record it will find first, given the nature of the database file in which the records are stored.

    The script could be improved to do merging, or at least to check which record has more information before deleting them. Unfortunately I am not using Evolution any more, so that will have to be a task for someone else 🙂

  22. Wow you’ve saved me hours with that script!! Thanks a bunch!

  23. There is an easier way! Create a new address book. Name it whatever you want. Take the two known* duplicate entries and drag them there. It will give you the duplicate merge dialog. Merge them. Drag them back. Done!

    *this being the tricky part…

  24. Works great for me! I had a bunch of duplicates after syncing with Palm. This eliminated them. Thanks!!! 2000+ contacts down to 1000+ contacts!

  25. Thanks a lot for this! You saved me around hours of work when synce messed up and I ended up with 8 of the same contact x 400 contacts!

  26. When I run the script from Terminal I get the error message “command not found”

    evo_eliminate_duplicate_contacts.pl

  27. OIC. You have to add the command perl in front of the script. Did not know that.

    brion@brion-laptop:~$ perl evo_eliminate_duplicate_contacts.pl

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s