One PowerBase group with over 150,000 records wanted to purge their database. They knew they had a lot of contacts that were just dead weight: no contact information or bad contact information or incomplete names or duplicate records.
Here is the process we went through to clean things up.
First - we identified that we would take two approaches:
- Delete contacts
- De-dupe contacts
We decided that deleting contacts first made more sense - since there would be less contacts to dedupe - which is the far more time-consuming processes.
Deleting Contacts
There are many ways to find and delete contacts, but the hard part is coming up with the criteria.
We started with the idea that if we can't contact the person, they should be removed.
So, we began searching for everyone who didn't have an email address AND didn't have a phone number AND didn't have a city. Since you can't do this type of search in Advanced search, we did it in Search Builder. We used "Primary" as the location type and "Is Empty" as the criteria.
We later changed city to street address - because some people had a city but no street address.
At this point we got to a list of about 4,000. However, we noticed two problems:
- Some people had contribution or event records. We don't want to lose them.
- Some contacts were Organizations that had relationships to contacts with contact information. Probably want to keep this data
To avoid deleting these contacts, we created two more groups using advanced search:
- Everyone with a contribution OR an attendance record (We used the and/or search option for this). We expanded both the contribution and event accordions. For contribution - we simply but both contributions and soft credits to get every conceivable donation. For event, we put an end date of today to match all participation records.
- All organizations with an employer relationship. It does not seem possible to search for all organizations with any relationship - and all the organization relationships seem to be employer relationships - so this search got all that we needed.
Now that we had these three groups we were ready.
We did an include/exclude search that included everyone without an email/phone/address and excluded everyone with a contribution or attendance record and excluded every organization with an employer relationship.
About 3500 records, all of which were deleted.
More criteria needed
3500 records removed from a database with 150,000 records is a great start, but doesn't add up to much.
Next we proposed:
- Delete everyone without a first name OR without a last name (provided they are not in any of our two exclusion groups)
- Delete everyone with just an asterisk as a first name or last name (quite a few of these)
Still waiting to hear back.
Now let's dedupe.... not so fast.
First and last name are critical to de-duping. However, our contacts had some atrocious data in the first and last name.
It turns out that there are many contacts with a last name or first name plus an asterisk or a number. It seems these characters were some kind of code in the imported data.
There is no point in de-duping with these characters because we won't get a clean match on first and last name.