Protect Your Contact Data Against Crawlers

Last updated
  • quick tip

I recently ran into an interesting problem: How can I add contact info to my website in a way that’s easy to use for humans, but hard to abuse for bots? If you are into SEO, the solution might be obvious to you. If you’re not, this post will help you out.

The Theory

Let’s start with the basics. In an ideal world, it would suffice to add a mailto link to your website:

<a href="mailto:mail@example.com">mail@example.com</a>

But sadly, evil cybercriminals with dark hoodies want to steal your mail addresses to do evil things with them. We can’t let them succeed!

Since web scraping has become a huge problem, many tools protect your contact data by default. For example, Cloudflare automatically redacts all your mail addresses. Of course, you can also obfuscate your mail addresses manually. The idea is to replace characters required for successful Regex matching (most importantly, the @) with phrases that can still be understood by humans. Have a look at the following example:

mail@example.com -> mail AT example DOT com

Most crawlers will not recognize mail AT example DOT com as a mail address, but most people will.

This solution is far from ideal. People have to manually »de-obfuscate« your mail address before they can write you - an unnecessary extra step that might put them off. How can we solve this?

The answer is incredibly simple: Crawlers usually do not execute JavaScript. You can - for example - add a mailto link to your website via onload script. To human visitors (who haven’t disabled JavaScript in their browsers), it will look like a normal, static link. To most crawlers, however, it will be invisible. That’s exactly the result we want!

On a side note, this piece of information is key when you optimize your website for SEO; content made accessible via JavaScript will be parsed and ranked much later than static content!

TL;DR The Solution

Here’s a small example to get you started. First, add a manually obfuscated mailto link to your website:

<a id="mail" href="mailto:mail AT example.com">mail AT example.com</a>

Next, add an onload script to »disentangle« the address programmatically:

document.body.onload = () => {
  const mailLink = document.getElementById('mail');

  // Fix actual `mailto` target
  mailLink.href = mailLink.href.replace(' AT ', '@');

  // Fix displayed address
  mailLink.innerHTML = mailLink.innerHTML.replace(' AT ', '@');
};

You can omit the first step and just append the whole link element to the DOM tree onload if you prefer. However, I find the given variant more elegant; if the disentanglement fails, people will still be able to read your mail address.

Depending on your content security policy, you might need to add a nonce to allow script execution.

The following snippet brings it all together:

<html>
  <head>
    <meta http-equiv="Content-Security-Policy" content="script-src 'nonce-<your-nonce-here>'" />
  </head>
  <body>
    <a id="mail" href="mailto:mail AT example.com">mail AT example.com</a>
    <script nonce="<your-nonce-here>">
      document.body.onload = () => {
        const mailLink = document.getElementById('mail');
        mailLink.href = mailLink.href.replace(' AT ', '@');
        mailLink.innerHTML = mailLink.innerHTML.replace(' AT ', '@');
      };
    </script>
  </body>
</html>

You can find a working example of this on my digital business card. Come have a look!

I hope you find this little trick as useful as I do. Keep in mind that it does not guarantee 100 percent protection, though!