DeGoogle: Leaving Gmail
Approximately a fifth of the world has a Gmail account, and 99% of those users pay nothing for the privilege. This is taken as a fact of life, despite being transparently unsustainable. I have been using Gmail for over a decade, and my dependency on it is enormous. If Gmail were paywalled tomorrow, or shuttered entirely, I would not be able to live my life. There are probably hundreds of people I would permanently lose contact with.
Luckily, email is one of those technologies that has been standardised to death. Using those standards, I'm pretty sure there's a way to move off Gmail completely, with no downtime and no dropped messages.
## Getting your own domain
The first step is to get an email address you control. This means getting a *domain* you control. Why? Because if you change `dan@gmail.com` into `dan@protonmail.com`, then you'll have to do this migration all over again when it turns out ProtonMail is just as unsustainable as your previous provider. In fact, Gmail will allow you to do this -- it's how companies use "Google Workspace" with a custom email domain.
The steps go something like this:
* Buy (rent) a domain name that you think you can stick to permanently. * Pick an mail service you feel like using at the moment. This can be gmail for now, or you can scope out alternatives. * Set up an MX record to point your domain to the mail servers you chose.
Getting a domain name costs about 10-20 USD per year, at least (some are cheaper than others). This has to be something you expect to use for a long time, since *something* has to be a permanent identifier for you. Ideally, governments would reserve and provide certain domains for their citizens, free of charge. But we live in hell, so you have to rent them from a private company.
I selected <https://forwardemail.net> as my email service. Their front-end and back-end are entirely free software, so I could replace them with another company (or manage a service myself) if it came to it. Note however that I *do not recommend* running your own mail server. Because spam is such a pervasive issue, "reputation" is essential for making sure your emails are actually delivered. It is not uncommon for people to set up a perfectly well-functioning mail server, but to have all their emails black-holed by the larger providers.
An MX record is a kind of DNS record specifically for email. If you don't know about the details of DNS, that doesn't matter (although I will write an article about it some day, because it's nice to have an idea). [ForwardEmail do have a guide on this](https://forwardemail.net/en/faq#how-do-i-get-started-and-set-up-email-forwarding) ([and so do Google](https://support.google.com/a/answer/140034?sjid=14272063294006331623-EU)).
If you managed to set this up correctly, then emails to `you@your-domain.abc` should now get forwarded to your email inbox of choice. If you want, you can set this up as your existing gmail account, and you can at least *receive* mail from your custom domain. This means you can give all your friends your new domain, and you'll keep getting email -- but when you switch the backend, they don't need to change anything. You can even try out one service and decide you don't like it, and change back, or change to a third option. This is already a significant step forwards in terms of freedom.
## Getting a free client, and sending mail
From a software freedom perspective, the most significant obstacle is being able to send and receive email using software entirely under your control. In the previous section, hopefully you managed to sign up for an email service that allows you to send email from them - i.e. to send them web requests, that result in them sending email on your behalf. As I mentioned before, this really is a service -- you can send email yourself from any computer, but it will likely be blocked or ignored by most email services. It does this using SMTP -- the Simple Mail Transfer Protocol.
Another protocol that needs mentioning is IMAP. IMAP is a protocol for *receiving* your inbox, any folders you put your emails into, any metadata or attachments, and keeping that copy in sync with the one your email service maintains. In other words, it lets you make changes to your email inbox on your own computer, and forward those changes to your email service without going out of sync. IMAP is an improvement on POP3; if you see options to set up POP for your inbox, you can ignore them, and choose the IMAP option instead.
There are many free email clients available. But they split into basically two camps.
* "Friendly" clients, which essentially function as personal organisers. * "Hacker" clients, which require technical knowledge, and work via the command line.
As far as friendly clients go, my recommendation is Thunderbird. You should set up IMAP and SMTP for both your gmail account and your new custom domain. The Gmail setup is common enough that it should work out of the box. ForwardEmail have [a guide for SMTP](https://forwardemail.net/en/faq#do-you-support-sending-email-with-smtp) and [a guide for IMAP](https://forwardemail.net/en/faq#what-are-your-imap-server-configuration-settings).
At this point, you effectively have two different email addresses. Both of them will work, but you will have to remember which address you use for each person, or service.
## The long path to neutering your Gmail account
If you followed all the steps in the previous section, you should now be able to send and receive email using entirely free software. Unfortunately, you still have a substantial dependence on Google itself. Even if you set up email forwarding, so that all incoming email was forwarded to your own mail server, you still have to rely on Google keeping that forwarding in place. If Gmail were closed down or paywalled, they could stop forwarding your email immediately, and your mail would just get dropped.
Unless you keep a meticulous record of who you email, and which services you sign up for, what remains is a process of very tedious archeology to determine *who must be updated in order to contact me*. For people, you can try to maintain an address book of your friends and family. You could, also, maintain a "contact" page on your website, and include it in your email signature. That way, anyone who tries to contact you using a method that doesn't work, can go to you `your-domain.abc/contact` and get your new details.
For services, however, there really is nothing you can do except visit them one by one and update their records. And if you're like me, you really have no idea what you've signed up for over the years. My strategy for figuring it out is to basically scrape all my old emails to find the email senders, and split them up into people and companies.
* If it's a person, I can delete/archive the email and add them to my address book. * If it's a company, I can delete/archive the email and update their records.
I should be able to go through this as a literal to-do list, and gradually reduce the number of places I have to track. It may also be useful to have a list of services I'm signed up for - as a kind of record to myself.
Once you think you've done most of the cases, you could set up an auto-reply on the gmail account, telling everyone who emails you that their email has been forwarded to your new address, that they should expect a reply directly from there, and that this inbox will eventually go out of maintainance.
## Scraping my emails
Luckily, there is a lot of software on the shelf that I can use for this job. Thunderbird doesn't have good support for scripting directly, but there is a standardised directory scheme called [Maildir](https://en.wikipedia.org/wiki/Maildir) for exposing emails on a local filesystem. Once there, it's easy to write UNIX-like utilities that operate on the emails as files.
In fact, [`mblaze`](https://github.com/leahneukirchen/mblaze) is a library of shell utilities to do exactly that. Combined with [`offlineimap`](https://www.offlineimap.org/), I managed to get a list of everyone who's ever emailed me:
``` offlineimap ... lots of log messages ... tree ~/Maildir | tail -n1 57 directories, 12975 files maddr ~/Maildir/**/* | sort | uniq | wc -l 2553 ```
### Detecting mailing lists
That list of addresses is much too big to review. I just don't know that many people. I immediately suspect that mailing lists I'm subscribed to are the issue. I subscribe to a few -- I don't actually know the full list.
I have two heuristics for emails that are probably mailing list addresses:
* They should appear frequently in the `To:` header, because every email to the list must include that one. * They probably contain words like `dev`, `help`, `gnu`.
Finding the most common addresses in my inbox should actually only be a tiny modification of the snippet I used to count unique addresses. Rather than doing `| sort | uniq` you want to do something like `| sort | group_with_count | sort_by_count`.
In fact, `uniq` has a `-c` argument which will count the duplicated occurrences, and `sort -n` will do a correct numerical sort (no need to split the output into fields). I can also normalise the output to print only the addresses using `maddr -a` -- this way, I don't see multiple rows for the same person just because they changed their display name. The final snippet is
``` maddr -a ~/Maildir/**/* | sort | uniq -c | sort -n ... 2160 guix-devel@gnu.org 2275 pandoc-discuss@googlegroups.com 4017 danielittlewood@gmail.com 7543 ~mil/sxmo-devel@lists.sr.ht ```
Ok - so some of these lists are clearly much more noisy than others. *Hopefully*, by clearing them out, a lot of the unfamiliar addresses will drop out the other end. One change I can make without thinking is to delete all the unread emails that come from mailing lists - I can't delete *every* unread email, but if it's available on the archive, and I never read it the first time round, then there's no chance of me reading it now.
So I would like to find all the emails that refer to `~mil/sxmo-devel@lists.sr.hr`. Naively looking in the `To:` header only shows 227 messages, so that can't be right. Sure enough, `maddr` is using multiple headers (snippet from man page)
> `-h headers` > > Only search the colon-separated list of headers for mail addresses. > Default: ‘from:sender:reply-to:to:cc:bcc:’ and their respective ‘resent-’ > variants, if any.
This explains why my own email address is turning up, too. I'm not *completely* sure what "their respective `resent-` variants" means. To be on the safe side, I can search every single header (but not the message body) with `magrep -a`, like so:
``` mlist -s ~/Maildir/**/* | magrep -a -p "*:sxmo-devel" | less ...: received: ~mil/sxmo-devel@lists.sr.ht ...: to: ~mil/sxmo-devel@lists.sr.ht ...: list-unsubscribe: ~mil/sxmo-devel+unsubscribe@lists.sr.ht?subject=unsubscribe ...: list-subscribe: ~mil/sxmo-devel+subscribe@lists.sr.ht?subject=subscribe ...: list-archive: //lists.sr.ht/~mil/sxmo-devel ...: list-id: ~mil/sxmo-devel.lists.sr.ht ```
This shows a lot of matching headers, including `list-id`, which is interesting. I *should* be able to search using just that header.
``` $ mlist ~/Maildir/**/* | magrep -p "list-id:sxmo-devel" | wc -l 3772 $ mlist -s ~/Maildir/**/* | magrep -p "list-id:sxmo-devel" | wc -l 264 ```
Interesting! This is way less than I saw earlier. I wonder if `maddr` is printing addresses for the same email multiple times? Anyway, these emails are all junk -- let's delete them:
``` mlist -s ~/Maildir/inbox/**/* | magrep "list-id:sxmo-devel" | mrefile bak.trash ```
While I'm at it, let's make a tag for mailing lists.
### Listing lists
[RFC 2919](https://www.ietf.org/rfc/rfc2919.txt) roughly describes the structure of the list-id header,
> The contents of the List-Id header mostly consist of angle-bracket > ('<', '>') enclosed identifier, with internal whitespace being > ignored. MTAs MUST NOT insert whitespace within the brackets, but > client applications should treat any such whitespace, that might be > inserted by poorly behaved MTAs, as characters to ignore.
Well, here is a snippet that will list all distinct values of the list-id header,
``` mlist ~/Maildir/**/* | magrep "list-id:.*" | mhdr -h 'list-id' | sort | uniq ```
and here's one with the fluff removed (using `sed`), leaving just the unique identifier.
``` mlist ~/Maildir/**/* | magrep "list-id:.*" | mhdr -h 'list-id' | sed 's/^.*<\(.*\)>.*/\1/' | sort | uniq ```
Once you have the ids, you can find them using `magrep`. Unfortunately, `magrep` does not have `grep`'s `--fixed-string` option, which quotes regex-special characters like `.` or `*`. I had to simulate it with a tiny Ruby script `quote-regex.rb`:
``` #!/usr/bin/env ruby
puts Regexp.quote(ARGV[0]) ```
then you can search properly:
``` id="0f04c6291fcda4f25e8704894.301471.list-id.mcsv.net" mlist ~/Maildir/**/* | magrep "list-id:$(./quote-regex.rb $id)" | mscan ```
In fact, you can turn this into a nice little interactive script,
``` #!/usr/bin/env bash
select id in $(mlist "$@" | magrep "list-id:.*" | mhdr -h 'list-id' | sed 's/^.*<\(.*\)>.*/\1/' | sort | uniq); do if [ -n $id ]; then mlist "$@" | magrep "list-id:$(./quote-regex.rb $id)" | mseq -S while true; do echo "Current sequence set to $id. What do? break to return; exit to quit; mscan/mshow/whatever" read -p "$: " action if [[ $action ]]; then case $action in exit) exit ;; break) break ;; *) bash -c "$action" ;; esac fi done fi done ```
called with e.g. `./enumerate-lists.sh inbox/`, you can deal with lists one by one. This script populates the "current sequence" (see `man mseq` for an explanation). This means that in the REPL you can do things like
* `mscan` to browse the list. * `mseq | maddr -a` to check who sent the mail (to add them as a contact, or block, or whatever). * `mseq | mrefile bak.trash` to delete, or `mrefile marketing` if you don't want to delete them.
### Systematically dealing with bulk email
All well and good -- I've managed to clean up one mailing list once. After a week of receiving more mail, my inbox will be full again. And I'll have to repeat the same (non-trivial) steps for every mailing list. It seems clear that rather than a one-off scrape, I need some kind of systematic *strategy* for dealing with this sort of mail.
First of all, what even is "this sort of mail"? Well, it seems to include things like newsletters, marketing emails, and other "bulk" interactions where a single entity is broadcasting to me (and lots of other people).
* Some email is just junk. I receive it, but I don't want to. For these, I want to unsubscribe and delete all old mail. * In most other cases, I want them to hit my inbox initially - since I might want to read them. But if I decide not to read them, they should fall out of my inbox pretty quickly so that I don't lose track of other important messages. * In some cases, I will want to archive these emails forever. In others, they're either archived publicly or of sufficiently low permanent value that I just want them deleted if I don't read them quickly enough. * Emails I've read should be treated specially. If I never opened a newsletter, it's ok to delete. If I did open it, maybe I saw something interesting that I wanted to return to? Those emails should not be deleted automatically.
Ok, so how about the following taxonomy?
* **Inbox**. Everything goes in here, so I can see it. Messages are moved out as time progresses depending on certain actions. * **Archive**. Anything I want to keep permanently ends up in here. Personal emails that are no longer useful but that I should keep. Emails from the government or my employer. Ideally I should actually back this up permanently! * **Transactional**. Password resets, purchases, planning holidays. Moved here as soon as I see them, read or not. Never deleted automatically (since it's hard to tell automatically when a transaction is "stale" -- I still want total control of that). But I do expect to delete all of them eventually (move to archive if not). * **Bulk**. Mailing lists, newsletters, marketing emails. * **Personal**. Emails from people. Should be in my address book.
In fact, noticing the "Personal" tag makes me think I should be trying to categorise not just emails, but also addresses. For each address, is it a person I know, a person I don't know, or a company? Is that a useful categorisation? It seems like it would enable me to keep track of accounts I've signed up for.
### My `.offlineimaprc` For reference, here is my `.offlineimaprc`, with passwords redacted (copied from [the FAQ](https://www.offlineimap.org/doc/conf_examples.html) and [the docs](https://www.offlineimap.org/doc/quick_start.html).
``` [general] accounts = danielittlewood,gmail
[Account danielittlewood] localrepository = New remoterepository = forward-email
[Account gmail] localrepository = Old remoterepository = gmail
[Repository gmail] type = Gmail remoteuser = remotepass = nametrans = lambda foldername: re.sub ('^\[gmail\]', 'bak', re.sub ('sent_mail', 'sent', re.sub ('starred', 'flagged', re.sub (' ', '_', foldername.lower())))) folderfilter = lambda foldername: foldername not in ['[Gmail]/All Mail'] # Necessary as of OfflineIMAP 6.5.4 sslcacertfile = /etc/ssl/certs/ca-certificates.crt # Necessary to work around https://github.com/OfflineIMAP/offlineimap/issues/573 (versions 7.0.12, 7.2.1) ssl_version = tls1_2
[Repository forward-email] type = IMAP remotehost = imap.forwardemail.net remoteuser = remotepass = # Necessary as of OfflineIMAP 6.5.4 sslcacertfile = /etc/ssl/certs/ca-certificates.crt # Necessary to work around https://github.com/OfflineIMAP/offlineimap/issues/573 (versions 7.0.12, 7.2.1) ssl_version = tls1_2
[Repository Old] type = Maildir localfolders = ~/Maildir
[Repository New] type = Maildir localfolders = ~/Maildir.new ```