Yet another question about self-hosting email, but I haven’t found the answer at least phrased in a way that makes sense with my question.

I’ve got ~15 GBs of old gmail data that I’ve already downloaded, and google is on my ass about “91% full” and we know I’m not about to pay them for storage (I’ll sooner spend 100 hours trying to solve it myself before I pay them $3/month).

What I want is to have the same (or relatively close to the same) access and experience to find stuff in those old emails when they are stored on my hardware as I do when they are in my gmail. That is, I want to have a website and/or app that i search for emails from so-and-so, in some date-range, keywords. I don’t actually want to send any emails from this server or receive anything to it (maybe I would want gmail to forward to it or something, but probably I’d just do another archive batch every year).

What I’ve tried so far, which is sort of working, is that I’ve set up docker-mailserver on my box, and that is working and accessible. I can connect to it via Thunderbird or K-9 mail. I also converted big email download from google, which was a .mbox, into maildir using mb2md (apt install mb2md on debian was nice). This gave me a directory with ~120k individual email files.

When I check this out in Thunderbird, I see all those emails (and they look like they have the right info) (as a side - I actually only moved 1k emails into the directory that docker-mailserver has access to, just for testing, and Thunderbird only sees that 1k then). I can do some searching on those.

When I open in K-9, it by default looks like it just pulls in 100 of them. I can pull in more or refresh it sort of thing. I don’t normally use K-9, so I may just be missing how the functionality there is supposed to work.

I also just tried connecting to the mail server with Nextcloud Mail, which works in the sense that it connects but it (1) seems like it is struggling, and (2) is putting ‘today’ as the date for all the emails rather than when they actually came through. I don’t really want to use Nextcloud Mail here…

So, I think my question here is now really around search and storage. In Thunderbird, I think that the way it works (I don’t normally use Thunderbird much either) is that it downloads all the files locally, and then it will search them locally. In K-9 that appears to be the same, but with the caveat that it doesn’t look like it really wants to download 120k emails locally (even if I can).

What I think I want to do, though, is have the search running on the server. Like I don’t want to download 15GBs (and another 9 from gmail soon enough) to each client. I want it all on the server and just put in my search and the server do the query and give me a response.

docker-mailserver has a page for setting up Full-Text Search with Xapian, where it’ll make all the indices and all that. I tinkered with this and think I got it set up. This is another sort of thing where I would want the search to be utilizing the server rather than client since the server is (hopefully) optimizing for some of this stuff.

Should I be using a different server for what I want here? I’ve poked around at different ones and am more than open to changing to something else that is more for what I need here.

For clients, should I be using Roundcube or something else? Will that actually help with this ‘use the server to search’ question? For mobile, is there any way to avoid downloading all the emails to the client?

Thanks for the help.

12 points

I don’t think you want a mail server, you want a mail archive. A quick google search for “selfhosted email archive” shows a number of good leads.

permalink
report
reply
2 points

I think that that is right that I fundamentally want an archive, not what a normal mail server provides. Part of my thought on looking at mail servers is that those would integrate directly with whatever other front-end/client that I’d normally use, whereas an archive maybe would not.

And regarding archive-specific stuff, I am seeing some things on a search, but I guess i’m wondering if folks here have any recommendations. When I look at , for example, nothing comes up for email archive, just for email servers. That, plus what I see when searching, makes me think that the archive-specific stuff is either oriented to business or oriented to a CLI (like NotMuch, which was mentioned in the discussion here and does look cool).

permalink
report
parent
reply
2 points

Yup. There’s some confusion about what a mail server’s purpose is here.

permalink
report
parent
reply
5 points
*

As another poster pointed out, it sounds like you want more of a mail search and archival tool than a mail server. I would suggest you pull the emails in maildir format from Google Takeout, and then index/search them with the amazing Notmuch. Notmuch is way more capable than Gmail search ever has been. Look at the Arch Wiki page page as well for info, the official docs are a bit obtuse but it’s not actually hard to use.

permalink
report
reply
2 points

This looks like a good backend for sure, but the web frontends look a little lacking and I’m not seeing anything about a mobile frontend (other than if a web one was up, which would be fine). Have you tried any of the web frontends?

permalink
report
parent
reply
0 points

No, I used it with Alot mostly in the terminal. Can’t really speak to the front ends, I was kind of assuming you don’t need to search your old emails that often.

permalink
report
parent
reply
1 point

it is indeed infrequent, but the modern world has trained me to expect convenience and instant-ness. Last time i wanted a 12-year-old email I was in the car with friends and and to pull it up. it wasn’t anything important at all, to be clear, but i’m hoping to search my 12-year-old emails with the same convenience as last month’s.

permalink
report
parent
reply
4 points

So this is 100% a really common situation. I don’t know any of my friends that aren’t hovering at about 96% of their gmail capacity and don’t want to pay. In fact that’s me today. Hence I’ve been looking around at self hosted alternatives and had previously looked at extracting my emails from Google and loading them in from local storage into Thunderbird - However I was playing around with Yunohost today and randomly uncovered this page - https://yunohost.org/es/email_migration I’m not sure how relevant it is but points to potentially some approaches. I can’t vouch for them but I’d love to hear from anyone who has used imapsync or larch

permalink
report
reply
3 points

As others suggested you don’t need all your historic mail on your mailserver. My approach to email archival is the same as all my historic data — a disorganized dumping ground that’s like my personal data lake, and separate service(s) to crawl, index, and search it (e.g. https://www.recoll.org/)

permalink
report
reply
3 points

If you selfhost paperless-ngx, there are option to add email accounts and regularly import emails+their attachments like any other document. You can then have it delete imported mail from the mail server, or just move/mark it so you can deal with that manually.

It doesn’t currently support OAuth2 for providers like Microsoft, so you’ve gotta use App Passwords with Gmail for now, but there is a fix in the pipeline to add OAuth2 support soon. (there’s also other methods you can use to get that part working right now)

permalink
report
reply

Selfhosted

!selfhosted@lemmy.world

Create post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules:

  1. Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

Community stats

  • 5.2K

    Monthly active users

  • 3.5K

    Posts

  • 76K

    Comments