Linux Mail Storage Formats: Mbox vs MailDir (Page 1 of 1)
Written by
Steve Lake
Posted on: Jul 17, 2009 at 12:03pm
Section:
Tutorials
Printer Friendly Version
Legacy URL

The subject of standard Unix and Linux mail file formats came up as part of a recent article I was writing and I decided that it was something that needed to be addressed to avoid confusion when talking about it in the future.
The two primary formats for mail storage in Linux and Unix are Mbox and MailDir. There are several other formats out there as well, including MMDF (a close cousin to Mbox), MH (similar to MailDir, but used more in IMAP mail senarios), and BABYL. Other than Mbox and MailDir, most other formats are either deprecated, obsolete, or in very limited use. Hence most end users won't encounter said formats and thus don't need to worry about them. Therefore, we'll only concentrate today on the primary two formats.
Mbox
To get started, we'll look at the Mbox format. Mbox is one of those old Unix standards that refuses to die, and for good reasons, as I for one feel it's the better format. And it's not some rusty old relic of days gone by, it's still a format that's in constant development. I believe the current version is 7.0 which incorporates even more new mail features available on the web, as well as improved support for a variety of security systems.
The Mbox stores all your mail in a single file. Not one mega-mondo huge file with all your eggs in one basket. Instead, each folder within your mail program is one file stored in the root of your "Mail" folder in your user directory. Since all messages in a folder need to be separated by something to determine where one ends and the next begins, Mbox goes for simplicity once again by simply using a single line that looks something like this:
From user@domain.com Wed Jul 15 16:16:37 2009 -0400
That's it. Just a single "from" line to separate the messages. It always appears at the top and is almost always followed by either a date entry, X-modifiers (x-spam, x-envelope, etc), Return-Path, or other message specific header information that provides tracking and logging information for users and admin alike. These aren't things that are specific to Mbox itself, but rather the Pop3 standard. However, Mbox will apply them in the either the order they were received, or in a logical fashion by the mail client or server which is writing to the file.
Mbox is also a text based storage system. Since Pop3 and other email systems must use ascii when sending and receiving data (due to character limitations in mail protocol standards), Mbox stores everything exactly as it's received, leaving it up to the specific mail program to decode attachments, links and other such things.
Mbox is also a format that's used in a lot of mainstream mail programs on a wide variety of platforms, and not just Linux or Unix. For example, Eudora (Windows, MacOS) uses Mbox format for storing its mail, as does the ever popular Mozilla Thunderbird (Windows, Linux, MacOS) and numerous others.
Mbox is a very portable format as well. Most mail programs can easily read and handle Mbox standard mail files generated by other mail programs without having to import or convert them. So if you want to move between machines, between OS's, or even between mail clients, just pick up the files in one place, plop them down in another, and you're done.
This is also a good mail format for POP3 servers, which use the "download and store locally" system as opposed to IMAP which stores all mail online. Since downloading mail via POP3 is an all or nothing proposition, a format that allows the server to quickly and easily store and dump mail to the system is important. This is a format that's used by most POP3 and SMTP servers, including Sendmail and Qpopper.
MailDir
MailDir is one of those formats that make you go "What were they thinking!?" I understand the logic behind doing it, and while I'm highly partial to Mbox, MailDir does have it's advantages. But before I get into those, how about we look at what MailDir is.
MailDir format uses something equivalent to a "bucket of marbles" approach, where mail is stored in much the same way marbles would be stored in a bucket. IE, each message is stored individually in its own file. So instead of having one large mail file for each folder in your mail program, each folder is literally a physical folder on your system and each message is stored in its own individual file within that folder on the drive.
This design has it's advantages, even if it's not the "preferred" format, as there are things that MailDir can do that Mbox can't, or if it can, it can't do them easily. The first advantage is that, messages are kept isolated from each other since they're each in their own little file. Think of it as a "timeout" or separate "playpen" for each of your messages.
The key advantage here is, if one file is corrupt or causing issues, it won't mess up any others and can easily be removed from the system without a lot of file hacking. Individual messages can also be isolated if need be, or even encrypted for added security, where as with Mbox you have an all or nothing proposition, as individual encryption can get messy. It can be done, but again, it gets messy. So for MailDir, having each message as a separate file is an advantage in this respect.
The biggest disadvantage of MailDir is that it sucks up hard drive space like candy. This is especially important as cluster sizes on hard drives grow. At one point in time, cluster sizes were small, around anywhere from 512 bytes to 2kb. Thus having one file consume a cluster wasn't that big a deal since the chances of loosing space wasn't that great, as most messages rarely went over 2kb, so disk overhead wasn't that big.
Today however, cluster sizes have grown to 64kb or more. Each file consumes one whole cluster, and will only spill over into a second if it exceeds the cluster maximum size. Most mail messages don't. So if you have 20 files, each 1kb in size, and you're using even something as small as a 16kb cluster, you're still loosing an enormous amount of space. Upwards to 300kb. That may not seem like a lot, but the bigger your directory grows, the more you loose. In time with, oh say 2000 messages, you could easily start loosing in the hundreds of megs of lost space.
MailDir also works very well on IMAP servers over Mbox. I have no idea why, as I'm no fan of IMAP, but given that IMAP is a directory based online mail system vs POP3 which is a download and store locally technology, it's understandable that a system that automatically partitions stuff into individual folders and messages would make searching for individual messages and retrieving them much, much faster. Qmail is famous for using this format, although it's not the only format Qmail supports, it does seem to be the preferred one.
Conclusion
Overall, both are good formats and both have their individual advantages and disadvantages, many, many more than I listed. Both are easy to store too, as they both compact down to about the same size when compressed, and both store data in text/ascii format. No one format will solve every need of every person, but these two definitely fit the needs of a lot of people and a lot of servers as well.
Most people in their entire user experience will likely never have to bother with either of these formats, why they're there, or much of anything else. But just because most people won't ever interact with these file formats doesn't mean that you won't. As is an old saying in the tech world, it's better to know too much than too little. 
==========================================================
EDIT: Wow, here's a surprise. From the feedback I've been getting on this article, it appears that the subject of mailbox formats is as hot a topic in the Linux world as text editors is. IE, something akin to the battles over VI vs Emacs vs Pico, etc. heh. Never would have expected that. 
Also, for those who have pointed out the faults in Mbox, apparently several rather severe ones it would seem (none of which I've ever encountered mind you), it appears that of the two formats, MailDir is actually a significantly better format, despite the drive space issues. Although even it may be surpassed by other, even lesser used formats, in quality and reliability, meaning that the only advantage of Mbox anymore is the fact that it's universally supported.
|
Average vistor rating: 3.0 out of 5 (20 total votes) | |
|
Latest Articles

Upcoming Shows and Cons

Announcements
 There are no current announcements.
How often do you change distros?

Latest Releases (courtesy of Distrowatch)

More
|