Wiki

Performance figures for the Postfix MTA

These are some (very) raw notes from some very ad-hoc performance testing done on some systems not representative of what we usually deploy. Why? Customers want to pump out email very quickly. They're quick to blame us when it's not fast, so we need to get an idea of what's reasonable/possible. This will help establish what the problems are (if any) and what can/should be done.

Method overview

In a nutshell, inject email as fast as possible into a queue, let postfix get them out as quick as possible. The idea would be to demonstrate that postfix isn't slow about dequeuing mail.

Hardware

Sender - YUMI

Receiver - YOSHINO

Method

  1. A single MX record for nekomimi-mo.de has been created, pointing to yoshino.meidokon.net as the MX with priority of 1.

  2. YOSHINO has been configured to accept mail for nekomimi-mo.de by adding an entry to virtusertable as is standard.

  3. A wildcard entry is used to pass everything to 'blackhole'.
  4. blackhole is defined in /etc/aliases to pass everything to /dev/null

  5. Inject messages into queue on YUMI as fast as possible with a short python script acting as an SMTP client. Time how long this takes. This is a full SMTP transaction, so the timing includes everything needed to ensure the message is queued. Repeat several times for consistent results.
    • Specifically, the script imports the necessary libraries, opens a connection to the local smtpd, injects a single mail, then quits.
    • This is run in a short loop in the shell. The number of iterations is easily variable this way.
    • The message data/metadata is hard-coded into the python script
    • This will incur the penalty of creating a new python interpreter instance for every single message.
    • for i in `seq 1 1000` do ./mailer.py ; echo -n "$i " ; done
  6. Tail the maillog and qshape output on YOSHINO, confirm receipt and final delivery of incoming messages. Time if this extends substantially past the sender-side injection.
  7. Create a 'blackhole' user on YOSHINO. The account is locked down, as it will only be used to receive mail.
  8. Remove the entry in /etc/aliases for blackhole. All mail will now be locally delivered.

  9. Rerun the message injection in the same manner as before. This time, it is expected to take a long time to flush mail on the receiving end.
  10. In addition to timing the injection, also watch how long it takes for YOSHINO to empty the queues once injection completes on YUMI.

mailer.py

# this creates a 3.1KiB plain text message at the recipient's end

import smtplib
import datetime

TIMESTAMP = datetime.datetime.now().ctime()

FROM = 'test.mail.sender@yumi.meidokon.net'
TO = 'blackhole@nekomimi-mo.de' 
SUBJECT = 'Test message from test mailer - run started at %s' % TIMESTAMP
HEADERS = ["Subject: "+SUBJECT, "From: "+FROM, "To: "+TO]
MESSAGE_BODY = '''This is a test message.
We're putting in some content to bulk up the body a bit, kinda like spam.
This is legit though, we want to see how fast postfix can send to a willing
recipient. Now to bulk out the body a bit...

The smtplib module defines an SMTP client session object that can be used to
send mail to any Internet machine with an SMTP or ESMTP listener daemon. For 
details of SMTP and ESMTP operation, consult RFC 821 (Simple Mail Transfer 
Protocol) and RFC 1869 (SMTP Service Extensions).

class SMTP(     [host[, port[, local_hostname]]])
    A SMTP instance encapsulates an SMTP connection. It has methods that 
support a full repertoire of SMTP and ESMTP operations. If the optional host 
and port parameters are given, the SMTP connect() method is called with those 
parameters during initialization. An SMTPConnectError is raised if the 
specified host doesn't respond correctly.

    For normal use, you should only require the initialization/connect, 
sendmail(), and quit() methods. An example is included below. 

A nice selection of exceptions is defined as well:

exception SMTPException
    Base exception class for all exceptions raised by this module. 

exception SMTPServerDisconnected
    This exception is raised when the server unexpectedly disconnects, or 
when an attempt is made to use the SMTP instance before connecting it to a server. 

exception SMTPResponseException
    Base class for all exceptions that include an SMTP error code. These exceptions 
are generated in some instances when the SMTP server returns an error code. The error 
code is stored in the smtp_code attribute of the error, and the smtp_error attribute 
is set to the error message. 

exception SMTPSenderRefused
    Sender address refused. In addition to the attributes set by on all 
SMTPResponseException exceptions, this sets `sender' to the string that the SMTP 
server refused. 

exception SMTPRecipientsRefused
    All recipient addresses refused. The errors for each recipient are accessible 
through the attribute recipients, which is a dictionary of exactly the same sort 
as SMTP.sendmail() returns. 

exception SMTPDataError
    The SMTP server refused to accept the message data. 

exception SMTPConnectError
    Error occurred during establishment of a connection with the server. 



Thanks for supporting us! Standby for our next e-blast campaign!'''

MESSAGE = '\n'.join(HEADERS) + '\n\n' + MESSAGE_BODY


mailer = smtplib.SMTP()
mailer.connect() # localhost:25 is the default
mailer.sendmail(FROM, TO, MESSAGE)

attachment.py

# this creates a 131KiB multipart message at the recipient's end

import smtplib
import datetime

from email.MIMEImage import MIMEImage
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText

filename_to_attach = 'Kyoukai01.jpg'
TIMESTAMP = datetime.datetime.now().ctime()
FROM = 'test.mail.sender@yumi.meidokon.net'
TO = 'blackhole@nekomimi-mo.de'
SUBJECT = 'Test message from test mailer - run started at %s' % TIMESTAMP


# setup the message
mail = MIMEMultipart()
mail['Subject'] = 'A test message with attached jpeg image file'
mail['From'] = FROM
mail['To'] = TO
txt = MIMEText("Please find attached the jpeg file")
mail.attach(txt)

# attach the file
img = MIMEImage(open(filename_to_attach, 'rb').read())
mail.attach(img)

# send it
mailer = smtplib.SMTP()
mailer.connect() # localhost:25 is the default
mailer.sendmail(FROM, TO, mail.as_string())
mailer.close()

Results

1000 messages, 3.1KiB plain text, being blackholed to /dev/null

1000 messages, 3.1KiB plain text, being locally delivered

-

injection time

max observed queue at recv

+time to flush

Run 1

32.5

722

30sec

Run 2

32.3

709

27sec

1000 messages, 132KiB multipart, text+97KiB jpeg, being locally delivered

1000 messages, 132KiB multipart, text+97KiB jpeg, being locally delivered

-

injection time

max observed queue at recv

+time to flush

Run 1

60.5

677

54sec

Run 2

60.2

722

51sec

Run 3

60.8

747

47sec

1000 message, two parallel invocations of 500, 132KiB multipart, text+97KiB jpeg, being locally delivered

-

injection time

max observed queue at recv

+time to flush

Run 1

31.3

852

55sec

Large-scale tests

We're concerned with:

  1. How fast we can inject mail into the queue
  2. How fast mail can be flushed from the queue

Other concerns:

Injection

(same method as before) 500,000 messages injected into queue serially with shell-script loop. 556m 30.291s on YUMI == about 10hrs, or 15/sec, or 898/min. Mailgraph on YOSHINO shows it was keeping up with blackholing them the entire time.

Need to test next:

Flushing

There's now ~40,000 messages in holding. default_destination_concurrency_limit has been set to 100, to match the maximum number of recipient smtpd processes YOSHINO will start by default. This seems to be a fair number. Delivery should be instant, but just in case, in_flow_delay has been changed from 1 to 0 on YOSHINO, so no artificial delay will be in effect. We'll unHOLD the mail, initiate a queue-flush, then see how long until the queue is empty.

started a flush at 03:24 2008-07-19 Approximately half out of the queue 4min later, equating to 300,000/hr. YOSHINO is struggling, noting "connection concurrency limit exceeded: 51" warnings in mail log. Must be hitting the limit specified by smtpd_client_connection_count_limit. At this point, something might have kicked in on YOSHINO, as maillog shows YUMI has getting turned away with deferrals. Decided to let YOSHINO clear the queue to /dev/null, then continue. YUMI has 21,948 messages in deferred queue.

queue empty 20min later.

Start flush again at 03:45... YOSHINO is running a massive number of cleanup processes to deal with all the mail, looks like it's being overwhelmed by the sheer concurrency. Not a good test for queue flushing... emptied by 03:58

Setup a new IP and DNS for blackhole.nekomimi-mo.de, can pump mail to smtp-sink. Can't get as much data, but performance monitoring will show when network activity starts/stops. With smtp-sink running and no HOLDing, I can inject 5000 in 77sec, or 65/sec, an interesting increase on the previous figures. Suspect this is due to not needing to be stored to disk, but simply passed straight to the SMTP client process.

Will try 4 concurrent processes pushing 250,000 mails each, all will be headed for smtp-sink. Started at 04:31. Load was creeping quite high on YUMI, suspect too much activity to work efficiently. Killed one injector at 21,722 messages, will let the rest continue. YOSHINO keeping up without problem running smtp-sink, as expected. Solidly 50 smtp connections open.

Logging suggests DNS lookups are a significant performance factor. Even for these emails with a single destination, postfix is doing some 66 DNS lookups every second. Based on Russell Coker's testing, a local DNS cache/proxy may also present a non-trivial load. Will just have to find out!

[root@yumi smtp-mass-mailer]# tail -f /var/log/maillog  | grep statis
Jul 19 04:42:13 yumi postfix/scache[3507]: statistics: start interval Jul 19 04:32:13
Jul 19 04:42:13 yumi postfix/scache[3507]: statistics: domain lookup hits=41287 miss=600 success=98%
Jul 19 04:42:13 yumi postfix/scache[3507]: statistics: address lookup hits=7 miss=593 success=1%
Jul 19 04:42:13 yumi postfix/scache[3507]: statistics: max simultaneous domains=1 addresses=1 connection=50
Jul 19 04:52:13 yumi postfix/scache[3507]: statistics: start interval Jul 19 04:42:13
Jul 19 04:52:13 yumi postfix/scache[3507]: statistics: domain lookup hits=38132 miss=501 success=98%
Jul 19 04:52:13 yumi postfix/scache[3507]: statistics: address lookup hits=0 miss=501 success=0%
Jul 19 04:52:13 yumi postfix/scache[3507]: statistics: max simultaneous domains=1 addresses=1 connection=50
Jul 19 05:02:13 yumi postfix/scache[3507]: statistics: start interval Jul 19 04:52:13
Jul 19 05:02:13 yumi postfix/scache[3507]: statistics: domain lookup hits=40538 miss=432 success=98%
Jul 19 05:02:13 yumi postfix/scache[3507]: statistics: address lookup hits=1 miss=431 success=0%
Jul 19 05:02:13 yumi postfix/scache[3507]: statistics: max simultaneous domains=1 addresses=1 connection=50

Total of 21722+(250000*3) messages injected (771,722). Injectors were in lock-step the entire way through.

Cacti data shows a lot of things working in concert. All these factors were elevated throughout the run, and in most cases started high and gradually rose as things progressed.

Unaffected

771,722 messages in 172min == 4486/min == ~75/sec == 269,200/hr. Enough to run the alleged entirety of the Shinohara Heavy Industries list in under an hour.

It was indeed noted that the speed apparently rose as the queue emptied, but if things run reasonably, this is totally a non-factor. Would be nice to run mailgraph on YUMI to get some direct data on how throughput changes over time.

Improvements:

PowerMTA comparison:

yoshino:~# smtp-sink -u nobody -c 202.4.232.113:25 50
sess=6396 quit=2156 mesg=771725


Wiki: dedicated/Postfix_MTA_performance_testing (last edited 2009-12-07 11:31:26 by BarneyDesmond)