posftix队列健康状况分析(二)-摩杜云开发者社区

The "hold" queue

The administrator can define "smtpd" access(5) policies, orcleanup(8) header/body checks that cause messages to be automaticallydiverted from normal processing and placed indefinitely in the"hold" queue. Messages placed in the "hold" queue stay there untilthe administrator intervenes. No periodic delivery attempts aremade for messages in the "hold" queue. The postsuper(1) commandcan be used to manually release messages into the "deferred" queue.

Messages can potentially stay in the "hold" queue longer than$maximal_queue_lifetime. If such "old" messages need to be released fromthe "hold" queue, they should typically be moved into the "maildrop"queue using "postsuper -r", so that the message gets a new timestamp andis given more than one opportunity to be delivered. Messages that are"young" can be moved directly into the "deferred" queue using"postsuper -H".

The "hold" queue plays little role in Postfix performance, andmonitoring of the "hold" queue is typically more closely motivatedby tracking spam and malware, than by performance issues.

The "incoming" queue

All new mail entering the Postfix queue is written by thecleanup(8) service into the "incoming" queue. New queue files arecreated owned by the "postfix" user with an access bitmask (ormode) of 0600. Once a queue file is ready for further processingthe cleanup(8) service changes the queue file mode to 0700 andnotifies the queue manager of new mail arrival. The queue managerignores incomplete queue files whose mode is 0600, as these arestill being written by cleanup.

The queue manager scans the incoming queue bringing any newmail into the "active" queue if the active queue resource limitshave not been exceeded. By default, the active queue accommodatesat most 20000 messages. Once the active queue message limit isreached, the queue manager stops scanning the incoming (and deferred,see below) queue.

Under normal conditions the incoming queue is nearly empty (hasonly mode 0600 files), with the queue manager able to import newmessages into the active queue as soon as they become available.

The incoming queue grows when the message input rate spikesabove the rate at which the queue manager can import messages intothe active queue. The main factors slowing down the queue managerare disk I/O and lookup queries to the trivial-rewrite service. If the queuemanager is routinely not keeping up, consider not using "slow"lookup services (MySQL, LDAP, ...) for transport lookups or speedingup the hosts that provide the lookup service. If the problem is I/Ostarvation, consider striping the queue over more disks, faster controllerswith a battery write cache, or other hardware improvements. At the veryleast, make sure that the queue directory is mounted with the "noatime"option if applicable to the underlying filesystem.

The in_flow_delay parameter is used to clamp the input ratewhen the queue manager starts to fall behind. The cleanup(8) servicewill pause for $in_flow_delay seconds before creating a new queuefile if it cannot obtain a "token" from the queue manager.

Since the number of cleanup(8) processes is limited in mostcases by the SMTP server concurrency, the input rate can exceedthe output rate by at most "SMTP connection count" / $in_flow_delaymessages per second.

With a default process limit of 100, and an in_flow_delay of1s, the coupling is strong enough to limit a single run-away injectorto 1 message per second, but is not strong enough to deflect anexcessive input rate from many sources at the same time.

If a server is being hammered from multiple directions, considerraising the in_flow_delay to 10 seconds, but only if the incomingqueue is growing even while the active queue is not full and thetrivial-rewrite service is using a fast transport lookup mechanism.

The "active" queue

The queue manager is a delivery agent scheduler; it works toensure fast and fair delivery of mail to all destinations withindesignated resource limits.

The active queue is somewhat analogous to an operating system'sprocess run queue. Messages in the active queue are ready to besent (runnable), but are not necessarily in the process of beingsent (running).

While most Postfix administrators think of the "active" queueas a directory on disk, the real "active" queue is a set of datastructures in the memory of the queue manager process.

Messages in the "maildrop", "hold", "incoming" and "deferred"queues (see below) do not occupy memory; they are safely stored ondisk waiting for their turn to be processed. The envelope informationfor messages in the "active" queue is managed in memory, allowingthe queue manager to do global scheduling, allocating availabledelivery agent processes to an appropriate message in the activequeue.

Within the active queue, (multi-recipient) messages are brokenup into groups of recipients that share the same transport/nexthopcombination; the group size is capped by the transport's recipientconcurrency limit.

destination concurrency limit for the transports caps the numberof simultaneous delivery attempts for each nexthop. Transports witha recipient

Congestion occurs in the active queue when one or more destinationsdrain slower than the corresponding message input rate.

Input into the active queue comes both from new mail in the "incoming"queue, and retries of mail in the "deferred" queue. Should the "deferred"queue get really large, retries of old mail can dominate the arrivalrate of new mail. Systems with more CPU, faster disks and more networkbandwidth can deal with larger deferred queues, but as a rule of thumbthe deferred queue scales to somewhere between 100,000 and 1,000,000messages with good performance unlikely above that "limit". Systems withqueues this large should typically stop accepting new mail, or put thebacklog "on hold" until the underlying issue is fixed (provided thatthere is enough capacity to handle just the new mail).

When a destination is down for some time, the queue manager willmark it dead, and immediately defer all mail for the destination withouttrying to assign it to a delivery agent. In this case the messageswill quickly leave the active queue and end up in the deferred queue(with Postfix < 2.4, this is done directly by the queue manager,with Postfix ≥ 2.4 this is done via the "retry" delivery agent).

When the destination is instead simply slow, or there is a problemcausing an excessive arrival rate the active queue will grow and willbecome dominated by mail to the congested destination.

The only way to reduce congestion is to either reduce the inputrate or increase the throughput. Increasing the throughput requireseither increasing the concurrency or reducing the latency ofdeliveries.

For high volume sites a key tuning parameter is the number of"smtp" delivery agents allocated to the "smtp" and "relay" transports.High volume sites tend to send to many different destinations, manyof which may be down or slow, so a good fraction of the availabledelivery agents will be blocked waiting for slow sites. Also maildestined across the globe will incur large SMTP command-responselatencies, so high message throughput can only be achieved withmore concurrent delivery agents.

The default "smtp" process limit of 100 is good enough for mostsites, and may even need to be lowered for sites with low bandwidthconnections (no use increasing concurrency once the network pipeis full). When one finds that the queue is growing on an "idle"system (CPU, disk I/O and network not exhausted) the remainingreason for congestion is insufficient concurrency in the face ofa high average latency. If the number of outbound SMTP connections(either ESTABLISHED or SYN_SENT) reaches the process limit, mailis draining slowly and the system and network are not loaded, raisethe "smtp" and/or "relay" process limits!

When a high volume destination is served by multiple MX hosts withtypically low delivery latency, performance can suffer dramatically whenone of the MX hosts is unresponsive and SMTP connections to that hosttimeout. For example, if there are 2 equal weight MX hosts, the SMTPconnection timeout is 30 seconds and one of the MX hosts is down, theaverage SMTP connection will take approximately 15 seconds to complete.With a default per-destination concurrency limit of 20 connections,throughput falls to just over 1 message per second.

The best way to avoid bottlenecks when one or more MX hosts isnon-responsive is to use connection caching. Connection caching wasintroduced with Postfix 2.2 and is by default enabled on demand fordestinations with a backlog of mail in the active queue. When connectioncaching is in effect for a particular destination, established connectionsare re-used to send additional messages, this reduces the number ofconnections made per message delivery and maintains good throughput evenin the face of partial unavailability of the destination's MX hosts.

If connection caching is not available (Postfix < 2.2) or doesnot provide a sufficient latency reduction, especially for the "relay"transport used to forward mail to "your own" domains, consider settinglower than default SMTP connection timeouts (1-5 seconds) and higherthan default destination concurrency limits. This will further reducelatency and provide more concurrency to maintain throughput shouldlatency rise.

Setting high concurrency limits to domains that are not your own maybe viewed as hostile by the receiving system, and steps may be takento prevent you from monopolizing the destination system's resources.The defensive measures may substantially reduce your throughput or blockaccess entirely. Do not set aggressive concurrency limits to remotedomains without coordinating with the administrators of the targetdomain.

If necessary, dedicate and tune custom transports for selected highvolume destinations. The "relay" transport is provided for forwarding mailto domains for which your server is a primary or backup MX host. These canmake up a substantial fraction of your email traffic. Use the "relay" andnot the "smtp" transport to send email to these domains. Using the "relay"transport allocates a separate delivery agent pool to these destinationsand allows separate tuning of timeouts and concurrency limits.

Another common cause of congestion is unwarranted flushing of theentire deferred queue. The deferred queue holds messages that are likelyto fail to be delivered and are also likely to be slow to fail delivery(time out). As a result the most common reaction to a large deferred queue(flush it!) is more than likely counter-productive, and typically makesthe congestion worse. Do not flush the deferred queue unless you expectthat most of its content has recently become deliverable (e.g. relayhostback up after an outage)!

Note that whenever the queue manager is restarted, there mayalready be messages in the active queue directory, but the "real"active queue in memory is empty. In order to recover the in-memorystate, the queue manager moves all the active queue messagesback into the incoming queue, and then uses its normal incomingqueue scan to refill the active queue. The process of moving allthe messages back and forth, redoing transport table (trivial-rewrite(8)resolve service) lookups, and re-importing the messages back intomemory is expensive. At all costs, avoid frequent restarts of thequeue manager (e.g. via frequent execution of "postfix reload").

The "deferred" queue

When all the deliverable recipients for a message are delivered,and for some recipients delivery failed for a transient reason (itmight succeed later), the message is placed in the deferred queue.

The queue manager scans the deferred queue periodically. The scaninterval is controlled by the queue_run_delay parameter. While a deferredqueue scan is in progress, if an incoming queue scan is also in progress(ideally these are brief since the incoming queue should be short), thequeue manager alternates between looking for messages in the "incoming"queue and in the "deferred" queue. This "round-robin" strategy preventsstarvation of either the incoming or the deferred queues.

Each deferred queue scan only brings a fraction of the deferredqueue back into the active queue for a retry. This is because eachmessage in the deferred queue is assigned a "cool-off" time whenit is deferred. This is done by time-warping the modificationtime of the queue file into the future. The queue file is noteligible for a retry if its modification time is not yet reached.

The "cool-off" time is at least $minimal_backoff_time and atmost $maximal_backoff_time. The next retry time is set by doublingthe message's age in the queue, and adjusting up or down to liewithin the limits. This means that young messages are initiallyretried more often than old messages.

If a high volume site routinely has large deferred queues, itmay be useful to adjust the queue_run_delay, minimal_backoff_time andmaximal_backoff_time to provide short enough delays on first failure(Postfix ≥ 2.4 has a sensibly low minimal backoff time by default),with perhaps longer delays after multiple failures, to reduce theretransmission rate of old messages and thereby reduce the quantityof previously deferred mail in the active queue. If you want a reallylow minimal_backoff_time, you may also want to lower queue_run_delay,but understand that more frequent scans will increase the demand fordisk I/O.

One common cause of large deferred queues is failure to validaterecipients at the SMTP input stage. Since spammers routinely launchdictionary attacks from unrepliable sender addresses, the bouncesfor invalid recipient addresses clog the deferred queue (and at highvolumes proportionally clog the active queue). Recipient validationis strongly recommended through use of the local_recipient_maps andrelay_recipient_maps parameters. Even when bounces drain quickly theyinundate innocent victims of forgery with unwanted email. To avoidthis, do not accept mail for invalid recipients.

When a host with lots of deferred mail is down for some time,it is possible for the entire deferred queue to reach its retrytime simultaneously. This can lead to a very full active queue oncethe host comes back up. The phenomenon can repeat approximatelyevery maximal_backoff_time seconds if the messages are again deferredafter a brief burst of congestion. Perhaps, a future Postfix releasewill add a random offset to the retry time (or use a combinationof strategies) to reduce the odds of repeated complete deferredqueue flushes.

Credits

The qshape(1) program was developed by Victor Duchovni of MorganStanley, who also wrote the initial version of this document.