Hello,
My guess is that kernel is running out of available entropy (possibly
during ssl sessions?). You may want to keep an eye
on /proc/sys/kernel/random/entropy_avail. If it's at or near 0, your
server needs to generate random numbers faster somehow.
Possible solutions:
1) Make sure Perdition uses /dev/urandom for random number generation
instead of /dev/random.
or
2) Use rngd (from rng-tools package), it sucks numbers from /dev/urandom
and feeds them kernel to satisfy its entropy needs.
Best regards,
Janne Pikkarainen
On Wed, 2009-01-07 at 23:37 +0200, Ariel Biener wrote:
We have configured two perdition servers (as
front ends to 4
dovecot), using eDirectory as an LDAP backend, with anonymous queries.
The perdition servers are load balanced using round robin (I also used
least conns for a while) via a 6513 3BXL switch, using the embedded
load balancer in the IOS, and not the dedicated blade. The hosts are
two HP DL140 machines. The clients connect with either
imap/imaps/pop3/pop3s, and we connect to the backend servers with
either imap or pop3. The conf is below. On the LDAP side, everything
is properly indexed. connection_logging connect_relog 0 F mail g
nobody imap_capability IMAP4rev1 SASL-IR SORT THREAD=REFERENCES
MULTIAPPEND UNSELECT LITERAL+ IDLE CHILDREN NAMESPACE LOGIN-REFERRALS
STARTTLS map_library /usr/lib64/libperditiondb_ldap.so map_library_opt
"ldap://ldapserver:389/o=someorg?cn,nSCPAmailHost?sub?(& (uid=%
25s)(objectClass=nSCPMailRecipient)(!
(nSCPAmailMessageStore=inactive*)))" server_resp_line outgoing_server
imapold.somedomain S all timeout 0 u nobody ssl_ca_accept_self_signed
ssl_cert_file /etc/perdition/perdition.crt.pem
ssl_cert_accept_self_signed ssl_cert_accept_expired
ssl_cert_accept_not_yet_valid
ssl_key_file /etc/perdition/perdition.key.pem ssl_no_cert_verify
ssl_no_cn_verify Every 10-15 minutes on the average, one of the
perdition client processes (a fork from one of the 4 listeners -
imap/imaps/pop3/pop3s) enters a loop (easily seen both with strace and
while in that loop, the CPU the process is running on is at 100%
usage. For now, I've written a small health check monitor that
checks for these runaway processes, and kills them. While I cannot
run perdition in full debug mode to check what is happening (due to
the load of connections we get here), I can share the details I have,
from both the logs and ltrace/strace.
The logs show Re-Authentication failure for each of these
sessions.... The ltrace of the looping process looks like this:
select(1024, 0x7fffed4aa940, 0, 0x7fffed4aa9c0, 0x7fffed4aa8a0) = 0
time(NULL) = 1231363480 vanessa_list_get_element(0x83a9ef0,
0x7fffed4aa644, 0x7fffed4aa690, 5, 0x7fffed4aa8a0) = 0x8397aa0
SSL_pending(0x8397250, 0x7fffed4aa644, 0x8397aa0, 5, 0x7fffed4aa8a0) =
0 select(1024, 0x7fffed4aa940, 0, 0x7fffed4aa9c0, 0x7fffed4aa8a0) = 0
time(NULL) = 1231363480 vanessa_list_get_element(0x83a9ef0,
0x7fffed4aa644, 0x7fffed4aa690, 5, 0x7fffed4aa8a0) = 0x8397aa0
SSL_pending(0x8397250, 0x7fffed4aa644, 0x8397aa0, 5, 0x7fffed4aa8a0) =
0 select(1024, 0x7fffed4aa940, 0, 0x7fffed4aa9c0, 0x7fffed4aa8a0) = 0
time(NULL) = 1231363480 vanessa_list_get_element(0x83a9ef0,
0x7fffed4aa644, 0x7fffed4aa690, 5, 0x7fffed4aa8a0) = 0x8397aa0
SSL_pending(0x8397250, 0x7fffed4aa644, 0x8397aa0, 5, 0x7fffed4aa8a0) =
0 select(1024, 0x7fffed4aa940, 0, 0x7fffed4aa9c0, 0x7fffed4aa8a0) = 0
time(NULL) = 1231363480 The strace looks like this: select(1024, [5],
NULL, [5], {0, 0}) = 0 (Timeout) select(1024, [5], NULL, [5], {0, 0})
= 0 (Timeout) ...... Does any of you have an idea about what may be
wrong ? best. --Ariel -- Ariel Biener, CISO Tel-Aviv University CIT
div. e-mail: ariel(a)aristo.tau.ac.il phone: 03-6406086 PGP key:
http://www.tau.ac.il/~ariel/pgp.html
______________________________________________
Perdition-users mailing list
Perdition-users(a)vergenet.net
http://lists.vergenet.net/listinfo/perdition-users