perdition processes (forks) get stuck in endless loop

List overview All Threads
Download

newer

older

Timeout for "connect" state...

About perdition_1.17-7etch1.diff.gz

Ariel Biener

7 Jan 2009 7 Jan '09

9:37 p.m.

Attachments:

attachment.html (text/html — 3.3 KB)

Show replies by date

Janne Pikkarainen

8 Jan 8 Jan

8:22 a.m.

New subject: perdition processes (forks) get stuck in endless loop

Hello, My guess is that kernel is running out of available entropy (possibly during ssl sessions?). You may want to keep an eye on /proc/sys/kernel/random/entropy_avail. If it's at or near 0, your server needs to generate random numbers faster somehow. Possible solutions: 1) Make sure Perdition uses /dev/urandom for random number generation instead of /dev/random. or 2) Use rngd (from rng-tools package), it sucks numbers from /dev/urandom and feeds them kernel to satisfy its entropy needs. Best regards, Janne Pikkarainen On Wed, 2009-01-07 at 23:37 +0200, Ariel Biener wrote:

...

We have configured two perdition servers (as front ends to 4 dovecot), using eDirectory as an LDAP backend, with anonymous queries. The perdition servers are load balanced using round robin (I also used least conns for a while) via a 6513 3BXL switch, using the embedded load balancer in the IOS, and not the dedicated blade. The hosts are two HP DL140 machines. The clients connect with either imap/imaps/pop3/pop3s, and we connect to the backend servers with either imap or pop3. The conf is below. On the LDAP side, everything is properly indexed. connection_logging connect_relog 0 F mail g nobody imap_capability IMAP4rev1 SASL-IR SORT THREAD=REFERENCES MULTIAPPEND UNSELECT LITERAL+ IDLE CHILDREN NAMESPACE LOGIN-REFERRALS STARTTLS map_library /usr/lib64/libperditiondb_ldap.so map_library_opt "ldap://ldapserver:389/o=someorg?cn,nSCPAmailHost?sub?(& (uid=% 25s)(objectClass=nSCPMailRecipient)(! (nSCPAmailMessageStore=inactive*)))" server_resp_line outgoing_server imapold.somedomain S all timeout 0 u nobody ssl_ca_accept_self_signed ssl_cert_file /etc/perdition/perdition.crt.pem ssl_cert_accept_self_signed ssl_cert_accept_expired ssl_cert_accept_not_yet_valid ssl_key_file /etc/perdition/perdition.key.pem ssl_no_cert_verify ssl_no_cn_verify Every 10-15 minutes on the average, one of the perdition client processes (a fork from one of the 4 listeners - imap/imaps/pop3/pop3s) enters a loop (easily seen both with strace and while in that loop, the CPU the process is running on is at 100% usage. For now, I've written a small health check monitor that checks for these runaway processes, and kills them. While I cannot run perdition in full debug mode to check what is happening (due to the load of connections we get here), I can share the details I have, from both the logs and ltrace/strace. The logs show Re-Authentication failure for each of these sessions.... The ltrace of the looping process looks like this: select(1024, 0x7fffed4aa940, 0, 0x7fffed4aa9c0, 0x7fffed4aa8a0) = 0 time(NULL) = 1231363480 vanessa_list_get_element(0x83a9ef0, 0x7fffed4aa644, 0x7fffed4aa690, 5, 0x7fffed4aa8a0) = 0x8397aa0 SSL_pending(0x8397250, 0x7fffed4aa644, 0x8397aa0, 5, 0x7fffed4aa8a0) = 0 select(1024, 0x7fffed4aa940, 0, 0x7fffed4aa9c0, 0x7fffed4aa8a0) = 0 time(NULL) = 1231363480 vanessa_list_get_element(0x83a9ef0, 0x7fffed4aa644, 0x7fffed4aa690, 5, 0x7fffed4aa8a0) = 0x8397aa0 SSL_pending(0x8397250, 0x7fffed4aa644, 0x8397aa0, 5, 0x7fffed4aa8a0) = 0 select(1024, 0x7fffed4aa940, 0, 0x7fffed4aa9c0, 0x7fffed4aa8a0) = 0 time(NULL) = 1231363480 vanessa_list_get_element(0x83a9ef0, 0x7fffed4aa644, 0x7fffed4aa690, 5, 0x7fffed4aa8a0) = 0x8397aa0 SSL_pending(0x8397250, 0x7fffed4aa644, 0x8397aa0, 5, 0x7fffed4aa8a0) = 0 select(1024, 0x7fffed4aa940, 0, 0x7fffed4aa9c0, 0x7fffed4aa8a0) = 0 time(NULL) = 1231363480 The strace looks like this: select(1024, [5], NULL, [5], {0, 0}) = 0 (Timeout) select(1024, [5], NULL, [5], {0, 0}) = 0 (Timeout) ...... Does any of you have an idea about what may be wrong ? best. --Ariel -- Ariel Biener, CISO Tel-Aviv University CIT div. e-mail: ariel(a)aristo.tau.ac.il phone: 03-6406086 PGP key: http://www.tau.ac.il/~ariel/pgp.html ______________________________________________ Perdition-users mailing list Perdition-users(a)vergenet.net http://lists.vergenet.net/listinfo/perdition-users

5588

days inactive

5589

days old

perdition-users@vergenet.net

Manage subscription

1 comments

2 participants

tags (0)

participants (2)

Ariel Biener
Janne Pikkarainen