Hi Simon,
Am 08.12.2011 um 01:42 schrieb Simon Horman:
the problem
still exists in 1.19-rc4. However, we found out:
If you add the capability flag "STLS" to option pop3_capability the
described behavior occours
Thanks David,
that is very valuable information.
I will try and reproduce the problem and track down the cause.
I was investigating the problem with David together. I just want to share what we found
out. Here's the complete story:
In summer a customer called, he could not connect to our POP3 service since upgrade to Mac
OS X Lion. I checked on a Lion Mac, and that's right, the connection was droped
because of "wrong capabilities". Our production system is a Debian Lenny using
Debian's version label 1.17.1-2+lenny.
In Perdition 1.17.1 pop_capability and imap_capabilitiy cannot set to different values
(however, as far as I found out), so that the POP3 proxy posted all IMAP capabilities to
the client, and because they are seperated by one space, the client "sees" them
as on capability in one line. That's what Mail in Mac OS X 10.7 Lion doesn't
like.
So we did different configs for IMAP and POP3, perdition makes this easy: I created a copy
of the original config fine and renamed it to perdtion.pop3.conf and a symbolic link
perdition.pop3s.conf to itself.
In the pop3-version of the config file I set the correct capabilities for our POP3
server:
pop_capability STLS EXPIRE NEVER LOGIN-DELAY 0 TOP UIDL PIPELINING RESP-CODES
AUTH-RESP-CODE USER IMPLEMENTATION Cyrus POP3 server
this is what our POP3 server backend shouts out on CAPA.
So, we restarted perdition, lion clients could do a fine POP3 again, everything seems to
be fine now...
Months later our perditon proxy server run out of resources, and we had no idea why... a
ps ax gave thousends of hanging "Re-Authentication failed" perdtion pop3s
processes. A pkill -f "perdition.pop3s: auth failed:" helps ending them, but it
seems, every time someone put in a wrong password, the perdition process seem to hang
indefinitely in the process list.
As a workaround we did a cronjob doing the pkill once a day. However, what a dirty hack, I
know... At this time we did not know that the behavior has to do with the pop_capability
setting we did months ago.
Next up we saw a traceback in the console log every time that problem occours that looks
like so:
*** glibc detected *** perdition: auth failed: Re-Authentication Failure: double free or
corruption (out): 0x000000000068f270 ***
======= Backtrace: =========
/lib/libc.so.6[0x7fb954b339a8]
/lib/libc.so.6(cfree+0x76)[0x7fb954b35ab6]
perdition: auth failed: Re-Authentication Failure(protocol_capability+0x66)[0x4137c6]
perdition: auth failed: Re-Authentication Failure(pop3_capability+0x2d)[0x41242d]
perdition: auth failed: Re-Authentication Failure(main+0xa27)[0x40fbb7]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7fb954ade1a6]
perdition: auth failed: Re-Authentication Failure[0x406269]
Everytime this occours, the session keeps open and one of these perdition "Auth
failed" processes stays forever in some kind of a deadlock situation. Next time my
pkill comes, this process dies safely. And yes, it still listens to a simple TERM signal,
no KILL required.
Still not on the right track, we did a test on a new machine running Debian sqeeze and
perditon 1.19rc4. And I have to say: It still doesn't work, but the behavior is some
kind of different. So, there are no more running processes, but... POP3 proxy doesn't
work completely. When connected to an pop3 proxy (it doesn't count if ssl or not), you
see the greeting string, and the connection is closed immediately. On console log I see
the following traceback:
*** glibc detected *** perdition: connect (
aaa.bbb.ccc.ddd:59911->eee.ffff.ggg.hhh:995): double free or corruption (out):
0x0000000000da76b0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x71ad6)[0x7febf82caad6]
/lib/libc.so.6(cfree+0x6c)[0x7febf82cf84c]
perdition: connect (
aaa.bbb.ccc.ddd:59911->eee.ffff.ggg.hhh:995)(pop3_in_get_auth+0x178)[0x417188]
perdition: connect (
aaa.bbb.ccc.ddd:59911->eee.ffff.ggg.hhh:995)(main+0x860)[0x414910]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7febf8277c4d]
perdition: connect ( aaa.bbb.ccc.ddd:59911->eee.ffff.ggg.hhh:995)[0x406e79]
Now i checked out the source and took a look into pop3_in_get_auth, and what was done
first in this function? Yes, the capabilities. After removing my pop3_capability config,
the proxy worked again. Now, I checked every single capability, and found "STLS"
was the reason (this sounds very easy, as you know now what the solution is, but it
wasn't...).
So I removed STLS from pop3_capability config on the production machine (runnig 1.17.1 as
mentioned above), and no more "auth failed" processes kept open so far... so
what a journey... :)
However, I know it makes no sense to set STLS capability on an pop3s proxy, but on pop3
proxy with --ssl_mode tls_all, it's not so a bad idea. As I found out now, STLS is
added automatically by perdition, when --ssl_mode tls_all is set. So I think, everything
is fine now. But maybe you mention anywhere in the documentation or default config file,
that's not necessary to set "STLS" by user. Or, in any next version, remove
STLS from capability list automatically, if the user is as dumb as I was to add it
manually.... :)