Horms wrote:
On Wed, Sep 12, 2007 at 04:44:59PM +0100, Daniel Watts
wrote:
Hi guys,
Suddenly seem to be having a problem with perdition.imap4.
It stops listening on the imap port. netstat -l shows all other
Perdition services listening (imaps,pop3,pop3s) fine but not imap.
We also still have imap4 processes running:
nobody 17589 0.0 0.4 5388 2220 ? S 16:18 0:00
perdition.imap4: auth ok
nobody 17595 0.0 0.4 5520 2384 ? S 16:18 0:00
perdition.imap4: auth ok
etc
Nothing seems to be logging.
Connecting to the server on 143 shows as expected:
[root@tg1 perdition]# telnet localhost 143
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host: Connection refused
A restart has it working fine for 10 minutes to a few hours then it
drops again.
Perdition version is 1.17
Any ideas what else I can do to find out why it is dropping the port?
Hi Daniel,
That does not sound good.
What it sounds like is that the main perdition.imap4 process is dying,
and what you are seeing is a few child processes. It is the main
process that handles new connections, spawning a child to handle
them on connect().
Is there anything in the logs indicating a crash?
This is the output of my strace. Looks like it is being killed...but by
what? Itself?
[root@host]# strace -p 23606
...
accept(4, 0xbf8067f0, [16]) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigaction(SIGCHLD, {0x804c1f8, [CHLD], SA_RESTART}, {0x804c1f8,
[CHLD], SA_RESTART}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 30630
wait4(-1, 0xbf8064bc, WNOHANG, NULL) = 0
sigreturn() = ? (mask now [])
accept(4, {sa_family=AF_INET, sin_port=htons(31266),
sin_addr=inet_addr("65.254.35.122")}, [16]) = 5
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7eee708) = 30631
close(5) = 0
accept(4, 0xbf8067f0, [16]) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigaction(SIGCHLD, {0x804c1f8, [CHLD], SA_RESTART}, {0x804c1f8,
[CHLD], SA_RESTART}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 30631
wait4(-1, 0xbf8064bc, WNOHANG, NULL) = 0
sigreturn() = ? (mask now [])
accept(4, {sa_family=AF_INET, sin_port=htons(31360),
sin_addr=inet_addr("65.254.35.122")}, [16]) = 5
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7eee708) = 30633
close(5) = 0
accept(4, {sa_family=AF_INET, sin_port=htons(31372),
sin_addr=inet_addr("65.254.35.122")}, [16]) = 5
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7eee708) = 30637
close(5) = 0
accept(4, 0xbf8067f0, [16]) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigaction(SIGCHLD, {0x804c1f8, [CHLD], SA_RESTART}, {0x804c1f8,
[CHLD], SA_RESTART}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 30633
wait4(-1, 0xbf8064bc, WNOHANG, NULL) = 0
sigreturn() = ? (mask now [])
accept(4, {sa_family=AF_INET, sin_port=htons(34561),
sin_addr=inet_addr("65.254.41.154")}, [16]) = 5
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7eee708) = 30666
close(5) = 0
accept(4, 0xbf8067f0, [16]) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigaction(SIGCHLD, {0x804c1f8, [CHLD], SA_RESTART}, {0x804c1f8,
[CHLD], SA_RESTART}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 30666
wait4(-1, 0xbf8064bc, WNOHANG, NULL) = 0
sigreturn() = ? (mask now [])
accept(4, {sa_family=AF_INET, sin_port=htons(31460),
sin_addr=inet_addr("65.254.35.122")}, [16]) = 5
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7eee708) = 30672
close(5) = 0
accept(4, 0xbf8067f0, [16]) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigaction(SIGCHLD, {0x804c1f8, [CHLD], SA_RESTART}, {0x804c1f8,
[CHLD], SA_RESTART}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 30672
wait4(-1, 0xbf8064bc, WNOHANG, NULL) = 0
sigreturn() = ? (mask now [])
accept(4, 0xbf8067f0, [16]) = ? ERESTARTSYS (To be restarted)
+++ killed by SIGKILL +++
Process 23606 detached
This happens about every 10 minutes. We have a cron script restarting
perdition if it does but this isn't great of course.