Job is getting stuck before it can even start building

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Job is getting stuck before it can even start building

Shobha Dashottar
Hello,

   I am facing the following issue of a stuck job execution.

   Once a build is fired, the console output keeps showing the first 2 lines of job execution.

   Started by user xyz
    Building remotely on  slave-machine-name
   


   I cannot even cancel the job, I have to restart tomcat. It is painful since there are other jobs building.  This happens randomly on any machine any job thus making it harder to debug. Jobs that do not use the perforce SCM plugin are also getting affected.

  This is what I have done so far to debug.
    1.  Added -XX:MaxPermSize=256m for Tomcat in the Options Registry key. Added Keys JVMms=218, JVMmx=512.
    2.  Upgraded hudson verion to 1.381 after finding out that there are some defects fixed for the same or similar issues.

   But yesterday the bug crept in again. The Hudson server is on a Windows 7 machine. The hudson and catalina logs don't show much.

Appreciate the help.

Thanks
Shobha

Reply | Threaded
Open this post in threaded view
|

Re: Job is getting stuck before it can even start building

Geoffrey Crandall
Hi,

I have the same problem. I've always been able to associate it with a
dead SSH connection. I've never seen the problem on slaves that are
connected by JNLP. Our network admins insist that it's not a keep
alive problem nor is it the firewall killing the SSH. Nothing appears
on any logs, so it's hard to figure out what is happening.

Don't know if the dead SSH connection is causing the problem or just a
consequence of the job getting stuck. If Hudson is busy running a job,
should it notice that the slave SSH connection dies? Or maybe the SSH
connection dies because the job gets stuck for some other reason and
there's no slave communication, but on the slave machine, there is no
slave processes even running that would seem stuck. The annoying thing
is that the executor is stuck and can't be flushed except by
restarting Hudson. Even if I disconnect and reconnect the slave, the
job is still blocking the executor. Would be nice to just kill all
executor jobs if a slave is disconnected.

For me the problem has existed for quite a long time, but after I
moved to Hudson build 379, the problem has gotten much worse and I
need to restart Hudson almost daily.

-Jeff

On Mon, Nov 22, 2010 at 9:03 AM, shobhad <[hidden email]> wrote:

>
> Hello,
>
>   I am facing the following issue of a stuck job execution.
>
>   Once a build is fired, the console output keeps showing the first 2 lines
> of job execution.
>
>   Started by user xyz
>    Building remotely on  slave-machine-name
>
>
>   I cannot even cancel the job, I have to restart tomcat. It is painful
> since there are other jobs building.  This happens randomly on any machine
> any job thus making it harder to debug. Jobs that do not use the perforce
> SCM plugin are also getting affected.
>
>  This is what I have done so far to debug.
>    1.  Added -XX:MaxPermSize=256m for Tomcat in the Options Registry key.
> Added Keys JVMms=218, JVMmx=512.
>    2.  Upgraded hudson verion to 1.381 after finding out that there are
> some defects fixed for the same or similar issues.
>
>   But yesterday the bug crept in again. The Hudson server is on a Windows 7
> machine. The hudson and catalina logs don't show much.
>
> Appreciate the help.
>
> Thanks
> Shobha
>
>
> --
> View this message in context: http://hudson.361315.n4.nabble.com/Job-is-getting-stuck-before-it-can-even-start-building-tp3053144p3053144.html
> Sent from the Hudson users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Job is getting stuck before it can even start building

Kohsuke Kawaguchi
Administrator
In reply to this post by Shobha Dashottar
Please see http://wiki.hudson-ci.org/display/HUDSON/Build+is+hanging
and get us the stack trace, so that we can see where the hang is
happening.

2010/11/21 shobhad <[hidden email]>:

>
> Hello,
>
>   I am facing the following issue of a stuck job execution.
>
>   Once a build is fired, the console output keeps showing the first 2 lines
> of job execution.
>
>   Started by user xyz
>    Building remotely on  slave-machine-name
>
>
>   I cannot even cancel the job, I have to restart tomcat. It is painful
> since there are other jobs building.  This happens randomly on any machine
> any job thus making it harder to debug. Jobs that do not use the perforce
> SCM plugin are also getting affected.
>
>  This is what I have done so far to debug.
>    1.  Added -XX:MaxPermSize=256m for Tomcat in the Options Registry key.
> Added Keys JVMms=218, JVMmx=512.
>    2.  Upgraded hudson verion to 1.381 after finding out that there are
> some defects fixed for the same or similar issues.
>
>   But yesterday the bug crept in again. The Hudson server is on a Windows 7
> machine. The hudson and catalina logs don't show much.
>
> Appreciate the help.
>
> Thanks
> Shobha
>
>
> --
> View this message in context: http://hudson.361315.n4.nabble.com/Job-is-getting-stuck-before-it-can-even-start-building-tp3053144p3053144.html
> Sent from the Hudson users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



--
Kohsuke Kawaguchi
Reply | Threaded
Open this post in threaded view
|

Re: Job is getting stuck before it can even start building

Shobha Dashottar
This post has NOT been accepted by the mailing list yet.
I thought after upgrading Hudson and upgrading the perforce plugin, the problem had gone away but it is reappearing.

It happened today and here is the threadDump on the particular job/slave that got stuck.

Channel reader thread: slave2
"Channel reader thread: slave2" Id=123 Group=main RUNNABLE (in native)
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        -  locked java.io.BufferedInputStream@483fd4
        at java.io.FilterInputStream.read(Unknown Source)
        at hudson.remoting.BinarySafeStream$1._read(BinarySafeStream.java:149)
        at hudson.remoting.BinarySafeStream$1.read(BinarySafeStream.java:80)
        at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:948)

...
...

Executor #0 for slave2 : executing job2-Continuous #1403
"Executor #0 for slave2 : executing job2-Continuous #1403" Id=70 Group=main BLOCKED on hudson.remoting.Channel@d68b39 owned by "Workspace clean-up thread" Id=1419
        at hudson.remoting.Request.call(Request.java:100)
        -  blocked on hudson.remoting.Channel@d68b39
        at hudson.remoting.Channel.call(Channel.java:630)
        at hudson.FilePath.act(FilePath.java:742)
        at hudson.FilePath.act(FilePath.java:735)
        at hudson.FilePath.mkdirs(FilePath.java:801)
        at hudson.model.AbstractProject.checkout(AbstractProject.java:1090)
        at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
        at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
        at hudson.model.Run.run(Run.java:1280)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:139)

From the ThreadDump, it appears that the executor is waiting and is stuck on a "workspace cleanup" process. I don't know how to kill it either as it does not appear in the task manager.

Let me know if you need anything more from the ThreadDump

Thanks
Shobha
Reply | Threaded
Open this post in threaded view
|

Re: Job is getting stuck before it can even start building

ShobhaD
In reply to this post by Kohsuke Kawaguchi
I thought after upgrading Hudson and upgrading the perforce plugin,
the problem had gone away but it is reappearing.

It happened today and here is the threadDump on the particular job/
slave that got stuck.

Channel reader thread: slave2
"Channel reader thread: slave2" Id=123 Group=main RUNNABLE (in native)
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        -  locked java.io.BufferedInputStream@483fd4
        at java.io.FilterInputStream.read(Unknown Source)
        at hudson.remoting.BinarySafeStream
$1._read(BinarySafeStream.java:149)
        at hudson.remoting.BinarySafeStream
$1.read(BinarySafeStream.java:80)
        at java.io.ObjectInputStream$PeekInputStream.peek(Unknown
Source)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown
Source)
        at java.io.ObjectInputStream
$BlockDataInputStream.peekByte(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:948)

...
...

Executor #0 for slave2 : executing job2-Continuous #1403
"Executor #0 for slave2 : executing job2-Continuous #1403" Id=70
Group=main BLOCKED on hudson.remoting.Channel@d68b39 owned by
"Workspace clean-up thread" Id=1419
        at hudson.remoting.Request.call(Request.java:100)
        -  blocked on hudson.remoting.Channel@d68b39
        at hudson.remoting.Channel.call(Channel.java:630)
        at hudson.FilePath.act(FilePath.java:742)
        at hudson.FilePath.act(FilePath.java:735)
        at hudson.FilePath.mkdirs(FilePath.java:801)
        at hudson.model.AbstractProject.checkout(AbstractProject.java:
1090)
        at hudson.model.AbstractBuild
$AbstractRunner.checkout(AbstractBuild.java:479)
        at hudson.model.AbstractBuild
$AbstractRunner.run(AbstractBuild.java:411)
        at hudson.model.Run.run(Run.java:1280)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at
hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:139)

From the ThreadDump, it appears that the executor is waiting and is
stuck on a "workspace cleanup" process. I don't know how to kill it
either as it does not appear in the task manager.

Let me know if you need anything more from the ThreadDump as it is
really very huge to post it entirely over here.

Thanks
Shobha

On Nov 23, 10:40 am, Kohsuke Kawaguchi <[hidden email]> wrote:

> Please seehttp://wiki.hudson-ci.org/display/HUDSON/Build+is+hanging
> and get us the stack trace, so that we can see where the hang is
> happening.
>
> 2010/11/21 shobhad <[hidden email]>:
>
>
>
>
>
> > Hello,
>
> >   I am facing the following issue of a stuck job execution.
>
> >   Once a build is fired, the console output keeps showing the first 2 lines
> > of job execution.
>
> >   Started by user xyz
> >    Building remotely on  slave-machine-name
>
> >   I cannot even cancel the job, I have to restart tomcat. It is painful
> > since there are other jobs building.  This happens randomly on any machine
> > any job thus making it harder to debug. Jobs that do not use the perforce
> > SCM plugin are also getting affected.
>
> >  This is what I have done so far to debug.
> >    1.  Added -XX:MaxPermSize=256m for Tomcat in the Options Registry key.
> > Added Keys JVMms=218, JVMmx=512.
> >    2.  Upgraded hudson verion to 1.381 after finding out that there are
> > some defects fixed for the same or similar issues.
>
> >   But yesterday the bug crept in again. The Hudson server is on a Windows 7
> > machine. The hudson and catalina logs don't show much.
>
> > Appreciate the help.
>
> > Thanks
> >Shobha
>
> > --
> > View this message in context:http://hudson.361315.n4.nabble.com/Job-is-getting-stuck-before-it-can...
> > Sent from the Hudson users mailing list archive at Nabble.com.
>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
> --
> Kohsuke Kawaguchi
Reply | Threaded
Open this post in threaded view
|

Re: Job is getting stuck before it can even start building

ShobhaD
This issue is not resolved yet for me.
I have to restart Tomcat each time this happens. I have now started
tomcat in a command window
This is the first time I have deployed hudson /Tomcat on a Windows
( windows 7 32-bit) machine. I have never had this issue with my
previous Linux based hudson setups.


I noticed in the command window that hudson is stuck at node
monitoring  when a job is stuck while building (only the first two
lines ( Building remotely on <slave x>....) ) are displayed.
I have gathered the console output of execution of Tomcat6.exe //TS//
Tomcat6)  of when this happens

Feb 8, 2011 9:20:37 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Free Swap Space monitoring activity still in
progress. Interrupting
Feb 8, 2011 9:20:37 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Free Disk Space monitoring activity still in
progress. Interrupting
Feb 8, 2011 9:20:37 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Free Temp Space monitoring activity still in
progress. Interrupting
Feb 8, 2011 9:20:37 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Architecture monitoring activity still in progress.
Interrupting
Feb 8, 2011 9:20:37 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Clock Difference monitoring activity still in
progress. Interrupting
Feb 8, 2011 9:31:57 AM org.apache.coyote.http11.Http11AprProtocol
pause
INFO: Pausing Coyote HTTP/1.1 on http-8080
<<--------------------------------------------------------------------------------------------------------------
cancelled execution at this point
Feb 8, 2011 9:31:57 AM org.apache.coyote.ajp.AjpAprProtocol pause
INFO: Pausing Coyote AJP/1.3 on ajp-8009
Feb 8, 2011 9:31:58 AM org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina
Feb 8, 2011 9:32:24 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Response Time monitoring activity still in progress.
Interrupting
Feb 8, 2011 9:32:24 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Free Swap Space monitoring activity still in
progress. Interrupting
Feb 8, 2011 9:32:24 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Free Disk Space monitoring activity still in
progress. Interrupting
Feb 8, 2011 9:32:24 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Free Temp Space monitoring activity still in
progress. Interrupting
Feb 8, 2011 9:32:24 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Architecture monitoring activity still in progress.
Interrupting
Feb 8, 2011 9:32:24 AM
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
WARNING: Previous Clock Difference monitoring activity still in
progress. Interrupting
Feb 8, 2011 9:33:27 AM hudson.slaves.SlaveComputer tryReconnect
INFO: Attempting to reconnect <slave x..>
Feb 8, 2011 9:33:31 AM
com.youdevise.hudson.slavestatus.SlaveListenerInitiator onOnline
INFO: Starting slave-status listener on <slave x>
Feb 8, 2011 9:33:31 AM hudson.slaves.CommandLauncher launch
INFO: slave agent launched for  <slave x>
Feb 8, 2011 9:36:59 AM org.apache.catalina.startup.Catalina stopServer
SEVERE: Catalina.stop:
java.net.ConnectException: Connection refused: connect
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(Unknown Source)
        at java.net.PlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.PlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at java.net.Socket.<init>(Unknown Source)
        at java.net.Socket.<init>(Unknown Source)
        at
org.apache.catalina.startup.Catalina.stopServer(Catalina.java:408)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at
org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:338)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:
416)

Also, I get a message on the Dashboard under Manage Hudson that says
that that "there are too many SCM polling threads running than can be
handled".

Appreciate any help.

Thanks
Shobha


On Dec 16 2010, 9:28 am, ShobhaD <[hidden email]> wrote:

> I thought after upgrading Hudson and upgrading the perforce plugin,
> the problem had gone away but it is reappearing.
>
> It happened today and here is the threadDump on the particular job/
> slave that got stuck.
>
> Channel reader thread: slave2
> "Channel reader thread: slave2" Id=123 Group=main RUNNABLE (in native)
>         at java.io.FileInputStream.readBytes(Native Method)
>         at java.io.FileInputStream.read(Unknown Source)
>         at java.io.BufferedInputStream.fill(Unknown Source)
>         at java.io.BufferedInputStream.read1(Unknown Source)
>         at java.io.BufferedInputStream.read(Unknown Source)
>         -  locked java.io.BufferedInputStream@483fd4
>         at java.io.FilterInputStream.read(Unknown Source)
>         at hudson.remoting.BinarySafeStream
> $1._read(BinarySafeStream.java:149)
>         at hudson.remoting.BinarySafeStream
> $1.read(BinarySafeStream.java:80)
>         at java.io.ObjectInputStream$PeekInputStream.peek(Unknown
> Source)
>         at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown
> Source)
>         at java.io.ObjectInputStream
> $BlockDataInputStream.peekByte(Unknown Source)
>         at java.io.ObjectInputStream.readObject0(Unknown Source)
>         at java.io.ObjectInputStream.readObject(Unknown Source)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:948)
>
> ...
> ...
>
> Executor #0 for slave2 : executing job2-Continuous #1403
> "Executor #0 for slave2 : executing job2-Continuous #1403" Id=70
> Group=main BLOCKED on hudson.remoting.Channel@d68b39 owned by
> "Workspace clean-up thread" Id=1419
>         at hudson.remoting.Request.call(Request.java:100)
>         -  blocked on hudson.remoting.Channel@d68b39
>         at hudson.remoting.Channel.call(Channel.java:630)
>         at hudson.FilePath.act(FilePath.java:742)
>         at hudson.FilePath.act(FilePath.java:735)
>         at hudson.FilePath.mkdirs(FilePath.java:801)
>         at hudson.model.AbstractProject.checkout(AbstractProject.java:
> 1090)
>         at hudson.model.AbstractBuild
> $AbstractRunner.checkout(AbstractBuild.java:479)
>         at hudson.model.AbstractBuild
> $AbstractRunner.run(AbstractBuild.java:411)
>         at hudson.model.Run.run(Run.java:1280)
>         at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
>         at
> hudson.model.ResourceController.execute(ResourceController.java:88)
>         at hudson.model.Executor.run(Executor.java:139)
>
> From the ThreadDump, it appears that the executor is waiting and is
> stuck on a "workspace cleanup" process. I don't know how to kill it
> either as it does not appear in the task manager.
>
> Let me know if you need anything more from the ThreadDump as it is
> really very huge to post it entirely over here.
>
> Thanks
> Shobha
>
> On Nov 23, 10:40 am, Kohsuke Kawaguchi <[hidden email]> wrote:
>
> > Please seehttp://wiki.hudson-ci.org/display/HUDSON/Build+is+hanging
> > and get us the stack trace, so that we can see where the hang is
> > happening.
>
> > 2010/11/21shobhad<[hidden email]>:
>
> > > Hello,
>
> > >   I am facing the following issue of a stuck job execution.
>
> > >   Once a build is fired, the console output keeps showing the first 2 lines
> > > of job execution.
>
> > >   Started by user xyz
> > >    Building remotely on  slave-machine-name
>
> > >   I cannot even cancel the job, I have to restart tomcat. It is painful
> > > since there are other jobs building.  This happens randomly on any machine
> > > any job thus making it harder to debug. Jobs that do not use the perforce
> > > SCM plugin are also getting affected.
>
> > >  This is what I have done so far to debug.
> > >    1.  Added -XX:MaxPermSize=256m for Tomcat in the Options Registry key.
> > > Added Keys JVMms=218, JVMmx=512.
> > >    2.  Upgraded hudson verion to 1.381 after finding out that there are
> > > some defects fixed for the same or similar issues.
>
> > >   But yesterday the bug crept in again. The Hudson server is on a Windows 7
> > > machine. The hudson and catalina logs don't show much.
>
> > > Appreciate the help.
>
> > > Thanks
> > >Shobha
>
> > > --
> > > View this message in context:http://hudson.361315.n4.nabble.com/Job-is-getting-stuck-before-it-can...
> > > Sent from the Hudson users mailing list archive at Nabble.com.
>
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
>
> > --
> > Kohsuke Kawaguchi
Reply | Threaded
Open this post in threaded view
|

Re: Job is getting stuck before it can even start building

Mark Waite
I wonder if I'm seeing the same condition you're seeing, even though I'm seeing
it without tomcat.  My builds become "stuck" and there may be a correlation
between the Windows slave that is hosting one of the builds and the "stuck".

The Hudson master on Linux consumes 100% of the CPU and the build on Windows
makes no further progress.  A request to view console output for the "stuck"
Windows job from the web interface never returns.

Can you tell me how you generated the thread dump on Windows, and I can perform
a similar thread dump?

Mark Waite


----- Original Message ----

> From: ShobhaD <[hidden email]>
> To: Jenkins Users <[hidden email]>
> Cc: [hidden email]
> Sent: Mon, February 7, 2011 9:28:00 PM
> Subject: Re: Job is getting stuck before it can even start building
>
> This issue is not resolved yet for me.
> I have to restart Tomcat each time  this happens. I have now started
> tomcat in a command window
> This is the  first time I have deployed hudson /Tomcat on a Windows
> ( windows 7 32-bit)  machine. I have never had this issue with my
> previous Linux based hudson  setups.
>
>
> I noticed in the command window that hudson is stuck at  node
> monitoring  when a job is stuck while building (only the first  two
> lines ( Building remotely on <slave x>....) ) are displayed.
> I  have gathered the console output of execution of Tomcat6.exe  //TS//
> Tomcat6)  of when this happens
>
> Feb 8, 2011 9:20:37  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Free Swap Space monitoring activity still  in
> progress. Interrupting
> Feb 8, 2011 9:20:37  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Free Disk Space monitoring activity still  in
> progress. Interrupting
> Feb 8, 2011 9:20:37  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Free Temp Space monitoring activity still  in
> progress. Interrupting
> Feb 8, 2011 9:20:37  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Architecture monitoring activity still in  progress.
> Interrupting
> Feb 8, 2011 9:20:37  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Clock Difference monitoring activity still  in
> progress. Interrupting
> Feb 8, 2011 9:31:57 AM  org.apache.coyote.http11.Http11AprProtocol
> pause
> INFO: Pausing Coyote  HTTP/1.1 on  http-8080
><<--------------------------------------------------------------------------------------------------------------
>-
> cancelled  execution at this point
> Feb 8, 2011 9:31:57 AM  org.apache.coyote.ajp.AjpAprProtocol pause
> INFO: Pausing Coyote AJP/1.3 on  ajp-8009
> Feb 8, 2011 9:31:58 AM org.apache.catalina.core.StandardService  stop
> INFO: Stopping service Catalina
> Feb 8, 2011 9:32:24  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Response Time monitoring activity still in  progress.
> Interrupting
> Feb 8, 2011 9:32:24  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Free Swap Space monitoring activity still  in
> progress. Interrupting
> Feb 8, 2011 9:32:24  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Free Disk Space monitoring activity still  in
> progress. Interrupting
> Feb 8, 2011 9:32:24  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Free Temp Space monitoring activity still  in
> progress. Interrupting
> Feb 8, 2011 9:32:24  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Architecture monitoring activity still in  progress.
> Interrupting
> Feb 8, 2011 9:32:24  AM
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> WARNING: Previous Clock Difference monitoring activity still  in
> progress. Interrupting
> Feb 8, 2011 9:33:27 AM  hudson.slaves.SlaveComputer tryReconnect
> INFO: Attempting to reconnect  <slave x..>
> Feb 8, 2011 9:33:31  AM
> com.youdevise.hudson.slavestatus.SlaveListenerInitiator onOnline
> INFO:  Starting slave-status listener on <slave x>
> Feb 8, 2011 9:33:31 AM  hudson.slaves.CommandLauncher launch
> INFO: slave agent launched for   <slave x>
> Feb 8, 2011 9:36:59 AM org.apache.catalina.startup.Catalina  stopServer
> SEVERE: Catalina.stop:
> java.net.ConnectException: Connection  refused: connect
>         at  java.net.PlainSocketImpl.socketConnect(Native Method)
>          at java.net.PlainSocketImpl.doConnect(Unknown Source)
>          at java.net.PlainSocketImpl.connectToAddress(Unknown  Source)
>         at  java.net.PlainSocketImpl.connect(Unknown Source)
>          at java.net.SocksSocketImpl.connect(Unknown Source)
>          at java.net.Socket.connect(Unknown Source)
>          at java.net.Socket.connect(Unknown Source)
>         at  java.net.Socket.<init>(Unknown Source)
>         at  java.net.Socket.<init>(Unknown Source)
>          at
> org.apache.catalina.startup.Catalina.stopServer(Catalina.java:408)
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native  Method)
>         at  sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>          at  sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
> Source)
>          at java.lang.reflect.Method.invoke(Unknown  Source)
>          at
> org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:338)
>          at  org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:
> 416)
>
> Also,  I get a message on the Dashboard under Manage Hudson that says
> that that  "there are too many SCM polling threads running than can  be
> handled".
>
> Appreciate any  help.
>
> Thanks
> Shobha
>
>
> On Dec 16 2010, 9:28 am, ShobhaD <[hidden email]> wrote:
> > I  thought after upgrading Hudson and upgrading the perforce plugin,
> > the  problem had gone away but it is reappearing.
> >
> > It happened today  and here is the threadDump on the particular job/
> > slave that got  stuck.
> >
> > Channel reader thread: slave2
> > "Channel reader  thread: slave2" Id=123 Group=main RUNNABLE (in native)
> >         at  java.io.FileInputStream.readBytes(Native Method)
> >         at  java.io.FileInputStream.read(Unknown Source)
> >         at  java.io.BufferedInputStream.fill(Unknown Source)
> >         at  java.io.BufferedInputStream.read1(Unknown Source)
> >         at  java.io.BufferedInputStream.read(Unknown Source)
> >         -  locked  java.io.BufferedInputStream@483fd4
> >         at  java.io.FilterInputStream.read(Unknown Source)
> >         at  hudson.remoting.BinarySafeStream
> >  $1._read(BinarySafeStream.java:149)
> >         at  hudson.remoting.BinarySafeStream
> >  $1.read(BinarySafeStream.java:80)
> >         at  java.io.ObjectInputStream$PeekInputStream.peek(Unknown
> > Source)
> >          at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown
> >  Source)
> >         at java.io.ObjectInputStream
> >  $BlockDataInputStream.peekByte(Unknown Source)
> >         at  java.io.ObjectInputStream.readObject0(Unknown Source)
> >         at  java.io.ObjectInputStream.readObject(Unknown Source)
> >         at  hudson.remoting.Channel$ReaderThread.run(Channel.java:948)
> >
> >  ...
> > ...
> >
> > Executor #0 for slave2 : executing  job2-Continuous #1403
> > "Executor #0 for slave2 : executing  job2-Continuous #1403" Id=70
> > Group=main BLOCKED on  hudson.remoting.Channel@d68b39 owned by
> > "Workspace clean-up thread"  Id=1419
> >         at  hudson.remoting.Request.call(Request.java:100)
> >         -  blocked on  hudson.remoting.Channel@d68b39
> >         at  hudson.remoting.Channel.call(Channel.java:630)
> >         at  hudson.FilePath.act(FilePath.java:742)
> >         at  hudson.FilePath.act(FilePath.java:735)
> >         at  hudson.FilePath.mkdirs(FilePath.java:801)
> >         at  hudson.model.AbstractProject.checkout(AbstractProject.java:
> >  1090)
> >         at hudson.model.AbstractBuild
> >  $AbstractRunner.checkout(AbstractBuild.java:479)
> >         at  hudson.model.AbstractBuild
> >  $AbstractRunner.run(AbstractBuild.java:411)
> >         at  hudson.model.Run.run(Run.java:1280)
> >         at  hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
> >          at
> >  hudson.model.ResourceController.execute(ResourceController.java:88)
> >          at hudson.model.Executor.run(Executor.java:139)
> >
> > From the  ThreadDump, it appears that the executor is waiting and is
> > stuck on a  "workspace cleanup" process. I don't know how to kill it
> > either as it  does not appear in the task manager.
> >
> > Let me know if you need  anything more from the ThreadDump as it is
> > really very huge to post it  entirely over here.
> >
> > Thanks
> > Shobha
> >
> > On Nov  23, 10:40 am, Kohsuke Kawaguchi <[hidden email]> wrote:
> >
> >  > Please seehttp://wiki.hudson-ci.org/display/HUDSON/Build+is+hanging
> > > and  get us the stack trace, so that we can see where the hang is
> > >  happening.
> >
> > > 2010/11/21shobhad<[hidden email]>:
> >
> > >  > Hello,
> >
> > > >   I am facing the following issue of a  stuck job execution.
> >
> > > >   Once a build is fired, the  console output keeps showing the first 2
>lines
> > > > of job  execution.
> >
> > > >   Started by user xyz
> > > >     Building remotely on  slave-machine-name
> >
> > > >   I cannot  even cancel the job, I have to restart tomcat. It is painful
> > > >  since there are other jobs building.  This happens randomly on any  
>machine
> > > > any job thus making it harder to debug. Jobs that do  not use the
>perforce
> > > > SCM plugin are also getting  affected.
> >
> > > >  This is what I have done so far to  debug.
> > > >    1.  Added -XX:MaxPermSize=256m for Tomcat in the  Options Registry
>key.
> > > > Added Keys JVMms=218, JVMmx=512.
> >  > >    2.  Upgraded hudson verion to 1.381 after finding out that there  
are

> > > > some defects fixed for the same or similar  issues.
> >
> > > >   But yesterday the bug crept in again. The  Hudson server is on a
>Windows 7
> > > > machine. The hudson and  catalina logs don't show much.
> >
> > > > Appreciate the  help.
> >
> > > > Thanks
> > > >Shobha
> >
> >  > > --
> > > > View this message in
>context:http://hudson.361315.n4.nabble.com/Job-is-getting-stuck-before-it-can...
> >  > > Sent from the Hudson users mailing list archive at  Nabble.com.
> >
> > > >  ---------------------------------------------------------------------
> >  > > To unsubscribe, e-mail: [hidden email]
> >  > > For additional commands, e-mail: [hidden email]
> >
> >  > --
> > > Kohsuke Kawaguchi
>
Reply | Threaded
Open this post in threaded view
|

Re: Job is getting stuck before it can even start building

ShobhaD
Type in http://<url to your hudson server>/hudson/threadDump
It will give a summary of what's happening on hudson. It says the scm-
polling thread is stuck and execution of the build is hanged but not
sure why.

Please update this posting with your findings as well.


Thanks
Shobha



On Feb 8, 10:30 am, Mark Waite <[hidden email]> wrote:

> I wonder if I'm seeing the same condition you're seeing, even though I'm seeing
> it without tomcat.  My builds become "stuck" and there may be a correlation
> between the Windows slave that is hosting one of the builds and the "stuck".
>
> The Hudson master on Linux consumes 100% of the CPU and the build on Windows
> makes no further progress.  A request to view console output for the "stuck"
> Windows job from the web interface never returns.
>
> Can you tell me how you generated the thread dump on Windows, and I can perform
> a similar thread dump?
>
> Mark Waite
>
>
>
> ----- Original Message ----
> > From: ShobhaD <[hidden email]>
> > To: Jenkins Users <[hidden email]>
> > Cc: [hidden email]
> > Sent: Mon, February 7, 2011 9:28:00 PM
> > Subject: Re: Job is getting stuck before it can even start building
>
> > This issue is not resolved yet for me.
> > I have to restart Tomcat each time  this happens. I have now started
> > tomcat in a command window
> > This is the  first time I have deployed hudson /Tomcat on a Windows
> > ( windows 7 32-bit)  machine. I have never had this issue with my
> > previous Linux based hudson  setups.
>
> > I noticed in the command window that hudson is stuck at  node
> > monitoring  when a job is stuck while building (only the first  two
> > lines ( Building remotely on <slave x>....) ) are displayed.
> > I  have gathered the console output of execution of Tomcat6.exe  //TS//
> > Tomcat6)  of when this happens
>
> > Feb 8, 2011 9:20:37  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Free Swap Space monitoring activity still  in
> > progress. Interrupting
> > Feb 8, 2011 9:20:37  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Free Disk Space monitoring activity still  in
> > progress. Interrupting
> > Feb 8, 2011 9:20:37  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Free Temp Space monitoring activity still  in
> > progress. Interrupting
> > Feb 8, 2011 9:20:37  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Architecture monitoring activity still in  progress.
> > Interrupting
> > Feb 8, 2011 9:20:37  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Clock Difference monitoring activity still  in
> > progress. Interrupting
> > Feb 8, 2011 9:31:57 AM  org.apache.coyote.http11.Http11AprProtocol
> > pause
> > INFO: Pausing Coyote  HTTP/1.1 on  http-8080
> ><<--------------------------------------------------------------------------------------------------------------
> >-
> > cancelled  execution at this point
> > Feb 8, 2011 9:31:57 AM  org.apache.coyote.ajp.AjpAprProtocol pause
> > INFO: Pausing Coyote AJP/1.3 on  ajp-8009
> > Feb 8, 2011 9:31:58 AM org.apache.catalina.core.StandardService  stop
> > INFO: Stopping service Catalina
> > Feb 8, 2011 9:32:24  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Response Time monitoring activity still in  progress.
> > Interrupting
> > Feb 8, 2011 9:32:24  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Free Swap Space monitoring activity still  in
> > progress. Interrupting
> > Feb 8, 2011 9:32:24  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Free Disk Space monitoring activity still  in
> > progress. Interrupting
> > Feb 8, 2011 9:32:24  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Free Temp Space monitoring activity still  in
> > progress. Interrupting
> > Feb 8, 2011 9:32:24  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Architecture monitoring activity still in  progress.
> > Interrupting
> > Feb 8, 2011 9:32:24  AM
> > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record  <init>
> > WARNING: Previous Clock Difference monitoring activity still  in
> > progress. Interrupting
> > Feb 8, 2011 9:33:27 AM  hudson.slaves.SlaveComputer tryReconnect
> > INFO: Attempting to reconnect  <slave x..>
> > Feb 8, 2011 9:33:31  AM
> > com.youdevise.hudson.slavestatus.SlaveListenerInitiator onOnline
> > INFO:  Starting slave-status listener on <slave x>
> > Feb 8, 2011 9:33:31 AM  hudson.slaves.CommandLauncher launch
> > INFO: slave agent launched for   <slave x>
> > Feb 8, 2011 9:36:59 AM org.apache.catalina.startup.Catalina  stopServer
> > SEVERE: Catalina.stop:
> > java.net.ConnectException: Connection  refused: connect
> >         at  java.net.PlainSocketImpl.socketConnect(Native Method)
> >          at java.net.PlainSocketImpl.doConnect(Unknown Source)
> >          at java.net.PlainSocketImpl.connectToAddress(Unknown  Source)
> >         at  java.net.PlainSocketImpl.connect(Unknown Source)
> >          at java.net.SocksSocketImpl.connect(Unknown Source)
> >          at java.net.Socket.connect(Unknown Source)
> >          at java.net.Socket.connect(Unknown Source)
> >         at  java.net.Socket.<init>(Unknown Source)
> >         at  java.net.Socket.<init>(Unknown Source)
> >          at
> > org.apache.catalina.startup.Catalina.stopServer(Catalina.java:408)
> >          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native  Method)
> >         at  sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> >          at  sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
> > Source)
> >          at java.lang.reflect.Method.invoke(Unknown  Source)
> >          at
> > org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:338)
> >          at  org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:
> > 416)
>
> > Also,  I get a message on the Dashboard under Manage Hudson that says
> > that that  "there are too many SCM polling threads running than can  be
> > handled".
>
> > Appreciate any  help.
>
> > Thanks
> > Shobha
>
> > On Dec 16 2010, 9:28 am, ShobhaD <[hidden email]> wrote:
> > > I  thought after upgrading Hudson and upgrading the perforce plugin,
> > > the  problem had gone away but it is reappearing.
>
> > > It happened today  and here is the threadDump on the particular job/
> > > slave that got  stuck.
>
> > > Channel reader thread: slave2
> > > "Channel reader  thread: slave2" Id=123 Group=main RUNNABLE (in native)
> > >         at  java.io.FileInputStream.readBytes(Native Method)
> > >         at  java.io.FileInputStream.read(Unknown Source)
> > >         at  java.io.BufferedInputStream.fill(Unknown Source)
> > >         at  java.io.BufferedInputStream.read1(Unknown Source)
> > >         at  java.io.BufferedInputStream.read(Unknown Source)
> > >         -  locked  java.io.BufferedInputStream@483fd4
> > >         at  java.io.FilterInputStream.read(Unknown Source)
> > >         at  hudson.remoting.BinarySafeStream
> > >  $1._read(BinarySafeStream.java:149)
> > >         at  hudson.remoting.BinarySafeStream
> > >  $1.read(BinarySafeStream.java:80)
> > >         at  java.io.ObjectInputStream$PeekInputStream.peek(Unknown
> > > Source)
> > >          at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown
> > >  Source)
> > >         at java.io.ObjectInputStream
> > >  $BlockDataInputStream.peekByte(Unknown Source)
> > >         at  java.io.ObjectInputStream.readObject0(Unknown Source)
> > >         at  java.io.ObjectInputStream.readObject(Unknown Source)
> > >         at  hudson.remoting.Channel$ReaderThread.run(Channel.java:948)
>
> > >  ...
> > > ...
>
> > > Executor #0 for slave2 : executing  job2-Continuous #1403
> > > "Executor #0 for slave2 : executing  job2-Continuous #1403" Id=70
> > > Group=main BLOCKED on  hudson.remoting.Channel@d68b39 owned by
> > > "Workspace clean-up thread"  Id=1419
> > >         at  hudson.remoting.Request.call(Request.java:100)
> > >         -  blocked on  hudson.remoting.Channel@d68b39
> > >         at  hudson.remoting.Channel.call(Channel.java:630)
> > >         at  hudson.FilePath.act(FilePath.java:742)
> > >         at  hudson.FilePath.act(FilePath.java:735)
> > >         at  hudson.FilePath.mkdirs(FilePath.java:801)
> > >         at  hudson.model.AbstractProject.checkout(AbstractProject.java:
> > >  1090)
> > >         at hudson.model.AbstractBuild
> > >  $AbstractRunner.checkout(AbstractBuild.java:479)
> > >         at  hudson.model.AbstractBuild
> > >  $AbstractRunner.run(AbstractBuild.java:411)
> > >         at  hudson.model.Run.run(Run.java:1280)
> > >         at  hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
> > >          at
> > >  hudson.model.ResourceController.execute(ResourceController.java:88)
> > >          at hudson.model.Executor.run(Executor.java:139)
>
> > > From the  ThreadDump, it appears that the executor is waiting and is
> > > stuck on a  "workspace cleanup" process. I don't know how to kill it
> > > either as it  does not appear in the task manager.
>
> > > Let me know if you need  anything more from the ThreadDump as it is
> > > really very huge to post it  entirely over here.
>
> > > Thanks
> > > Shobha
>
> > > On Nov  23, 10:40 am, Kohsuke Kawaguchi <[hidden email]> wrote:
>
> > >  > Please seehttp://wiki.hudson-ci.org/display/HUDSON/Build+is+hanging
> > > > and  get us the stack trace, so that we can see where the hang is
> > > >  happening.
>
> > > > 2010/11/21shobhad<[hidden email]>:
>
> > > >  > Hello,
>
> > > > >   I am facing the following issue of a  stuck job execution.
>
> > > > >   Once a build is fired, the  console output keeps showing the first 2
> >lines
> > > > > of job  execution.
>
> > > > >   Started by user xyz
> > > > >     Building remotely on  slave-machine-name
>
> > > > >   I cannot  even cancel the job, I have to restart tomcat. It is painful
> > > > >  since there are other jobs building.  This happens randomly on any  
> >machine
> > > > > any job thus making it harder to debug. Jobs that do  not use the
> >perforce
> > > > > SCM plugin are also getting  affected.
>
> > > > >  This is what I have done so far to  debug.
> > > > >    1.  Added -XX:MaxPermSize=256m for Tomcat in the  Options Registry
> >key.
> > > > > Added Keys JVMms=218, JVMmx=512.
> > >  > >    2.  Upgraded hudson verion to 1.381 after finding out that there  
>
> are ...
>
> read more »
Reply | Threaded
Open this post in threaded view
|

Re: Job is getting stuck before it can even start building

Wayne Fay
In reply to this post by Mark Waite
> Can you tell me how you generated the thread dump on Windows, and I can perform
> a similar thread dump?

Get the PID from Task Manager (or ps -ef on *nix) and run "jstack
$pid" on the command line to get the stack trace out of any Java
program running on Windows and other OSes too...

Wayne