https://github.com/Netflix/atlas
Revision b4a5a0d222507546701302d0514ace3e83014fb6 authored by brharrington on 05 June 2017, 22:18:13 UTC, committed by GitHub on 05 June 2017, 22:18:13 UTC
The healthcheck api was accessing the service manager
from a provider when the routes were fetched. There
doesn't appear to be any reason for this and it was
removed already in 1.6 branch as part of #490.

This was usually harmless, but on some systems threads
would get scheduled in such a way that all of the actor
threads were blocked with traces like:

```
"atlas-akka.actor.default-dispatcher-9" #18 prio=5 os_prio=0 tid=0x00007fb914010000 nid=0x6410 waiting on condition [0x00007fb9235f7000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000838ae498> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at com.google.inject.internal.CycleDetectingLock$CycleDetectingLockFactory$ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle(CycleDetectingLock.java:164)
        at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:185)
        at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
        at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019)
        at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085)
        at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015)
        at com.netflix.atlas.akka.HealthcheckApi.routes(HealthcheckApi.scala:36)
        at com.netflix.atlas.akka.RequestHandlerActor.receive(RequestHandlerActor.scala:41)
        at akka.actor.ActorCell.newActor(ActorCell.scala:568)
        at akka.actor.ActorCell.create(ActorCell.scala:588)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
        at akka.dispatch.Mailbox.run(Mailbox.scala:223)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
```

The user would see this as the application never fully
starting and not being accessible. This should fix
issue #612.
1 parent 7f660b3
History
Tip revision: b4a5a0d222507546701302d0514ace3e83014fb6 authored by brharrington on 05 June 2017, 22:18:13 UTC
fix possible deadlock during startup (#613)
Tip revision: b4a5a0d

README.md

back to top