https://github.com/Netflix/atlas
Revision b4a5a0d222507546701302d0514ace3e83014fb6 authored by brharrington on 05 June 2017, 22:18:13 UTC, committed by GitHub on 05 June 2017, 22:18:13 UTC
The healthcheck api was accessing the service manager from a provider when the routes were fetched. There doesn't appear to be any reason for this and it was removed already in 1.6 branch as part of #490. This was usually harmless, but on some systems threads would get scheduled in such a way that all of the actor threads were blocked with traces like: ``` "atlas-akka.actor.default-dispatcher-9" #18 prio=5 os_prio=0 tid=0x00007fb914010000 nid=0x6410 waiting on condition [0x00007fb9235f7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000838ae498> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at com.google.inject.internal.CycleDetectingLock$CycleDetectingLockFactory$ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle(CycleDetectingLock.java:164) at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:185) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085) at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015) at com.netflix.atlas.akka.HealthcheckApi.routes(HealthcheckApi.scala:36) at com.netflix.atlas.akka.RequestHandlerActor.receive(RequestHandlerActor.scala:41) at akka.actor.ActorCell.newActor(ActorCell.scala:568) at akka.actor.ActorCell.create(ActorCell.scala:588) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282) at akka.dispatch.Mailbox.run(Mailbox.scala:223) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ``` The user would see this as the application never fully starting and not being accessible. This should fix issue #612.
1 parent 7f660b3
Tip revision: b4a5a0d222507546701302d0514ace3e83014fb6 authored by brharrington on 05 June 2017, 22:18:13 UTC
fix possible deadlock during startup (#613)
fix possible deadlock during startup (#613)
Tip revision: b4a5a0d
File | Mode | Size |
---|---|---|
atlas-akka | ||
atlas-chart | ||
atlas-config | ||
atlas-core | ||
atlas-jmh | ||
atlas-json | ||
atlas-module-akka | ||
atlas-module-cloudwatch | ||
atlas-module-webapi | ||
atlas-poller | ||
atlas-poller-cloudwatch | ||
atlas-standalone | ||
atlas-test | ||
atlas-webapi | ||
atlas-wiki | ||
conf | ||
project | ||
scripts | ||
.gitignore | -rw-r--r-- | 242 bytes |
.travis.yml | -rw-r--r-- | 924 bytes |
LICENSE | -rw-r--r-- | 11.1 KB |
Makefile | -rw-r--r-- | 2.2 KB |
OSSMETADATA | -rw-r--r-- | 20 bytes |
README.md | -rw-r--r-- | 984 bytes |
build.sbt | -rw-r--r-- | 4.0 KB |
Computing file changes ...