https://github.com/Netflix/atlas

sort by:
Revision Author Date Message Commit Date
bf9a99d simplify log/power scale logic (#1009) Simplifies the logic for computing the log and power scales. Now it just applies the mapping function to the input bounds and the value prior to using a normal linear scale. 28 March 2019, 01:27:28 UTC
b5b9eae update dependencies (#1006) 20 March 2019, 03:03:18 UTC
824d9ef aws-java-sdk 1.11.521 20 March 2019, 02:52:29 UTC
d546872 equalsverifier 3.1.7 20 March 2019, 02:36:35 UTC
a0bc227 scalatest 3.0.7 20 March 2019, 02:34:44 UTC
f89c5d1 spectator 0.87.0 20 March 2019, 02:31:35 UTC
8c4d12f iep 2.0.0 20 March 2019, 02:29:02 UTC
5f95cba Collect additional aurora replica metrics (#1003) Add collection of `AuroraReplicaLag` and `AuroraReplicaLagMaximum` (fixes #1002). 07 March 2019, 20:52:38 UTC
bab1872 Improve CloudWatch metric lag handling (#1001) This is the first of potentially multiple commits to improve handling of CloudWatch metric lag. This first commit makes the time range of the query configurable and adds a metric to track the age (in CloudWatch periods) of the latest datapoint returned. This will enable assessing the distribution of ages across the namespaces and metrics collected. Those data will influence the approach for datapoints that are older than Atlas will accept. 01 March 2019, 23:09:05 UTC
ed1f1f2 update dependencies (#1000) 28 February 2019, 17:18:32 UTC
782583a log4j 2.11.2 28 February 2019, 17:01:47 UTC
77ce64f roaring bitmap 0.7.42 28 February 2019, 16:54:43 UTC
23d59a2 equalsverifier 3.1.5 28 February 2019, 16:53:49 UTC
7010f52 slf4j 1.7.26 28 February 2019, 16:48:17 UTC
ed81688 scalatest 3.0.6 28 February 2019, 16:36:21 UTC
47ea06c akka 2.5.21 28 February 2019, 16:34:16 UTC
28f7b3a spectator 0.86.0 28 February 2019, 15:10:39 UTC
29c2ad9 aws-java-sdk 1.11.501 28 February 2019, 15:08:43 UTC
6e1be7f Add NetworkELB TLS metrics (#997) See * https://aws.amazon.com/about-aws/whats-new/2019/01/network-load-balancer-now-supports-tls-termination/ * https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-cloudwatch-metrics.html 28 February 2019, 15:04:55 UTC
fd64f34 Add NATGateway Metrics (#998) See https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway-cloudwatch.html 28 February 2019, 02:15:28 UTC
9fb970d fix link on eval lib readme (#996) Reference to reactive streams publisher didn't match. 27 February 2019, 14:51:19 UTC
9ad7190 add helpers for accessing materializer for stage (#995) In some cases, such as calling `discardEntityBytes` for an HTTP response, it is useful to access the materializer for the stage so the stream blueprint can be created without needing to pass in a materializer. 22 February 2019, 06:17:23 UTC
b151b50 only use offset notation if there is duplication (#994) Adjusts the logic so that offset notation for the y-axis will only get used if there is an actual duplication for the major tick labels. See #991 for more information. 15 February 2019, 17:55:30 UTC
6c3e280 avoid BoxesRunTime.equals for SmallHashMap (#990) For the QueryIndex on the streaming clusters a hot spot is `SmallHashMap.get`. Flame graphs show a significant and unnecessary overhead being `BoxesRunTime.equals`. This change updates 7 places in the code to avoid that call: **Before** ``` $ javap -verbose ./atlas-core/target/scala-2.12/classes/com/netflix/atlas/core/util/SmallHashMap.class | grep Boxes 46: invokestatic #1101 // Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z 79: invokestatic #1101 // Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z 105: invokestatic #1101 // Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z 40: invokestatic #1155 // Method scala/runtime/BoxesRunTime.unboxToBoolean:(Ljava/lang/Object;)Z 29: invokestatic #1101 // Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z 59: invokestatic #1101 // Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z 17: invokestatic #1101 // Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z 2: invokestatic #1101 // Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z ``` **After** ``` $ javap -verbose ./atlas-core/target/scala-2.12/classes/com/netflix/atlas/core/util/SmallHashMap.class | grep Boxes 40: invokestatic #1154 // Method scala/runtime/BoxesRunTime.unboxToBoolean:(Ljava/lang/Object;)Z ``` 01 February 2019, 00:30:38 UTC
d7e92dc QueryIndex: reduce overhead for checking entries (#989) Flame graphs on the prod clusters show a bit of overhead for filter and exists calls on the list of entries. This change converts it to a simple array and avoids using the collections framework methods. For the existing JMH test this resulted in about a 12% improvement. 31 January 2019, 22:31:05 UTC
5f5166a add rolling-mean operator (#987) This is meant as an alternative to `:trend` that fixes a number of issues with that operator. Specifically: 1. The denominator for the average is the number of actual values, that is non-NaN entries, within the rolling buffer. The `:trend` operator always uses the window size which can create confusing drops because `NaN` values are effectively 0. 2. The minimum number of values permitted before emitting a mean can be specified by the user. 3. It is more consistent with other stateful operators in that it works on a window size relative to the step interval rather than a fixed time duration. 4. It is more consistent with similar operators in other tools such as the `rolling_mean` function provided by Panda. Fixes #958. 29 January 2019, 22:30:29 UTC
80322a9 fix procedure syntax warnings (#988) Procedure syntax is deprecated in 2.13 and results in a lot of warnings when trying to build on that version of Scala. 29 January 2019, 22:26:02 UTC
fb09dd5 update dependencies (#986) 29 January 2019, 20:36:22 UTC
d11f6aa equalsverifier 3.1.4 29 January 2019, 18:31:37 UTC
c262b24 roaring bitmap 0.7.36 29 January 2019, 18:27:49 UTC
52da9ed frigga 0.19.0 29 January 2019, 18:27:00 UTC
4cf5e49 iep 1.2.10 29 January 2019, 18:22:08 UTC
b965033 aws-java-sdk 1.11.482 29 January 2019, 18:21:06 UTC
27c5f33 spectator 0.83.0 29 January 2019, 18:10:14 UTC
8e2007e sbt-scalafmt 1.16 29 January 2019, 18:09:26 UTC
2acb1a3 akka-http 10.1.7 29 January 2019, 18:01:00 UTC
25016b1 akka 2.5.20 29 January 2019, 18:00:02 UTC
ca7ffed consistent state model for all stateful operators (#985) Updates all of the stateful operators to use the same online algorithm base classes. This also gives them a consistent representation of state that can easily be serialized and deserialized. This is a first step to possible future work of persisting the state of streaming evaluations so it can be replayed or the execution can be transitioned to another instance. 29 January 2019, 17:54:49 UTC
ccea49d refactor to avoid AssignOrNamedArg (#984) This class was renamed in scala 2.13 (scala/scala@870131b). 29 January 2019, 14:49:35 UTC
d295803 Throttle calls to CloudWatch (#983) We're hitting CloudWatch rate limits on a regular basis. However, the AWS limits should be sufficient for our overall per second call rate in the majority of cases. The current pattern of calls has bursts when the `Tick` message kicks off a collection, which causes the call rate to spike above the per second limit. This commit introduces call rate limiting to smooth out the request pattern. The Akka documentation for throttling request/response actor communication is incomplete. Through iteration playing around in a local toy app, I arrived at the implementation herein and confirmed that it works as expected. For this use case, it's important to ensure that either all or none of the `MetricMetadata` elements are added for processing. To satisfy that requirement, the full list is sent to the actor `Source` which then uses `flatMapConcat` to send each element individually through the throttle phase. This provides a stronger guarantee than the default, which could drop elements if the queue fills up. In practice, memory is more likely to be the limiting factor, given actors have unbounded mailboxes by default. Case in point, it was difficult to trigger the drop scenario in the local toy app. However, this approach more deterministically provides the stronger guarantee. 28 January 2019, 19:49:54 UTC
c56bc88 use jdk8 for building the scala 2.11 artifacts (#982) Since 2.11 doesn't support the `--release` option, building on a newer version can lead to errors when running on jdk8. Specifically the return type of some methods changed in jdk9+. This should fix errors like: ``` Cause: java.lang.NoSuchMethodError: java.nio.CharBuffer.clear()Ljava/nio/CharBuffer; at com.netflix.atlas.core.model.TaggedItem$.writePair(TaggedItem.scala:59) at com.netflix.atlas.core.model.TaggedItem$.computeId(TaggedItem.scala:105) ``` This means that image tests will not run for the 2.11 build. 23 January 2019, 22:23:54 UTC
8b8881f remove stat vars from output tags (#981) The stat vars are desired for substitutions (#878), but should not be included in the tag maps for the output. The output tags should be stable over time if evaluated incrementally. Including the stats breaks this because the values are dependent on the data for that time slice. 18 January 2019, 17:40:54 UTC
55b900a add helper function to validation a datasource (#980) This can be used as an upfront check to filter out bad data sources rather than getting the failure via the stream. 18 January 2019, 00:50:33 UTC
b014def update default grid colors (#979) This makes the grid colors lighter so they do not distract the viewer as much. These settings have been used internally for many years so this also reduces differences between the internal use at Netflix and OSS settings. 08 January 2019, 21:13:50 UTC
e10c574 use dedicated object for algo state (#978) Before it was using a Config object for convenience. This switches it to a dedicated object that can be easily used with `Json.encode/decode` or other similar tools. This should also be more efficient for the more common use-cases because we can avoid creation of the needless config objects. 04 January 2019, 19:20:31 UTC
14efd74 avoid conversion if already a ConfigValue (#977) In the docs site it is getting config 1.2 in the sbt classpath. There isn't an obvious way to force it to a newer version. The older version will fail if trying to convert a ConfigValue to a ConfigValue. For now we can workaround the problem by special casing that to avoid the unnecessary conversion. 04 January 2019, 00:43:09 UTC
87689ad set --release for javac (#976) After switching to use jdk11 for the build, the java classes were getting compiled to class version 55 instead of 52. 03 January 2019, 21:52:59 UTC
bb7b319 remove redis from travis config (#975) This is no longer needed for the Atlas build. 03 January 2019, 20:43:24 UTC
c9cad18 disable scaladoc publishing (#974) On JDK11 with `-release 8` it crashes with: ``` [error] java.lang.AssertionError: assertion failed: [error] type AnyRef in java.lang [error] while compiling: ... [error] during phase: globalPhase=terminal, enteringPhase=typer [error] library version: version 2.12.8 [error] compiler version: version 2.12.8 ``` This is a quick workaround as we do not rely on the published scaladoc jars for anything. 03 January 2019, 19:24:13 UTC
ceced7a build using openjdk11 (#973) This updates the travis builds to use OpenJDK 11. The `-release 8` option is used to ensure the generated bytecode will still work on JDK8. Due to font rendering differences, image tests will fail when running on older versions of the JDK or on operating systems other than Mac OS X. Those checks will now automatically be disabled on systems that are known to fail, but are checked as part of CI validation. 03 January 2019, 17:32:11 UTC
f6c0fb9 enable antialiasing by default (#972) Update the config settings to use antialiasing for the text by default. It is explicitly disabled for tests as it will frequently cause rendering differences across systems. 02 January 2019, 22:25:02 UTC
822beec use RobotoMono font for error images (#971) Follow up to #967. Use RobotoMono font for the png error image utility as well as the graphs. 02 January 2019, 21:14:58 UTC
4c10f3c sbt 1.2.8 (#970) Fixes occasional NPE for Bintray. https://developer.lightbend.com/blog/2018-12-30-sbt-1-2-8/ 02 January 2019, 18:35:26 UTC
34d291b update license headers for 2019 (#969) 02 January 2019, 18:05:06 UTC
de117e9 use standard IIOMetadata classes for PNG (#968) When using the `-release 8` option the internal `PNGMetadata` class is not found in the classpath. Update the usage to rely on the public APIs. 22 December 2018, 04:39:47 UTC
363f2fe switch to RobotoMono font (#967) The Lucida fonts are not included with OpenJDK and have been removed from OracleJDK in version 11. The RobotoMono family is Apache licensed and will now be used as the default to get a more consistent experience across JDK versions. 21 December 2018, 23:49:17 UTC
4662ced fix #852, inconsistent group by behavior (#966) Before, attempting a group by on non-grouped expressions without a math aggregate function would behave differently than a non-grouped expression with a math aggregate. Now they have the same behavior and the group by will be ignored for expression trees that do not support it. 21 December 2018, 21:08:31 UTC
22ae17f fix #763, custom consolidation with rewrites (#965) The rewrites that look like aggregation functions will now work with custom consolidations. If used the rewrite will not be preserved in the model. So the output of converting the parsed expression model to a string will be the expanded expression and not indicate the rewrite was used. 21 December 2018, 20:32:32 UTC
19bcfc0 fix #948, all zeros shown on y-axis (#964) If the upper bound exactly matched `10 * factor` for the selected unit prefix, then it would use a different selection to avoid large numbers for the tick labels. However, for an exact match it is better to use the default prefix to avoid getting zeros due to rounding with the larger prefix. 21 December 2018, 19:01:14 UTC
c4e8820 move PngImage from atlas-core to atlas-chart (#963) The image utilities are only used for charting and this makes it easier to get consistency with upcoming font changes. 21 December 2018, 17:31:14 UTC
b2a28d0 akka-http 10.1.6 (#961) Adds an explicit `akka-stream-testkit` dependency because it is no longer included with `akka-http-testkit`. 21 December 2018, 03:36:00 UTC
a8d0eed update to jackson 2.9.8 (#960) Has a number of security fixes: https://groups.google.com/forum/#!topic/jackson-user/8jdpNS1dQPQ https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.8 20 December 2018, 20:55:28 UTC
294e0ba update dependencies (#959) 14 December 2018, 22:29:27 UTC
f94046a sbt 1.2.7 14 December 2018, 20:47:40 UTC
f6a8180 sbt-release 1.0.10 14 December 2018, 20:45:53 UTC
e159a9c equalsverifier 3.0.3 14 December 2018, 19:25:32 UTC
21396ca aws-java-sdk 1.11.469 14 December 2018, 19:22:31 UTC
8572b63 spectator 0.82.0 14 December 2018, 19:16:59 UTC
ae0251a iep 1.2.9 14 December 2018, 19:13:44 UTC
1baffbe akka 2.5.19 14 December 2018, 18:57:39 UTC
1fd3d65 scala 2.12.8 14 December 2018, 18:55:55 UTC
79a8c29 use same hash for IntRefHashMap and IntIntMap (#957) Updates IntRefHashMap to just use the int value as the hash code just like IntIntMap and Integer.hashCode. This makes them more consistent. In tests on real data the overhead of computing the murmur hash outweighs the benefits it provides. 04 December 2018, 15:17:48 UTC
7c5b31b switch to new pattern matcher (#956) For more details see Netflix/spectator#651. 02 December 2018, 22:23:49 UTC
27e423d only use error images for browsers (#955) There are some programmatic use-cases for accessing images such as alerting emails and bots for including in chat apps. The current behavior makes it hard to detect errors with the images because it is embedded within the image text and has a 200 response code. This changes the graph api behavior to only use error images when the access appears to be coming from a web browser. 27 November 2018, 23:20:03 UTC
29991de avoid Option allocation for queries (#954) For the LWC bridge flame graphs show a lot of time being spent on accessing the values. Futher allocation profiles show a lot of allocations for the temporary Option that isn't needed. This change adds a special case that uses `SmallHashMap.getOrNull` when possible to avoid the extra overhead. 27 November 2018, 21:52:33 UTC
e56379c add check for invalid messages in the stream (#952) This will prevent the stream from failing and just log the invalid message that was received to help with debugging. If this does occur it isn't clear that the stream will recover, but there is a metric to detect that bad messsages were received. 13 November 2018, 16:23:11 UTC
8afc8a5 improve performance of findValues for common tags (#949) For tags that are repeated across many metrics, this change prunes the item set by doing a lookup for that key value. The `andNot` is quite a bit faster than iterating the matches. It mostly impacts queries that are projecting all values for a common tag without other restrictions. For the sample benchmark, findValuesAllOne is the only one to show a noticeable change as expected. Before it was about 4864.472 ops/s and after 357671.851 ops/s. Trying it on real data, it is mostly faster, but there are a handful of cases where it is a bit slower. The table below shows mean time per operation in milliseconds for a handful of common queries: | *URI* | *Before* | *After* | |----------------------|-----------|-----------| | `/nf.app` | 115.5 | 26.9 | | `/nf.cluster` | 102.0 | 66.3 | | `/nf.node` | 156.4 | 196.9 | | `/nf.account` | 128.6 | 2.0 | | `/nf.region` | 97.1 | 2.3 | | `/nf.zone` | 90.5 | 5.5 | | `/name` | 122.5 | 111.0 | Full URI is with prefix of `/api/v1/tags`. 04 November 2018, 01:31:32 UTC
49f9114 update dependencies (#947) 26 October 2018, 16:44:00 UTC
1b5f1ce equalsverifier 3.0 26 October 2018, 16:23:09 UTC
99fc788 iep 1.2.8 26 October 2018, 16:01:49 UTC
9b1307a joda-convert 2.1.2 26 October 2018, 16:00:53 UTC
1df440b spectator 0.79.0 26 October 2018, 16:00:09 UTC
cfcaa84 akka 2.5.17 26 October 2018, 15:47:00 UTC
4dc64e6 aws-java-sdk 1.11.435 26 October 2018, 15:25:24 UTC
c6f7551 add config setting for max tags permitted (#946) For general use it is not recommended to change this, but making it configurable to facilitate experimentation and testing. 24 October 2018, 21:45:40 UTC
67dbb04 disable use of cache for ImageIO (#945) Should avoid tmp files that get created by default when rendering images. Also helps avoid problems on shared clusters when /tmp fills up. 24 October 2018, 21:45:18 UTC
d37e4a0 fix flakey test case (#944) If the stream did not start fast enough and pull on the queue source, then the first future will be inserted in the queue rather than passed through and the second future could be dropped. 19 October 2018, 22:52:23 UTC
03b418b optimize json encoding for lwc datapoint (#943) Avoid using databind to automatically encode as this is a hot path that is called many times on lwcapi for sending the datapoints to the user. 19 October 2018, 20:51:32 UTC
d4bfec5 reduce overhead for creating hash strings (#942) Avoids creating a BigInteger object from the hash array just for the purposes of creating the hex string. 19 October 2018, 19:41:37 UTC
9f2453b extract the step size from the uri (#941) If a step size is not explicitly supplied as part of the data source, then try to extract it based on the step param of the uri before falling back to the default of 1m. This allows `Evaluator.createPublisher` to work with a custom step size. 17 October 2018, 22:55:07 UTC
a7a422b switch to source based on BlockingQueue (#939) The default `Source.queue` has quite a bit of overhead and the input must be throttled based on the future to bound the memory use. `Source.actorRef` has a bit less overhead, but has the same unbounded memory use if it cannot keep up and does not provide a reasonable way to throttle the input. This change uses a custom source stage based on the java ArrayBlockingQueue. There is significantly less overhead and the memory usage is bounded even with an offer and forget usage style. 17 October 2018, 00:18:10 UTC
d5bd34b remove SSERenderable classes (#938) Simplify the messages returned back from the stream api and use the same model objects for both the server and the client. 13 October 2018, 22:27:53 UTC
9ae3eaf make queue size configurable (#937) Add `atlas.lwcapi.queue-size` setting for controlling the max number of items to queue up per stream before dropping. 13 October 2018, 21:12:06 UTC
94ccdb7 only send LwcSubscription for new subscriptions (#936) When running for a lot of expressions this can be a significant number of messages. In most cases the set of subscriptions do not change that often. 12 October 2018, 14:03:24 UTC
4843a9e do not request gzip for /stream call (#935) Current testing indicates the largest chunk of CPU time server side is in the GZIP compression. The akka directive doesn't easily allow for configuring it with `BEST_SPEED` instead of `BEST_COMPRESSION`. Further on the client side, the largest chunk of allocations is for the decompression. Overall it seems preferable right now to use more bandwidth and avoid the compression. Will revisit after web-socket change and switching to binary encoding for stream. 11 October 2018, 22:30:27 UTC
74bab04 improve early expr validation for eval lib (#934) Adds sanity checks that the query can be efficiently indexed with at least one exact match clause. Overly broad tag keys that would apply to almost all datapoints can be excluded via the configuration. 11 October 2018, 17:15:21 UTC
5b2c708 common helper for expanding :in clauses (#933) Makes it easier to reuse some of this logic for static analysis of queries. 11 October 2018, 15:46:42 UTC
df7ac1a pre-compute data expr map (#932) Computes the data expr map when the data sources are updated instead of when doing a lookup for logging. Reduces the overhead later in the stream if there are a lot of diagnostic messages. 10 October 2018, 00:30:00 UTC
13eb6d4 expand style offset for lwcapi (#931) Before the style variant of offset would not get expanded and the expression would appear to work. 09 October 2018, 14:09:56 UTC
a6f7bd1 store step size for TimeGroup (#929) If it is flushed via a heartbeat instead of data, then it could be empty and the step size cannot be extracted from the data. 05 October 2018, 00:31:22 UTC
back to top