https://github.com/Netflix/atlas

sort by:
Revision Author Date Message Commit Date
d11f6aa equalsverifier 3.1.4 29 January 2019, 18:31:37 UTC
c262b24 roaring bitmap 0.7.36 29 January 2019, 18:27:49 UTC
52da9ed frigga 0.19.0 29 January 2019, 18:27:00 UTC
4cf5e49 iep 1.2.10 29 January 2019, 18:22:08 UTC
b965033 aws-java-sdk 1.11.482 29 January 2019, 18:21:06 UTC
27c5f33 spectator 0.83.0 29 January 2019, 18:10:14 UTC
8e2007e sbt-scalafmt 1.16 29 January 2019, 18:09:26 UTC
2acb1a3 akka-http 10.1.7 29 January 2019, 18:01:00 UTC
25016b1 akka 2.5.20 29 January 2019, 18:00:02 UTC
ca7ffed consistent state model for all stateful operators (#985) Updates all of the stateful operators to use the same online algorithm base classes. This also gives them a consistent representation of state that can easily be serialized and deserialized. This is a first step to possible future work of persisting the state of streaming evaluations so it can be replayed or the execution can be transitioned to another instance. 29 January 2019, 17:54:49 UTC
ccea49d refactor to avoid AssignOrNamedArg (#984) This class was renamed in scala 2.13 (scala/scala@870131b). 29 January 2019, 14:49:35 UTC
d295803 Throttle calls to CloudWatch (#983) We're hitting CloudWatch rate limits on a regular basis. However, the AWS limits should be sufficient for our overall per second call rate in the majority of cases. The current pattern of calls has bursts when the `Tick` message kicks off a collection, which causes the call rate to spike above the per second limit. This commit introduces call rate limiting to smooth out the request pattern. The Akka documentation for throttling request/response actor communication is incomplete. Through iteration playing around in a local toy app, I arrived at the implementation herein and confirmed that it works as expected. For this use case, it's important to ensure that either all or none of the `MetricMetadata` elements are added for processing. To satisfy that requirement, the full list is sent to the actor `Source` which then uses `flatMapConcat` to send each element individually through the throttle phase. This provides a stronger guarantee than the default, which could drop elements if the queue fills up. In practice, memory is more likely to be the limiting factor, given actors have unbounded mailboxes by default. Case in point, it was difficult to trigger the drop scenario in the local toy app. However, this approach more deterministically provides the stronger guarantee. 28 January 2019, 19:49:54 UTC
c56bc88 use jdk8 for building the scala 2.11 artifacts (#982) Since 2.11 doesn't support the `--release` option, building on a newer version can lead to errors when running on jdk8. Specifically the return type of some methods changed in jdk9+. This should fix errors like: ``` Cause: java.lang.NoSuchMethodError: java.nio.CharBuffer.clear()Ljava/nio/CharBuffer; at com.netflix.atlas.core.model.TaggedItem$.writePair(TaggedItem.scala:59) at com.netflix.atlas.core.model.TaggedItem$.computeId(TaggedItem.scala:105) ``` This means that image tests will not run for the 2.11 build. 23 January 2019, 22:23:54 UTC
8b8881f remove stat vars from output tags (#981) The stat vars are desired for substitutions (#878), but should not be included in the tag maps for the output. The output tags should be stable over time if evaluated incrementally. Including the stats breaks this because the values are dependent on the data for that time slice. 18 January 2019, 17:40:54 UTC
55b900a add helper function to validation a datasource (#980) This can be used as an upfront check to filter out bad data sources rather than getting the failure via the stream. 18 January 2019, 00:50:33 UTC
b014def update default grid colors (#979) This makes the grid colors lighter so they do not distract the viewer as much. These settings have been used internally for many years so this also reduces differences between the internal use at Netflix and OSS settings. 08 January 2019, 21:13:50 UTC
e10c574 use dedicated object for algo state (#978) Before it was using a Config object for convenience. This switches it to a dedicated object that can be easily used with `Json.encode/decode` or other similar tools. This should also be more efficient for the more common use-cases because we can avoid creation of the needless config objects. 04 January 2019, 19:20:31 UTC
14efd74 avoid conversion if already a ConfigValue (#977) In the docs site it is getting config 1.2 in the sbt classpath. There isn't an obvious way to force it to a newer version. The older version will fail if trying to convert a ConfigValue to a ConfigValue. For now we can workaround the problem by special casing that to avoid the unnecessary conversion. 04 January 2019, 00:43:09 UTC
87689ad set --release for javac (#976) After switching to use jdk11 for the build, the java classes were getting compiled to class version 55 instead of 52. 03 January 2019, 21:52:59 UTC
bb7b319 remove redis from travis config (#975) This is no longer needed for the Atlas build. 03 January 2019, 20:43:24 UTC
c9cad18 disable scaladoc publishing (#974) On JDK11 with `-release 8` it crashes with: ``` [error] java.lang.AssertionError: assertion failed: [error] type AnyRef in java.lang [error] while compiling: ... [error] during phase: globalPhase=terminal, enteringPhase=typer [error] library version: version 2.12.8 [error] compiler version: version 2.12.8 ``` This is a quick workaround as we do not rely on the published scaladoc jars for anything. 03 January 2019, 19:24:13 UTC
ceced7a build using openjdk11 (#973) This updates the travis builds to use OpenJDK 11. The `-release 8` option is used to ensure the generated bytecode will still work on JDK8. Due to font rendering differences, image tests will fail when running on older versions of the JDK or on operating systems other than Mac OS X. Those checks will now automatically be disabled on systems that are known to fail, but are checked as part of CI validation. 03 January 2019, 17:32:11 UTC
f6c0fb9 enable antialiasing by default (#972) Update the config settings to use antialiasing for the text by default. It is explicitly disabled for tests as it will frequently cause rendering differences across systems. 02 January 2019, 22:25:02 UTC
822beec use RobotoMono font for error images (#971) Follow up to #967. Use RobotoMono font for the png error image utility as well as the graphs. 02 January 2019, 21:14:58 UTC
4c10f3c sbt 1.2.8 (#970) Fixes occasional NPE for Bintray. https://developer.lightbend.com/blog/2018-12-30-sbt-1-2-8/ 02 January 2019, 18:35:26 UTC
34d291b update license headers for 2019 (#969) 02 January 2019, 18:05:06 UTC
de117e9 use standard IIOMetadata classes for PNG (#968) When using the `-release 8` option the internal `PNGMetadata` class is not found in the classpath. Update the usage to rely on the public APIs. 22 December 2018, 04:39:47 UTC
363f2fe switch to RobotoMono font (#967) The Lucida fonts are not included with OpenJDK and have been removed from OracleJDK in version 11. The RobotoMono family is Apache licensed and will now be used as the default to get a more consistent experience across JDK versions. 21 December 2018, 23:49:17 UTC
4662ced fix #852, inconsistent group by behavior (#966) Before, attempting a group by on non-grouped expressions without a math aggregate function would behave differently than a non-grouped expression with a math aggregate. Now they have the same behavior and the group by will be ignored for expression trees that do not support it. 21 December 2018, 21:08:31 UTC
22ae17f fix #763, custom consolidation with rewrites (#965) The rewrites that look like aggregation functions will now work with custom consolidations. If used the rewrite will not be preserved in the model. So the output of converting the parsed expression model to a string will be the expanded expression and not indicate the rewrite was used. 21 December 2018, 20:32:32 UTC
19bcfc0 fix #948, all zeros shown on y-axis (#964) If the upper bound exactly matched `10 * factor` for the selected unit prefix, then it would use a different selection to avoid large numbers for the tick labels. However, for an exact match it is better to use the default prefix to avoid getting zeros due to rounding with the larger prefix. 21 December 2018, 19:01:14 UTC
c4e8820 move PngImage from atlas-core to atlas-chart (#963) The image utilities are only used for charting and this makes it easier to get consistency with upcoming font changes. 21 December 2018, 17:31:14 UTC
b2a28d0 akka-http 10.1.6 (#961) Adds an explicit `akka-stream-testkit` dependency because it is no longer included with `akka-http-testkit`. 21 December 2018, 03:36:00 UTC
a8d0eed update to jackson 2.9.8 (#960) Has a number of security fixes: https://groups.google.com/forum/#!topic/jackson-user/8jdpNS1dQPQ https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.8 20 December 2018, 20:55:28 UTC
294e0ba update dependencies (#959) 14 December 2018, 22:29:27 UTC
f94046a sbt 1.2.7 14 December 2018, 20:47:40 UTC
f6a8180 sbt-release 1.0.10 14 December 2018, 20:45:53 UTC
e159a9c equalsverifier 3.0.3 14 December 2018, 19:25:32 UTC
21396ca aws-java-sdk 1.11.469 14 December 2018, 19:22:31 UTC
8572b63 spectator 0.82.0 14 December 2018, 19:16:59 UTC
ae0251a iep 1.2.9 14 December 2018, 19:13:44 UTC
1baffbe akka 2.5.19 14 December 2018, 18:57:39 UTC
1fd3d65 scala 2.12.8 14 December 2018, 18:55:55 UTC
79a8c29 use same hash for IntRefHashMap and IntIntMap (#957) Updates IntRefHashMap to just use the int value as the hash code just like IntIntMap and Integer.hashCode. This makes them more consistent. In tests on real data the overhead of computing the murmur hash outweighs the benefits it provides. 04 December 2018, 15:17:48 UTC
7c5b31b switch to new pattern matcher (#956) For more details see Netflix/spectator#651. 02 December 2018, 22:23:49 UTC
27e423d only use error images for browsers (#955) There are some programmatic use-cases for accessing images such as alerting emails and bots for including in chat apps. The current behavior makes it hard to detect errors with the images because it is embedded within the image text and has a 200 response code. This changes the graph api behavior to only use error images when the access appears to be coming from a web browser. 27 November 2018, 23:20:03 UTC
29991de avoid Option allocation for queries (#954) For the LWC bridge flame graphs show a lot of time being spent on accessing the values. Futher allocation profiles show a lot of allocations for the temporary Option that isn't needed. This change adds a special case that uses `SmallHashMap.getOrNull` when possible to avoid the extra overhead. 27 November 2018, 21:52:33 UTC
e56379c add check for invalid messages in the stream (#952) This will prevent the stream from failing and just log the invalid message that was received to help with debugging. If this does occur it isn't clear that the stream will recover, but there is a metric to detect that bad messsages were received. 13 November 2018, 16:23:11 UTC
8afc8a5 improve performance of findValues for common tags (#949) For tags that are repeated across many metrics, this change prunes the item set by doing a lookup for that key value. The `andNot` is quite a bit faster than iterating the matches. It mostly impacts queries that are projecting all values for a common tag without other restrictions. For the sample benchmark, findValuesAllOne is the only one to show a noticeable change as expected. Before it was about 4864.472 ops/s and after 357671.851 ops/s. Trying it on real data, it is mostly faster, but there are a handful of cases where it is a bit slower. The table below shows mean time per operation in milliseconds for a handful of common queries: | *URI* | *Before* | *After* | |----------------------|-----------|-----------| | `/nf.app` | 115.5 | 26.9 | | `/nf.cluster` | 102.0 | 66.3 | | `/nf.node` | 156.4 | 196.9 | | `/nf.account` | 128.6 | 2.0 | | `/nf.region` | 97.1 | 2.3 | | `/nf.zone` | 90.5 | 5.5 | | `/name` | 122.5 | 111.0 | Full URI is with prefix of `/api/v1/tags`. 04 November 2018, 01:31:32 UTC
49f9114 update dependencies (#947) 26 October 2018, 16:44:00 UTC
1b5f1ce equalsverifier 3.0 26 October 2018, 16:23:09 UTC
99fc788 iep 1.2.8 26 October 2018, 16:01:49 UTC
9b1307a joda-convert 2.1.2 26 October 2018, 16:00:53 UTC
1df440b spectator 0.79.0 26 October 2018, 16:00:09 UTC
cfcaa84 akka 2.5.17 26 October 2018, 15:47:00 UTC
4dc64e6 aws-java-sdk 1.11.435 26 October 2018, 15:25:24 UTC
c6f7551 add config setting for max tags permitted (#946) For general use it is not recommended to change this, but making it configurable to facilitate experimentation and testing. 24 October 2018, 21:45:40 UTC
67dbb04 disable use of cache for ImageIO (#945) Should avoid tmp files that get created by default when rendering images. Also helps avoid problems on shared clusters when /tmp fills up. 24 October 2018, 21:45:18 UTC
d37e4a0 fix flakey test case (#944) If the stream did not start fast enough and pull on the queue source, then the first future will be inserted in the queue rather than passed through and the second future could be dropped. 19 October 2018, 22:52:23 UTC
03b418b optimize json encoding for lwc datapoint (#943) Avoid using databind to automatically encode as this is a hot path that is called many times on lwcapi for sending the datapoints to the user. 19 October 2018, 20:51:32 UTC
d4bfec5 reduce overhead for creating hash strings (#942) Avoids creating a BigInteger object from the hash array just for the purposes of creating the hex string. 19 October 2018, 19:41:37 UTC
9f2453b extract the step size from the uri (#941) If a step size is not explicitly supplied as part of the data source, then try to extract it based on the step param of the uri before falling back to the default of 1m. This allows `Evaluator.createPublisher` to work with a custom step size. 17 October 2018, 22:55:07 UTC
a7a422b switch to source based on BlockingQueue (#939) The default `Source.queue` has quite a bit of overhead and the input must be throttled based on the future to bound the memory use. `Source.actorRef` has a bit less overhead, but has the same unbounded memory use if it cannot keep up and does not provide a reasonable way to throttle the input. This change uses a custom source stage based on the java ArrayBlockingQueue. There is significantly less overhead and the memory usage is bounded even with an offer and forget usage style. 17 October 2018, 00:18:10 UTC
d5bd34b remove SSERenderable classes (#938) Simplify the messages returned back from the stream api and use the same model objects for both the server and the client. 13 October 2018, 22:27:53 UTC
9ae3eaf make queue size configurable (#937) Add `atlas.lwcapi.queue-size` setting for controlling the max number of items to queue up per stream before dropping. 13 October 2018, 21:12:06 UTC
94ccdb7 only send LwcSubscription for new subscriptions (#936) When running for a lot of expressions this can be a significant number of messages. In most cases the set of subscriptions do not change that often. 12 October 2018, 14:03:24 UTC
4843a9e do not request gzip for /stream call (#935) Current testing indicates the largest chunk of CPU time server side is in the GZIP compression. The akka directive doesn't easily allow for configuring it with `BEST_SPEED` instead of `BEST_COMPRESSION`. Further on the client side, the largest chunk of allocations is for the decompression. Overall it seems preferable right now to use more bandwidth and avoid the compression. Will revisit after web-socket change and switching to binary encoding for stream. 11 October 2018, 22:30:27 UTC
74bab04 improve early expr validation for eval lib (#934) Adds sanity checks that the query can be efficiently indexed with at least one exact match clause. Overly broad tag keys that would apply to almost all datapoints can be excluded via the configuration. 11 October 2018, 17:15:21 UTC
5b2c708 common helper for expanding :in clauses (#933) Makes it easier to reuse some of this logic for static analysis of queries. 11 October 2018, 15:46:42 UTC
df7ac1a pre-compute data expr map (#932) Computes the data expr map when the data sources are updated instead of when doing a lookup for logging. Reduces the overhead later in the stream if there are a lot of diagnostic messages. 10 October 2018, 00:30:00 UTC
13eb6d4 expand style offset for lwcapi (#931) Before the style variant of offset would not get expanded and the expression would appear to work. 09 October 2018, 14:09:56 UTC
a6f7bd1 store step size for TimeGroup (#929) If it is flushed via a heartbeat instead of data, then it could be empty and the step size cannot be extracted from the data. 05 October 2018, 00:31:22 UTC
28948f2 reduce duplication for SSE to chunk part (#928) In stream api move the mapping from SSERenderable to a ChunkStreamPart to after the heartbeat source has been merged with the queue of data coming from the evaluate api. 04 October 2018, 21:18:00 UTC
c2393bf lwc: server to client time notifications (#927) The eval client for consuming LWC data uses the timestamps in the messages so it can be used to run on live or captured data that may have old timestamps. If there is a subscription to an expression that doesn't match any data, then no messages will go through and thus the time grouping will not be flushed for a given interval. This change allows the server to send heartbeats with the timestamp and step so it can ensure there will always be at least one message coming though for a given stream. When running on previously captured data it will also include the heartbeat messages and exhibit the same behavior. 04 October 2018, 18:13:44 UTC
acb2f96 fix pull for diagnostic messages (#926) Since the diagnostic message is not being pushed to the stream it should always pull. Test case has been updated to catch the problem where before processing would stop after the first diagnostic message. 03 October 2018, 21:59:41 UTC
5255c95 diagnostic messages from publisher to subscriber (#925) Adds support for diagnostic messages to be sent to the evaluate api along with the data values. This can be used for the publisher to send diagnostic messages that will be received by a particular subscriber. An example use-case is to reject an expression due to the amount of load already on the publisher. 03 October 2018, 18:15:56 UTC
e794f17 initial websocket support for lwcapi (#924) Adds a websocket endpoint `/api/v1/subscribe` that can be used for both updating the set of subscriptions and receiving the stream of data. This also begins some changes to help improve consistency with other Atlas APIs, namely the messages rename `frequency` to `step` and the `/lwc` prefix is removed from the endpoint path. For now there is an alias so the eval client will still work with the messages coming from older lwcapi services running. This change also use the model objects from the eval client on the server side for the websocket endpoint to help ensure better consistency of the messages in the future. The old endpoints have not been changed to preserve compatibility during the transition. Current plan: 1) Deploy updated lwcapi service with the new endpoint. The client will continue to work as before. 2) Update the eval client and get internal usages migrated. 3) Cleanup the lwcapi to remove the old subscribe/stream endpoints. This should be the final big change before releasing 1.6. 28 September 2018, 23:41:31 UTC
280b115 add support for @JsonAlias on case class params (#923) Allows the user to specify a set of aliases to use when deserializing the case class. 28 September 2018, 17:23:38 UTC
a0bfbaa update to scala 2.12.7 (#922) It is supposed to improve compiler performance. 27 September 2018, 13:49:41 UTC
4a1e9dc avoid unnecessary string allocations (#921) Refactor stream to avoid decoding into a String as an intermediate step before parsing the JSON payloads. This yields about a 16% reduction in bytes allocated per message for the sample data tested. 26 September 2018, 22:15:50 UTC
2b2e11e Report `NaN` until first datapoint (#920) Report `NaN` until first datapoint for metric categories with a timeout configured. I reordered the tests to more closely follow the actual flow of data from startup and the progression to no data for a deleted resource. I've also noted the rationale for interpolating `0` values for gaps in CloudWatch data. 26 September 2018, 21:23:37 UTC
e2bc0e9 add :delay, :rolling-min, and :rolling-max (#919) Add operators requested for alerting use-cases: - `delay`: can be used as an alternative to `:offset` for use-cases short offsets that need to run in the streaming path. Also it can be applied to any time series instead of just data expressions. - `rolling-min`: track the minimum value seen within a window. For data that is mostly smooth, but has some noise this can be a useful way to get a reasonable lower bound with little tuning required. - `rolling-max`: similar to `rolling-min` only using the max to get an upper bound. Also refactors DES operators to use a new common base trait that works for any implementation of OnlineAlgorithm. 25 September 2018, 22:51:14 UTC
34b0fe6 add ignore and pipeline helpers for step alignment (#918) Adds an online algorithm to ignore the first N values. This can be used with sliding DES to align to a step boundary before starting the DES computation. Also adds a pipeline that can be used to chain together a sequence of online algorithms while conforming to the same interface. 25 September 2018, 17:41:33 UTC
a967d09 fix behavior for aggregating count (#917) The aggregate datapoint will already have been converted from a raw value to a base count at the source. So when it is received here it should behave just like a simple sum. Before it was converting the datapoint to a 1 or 0 when aggregating as well which can lead to drastically smaller count than expected if the source has many time series for the query. 25 September 2018, 14:08:18 UTC
de4d678 Minor improvement to `MetricCategory` doc (#916) Improves the wording of the `timeout` parameter doc vs previous commit: CloudWatch will return 0 metrics for at least two cases: - No metrics were recorded. - The resource has been removed, metrics still show up when listing metrics due to the retention window, but the specified time interval for the metric statistics request is after the removal. 25 September 2018, 01:59:52 UTC
5715951 Add optional CloudWatch metric timeout (#915) This commit adds a mechanism to specify how long the system should interpolate a base value for unreported CloudWatch metrics before ceasing to send them. CloudWatch will return 0 metrics in at least two cases: 1. No metrics were recorded. 2. The resource has been removed, but metrics for it still fall within the retention window. I've made this optional and configurable at the metric category level to tactically address the issue for RDS reported by a user and to minimize visual clutter in the config. Depending on usage in practice, we may want to require it to be specified for every category or lift the scope to make it consistent at the global level. 25 September 2018, 00:08:52 UTC
2c56abe Add note about `$percentile` to percentiles description (#914) 24 September 2018, 20:03:05 UTC
fd742e5 jackson 2.9.7 (#913) Addresses some additional CVEs. 24 September 2018, 15:23:16 UTC
5f472f8 add helper to compute a delayed signal (#912) Delays the values by the window size. This is similar to the `:offset` operator except that it can be applied to any input line instead of just changing the time window fetched with a DataExpr. Short delays can be useful for alerting to detect changes in slightly shifted trend lines. This change adds the online algorithm implementation, the actual StatefulExpr will be done as a follow up. 22 September 2018, 22:09:25 UTC
c230585 update des and sliding des to use base trait (#911) Make the DES operations use the base trait adding in #910. 22 September 2018, 21:53:22 UTC
492cf4e add helpers for rolling min/max (#910) Also sets up a base OnlineAlgorithm trait that can be used to get better reuse consuming the operations. The state can be captured as a Config object to make it easy to save the previous state and restore if needed. Will refactor DES helpers to use the same pattern in a follow up PR. 22 September 2018, 21:03:02 UTC
1cfc0cd maintain grouping by DataExpr (#909) The TimeGrouped stage needs to group by the DataExpr to aggregate the values as they arrive. It was then flattening these out to keep the stage output the same. Since the only consumer is FinalExprEval and it needs them grouped by DataExpr, this change maintains the grouping and passes it along. 21 September 2018, 17:29:08 UTC
e0da7b2 aggregate datapoints while grouping (#908) This helps reduce the amount of memory required when evaluating an expression. If a 1k node cluster is sending data for a single sum expression, then it will now result in a single value being maintained in the time grouping stage rather than keeping one per node until the final evaluation. 20 September 2018, 17:03:41 UTC
22e39ba Fix reporting of AWS/NetworkELB ConsumedLCUs (#907) `ConsumedLCUs` is available only on the `LoadBalancer` dimension, not `LoadBalancer` and `AvailabilityZone`. 19 September 2018, 02:56:28 UTC
dbd3e05 add hook to control startup delay for lwcapi (#906) This delay is used to keep the service in an unhealthy state for a specified window. Clients that publish data use the load balancer and will not start publishing until it is healthy. The eval library that does the subscriptions will attemtp to connect once it detects the new instance. This delay means that subscriptions should be in place before data starts flowing. 19 September 2018, 01:51:43 UTC
061e158 dependency updates (#905) 18 September 2018, 22:31:15 UTC
4a3c241 sbt 1.2.3 18 September 2018, 22:13:58 UTC
c9194bc add support for configurable diagnostic headers (#904) This can be used to add headers such as `Netflix-ASG` or `Netflix-Zone` that are used by common IPC to provide additional context. 18 September 2018, 22:13:19 UTC
5b3b387 roaringbitmap 0.7.17 18 September 2018, 21:56:14 UTC
85bbd4d equalsverifier 2.5.2 18 September 2018, 21:55:17 UTC
back to top