Revision history - None - origin: https://github.com/shader-slang/slang

visit type:

Revision	Author	Date	Message	Commit Date
fbac017	jsmall-nvidia	14 April 2020, 21:00:11 UTC	CUDA global scope initialization of arrays without function calls. (#1320) * Fix CUDA output of a static const array if values are all literals. * Fix bug in Convert definition. * Output makeArray such that is deconstructed on CUDA to fill in based on what the target type is. Tries to expand such that there are no function calls so that static const global scope definitions work. * Fix unbounded-array-of-array-syntax.slang to work correctly on CUDA. * Remove tabs. * Check works with static const vector/matrix. * Fix typo in type comparison. * Shorten _areEquivalent test. * Rename _emitInitializerList. Some small comment fixes. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	14 April 2020, 21:00:11 UTC
cbdee1b	Tim Foley	14 April 2020, 17:55:10 UTC	Fix front-end handling of generic static methods (#1319) * Fix front-end handling of generic static methods The front-end logic that was testing if a member was usable as a static member neglected to unwrap any generic-ness and look at the declaration inside (the parser currently puts all modifiers on the inner declaration instead of the outer generic). The test case included here is not a full compute test so that it only runs the front-end checking logic (where we had the bug). * fixup: tabs->spaces	14 April 2020, 17:55:10 UTC
79f6a01	Tim Foley	14 April 2020, 15:48:54 UTC	Change rules for layout of buffers/blocks containing only interface types (#1318) TL;DR: This is a tweak the rules for layout that only affects a corner case for people who actually use `interface`-type shader parameters (which for now is just our own test cases). The tweaked rules seem like they make it easier to write the application code for interfacing with Slang, but even if we change our minds later the risk here should be low (again: nobody is using this stuff right now). Slang already has a rule that a constant buffer that contains no ordinary/uniform data doesn't actually allocate a constant buffer `binding`/`register`: struct A { float4 x; Texture2D y; } // has uniform/ordinary data struct B { Texture2D u; SamplerState v; } // has none ConstantBuffer<A> gA; // gets a constant buffer register/binding ConstantBuffer<B> gB; // does not There is similar logic for `ParameterBlock`, where the feature makes more sense. A user would be somewhat surprised if they declared a parmaeter block with a texture and a sampler in it, but then the generating code reserved Vulkan `binding=0` for a constant buffer they never asked for. The behavior in the case of a plain `ConstantBuffer` is chosen to be consistent with the parameter block case. (Aside: all of this is a non-issue for targets with direct support for pointers, like CUDA and CPU. On those platforms a constant buffer or parameter block always translates to a pointer to the contained data.) Now, suppose the user declares a constant buffer with an interface type in it: interface IFoo { ... } ConstantBuffer<IFoo> gBuffer; When the layout logic sees the declaration of `gBuffer` it doesn't yet know what type will be plugged in as `IFoo` there. Will it contain uniform/ordinary data, such that a constant buffer is needed? The existing logic in the type layout step implemented a complicated rule that amounted to: * A `ConstantBuffer` or `cbuffer` that only contains `interface`/existential-type data will not be allocated a constant buffer `register`/`binding` during the initial layout process (on unspecialized code). That means that any resources declared after it will take the next consecutive `register`/`binding` without leaving any "gap" for the `ConstantBuffer` variable. * After specialization (e.g., when we know that `Thing` should be plugged in for `IFoo`), if we discover that there is uniform/ordinary data in `Thing` then we will allocate a constant buffer `register`/`binding` for the `ConstantBuffer`, but that register/binding will necessarily come after any `register`s/`binding`s that were allocated to parameters during the first pass. * Parameter blocks were intended to work the same when when it comes to whether or not they allocate a default `space`/`set`, but that logic appears to not have worked as intended. These rules make some logical sense: a `ConstantBuffer` declaration only pays for what the element type actually needs, and if that changes due to specialization then the new resource allocation comes after the unspecialized resources (so that the locations of unspecialized parameters are stable across specializations). The problem is that in practice it is almost impossible to write client application code that uses the Slang reflection API and makes reasonable choices in the presence of these rules. A general-purpose `ShaderObject` abstraction in application code ends up having to deal with multiple possible states that an object could be in: 1. An object where the element type `E` contains no uniform/ordinary data, and no interface/existential fields, so a constant buffer doesn't need to be allocated or bound. 2. An object where the element type `E` contains no uniform/ordinary data, but has interace/existential fields, with two sub-cases: a. When no values bound to interface/existential fields use uniform/ordinary dat, then the parent object must not bind a buffer b. When the type of value bound to an interface/existential field uses uniform/ordinary data, then the parent object needs to have a buffer allocated, and bind it. 3. When the element type `E` contains uniform/ordinary data, then a buffer should be allocated and bound (although its size/contents may change as interface/existential fields get re-bound) Needing to deal with a possible shift between cases (2a) and (2b) based on what gets bound at runtime is a mess, and it is important to note that even though both (2a) and (3) require a buffer to be bound, the rules about where the buffer gets bound aren't consistent (so that the application needs to undrestand the distinction between "primary" and "pending" data in a type layout). This change introduces a different rule, which seems to be more complicated to explain, but actually seems to simplify things for the application: * A `ConstantBuffer` or `cbuffer` that only contains `interface`/existential-type data always has a constant buffer `register`/`binding` allocated for it "just in case." * If after specialization there is any uniform/ordinary data, then that will use the buffer `register`/`binding` that was already allocated (that's easy enough). * If after speciazliation there isn't any uniform/ordinary data, then the generated HLSL/GLSL shader code won't declare a buffer, but the `register`/`binding` is still claimed. * A `ParameterBlock` behaves equivalently, so that if it contains any `interface`/existential fields, then it will always allocate a `space`/`set` "just in case" The effect of these rules is to streamline the cases that an application needs to deal with down to two: 1. If the element type `E` of a shader object contains no uniform/ordinary or interface/existential fields, then no buffer needs to be allocated or bound 2. If the element type `E` contains any uniform/ordinary or interface/existential fields, then it is always safe to allocate and bind a buffer (even in the cases where it might be ignored). Furthermore, the reflection data for the constant buffer `register`/`binding` becomes consistent in case (2), so that the application can always expect to find it in the same way.	14 April 2020, 15:48:54 UTC
b2c9fcc	jsmall-nvidia	13 April 2020, 16:34:20 UTC	Remove Not constant folding - because it doesn't take into account the type change. (#1317) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	13 April 2020, 16:34:20 UTC
4a5c606	Tim Foley	10 April 2020, 21:15:08 UTC	Fix CUDA build of render-test (#1316) The CUDA build of the render-test tool had been broken in a fixup change to #1307 (which was ostensibly adding features for the CUDA path). The fix is a simple one-liner.	10 April 2020, 21:15:08 UTC
2d16fcd	Tim Foley	10 April 2020, 16:20:36 UTC	Fix crashing bug when using overloaded name as generic arg (#1315) If somebody defines two `struct` types with the same name: ```hlsl struct A {} // ... struct A {} ``` and then tries to use that name when specializing a generic function: ```hlsl void doThing<T>() { ... } // ... doThing<A>(); ``` then the Slang front-end currently crashes, which leads to it not diagnosing the original problem (the conflicting declarations of `A`). This change fixes up the checking of generic arguments so that it properly fills in dummy "error" arguments in place of missing or incorrect arguments, and thus guarantees that the generic substitution it creates will at least be usable for the next steps of checking (rather than leaving null pointers in the substitution). This change also fixes up the error message for the case where a generic application like `F<A>` is formed where `F` is not a generic. We already had a more refined diagnostic defined for that case, but for some reason the site in the code where we ought to use it was still issuing an internal compiler error around an unimplemented feature. This chagne includes a diagnostic test case to cover both of the above fixes.	10 April 2020, 16:20:36 UTC
a01c09c	jsmall-nvidia	09 April 2020, 19:43:09 UTC	Literal folding on other operators (#1314) * Fold prefix operators if they prefix an int literal. * Make test case a bit more convoluted. * Remove ++ and -- as not appropriate for folding of literals. * Set output buffer name.	09 April 2020, 19:43:09 UTC
78acd32	Tim Foley	08 April 2020, 22:30:12 UTC	Replace /* unhandled / in source emit with a real error (#1313) For a long time the various source-to-source back-ends have been emitted the text `/ unhandled /` when they encounter an IR instruction opcode that didn't have any emit logic implemented. This choice had two apparent benefits: In most common cases, emitting `/* unhandled /` in place of an expression would lead to downstream compilation failure, so most catastrophic cases seemed to work as desired (e.g., if we emit `int i = / unhandled /;` we get a downstream parse error and we know something is wrong. In a few cases, if a dead-but-harmless instruction slips through (e.g., a type formed in the body of a specialized generic function), we would emit `/* unhandled /;`, which is a valid empty statement. It is already clear from the above write-up that the benefits of the policy aren't really that compelling, and where it has recently turned out to be a big liability is that there are actually plenty of cases where emitting `/ unhandled /` instead of a sub-expression won't cause downstream compilation failure, and will instead silently compute incorrect results: Emitting `/* unhandled / + b` instead of `a + b` Emitting `/* unhandled /(a)` instead of `f(a)`, or even `/ unhandled /(a, b)` instead of `f(a,b)` Emitting `f(/unhandled/)` instead of `f(a)` in cases where `f` is a built-in with both zero-argument and one-argument overloads The right fix here is simple: where we would have emitted `/* unhandled /` to the output we should instead diagnose an internal compiler error, thus leading to compilation failure. This change appears to pass all our current tests, but it is possible that there are going to be complicated cases in user code that were relying on the previous lax behavior. I know from experience that we sometimes see `/ unhandled */` in output for generics, and while we have eliminated many of those cases I don't have confidence we've dealt with them all. When this change lands we should make sure that the first release that incorporates it is marked as potentially breaking for clients, and we should make sure to test the changes in the context of the client codebases before those codebases integrate the new release.	08 April 2020, 22:30:12 UTC
6274e17	Tim Foley	08 April 2020, 20:57:24 UTC	Initial work to support OptiX output for ray tracing shaders (#1307) * Initial work to support OptiX output for ray tracing shaders This change represents in-progress work toward allowing Slang/HLSL ray-tracing shaders to be cross-compiled for execution on top of OptiX. The work as it exists here is incomplete, but the changes are incremental and should not disturb existing supported use cases. One major unresolved issue in this work is that the OptiX SDK does not appear to set an environment variable Changes include: * Modified the premake script to support new options for adding OptiX to the build. Right now the default path to the OptiX SDK is hard-coded because the installer doesn't seem to set an environment variable. We will want to update that to have a reasonable default path for both Windows and Unix-y platforms in a later chance. * I ran the premake generator on the project since I added new options, which resulted in a bunch of diffs to the Visual Studio project files that are unrelated to this change. Many of the diffs come from previous edits that added files using only the Visual Studio IDE rather than by re-running premake, so it is arguably better to have the checked-in project files more accurately reflect the generated files used for CI builds. * The "downstream compiler" abstraction was extended to have an explicit notion of the kind of pipeline that shaders are being compiled for (e.g., compute vs. rasterization vs. ray tracing). This option is used to tell the NVRTC case when it needs to include the OptiX SDK headers in the search path for shader compilation (and also when it should add a `#define` to make the prelude pull in OptiX). This code again uses a hard-coded default path for the OptiX SDK; we will need to modify that to have a better discovery approach and also to support an API or command-line override. * One note for the future is that instead of passing down a "pipeline type" we could instead pass down the list/set of stages for the kernels being compiled, and the OptiX support could be enabled whenever there is any ray tracing entry point present in a module. That approach would allow mixing RT and compute kernels during downstream compilation. We will need to revisit these choices when we start supporting code generation for multiple entry points at a time. * The CUDA emit logic is currently mostly unchanged. The biggest difference is that when emitting a ray-tracing entry point we prefix the name of the generated `__global__` function with a marker for its stage type, as required by the OptiX runtime (e.g., a `__raygen__` prefix is required on all ray-generation entry points). * The `Renderer` abstraction had a bare minimum of changes made to be able to understand that ray-tracing pipelines exist, and also that some APIs will require the name of each entry point along with its binary data in order to create a program. * The `ShaderCompileRequest` type was updated so that only a single "source" is supported (rather than distinct source for each entry point), and also the entry points have been turned into a single list where each entry identifies its stage instead of a fixed list of fields for the supported entry-point types. * The CUDA compute path had a lot of code added to support execution for the new ray-tracing pipeline type. The logic is mostly derived from the `optixHello` example in the OptiX SDK, and at present only supports running a single ray-generation shader with no parameters. The code here is not intended to be ready for use, but represents a signficiant amount of learning-by-doing. * The `slang-support.cpp` file in `render-test` was updated so that instead of having separate compilation logic for compute vs. rasterization shaders (which would mean adding a third path for ray tracing), there is now a single flow to the code that works for all pipeline types and any kind of entry points. * Implicit in the new code is dropping support for the way GLSL was being compiled for pass-through render tests, which means pass-through GLSL render tests will no longer work. It seems like we didn't have any of those to begin with, though, so it is no great loss. * Also implicit are some new invariants about how shaders without known/default entry points need to be handled. For example, the ray tracing case intentionally does not fill in entry points on the `ShaderCompileRequest` and instead fully relies on the Slang compiler's support for discovering and enumerating entry points via reflection. As a consequence of those edits the `-no-default-entry-point` flag on `render-test` is probably not working, but it seems like we don't have any test cases that use that flag anyway. Given the seemingly breaking changes in those last two bullets, I was surprised to find that all our current tests seem to pass with this change. If there are things that I'm missing, I hope they will come up in review. * fixup: issues from review and CI * Some issues noted during the review process (e.g., a missing `break`) * Fix logic for render tests with `-no-default-entry-point`. I had somehow missed that we had tests reliant on that flag. This required a bit of refactoring to pass down the relevant flag (luckily the function in question was already being passed most of what was in `Options`, so that just passing that in directly actually simplifies the call sites a bit. * There was a missing line of code to actually add the default compute entry points to the compile request. I think this was a problem that slipped in as part of some pre-PR refactoring/cleanup changes that I failed to re-test.	08 April 2020, 20:57:24 UTC
f38c082	Tim Foley	08 April 2020, 19:37:16 UTC	Fix expected output for dxc-error test. (#1312) I'm not sure how this slipped in, but I know that I missed this when testing all my recent PRs because I end up havign a bunch of random not-ready-to-commit repro tests in my source tree which means I always get at least some test failures and have to scan them for the ones that are real. Somehow I have had a blind spot for this one.	08 April 2020, 19:37:16 UTC
a53f817	Tim Foley	08 April 2020, 16:41:59 UTC	Fixes for IR generics (#1311) * Fixes for IR generics There are a few different fixes going on here (and a single test that covers all of them). 1. Fix optionality of trailing semicolon for `struct`s ====================================================== We have logic in the parser that tries to make a trailing `;` on a `struct` declaration optional. That logic is a bit subtle and couild potentially break non-idiomatic HLSL input, so we try to only trigger it for files written in Slang (and not HLSL). For command-line `slangc` this is based on the file extension (`.slang` vs. `.hlsl`), and for the API it is based on the user-specified language. The missing piece here was that the path for handling `import`ed code was not setting the source language of imported files at all, and so those files were not getting opted into the Slang-specific behavior. As a result, `import`ed code couldn't leave off the semicolon. 2. Fix generic code involving empty `interface`s ================================================ We have logic that tries to only specialize "definitions," but the definition-vs-declaration distinction at the IR level has historically been slippery. One corner case was that a witness table for an interface with no methods would always be considered a declaration, because it was empty. The notion of what is/isn't a definition has been made more nuanced so that it amounts to two main points: * If something is decorated as `[import(...)]`, it is not a definition * If something is a generic/func (a declaration that should have a body), and it has no body, it is a declaration Otherwise we consider anything a definition, which means that non-`[import(...)]` witness tables are now definitions whether or not they have anything in them. 3. Fix IR lowering for members of generic types =============================================== The IR lowering logic was trying to be a little careful in how it recurisvely emitted "all" `Decl`s to IR code. In particular, we don't want to recurse into things like function parameters, local variables, etc. since those can never be directly referenced by external code (they don't have linkage). The existing logic was basically emitting everything at global scope, and then only recursing into (non-generic) type declarations. This created a problem where a method declared inside a generic `struct` would not be emitted to the IR for its own module at all unless it happened to be called by other code in the same module. The fix here was to also recurse into the inner declaration of `GenericDecl`s. I also made the code recurse into any `AggTypeDeclBase` instead of just `AggTypeDecl`s, which means that members in `extension` declarations should not properly be emitted to the IR. Conclusion ========== These fixes should clear up some (but not all) cases where we might emit an `/* unhandled /` into output HLSL/GLSL. A future change will need to make that path a hard error and then clean up the remaining cases. fixup: tabs->spaces	08 April 2020, 16:41:59 UTC
a9214f3	jsmall-nvidia	08 April 2020, 14:56:00 UTC	Remove static struct members from layout and reflection (#1310) * * Added MemberFilterStyle - controls action of FilteredMemberList and FilteredMemberRefList * Splt out template implementations * Use more standard method names dofr FilteredMemberRefList * Added reflect-static.slang test * Added isNotEmpty/isEmpty to filtered lists * Added ability to index into filtered list (so not require building of array) * Default MemberFilterStyle to All. * Remove explicit MemberFilterStyle::All	08 April 2020, 14:56:00 UTC
ba232e4	Tim Foley	07 April 2020, 20:56:01 UTC	Fix a bug around generic functions using DXR RayQuery (#1309) The DXR `RayQuery` type is our first generic type defined in the stdlib that is marked as a target intrinsic but does not map to a custom `IRType` case. Instead, a reference to `RayQuery<T>` is encoded in the IR as an ordinary `specialize` instruction. Unfortunately, this doesn't play nice with the current specialization logic, which considered a `specialize` instruction to not represent a "fully specialized" instruction, which then inhibits specialization of generics/functions that use such an instruction. The fix here was to add more nuanced logic to the check for "fully specialized"-ness, so that it considers a `specialize` to already be fully specialized when the generic it applies to represents a target intrinsic. I also added a note that the whole notion of "fully specialized"-ness that we use isn't really the right thing for the specialization pass, and how we really ought to use a notion of what is or is not a "value." This change doesn't include a test because the only way to trigger the issue is using the DXR 1.1 `RayQuery` type, and that type is not supported in current release versions of DXC.	07 April 2020, 20:56:01 UTC
c5db04b	jsmall-nvidia	02 April 2020, 23:52:12 UTC	Fix WaveGetLaneIndex for glsl (#1306) * Fix typo in stdlib around WaveGetLaneIndex and WaveGetLaneCount * Reorder emit so #extensions come before layout * Added wave-get-lane-index.slang test.	02 April 2020, 23:52:12 UTC
00e1dba	jsmall-nvidia	02 April 2020, 21:06:16 UTC	Optimize creation of memberDictionary (#1305) * Improve performance of building members dictionary by adding when needed. * Fix unbounded-array-of-array-syntax.slang, that DISABLE_TEST now uses up an index. Use IGNORE_TEST. * Improve variable name. Small improvements. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	02 April 2020, 21:06:16 UTC
487d4a4	Tim Foley	02 April 2020, 15:52:42 UTC	Add basic support for namespaces (#1304) This change adds logic for parsing `namespace` declarations, referencing them, and looking up their members. * The parser changes are a bit subtle, because that is where we deal with the issue of "re-opening" a namespace. We kludge things a bit by re-using an existing `NamespaceDecl` in the same parent if one is available, and thereby ensure that all the members in the same namespace can see on another. * In order to allow namespaces to be referenced by name they need to have a type so that a `DeclRefExpr` to them can be formed. For this purpose we introduce `NamespaceType` which is the (singleton) type of a reference to a given namespace. * The new `NamespaceType` case is detected in the `MemberExpr` checking logic and routed to the same logic that `StaticMemberExpr` uses, and the static lookup logic was extended with support for looking up in a namespace (a thin wrapper around one of the existing worker routines in `slang-lookup.cpp`. * I made `NamespaceDecl` have a shared base class with `ModuleDecl` in the hopes that this would allow us to allow references to modules by name in the future. That hasn't been tested as part of this change. * I cleaned up a bunch of logic around `ModuleDecl` holding a `Scope` pointer that was being used for some of the more ad hoc lookup routines in the public API. Those have been switched over to something that is a bit more sensible given the language rules and that doesn't rely on keeping state sititng around on the `ModuleDecl`. * I added a test case to make sure the new funcitonality works, which includes re-opening a namespace, and it also tests both `.` and `::` operations for lookup in a namespace. * The main missing feature here is the ability to do something like C++ `using`. It would probably be cleanest if we used `import` for this, since we already have that syntax (and having both `import` and `using` seems like a recipe for confusion). Most of the infrastructure is present to support `import`ing one namespace into another (in a way that wouldn't automatically pollute the namespace for clients), but some careful thought needs to be put into how import of namespaces vs. modules should work.	02 April 2020, 15:52:42 UTC
5e73e98	jsmall-nvidia	31 March 2020, 18:06:34 UTC	Improve diagnostic parsing from GCC. (#1303) Enable x86_64 CPU tests on TC.	31 March 2020, 18:06:34 UTC
ea76905	jsmall-nvidia	30 March 2020, 23:23:09 UTC	CUDA version handling (#1301) * render feature for CUDA compute model. * Use SemanticVersion type. * Enable CUDA wave tests that require CUDA SM 7.0. Provide mechanism for DownstreamCompiler to specify version numbers. * Enabled wave-equality.slang * Make CUDA SM version major version not just a single digit. * Fix assert. * DownstreamCompiler::Version -> CapabilityVersion	30 March 2020, 23:23:09 UTC
ad5b60c	Tim Foley	30 March 2020, 19:47:43 UTC	Add a test for static const declarations in structure types (#1300) The functionality already appears to work, and this test is just to make sure we don't regress on it. The most interesting thing here is that I'm using this change to pitch a new organization for tests around what part of the language they cover (rather than the kind of test they are), since the `tests/compute/` directory is getting overly full and is hard to navigate. We can consider moving individual tests into more of a hierarchy at some later point.	30 March 2020, 19:47:43 UTC
6f43b26	jsmall-nvidia	27 March 2020, 22:35:06 UTC	WaveBroadcastAt/WaveShuffle (#1299) * Support for WaveReadLaneAt with dynamic (but uniform across Wave) on Vk by enabling VK1.4. Fixed wave-lane-at.slang test to test with laneId that is uniform across the Wave. * Added WaveShuffle intrinsic. Test for WaveShuffle intrinsic. * Added some documentation on WaveShuffle * Fix that version required for subgroupBroadcast to be non constexpr is actually 1.5 * Added WaveBroadcastLaneAt Documented WaveShuffle/BroadcastLaneAt/ReadLaneAt * Update docs around WaveBroadcast/Read/Shuffle. Use '_waveShuffle` as name in CUDA prelude to better describe it's more flexible behavior.	27 March 2020, 22:35:06 UTC
e267ce2	jsmall-nvidia	27 March 2020, 20:16:27 UTC	Adds WaveShuffle intrinsic (#1298) * Support for WaveReadLaneAt with dynamic (but uniform across Wave) on Vk by enabling VK1.4. Fixed wave-lane-at.slang test to test with laneId that is uniform across the Wave. * Added WaveShuffle intrinsic. Test for WaveShuffle intrinsic. * Added some documentation on WaveShuffle * Fix that version required for subgroupBroadcast to be non constexpr is actually 1.5	27 March 2020, 20:16:27 UTC
5b0b843	jsmall-nvidia	27 March 2020, 18:49:41 UTC	Support for WaveReadLaneAt with dynamic (but uniform across Wave) on Vk by enabling VK1.4. (#1297) Fixed wave-lane-at.slang test to test with laneId that is uniform across the Wave.	27 March 2020, 18:49:41 UTC
cc753f3	jsmall-nvidia	26 March 2020, 13:35:35 UTC	Disable CPU tests on TC. (#1295) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	26 March 2020, 13:35:35 UTC
423b558	Tim Foley	25 March 2020, 23:13:26 UTC	Fix a bug in exiting SSA form for loops (#1293) The Slang compiler was bit by a known issue when translating from SSA form back to straight-line code. Give code like the following: int x = 0; int y = 1; while(...) { ... int t = x; x = y; y = t; } ... The SSA construction pass will eliminate the temporary `t` and yield code something like: br(b, 0, 1); block b(param x : Int, param y : Int): ... br(b, y, x); The loop-dependent variables have become parameters of the loop block, and the branchs to the top of the loop pass the appropriate values for the next iteration (e.g., the jump that starts the loop sends in `0` and `1`). The problem comes up when translating the back-edge the continues the loop out of SSA form. Our generated code will re-introduce temporaries for `x` and `y`: int x; int y; // jump into loop becomes: x = 0; y = 1; for(;;) { ... // back-edge becomes x = y; y = x; continue; } The problem there is that we've naively translated a branch like `br(b, <a>, <b>)` into `x = <a>; y = <b>;` but that doesn't work correctly in the case where `<b>` is `x`, because we will have already clobbered the value of `x` with `<a>`. The simplest fix is to introduce a temporary (just like the input code had), and generate: // back-edge becomes int t = x; x = y; y = t; This change modifies the `emitPhiVarAssignments()` function so that it detects bad cases like the above and emits temporaries to work around the problem. A new test case is included that produced incorrect output before the change, and now produces the expected results. A secondary change is folded in here that tries to guard against a more subtle version of the problem: for(...) { ... int x1 = x + 1; int y1 = y + 1; x = y1; y = x1; } In this more complicated case, each of `x` and `y` is being assigned to a value derived from the other, but neither is being set using a block parameter directly, so the changes to `emitPhiVarAssignments()` do not apply. The problem in this case would be if the `shouldFoldInstIntoUseSites()` logic decided to fold the computation of `x1` or `y1` into the branch instruction, resulting in: x = y + 1; y = x + 1; which would again violate the semantics of the original code, because now there is an assignment to `x` before the computation of `x + 1`. Right now it seems impossible to force this case to arise in practice, due to implementation details in how we generate IR code for loops. In particular, the block that computes the `x+1` and `y+1` values is currently always distinct from the block that branches back to the top of the loop, and we do not allow "folding" of sub-expressions from different blocks. It is possible, however, that future changes to the compiler could change the form of the IR we generate and make it possible for this problem to arise. The right fix for this issue would be to say that we should introduce a temporary for any branch argument that "involves" a block parameter (whether directly using it or using it as a sub-expression). Unfortunately, the ad hoc approach we use for folding sub-expressions today means that testing if an operand "involves" something would be both expensive and unwieldy. A more expedient fix is to disallow all folding of sub-expressions into unconditional branch instructions (the ones that can pass arguments to the target block), which is what I ended up implementing in this change. Making that defensive change alters the GLSL we output for some of our cross-compilation tests, in a way that required me to update the baseline/gold GLSL. A better long-term fix for this whole space of issues would be to have the "de-SSA" operation be something we do explicitly on the IR. Such an IR pass would still need to be careful about the first issue addressed in this change, but the second one should (in principle) be a non-issue given that our emit/folding logic already handles code with explicit mutable local variables correctly.	25 March 2020, 23:13:26 UTC
b3e6f1b	jsmall-nvidia	25 March 2020, 20:45:56 UTC	Unroll target improvements (#1291) * Add unroll support for CUDA, and preliminary for C++. Document [unroll] support. * Fix loop-unroll to run on CPU, and test on CPU and elsewhere. Fix bug in emitting loop unroll condition. * Improved comment. * Added support for vk/glsl loop unrolling.	25 March 2020, 20:45:56 UTC
28a0ca9	jsmall-nvidia	25 March 2020, 18:08:21 UTC	Better diagnostics on failure on CUDA. (#1288) * Better diagnostics on failure on CUDA. * Catch exceptions in render-test * * Added ability to disable reporting on CUDA failures * Stopped using exception for reporting (just write to StdWriter::out() * Removed CUDAResult type * Don't set arch type on nvrtc to see if fixes CI issues. * Try compute_30 on CUDA. * Added ability to IGNORE_ a test DIsabled rw-texture-simple and texture-get-dimensions * Disable tests that require CUDA SM7.0 Use DISABLE_ prefix to disable tests. * Disable signalUnexpectedError doing printf.	25 March 2020, 18:08:21 UTC
889132e	Tim Foley	24 March 2020, 22:24:44 UTC	Parser changes to improve handling of static method calls (#1290) Static Method Calls ------------------- The main fix here is for parsing of calls to static methods. Given a type like: struct S { void doThing(); } the parser currently gets tripped up on a statement like: S::doThing(); The problem here is that the `Parser::ParseStatement` routine was using the first token of lookahead to decide what to do, and in the case where it saw a type name it assumed that must mean the statement would be a declaration. It turns out that `Parser::ParseStatement` already had a more intelligent bit of disambiguation later on when handling the general case of an identifier (for which it couldn't determine the type-vs-value status at parse time), and simply commetning out the special-case handling of a type name and relying on the more general identifier case fixes the issue. That catch-all case still has some issues of its own, and this change expands on the comments to make some of those issues clear so we can try to address them later. Empty Declarators ----------------- One reason why the static method call problem was hard to discover was that it was masked by the parser allowing for empty declarator. That is, given input like: S::doThing(); This can be parsed as a variable declaration with a parenthesized empty declarator `()`. Practically, there is no reason to support empty declarators anwhere except on parameters, and allowing them in other contexts could make parser errors harder to understand. This change makes the choice of whether or not empty declarators are allowed something that can be decided at each point where we parse a declarator, and makes it so that only parsing of parameters opts in to allowing them. By disabling support for empty declarators in contexts where they don't make sense, we make code like the above a parse error when it appears at global scope, rather than a weird semantic error. A more complete future version of this change might also make support for parenthesized declarators an optional feature, or remove that support entirely. Slang doesn't actually support pointers (yet) so there is no real reason to allow parenthesized declarators right now. One note for future generations is that using an emptye declarator on a parameter of array type can actually create an ambiguity. If the user writes: void f(int[2][3]); did they mean for it to be interpreted as: void f(int a[2][3]); or as: void f(int[2][3] a); or even as: void f(int[2] a[3]); The first case there yields a different type for `a` than the other two, but is also what we pretty much have to support for backwards compatibility with HLSL. Requiring all function declarations to include parameter names would eliminate this potentially confusing case. Layout Modifiers ---------------- One of the above two syntax changes led to a regression in the output for a diagnostic test for `layout` modifiers (which are a deprecated but still functional feature from back when `slangc` supported GLSL input). The original output of the test case seemed odd, and when I looked at the parsing logic I saw that an early-exit error case was leading to spurious error messages because it failed to consume all the tokens inside the `layout(...)`. Fixing the logic to not use an early-exit (and instead rely on the built-in recovery behavior of `Parser`) produced more desirable diagnosic output. I changed the input file to put the `binding` and `set` specifiers on differnet lines so that the error output could show that the compiler properly tags both of the syntax errors.	24 March 2020, 22:24:44 UTC
e71e75a	Tim Foley	24 March 2020, 21:04:30 UTC	Fix some bad behavior around static methods (#1289) These are steps towards a fix for the problem of not being able to call a static method as follows: SomeType::someMethod(); One problem in the above is that the parser gets confused and parses an (anonynmous!) function declaration. This change doesn't address that problem, but does fix the problem that when checking fails to coerce `SomeType::someMethod` into a type it was triggering an unimplemented-feature exception rather than a real error message. Another problem was that if the above is re-written to try to avoid the parser bug: (SomeType::someMethod)(); we end up with a call where the base expression (the callee) is a `ParenExpr` and the code for handling calls wasn't expecting that. Instead, it sent the overload resolution logic into an unimplemented case that was bailing by throwing an arbitrary C++ exception instead of emitting a diagnostic. This latter issue was fixed in two ways. First, the code path that failed to emit a diagnostic now emits a reasonable one for the unimplemented feature (this still ends up being a fatal compiler error). Second, we properly handle the case of trying to call a `ParenExpr` by unwrapping it and using the base expression instead, so that `(<func>)(<args>)` is always treated the same as `<func>(<args>)`.	24 March 2020, 21:04:30 UTC
7b4e0e1	jsmall-nvidia	23 March 2020, 21:55:49 UTC	First pass at a Target Compatibility document (#1287) * WIP compatibility docs. * Test transpose in matrix-float. * Small improvement to CUDA docs. * Added some discussion around tessellation. * Small improvements to target-compatibility.md * Improve compatibility documentation. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	23 March 2020, 21:55:49 UTC
05c9a5c	jsmall-nvidia	21 March 2020, 10:38:47 UTC	CPU Texture GetDimensions support (#1283) * Added CPU support for GetDimensions on C++/CPU target. Added texture-get-dimension.slang test * Fix some typos. * Update CUDA docs. * Fix output of GetDimensions on glsl when has an array. Disabled VK - because VK renderer doesn't support createTextureView * Fix typo. * Fix typo. * Fix bad-operator-call diagnostics output.	21 March 2020, 10:38:47 UTC
884a9bc	jsmall-nvidia	20 March 2020, 20:59:15 UTC	Handling of switch with empty body (#1284) * Added handling for empty switch body. Added test for empty switch. * Fix testing for case in switch.	20 March 2020, 20:59:15 UTC
cc4c5b8	jsmall-nvidia	20 March 2020, 18:58:22 UTC	Remove RWTextureCube/Array from stdlib (#1285) * Remove RWTextureCube and RWTextureCubeArray - as not supported. Put multisample code in a block to make nesting more readable. * Replace a tab. * Update bad-operator-call.slang.expected	20 March 2020, 18:58:22 UTC
3435a50	jsmall-nvidia	20 March 2020, 17:55:08 UTC	Diagnostic compare special case for stdlib line number changes (#1286) * Added diagnostic compare which ignores line number changes in std library. * Change stdlib just to make sure it's all working.	20 March 2020, 17:55:08 UTC
a8a23a6	Tim Foley	18 March 2020, 18:20:20 UTC	First pass at a language reference (#1279) * First pass at a language reference We already had the `language-guide.md` document under `docs/`, but this is an attempt to introduce a more full-featured reference to the Slang language and its features. Right now it is mostly focused on the syntax and what the language allows to be declared, and it is a little light on semantic details throughout (mostly relying on familiarity with C to explain the things that are left unsaid). Even so, this hopefully provides a starting point to continue adding more detail. * typos and other small fixes	18 March 2020, 18:20:20 UTC
21bbd3c	jsmall-nvidia	17 March 2020, 18:00:40 UTC	Added --cuda-sdk-path option to premake5.lua. (#1278)	17 March 2020, 18:00:40 UTC
315888e	jsmall-nvidia	17 March 2020, 13:59:25 UTC	Improve CUDA Wave intrinsics documentation. (#1276) * Improve CUDA Wave intrinsics documentation. Remove inappropriate comment. * Small CUDA doc improvement.	17 March 2020, 13:59:25 UTC
76b9ff6	jsmall-nvidia	16 March 2020, 19:01:21 UTC	CUDA support of MultiPrefix Wave intrinsics. (#1275) Support for cs_6_5 cand cs_6_4 in profile Added wave-multi-prefix.slang etst	16 March 2020, 19:01:21 UTC
256a20a	Tim Foley	16 March 2020, 16:03:19 UTC	Define compound intrinsic ops in the standard library (#1273) * Define compound intrinsic ops in the standard library The current stdlib code has a notion of "compound" intrinsic ops, which use the `__intrinsic_op` modifier but don't actually map to a single IR instruction. Instead, most* of these map to multiple IR instructions using hard-coded logic in `slang-ir-lower.cpp`. (* One special case is `kCompoundOp_Pos` that is used for unary `operator+` and that maps to zero IR instructions) All of the opcodes that used to use the `kCompoundOp_` enumeration values now have definitions directly in the stdlib and use the new `[__unsafeForceInlineEarly]` attribute to ensure that they get inlined into their use sites so that the output code is as close as possible to the original. For the most part, generating the stdlib definitions for the compound ops is straightforward, but here's some notes: * The unary `operator+` I just defined directly in Slang code, since it doesn't share much structure with other cases * The unary increment/decrement ops are generated as a cross product of increment/decrement and prefix/postfix. The logic is a bit messy but given that we have scalar, vector, and matrix versions to deal with it still saves code overall * Because all the compound/assignment cases got moved out, the existing code for generating unary/binary ops can be simplified a bit * All the no-op bit-cast operations like `asfloat(float)` are now inline identity functions * A few other small cleanups are made by not having to worry about the compound ops (which used to be called "pseudo ops") sometimes being encoded in to the same type of value as a real IR opcode. The one big detail here is a fix for how IR lowering works for `let` declarations: they were previously being `materialize()`d which only guarantees that they've been placed in a contiguous and addressable location, but doesn't actually convert them to an r-value. As a result a `let` declaration could accidentally capture a mutable location by reference, which is definitely not what we wanted it to do. Fixing this was needed to make the new postfix `++` definition work (several existing tests end up covering this). One important forward-looking note: One subtle (but significant) choice in this change is that we actually reduce the number of declarations in the stdlib, because instead of having the compound operators include both per-type and generic overloads (just listing scalar cases for now): float operator+=(in out float left, float right) { ... } int operator+=(in out int left, int right) { ... } ... T operator+= <T:__BuiltinBlahBlah>(in out T left, T right) { ... } We now have only the single generic version: T operator+= <T:__BuiltinBlahBlah>(in out T left, T right) { ... } In running our current tests, this change didn't lead to any regressions (perhaps surprisingly). Given that we were able to reduce the number of overloads for `operator+=` by a factor of N (where N is the number of built-in types), it seems worth considering whether we could also reduce the number of overloads of `operator+` by the same factor by only having generic rather than per-type versions. One concern that this forward-looking question raises is whether the quality of diagnostic messages around bad calls to the operators might suffer when there are only generic overloads instead of per-type overloads. In order to feel out this problem I added a test case that includes some bad operator calls both to `+=` (which is now only generic with this change) and `+` (which still has per-type overloads). Overall, I found the quality of the error messages (in terms of the candidates that get listed) isn't perfect for either, but personally I prefer the output in the generic case. As part of adding that test, I also added some fixups to how overload resolution messages get printed, to make sure the function name is printed in more cases, and also that the candidates print more consistently. These changes affected the expected output for one other diagnostic test. * fixup: disable bad operator test on non-Windows targets	16 March 2020, 16:03:19 UTC
c1743a5	jsmall-nvidia	12 March 2020, 19:47:44 UTC	Vector & Matrix Prefix Sum & Product (#1272) * Implement matrix and vector versions of prefixSum and prefix product. * Comment around how code is organized - where it seems it could be more performant.	12 March 2020, 19:47:44 UTC
69f7d28	Tim Foley	11 March 2020, 19:53:09 UTC	Add a basc inlining facility for use in the stdlib (#1271) The main feature visible to the stdlib here is the `[__unsafeForceInlineEarly]` attribute, which can be attached to a function definition and forces calls to that function to be inlined immediately after initial IR lowering. The pass is implemented in `slang-ir-inline.{h,cpp}` and currently only handles the completely trivial case of a function with no control flow that ends with a single `return`. The lack of support for any other cases motivates the `__unsafe` prefix on the attribute. In order to test that the pass works, I modified the "comma operator" in the standard library to be defined directly (rather than relying on special-case handling in IR lowering), and then added a test that uses that operator to make sure it generates code as expected. The compute version of the test confirms that we generate semantically correct code for the operator, while the SPIR-V cross-compilation test confirms that our output matches GLSL where the comma operator has been inlined, rather than turned into a subroutine. Notes for the future: * With this change it should be possible (in principle) to redefine all the compound operators in the stdlib to instead be ordinary functions with the new attribute, removing the need for `slang-compound-intrinsics.h`. * Once the compound intrinsics are defined in the stdlib, it should be easier/possible to start making built-in operators like `+` be ordinary functions from the standpoint of the IR * The attribute and pass here could be extended to include an alternative inlining attribute that happens later in compilation (after linking) but otherwise works the same. This could in theory be used for functions where we don't want to inline the definition into generated IR, but still want to inline things berfore generating final HlSL/GLSL/whatever. * The inlining pass itself could be generalized to work for less trivial functions pretty easily; for the most part it would just mean "splitting" the block with the call site and then inserting clones of the blocks in the callee. Any `return` instructions in the clone would become unconditional branches (with arguments) to the block after the call (which would get a parameter to represent the returned value). * The "hard" part for such an inlining pass would be handling cases where the control flow that results from inlining can't be handled by our later restructuring passes. The long-term fix there is to implement something like the "relooper" algorithm to restructure control flow as required for specific targets.	11 March 2020, 19:53:09 UTC
935768c	Tim Foley	11 March 2020, 15:50:38 UTC	Clean-ups related to expanded standard library coverage (#1269) This change continues the work already started in moving the definitions of many built-in functions to the standard library. The main focus in this change was reducing the number of operations that had to be special-cased on the CPU and CUDA targets by making sure that the scalar cases of built-in functions map to the proper names in the prelude (e.g., `F32_sin()`) via the ordinary `__target_intrinsic` mechanism. In some cases this cleanup meant that special-case logic that was constructing definitions for those functions using C++ code could be scrapped. Additional changes made along the way: * A few scalar functions that were missing in the CPU/CUDA preludes got added: `round`, hyperbolic trigonometric functions, `frexp`, `modf`, and `fma` * The floating-point `min()` and `max()` definitions in the preludes were changed to use intrinsic operations on the target (which are likely to follow IEEE semantics, while our definitions did not) * For the CUDA target, many of the functions had their names translated during code emit from, e.g., `sin` to `sinf`. This change makes the CUDA target more closely match the C++/CPU target in using names like `F32_sin` consistently. * For the CUDA target, a few additional functions have intrinsics that don't exist (portably) on CPU: `sincos()` and `rsqrt()`. * For the Slang stdlib definitions to work, a new `$P` replacement was defined for `__targert_intrinsic` that expands to a type based on the first operand of the function (e.g., `F32` for `float`). * I removed the dedicated opcodes for matrix-matrix, matrix-vector, and vector-matrix multiplication, and instead turned them into ordinary functions with definitions and `__target_intrinsic` modifiers to map them appropriately for HLSL and GLSL. This is realistically how we would have implemented these if we'd had `__target_intrinsic` from the start. Notes about possible follow-on work: * The `ldexp` function is still left in the Slang stdlib because it has to account for a floating-point exponent and the `math.h` version only handles integers for the exponent. It is possible that we can/should define another overload for `ldexp` (and `frexp`) that uses an integer for exponent, and then have that one be a built-in on CPU/CUDA, with the HLSL `frexp` being defined in the stdlib to delegate to the correct `frexp` for those targets. * The `firstbithigh` and related functions are missing for our CPU and CUDA targets, and will need to be added. It is worth nothing that `firstbithigh` apparently has some very odd functionality around signed integer arguments (which are supported, despite MSDN being unclear on that point). General cleanup will be required for those functions. * Maxing the various matrix and vector products no longer be intrinsic ops might affect how we emit code for them as sub-expressions (both whether we fold them into use sites and how we parenthize them). This doesn't seem to affect any of our existing tests, but we could consider marking these functions with `[__readNone]` to ensure they can be folded, and then also adding whatever modifier(s) we might invent to control precdence and parentheses insertion during emit.	11 March 2020, 15:50:38 UTC
b380b1a	jsmall-nvidia	10 March 2020, 20:43:41 UTC	Wave Prefix Product (#1270) * Fix some typos. * Add wave-prefix-sum.slang test * First pass at implementing prefixSum. * Small improvments to prefixSum CUDA. * Small improvement to prefix sum. * Enable prefix sum in stdlib. * Wave prefix product without using a divide. * Split out SM6.5 Wave intrinsics. Template mechanism for do prefix calculations.	10 March 2020, 20:43:41 UTC
a10d9cd	jsmall-nvidia	10 March 2020, 16:31:25 UTC	WIP Prefix Sum for CUDA (#1268) * Fix some typos. * Add wave-prefix-sum.slang test * First pass at implementing prefixSum. * Small improvments to prefixSum CUDA. * Small improvement to prefix sum. * Enable prefix sum in stdlib.	10 March 2020, 16:31:25 UTC
721d2e8	jsmall-nvidia	10 March 2020, 00:03:42 UTC	CUDA Wave intrinsic vector/matrix support (#1267) * Distinguish between __activeMask and _getConvergedMask(). Remove need to pass in mask to CUDA wave impls. * Add support for vector/matrix Wave intrinsics for CUDA. Fix issue with CUDA parsing of errors. * Fix typo. Make WaveReadLineAt and WaveReadFirst work for vector/matrix types. * Fix typo. * Added equality wave intrinsic test. * Fix some typos * Added wave-lane-at.slang	10 March 2020, 00:03:42 UTC
7e0aa93	jsmall-nvidia	09 March 2020, 16:40:04 UTC	CUDA support for vector/matrix Wave intrinsics (#1266) * Distinguish between __activeMask and _getConvergedMask(). Remove need to pass in mask to CUDA wave impls. * Add support for vector/matrix Wave intrinsics for CUDA. Fix issue with CUDA parsing of errors. * Fix typo.	09 March 2020, 16:40:04 UTC
b1317cd	Tim Foley	09 March 2020, 16:02:36 UTC	Yet more definitions moved into the stdlib (#1263) The only big catch that I ran into with this batch was that I found the `float.getPi()` function was being emitted to the output GLSL even when that function wasn't being used. This seems to have been a latent problem in the earlier PR, but was only surfaced in the tests once a Slang->GLSL test started using another intrinsic that led to the `float : __BuiltinFloatingPointType` witness table being live in the IR. The fix for the gotcha here was to add a late IR pass that basically empties out all witness tables in the IR, so that functions that are only referenced by witness tables can then be removed as dead code. This pass is something we should not apply if/when we start supporting real dynamic dispatch through witness tables, but that is a problem to be solved on another day. The remaining tricky pieces of this change were: * Needed to remember to mark functions as target intrinsics on HLSL and/or GLSL as appropriate (hopefully I caught all the cases) so they don't get emitted as source there. * The `msad4` function in HLSL is very poorly documented, so filling in its definition was tricky. I made my best effort based on how it is described on MSDN, but it is likely that if anybody wants to rely on this function they will need us to vet our results with some tests.	09 March 2020, 16:02:36 UTC
4760829	jsmall-nvidia	06 March 2020, 23:26:17 UTC	Use templates to implement Wave Intrinsic reduce on CUDA (#1265) * Update slang-binaries to verison with SPIR-V version support. * Support vec and matrix Wave intrinsics on vk. Added wave-vector.slang test Add wave-diverge.slang test Add support for more wave intrinsics to vk. * Test out Wave intrinsic support for matrices. * Remove matrix glsl intrinsics -> not available. Fix some typo. * Remove generated slang generated headers. * Use template to generate Wave reduce functions.	06 March 2020, 23:26:17 UTC
cdee134	jsmall-nvidia	06 March 2020, 22:15:58 UTC	Remove generated header files (#1264) * Update slang-binaries to verison with SPIR-V version support. * Support vec and matrix Wave intrinsics on vk. Added wave-vector.slang test Add wave-diverge.slang test Add support for more wave intrinsics to vk. * Test out Wave intrinsic support for matrices. * Remove matrix glsl intrinsics -> not available. Fix some typo. * Remove generated slang generated headers.	06 March 2020, 22:15:58 UTC
b94a12b	jsmall-nvidia	06 March 2020, 20:46:35 UTC	Wave intrinsics for Vector and Matrix types (#1262) * Update slang-binaries to verison with SPIR-V version support. * Support vec and matrix Wave intrinsics on vk. Added wave-vector.slang test Add wave-diverge.slang test Add support for more wave intrinsics to vk. * Test out Wave intrinsic support for matrices. * Remove matrix glsl intrinsics -> not available. Fix some typo.	06 March 2020, 20:46:35 UTC
18be2d8	Tim Foley	06 March 2020, 19:37:36 UTC	Expand range of definitions that can be moved into stdlib (#1259) The actual definitions that got moved into the stdlib here are pretty few: * `clip()` * `cross()` * `dxx()`, `ddy()` etc. * `degrees()` * `distance()` * `dot()` * `faceforward()` The meat of the change is infrastructure changes required to support these new declarations * Generic versions of the standard operators (e.g., `operator+`) were added that are generic for a type `T` that implements the matching `__Builtin`-prefixed interface. An open question is whether we can now drop the non-generic versions in favor of just having these generic operators. * A `__BuiltinLogicalType` interface was added to capture the commonality between integers and `bool` * `__BuiltinArithmeticType` was extended so that implementations must support initialization from an `int` * `__BuiltinFloatingPointType` was extended to require an accessor that returns the value of pi for the given type, and the concrete floating-point types were extended to provide definitions of this value. * It turns out that our logic for checking if two functions have the same signature (and should thus count as redeclarations/redefinitions) wasn't taking generic constraints into account at all. That was fixed with a stopgap solution that checks if the generic constraints are pairwise identical, but I didn't implement the more "correct" fix that would require canonicalizing the constraints. * When doing overload resolution and considering potential callees, logic was added so that a non-generic candidate should always be selected over a generic one (generally the Right Thing to do), and also so that a generic candidate with fewer parameters will be selected over one with more (an approximation of the much more complicated rule we'd ideally have). * The formatting of declarations/overloads for "ambiguous overload" errors was fleshed out a bit to include more context (the "kind" of declaration where appropriate, the return type for function declarations) and to properly space thing when outputting specialization of operator overloads that end with `<` (so that we print `func < <int>(int, int)` instead of just `func <<int,int>(int,int)`). * The core lookup routines were heavily refactored and reorganized to try to make them bottleneck more effectively so that all paths handle all the nuances of inheritance, extensions, etc. * Because of the refactoring to lookup logic, the semantic checking logic related to checking if a type conforms to an interface was updated to be driven based on the `Type` that is supposed to be conforming, rather than a `DeclRef` to the type's declaration. This allows it to use the type-based lookup entry point and eliminates one special-case entry point for lookup. In addition to the various core changes, this change also refactors some of the existing stdlib code to favor writing more things in actual Slang syntax, and less in C++ code that uses `StringBuilder` to construct the Slang syntax. There is a lot more that could be done along those lines, but even pushing this far is showing that the current approach that `slang-generate` takes for how to separate meta-level C++ and Slang code isn't really ideal, so a revamp of the generator code is probably needed before I continue pushing. One surprising casualty of the refactoring of lookup is that we no longer have the `lookedUpDecls` field in `LookupResult`. That field probably didn't belong there anyway, but the role it served was important. The idea of `lookedUpDecls` was to avoid looking up in the same interface more than once in cases where a type might have a "diamond" inheritance pattern. Removing that field doesn't appear to affect correctness of any of our existing tests, but by adding a specific test for "diamond" inheritance I could see that the refactoring introduced a regression and made looking up a member inherited along multiple paths ambiguous. Rather than add back `lookedUpDecls` I went for a simpler (but arguably even hackier) solution where when ranking candidates from a `LookupResult` we check for identical `DeclRef`s and arbitrarily favor one over the other. One complication that arises here is that when comparing `DeclRef`s inherited along different paths they might have a `ThisTypeSubstitution` for the same type, but with different subtype witnesses (because different inheritance paths could lead to different transitive subtype witnesses: e.g., `A : B : D` and `A : C : D`).	06 March 2020, 19:37:36 UTC
0b91ea7	jsmall-nvidia	06 March 2020, 16:41:35 UTC	Update slang-binaries to verison with SPIR-V version support. (#1261)	06 March 2020, 16:41:35 UTC
3c7c38d	jsmall-nvidia	05 March 2020, 17:14:59 UTC	Safer binary compatibilty betwee 1.0 and 1.1 versions, without using struct embedding. (#1257)	05 March 2020, 17:14:59 UTC
6684d32	jsmall-nvidia	05 March 2020, 15:59:54 UTC	Feature/glslang spirv version (#1256) * WIP add support for __spirv_version . * Added IRRequireSPIRVVersionDecoration * SPIR-V version passed to glslang. Enable VK wave tests. Split ExtensionTracker out, so can be cast and used externally to emit. Added SourceResult. * Fix warning on Clang. * Missing hlsl.meta.h * Refactor communication/parsing of __spirv_version with glslang. * Fix some debug typos. Be more precise in handling of substring handling. * Make glslang forwards and backwards binary compatible. * Small comment improvements. * Added slang-spirv-target-info.h/cpp * Fix for major/minor on gcc. * Another fix for gcc/clang. * VS projects include slang-spirv-target-info.h/cpp * Removed SPIRVTargetInfo Added SemanticVersion. Don't bother with passing a target to glslang. Should be separate from 'version'. * Renamed slang-emit-glsl-extension-tracker.cpp/.h -> slang-glsl-extension-tracker.cpp/.h Fixed some VS project issues. * Fix a comment. * Added slang-semantic-version.cpp/.h * Added slang-glsl-extension-tracker.cpp/.h * Added split that can check for input has all been parsed. * Fix problem on x86 win build.	05 March 2020, 15:59:54 UTC
5951d2a	jsmall-nvidia	03 March 2020, 23:41:07 UTC	__spirv_version Decoration (#1255) * WIP add support for __spirv_version . * Added IRRequireSPIRVVersionDecoration	03 March 2020, 23:41:07 UTC
0f1f4a4	Tim Foley	03 March 2020, 19:49:40 UTC	Move definitions of simple vector/matrix builtins to stdlib. (#1247) Some of the functions declared in the Slang standard library are built in on some targets (almost always the case for HLSL) but aren't available on other targets (often the case for GLSL, CUDA, and CPU). To date, the CUDA and CPU targets have worked around this issue by synthesizing definitions of the missing functions on the fly as part of output code generation, at the cost of some amount of code complexity in the emit pass. This change adds definitions inside the stdlib itself for a large number of built-in HLSL functions that act element-wise over both vectors and matrices (e.g., `sin()`, `sqrt()`, etc.), and changes the CPU/CUDA codegen path to not synthesize C++ code for those functions (instead relying on code generated from the Slang definitions). The element-wise vector/matrix function bodies are being defined using macros in the stdlib, so that we can more easily swap out the definitions en masse if we find an implementation strategy we like better. This could involve defining special-case syntax just for vector/matrix "map" operations that can lower directly to the IR and theoretically generate cleaner code after specialization is complete. As a byproduct of this change, the matrix versions of these functions should in principle now be available to GLSL (GLSL only defines vector versions of functions like `sin()`, and leaves out matrix ones). No testing has been done to confirm this fix. In some cases builtins were being declared with multiple declarations to split out the HLSL and GLSL cases, and this change tries to unify these as much as possible into single declarations to keep the stdlib as small as possible. Two functions -- `sincos()` and `saturate()` -- were simple enough that their full definitions could be given in the stdlib so that even the scalar cases wouldn't need to be synthesized, so the corresponding enumerants were removed in `slang-hlsl-intrinsic-set.h`. In the case of `saturate()` the pre-existing definition used for GLSL codegen could have been used for CPU/CUDA all along. In some cases functions that can and should be defined in the future have had commented-out bodies added as an outline for what should be inserted in the future. Most of these functions cannot be implemented directly in the stdlib today because basic operations like `operator+` are currently not defined for `T : __BuiltinArithmeticType`, etc. Adding such declarations should be straightforward, but brings risks of creating unexpected breakage, so it seemed best to leave for a future change. This change does not try to address making vector or matrix versions of builtin functions that map to single `IROp`s, because the existing mechanisms for target-based specialization, etc., do not apply for such cases. In the future we will either have to make those operations into ordinary functions (eliminating many `IROp`s) so that stdlib definitions can apply, or add an explicit IR pass to deal with legalizing vector/matrix ops for targets that don't support them natively. The right path for this is not yet clear, so this change doesn't wade into it. This change does not touch the `Wave*` functions added in Shader Model 6, despite many of these having vector/matrix versions that could benefit from the same default mapping. It is expected that these functions will have GLSL/Vulkan translation added soon, and it probably makes sense to know what cases are directly supported on Vulkan before adding the hand-written definitions. Because of the limitations on what could be ported into the stdlib, it is not yet possible to remove any of the infrastructure for synthesizing builtin function definitions in the CPU and CUDA back-ends.	03 March 2020, 19:49:40 UTC
cbba1f2	jsmall-nvidia	03 March 2020, 00:14:18 UTC	Renamed UnownedStringSlice::size to getLength to make match String. (#1254)	03 March 2020, 00:14:18 UTC
dbd8e8d	jsmall-nvidia	02 March 2020, 22:22:03 UTC	Feature/glsl wave intrinsic (#1253) * Test for some wave intrinsics. More wave intrinsic support on CUDA. * Use shfl_xor_sync. * Improvements around wave intrinsics. Fix built in integer types belong to __BuiltinIntegerType. * Improvements and fixes around Wave intrinsics. * Added WaveIsFirstLane test. No longer use __wavemask_lt, as appears not available as an intrinsic. * Small fixes to CUDA prelude. * Add wave-active-product test. Handle the special case for arbitray sums. * Used macro to implement CUDA wave intrinsics. * First pass at glsl wave intrinsics. Doesn't work in practice because require mechanism to set spir-v version Replace use of _lanemask_lt() for CUDA.	02 March 2020, 22:22:03 UTC
8899c14	jsmall-nvidia	02 March 2020, 21:18:20 UTC	Additional Wave Intrinsic Support (#1252) * Test for some wave intrinsics. More wave intrinsic support on CUDA. * Use shfl_xor_sync. * Improvements around wave intrinsics. Fix built in integer types belong to __BuiltinIntegerType. * Improvements and fixes around Wave intrinsics. * Added WaveIsFirstLane test. No longer use __wavemask_lt, as appears not available as an intrinsic. * Small fixes to CUDA prelude. * Add wave-active-product test. Handle the special case for arbitray sums. * Used macro to implement CUDA wave intrinsics.	02 March 2020, 21:18:20 UTC
b85ca6f	jsmall-nvidia	02 March 2020, 17:30:25 UTC	Feature/profile tool (#1251) * WIP slang-profile * Turn on symbols needed for profile. * Remove calls to slang API from core as doing so broke profiling information. Fix premake so slang-profile works on VS.	02 March 2020, 17:30:25 UTC
6e9f407	jsmall-nvidia	28 February 2020, 14:21:19 UTC	Constant time dynamic cast (#1250) * Constant time dynamic cast. * Use getClassInfo virtual function. Fix problem because of instanciation of specializations was in wrong order for clang. * Improve comments. * Improve comment. * Ensure s_first is defined before kClassInfo, to ensure construction ordering.	28 February 2020, 14:21:19 UTC
5e31e9f	Tim Foley	27 February 2020, 22:29:00 UTC	Fix some IR logic around load from a rate-qualified pointer (#1248) We currently represent the `groupshared` qualifier as a kind of "rate" at the IR level (where a rate can qualify a type to indicate the frequency/rate at which a value is stored and/or computed). This means that when computing the type that a pointer points to, we need to handle both, e.g., `Ptr<Int>` and `@GroupShared Ptr<Int>`. The logic that was trying to handle the rate-qualified case when deducing the "pointee" type of a pointer was somehow written incorrectly, and was querying `getDataType()` on an `IRRateQualifiedType` which is asking for the type of the type itself (null in this case), rather than `getValueType()` which gets the `T` part from a rate-qualified type `@R T`. Somehow none of our tests were hitting this case, and I'm not immediately clear on how to write a targeted reproducer for this, since the problem arose as a debug-only assertion failure in a user shader with thousands of lines.	27 February 2020, 22:29:00 UTC
b0f65ba	jsmall-nvidia	27 February 2020, 15:09:17 UTC	Fix unix ProcessUtil::getClockTick (#1246)	27 February 2020, 15:09:17 UTC
46fd5fe	jsmall-nvidia	26 February 2020, 22:19:53 UTC	Fix for GCC C++ target - remove warning for int literal value. (#1244) * Fix for GCC C++ target - remove warning for int literal value. * Comment around hack to negate -0x8000 0000 * Special case negation of literals in parser - to fix problems with errors on gcc. * Apply the literal integer 'fix' when doing negation of a literal.	26 February 2020, 22:19:53 UTC
7bce066	jsmall-nvidia	26 February 2020, 21:13:41 UTC	Support for RWTexture types on CPU and CUDA (#1243) * Added FloatTextureData as a mechanism to enable CPU based Texture writes. * Add [] RWTexture access for CPU. * Fixed rw-texture-simple.slang.expected.txt * WIP: CUDA stdlib has support for [] surface access. * Made IRWTexture class able to take different locations. Doing a Texture2d access on CUDA works. * Fix bug in outputing UniformState - was missing out padding. Support RWTexture with array. Support RWTexture3D. * Use * for locations for read only textures, so only need a ITexture interface. * Fix problem around application of set/get for CUDA on subscript Texture types.	26 February 2020, 21:13:41 UTC
6308a12	Tim Foley	25 February 2020, 16:12:16 UTC	Fix a crash when a generic value argument isn't constant (#1241) This arose when a user tried to specialize the DXR 1.1 `RayQuery` type to a local variable: ```hlsl RAY_FLAG rayFlags = RAY_FLAG_CULL_FRONT_FACING_TRIANGLES \| RAY_FLAG_CULL_NON_OPAQUE; RayQuery<rayFlags> query; ``` In this case, we issued an error around `rayFlags` not being a constant as expected, but then we also crashes later on in checking because the `DeclRef` that was being used for the type had a null pointer for the generic argument corresponding to `rayFlags`. The main fix here was thus to add an `ErrorIntVal` case that can be used to represent something that should be an `IntVal` but where there was some kind of error in the input code so that the actual value isn't known to the compiler. A secondary fix here is that we were issuing error messages about expecting a constant for a parameter like `rayFlags` there twice, and one of those times was during the `JustChecking` part of overload resolution (when we are not supposed to emit any diagnostics). I fixed that up by allowing the `DiagnosticSink` to be used to be passed down explicitly (and allowing it to be null), while also leaving behind overloaded functions with the old signatures so that all the existing logic can continue to work unmodified.	25 February 2020, 16:12:16 UTC
c4a32e3	Tim Foley	24 February 2020, 21:22:18 UTC	Add two missing RAY_FLAG cases (#1240) These new cases were added in DXR 1.1 (Shader Model 6.5), while the `enum` type came from DXR 1.0, so that I failed to revisit it when adding the new features.	24 February 2020, 21:22:17 UTC
777ac2b	Tim Foley	24 February 2020, 19:46:59 UTC	Fix support for SV_Coverage on GLSL path (#1239) There were two overlapping issues here: 1. We always mapped `SV_Coverage` to `gl_SampleMask`, even though `gl_SampleMaskIn` is the correct built-in variable to use for an input varying. 2. We treated `gl_SampleMask` like it was a scalar shader input, when it and `gl_SampleMaskIn` are actually arrays of indeterminate size (as a byproduct of trying to future-proof for implementations that might support hundreds or thousands of samples per pixel...) The fix here is simple: map to either `gl_SampleMask[0]` or `gl_SampleMaskIn[0]` as appropriate. I suppose that this approach doesn't handle the possibility of eventually supporting >32 samples per pixel by having something like `uint2 coverage : SV_Coverage`, but I think we can cross that bridge when we come to it.	24 February 2020, 19:46:59 UTC
f2c221e	jsmall-nvidia	21 February 2020, 21:59:48 UTC	dxil-asm target output was dxil code not asm. (#1238)	21 February 2020, 21:59:48 UTC
25eeb10	Tim Foley	21 February 2020, 19:42:27 UTC	Add some missing support for Shader Model 6.4/6.5 (#1237) Just adding the enumerants for the new stages/profiles didn't automatically update the code that computes a profile string for passing to fxc/dxc.	21 February 2020, 19:42:26 UTC
830671c	Tim Foley	21 February 2020, 18:39:05 UTC	Add surface syntax for "this type" (#1236) Within the context of an aggregate type (or an `extension` of one), the programmer can use `this` to refer to the "current" instance of the surrounding type, but there is no easy way to utter the name of the type itself. This is especially relevant inside of an `interface`, where the type of `this` isn't actually the `interface` type, but rather a placeholder for the as-yet-unknown concrete type that will implement the interface. This change adds a keyword `This` that works similarly to `this`, but names the current type instead of the current instance. It can be used to declare things like binary methods or factory functions in an interface: ``` interface IBasicMathType { This absoluteValue(); This sumWith(This left); } T doSomeMath<T:IBasicMathType>(T value) { return value.sumWith(value.absoluteValue()); } ``` The `This` type is consistent with the type named `Self` in Rust and Swift (where Rust/Swift use `self` instead of `this`). Other names could be considered (e.g., `ThisType`) if we find that users don't like the name in this change.	21 February 2020, 18:39:05 UTC
433ce86	Tim Foley	21 February 2020, 16:18:31 UTC	Initial support for explicit default initializers (#1235) This change makes it so that for a suitable type `MyType`, a variable declaration like: MyType v; is treated as if it were written: MyType v = MyType(); The definition of "suitable" here is that `MyType` needs to have an available `__init` declaration that can be invoked with zero arguments. I've added a test to confirm that the new behavior works in this specific case. There are a bunch of caveats to the feature as it stands today: * Just because `MyType` has a zero-parameter `__init`, that doesn't mean an array type like `MyType[10]` does, so arrays currently remain uninitialized by default. Fixing this gap requires careful consideration because some, but not all, array types should be default-initializable. * The change here should mean that a `struct` type with a field like `MyType f;` should count as having a default initial-value expression for that field, but I haven't confirmed that. * Even if a `struct` provides initial values for all its fields (e.g., `struct S { float f = 0; }`), that doesn't mean it has a default `__init` right now, so those `struct` types will still be left uninitialized by default. Converging all this behavior is still TBD. Just to be clear: there is no provision or plan in Slang to support destructors, RAII, copy constructors, move constructors, overloaded assignment operations, or any other features that buy heavily into the C++ model of how construction and destruction of values gets done. In fact, I'm not even 100% sure I like having this change in place at all, and I think we should reserve the right to revert it and say that only specific stdlib types get to opt in to default initialization along these lines.	21 February 2020, 16:18:31 UTC
1f401d0	jsmall-nvidia	20 February 2020, 23:24:00 UTC	WIP on RWTexture types on CUDA/CPU (#1234) * CUDA support for array of resources. * * Add support for Texture2DArray on CPU * Expand texture-simple.slang to test Texture2DArray * Reorganise CUDAComputeUtil to split out createTextureResource. * Add TextureCubeArray support for CPU/CUDA targets. * Pulled out CUDAResource Renamed derived classes to reflect that change. * Creation of SurfObject type. * Functions to return read/write access for simplifying future additions. * WIP for RWTexture access on CPU/CUDA. * CUsurfObject cannot have mips. * Ability to set number of mips on test data. Preliminary support for CUsurfObj and RWTexture1D on CUDA. CUDA docs improvements. * Fix typo.	20 February 2020, 23:24:00 UTC
f9d99fd	Tim Foley	20 February 2020, 21:44:03 UTC	Initial support for user-defined initializer/constructor declarations (#1233) The basic idea is that the user can write: ```hlsl struct MyThing { int a; float b; __init(int x, float y) { a = x; b = y; } } ``` and after that point, they can create an intstance of their `MyThing` type as simply as `MyThing(123, 4.56f)`. There was already a large amount of infrastructure laying around that is shared between ininitializers and ordinary functions, so enabling this feature mostly amounted to tying up some loose ends: * In the parser, make sure to properly push/pop the scope for an `__init` (or `__subscript`) declaration, so parameters would be visible to the body * In semantic checking, make sure that declaration "header" checking properly bottlenecks all the function-like cases into a base routine * In semantic checking, make sure that the logic for checking function bodies applies to every `FunctionDeclBase` with a body, and not just `FuncDecl`s * Update semeantic checking for statements to allow for any `FunctionDeclBase` as the parent declaration, not just a `FuncDecl` * In lookup, treat the `this` parameter of an `__init` (well, not actually a parameter in this case) as being mutable, just like for a `[mutating]` method * In IR codegen, don't just assume that all `__init`s are intrinsics, and narrow the scope of that hack to just `__init`s without bodies * In IR codegen, detect when we are emitting an IR function for an `__init`, and in that case create a local variable to represent the `this` value, and implicitly return that value at the end of the body. From that point on the rest of the compiler Just Works and IR codegen doesn't have to think of an `__init` as being any different than if the user had declared a `static MyThing make(...)` function. Caveats: * C++ users might like to use that naming convention (so `MyThing` as the name instead of `__init`). We can consider that later. * Everybody else might prefer a keyword other than `__init` (e.g., just `init` as in Swift), but I'm keeping this as a "preview" feature for now, rather than something officially supported * Early `return`s from the body of an `__init` aren't going to work right now. * There is currently no provision for automatically synthesizing initializers for `struct` types based on their fields. This seems like a reasonable direction to take in the future. * There is no provision for routing `{}`-based initializer lists over to initializer calls. The two syntaxes probably need to be unified at some point so that doing `MyType x = { a, b, c }` and `let x = MyType(a, b, c)` are semantically equivalent. It is possible that as a byproduct of this change user-defined `__subscript`s might Just Work, but I am guessing there will still be loose ends on that front as well, so I will refrain from looking into that feature until we have a use case that calls for it.	20 February 2020, 21:44:03 UTC
dcc3af7	jsmall-nvidia	20 February 2020, 15:34:58 UTC	CUDA/CPU support for 1D, 2D, CubeArray (#1232) * CUDA support for array of resources. * * Add support for Texture2DArray on CPU * Expand texture-simple.slang to test Texture2DArray * Reorganise CUDAComputeUtil to split out createTextureResource. * Add TextureCubeArray support for CPU/CUDA targets.	20 February 2020, 15:34:58 UTC
788556a	Tim Foley	19 February 2020, 22:36:44 UTC	Don't allocate a default space for a VK push-constant buffer (#1231) When a shader only uses `ParameterBlock`s plus a single buffer for root constants: ```hlsl ParameterBlock<A> a; ParameterBlock<B> b; [[vk::push_constant]] cbuffer Stuff { ... } ``` we expect the push-constant buffer should not affect the `space` allocated to the parameter blocks (so `a` should get `space=0`). This behavior wasn't being implemented correctly in `slang-parameter-binding.cpp`. There was logic to ignore certain resource kinds in entry-point parameter lists when computing whether a default space is needed, but the equivalent logic for the global scope only considered parameters that consuem whole register spaces/sets. This change shuffles the code around and makes sure it considers a global push-constant buffer as not needing a default space/set. Note that this change will have no impact on D3D targets, where `Stuff` above would always get put in `space0` because for D3D targets a push-constant buffer is no different from any other constant buffer in terms of register/space allocation. One unrelated point that this change brings to mind is the `[[vk::push_constant]]` should ideally also be allowed to apply to an entry point (where it would modify the default/implicit constant buffer). In fact, it could be argued that push-constant allocation should be the default for (non-RT) entry point `uniform` parameters (while `[[vk::shader_record]]` should be the default for RT entry point `uniform` parameters).	19 February 2020, 22:36:44 UTC
9603fde	Tim Foley	19 February 2020, 21:25:13 UTC	Fix a reference-counting bug in one of the session creation routines. (#1230) This is pretty straightforward, because we were calling `Session::init` (which can retain/release the session) on a `Session*` (no reference held). The catch is that our current tests use the older form of the Slang API, while Falcor relies on the newer API, and so the recent change to our reference-counting logic introduced a regression that we didn't detect in testing. This change just fixes the direct issue but doesn't address the gap in testing. A better long-term fix would be to fully define our "1.0" API, shift our testing to it, and layer the old API on top of it (to try and avoid regressions for client code).	19 February 2020, 21:25:13 UTC
46a1b5f	jsmall-nvidia	19 February 2020, 19:16:38 UTC	Initial partial support for WaveXXX intrinsics on CUDA (#1228) * Start work on wave intrinsics for CUDA. * Add prelimary CUDA support for some Wave intrinsics. Document the issue around WaveGetLaneIndex	19 February 2020, 19:16:38 UTC
1d9152b	Tim Foley	19 February 2020, 16:56:46 UTC	Fixes for DXR 1.1 RayQuery type (#1227) The previous change that added `RayQuery` to the standard library didn't mark it as any kind of intrinsic, so the first fix here was to add the appopriate attribtue to the stdlib declaration of `RayQuery`. Next I found that the legalization pass was obliterating the `RayQuery` type because it had no members, and thus looked like an empty `struct` (which we eliminate for a variety of reasons). I fixed that by adding a check for a target-intrinsic decoration in type legalization. Next I found that the type wasn't emitted correctly because our generic specialization was turning `RayQuery<0>` into a new type `RayQuery_0` (which is what our specialization is designed to do, after all). I then disabled generic specialization for types that are marked as target intrinsics (which probably renders the preceding fix moot). Finally, I found that the emit logic for types in HLSL wasn't handling the case of a generic intrinsic type that didn't also use its own dedicated opcode. I fixes that up by adding a specific case for `IRSpecialize` as a late catch-all. After all these changes, a declaration of a `RayQuery` variable seems to Just Work (even without any new/improved behavior for handling default constructors). One potential gotcha looking forward is that my checks for `IRTargetIntrinsicDecoration` aren't checking what target the decoration is for. This is fine for now because there are only two types using the decoration right now (`RayDesc` and `RayQuery`), and the special cases above are reasonable for both of them. If/when we have more target-intrinsic types with this decoration, and some of them are only intrinsic for specific targets, then we will need to revisit this choice and either: * make these checks perform filtering based on the "current" target (similar to what the emit logic has today), or * (more likely) make the linking and target-specialization step strip out any target-intrinsic decorations that aren't the right one(s) for the current target Note that this change doesn't include a test case yet because I don't have a DXR 1.1 ready version of dxc to test against. I have manually confirmed that appropriate Slang input seems to be producing reasonable HLSL output when using these functions, but I cannot yet try to check that in (using an HLSL file for the expected output would be quite fragile).	19 February 2020, 16:56:46 UTC
45d1a68	jsmall-nvidia	19 February 2020, 00:13:02 UTC	Added support for Targets to TypeTextUtil. (#1226) * Added support for Targets to TypeTextUtil. * Made Function names 'get' and 'find' instead of 'as' in TypeTextUtil.	19 February 2020, 00:13:02 UTC
8ee39e0	jsmall-nvidia	18 February 2020, 19:14:16 UTC	First pass Texture Array support on CUDA/CPU (#1225) * Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. * * Add CPU support for 3d textures * Add support for mip maps to CUDA * Disable warnings in nvrtc * Update CUDA docs * WIP on 3d texture support. * Add support for 3d textures for CPU and CUDA. * CPU and CUDA support for cube maps. * Add CPU support for Texture1DArray. * Support CUDA Layered/Array type in meta library.	18 February 2020, 19:14:16 UTC
e109985	jsmall-nvidia	18 February 2020, 17:40:14 UTC	CUDA/CPU resource coverage (#1224) * Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. * * Add CPU support for 3d textures * Add support for mip maps to CUDA * Disable warnings in nvrtc * Update CUDA docs * WIP on 3d texture support. * Add support for 3d textures for CPU and CUDA.	18 February 2020, 17:40:14 UTC
2c09754	jsmall-nvidia	14 February 2020, 20:06:35 UTC	Feature/cuda coverage (#1223) * Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	14 February 2020, 20:06:35 UTC
dfd3d26	Tim Foley	14 February 2020, 17:43:02 UTC	Add a bunch of stdlib declarations for SM 6.4 and 6.5 (#1221) The main thing this adds is the `RayQuery` type with its `TraceRayInline` method for DXR 1.1. None of these new functions/types/constants have been tested, and many of them are not expected to work at all (e.g., we don't actually have any mesh shader support, so adding them as stage types is just for completeness at the API level). I would like to write some test cases after this is checked in by looking for existing DXR 1.1 examples. We currently have an issue around default initialization that means we probably can't run any DXR 1.1 shaders right now with just the stdlib changes.	14 February 2020, 17:43:02 UTC
fd61c77	jsmall-nvidia	13 February 2020, 17:35:40 UTC	* Fix for unary - on glsl (#1222) * Test to check fix	13 February 2020, 17:35:40 UTC
f07834e	jsmall-nvidia	12 February 2020, 21:47:14 UTC	Nvrtc disable warnings/Float literal improvements (#1220) * Added 'truncate' for fixing floats, for floats near the max value (as opposed to making infinite). Put AreNearlyEqual into Math * Test for ::make static method.	12 February 2020, 21:47:14 UTC
e1e7a6e	jsmall-nvidia	12 February 2020, 15:44:07 UTC	Support for isinf/isfinite/isnan/ldexp (#1219) * Added support ldexp. * Added classify-float.slang test Fixed glsl output. * Added classify-double.slang * Added ldexp test to scalar-double.slang * isnan, isinf, isfinite are macros (on some targets) so remove :: prefix.	12 February 2020, 15:44:07 UTC
fe9d27a	jsmall-nvidia	12 February 2020, 14:15:47 UTC	CUDA barrier/atomic support (#1218) * * Improved fastRemoveAt * Fixed off by one bug * Fixed const safeness with List<> * Made List begin and end const safe. * Revert to previous RefPtr usage. * Fix bug with casting. * Tabs -> spaces. Small fixes/improvements to List. * Improve comment on List. * Group shared/atomic test works on CUDA. * * Enabled CUDA tests for atomics tests * Enabled DX12 test for atomics-buffer.slang Not clear just yet how to implement that for CUDA - it will work with StructuredBuffer. * hasContent -> isNonEmpty * Remove unneeded comment.	12 February 2020, 14:15:47 UTC
9b3e768	jsmall-nvidia	11 February 2020, 21:16:43 UTC	Small improvements around List (#1216) * * Improved fastRemoveAt * Fixed off by one bug * Fixed const safeness with List<> * Made List begin and end const safe. * Revert to previous RefPtr usage. * Fix bug with casting. * Tabs -> spaces. Small fixes/improvements to List. * Improve comment on List. * hasContent -> isNonEmpty	11 February 2020, 21:16:43 UTC
30d0932	jsmall-nvidia	11 February 2020, 19:14:10 UTC	Fix ref counting for sesion - simple test had a path to not releasing compile request. (#1217)	11 February 2020, 19:14:10 UTC
a0174d8	Tim Foley	11 February 2020, 14:44:15 UTC	Make fixes for the attribute-error test case. (#1215) There are two main fixes here: The first is to remove a memory leak in the reflection test tool, in the case where Slang compilation fails. There is no real reason to be using the reflection test tool for tests that produce diagnostics (we have the slangc tool for that), but it makes sense to go ahead and fix the leak rather than work around it. This was one of those leaks that could have been avoided with smart pointers and a COM-lite API. The second issue was that the logic for constructing an `AttributeDecl` based on a user-defined `struct` was not correctly setting the parent of the attribute decl. The code in question was a little hard to follow and had a few steps that didn't seem strictly necessary given its goals, so I went ahead and scrubbed+commented it to just do what made sense to me (and the tests still pass...). I'm not entirely happy with the design and implementation approach for user-defined attributes, so we might want to take another stab at it sooner or later. This change is not meant to address any design issues, and is just about fixing bugs in what is already there.	11 February 2020, 14:44:15 UTC
56771d6	Tim Foley	10 February 2020, 21:09:37 UTC	Fix output GLSL for primitive ID in a geometry shader (#1214) We had been translating an `SV_PrimitiveID` input in a shader over to `gl_PrimitiveID` in GLSL. That translation seemed to work just fine for users, so we thought it was correct. It turns out that `gl_PrimitiveID` is the correct GLSL for a primitive ID input in every stage except a geometry shader. In a geometry shader, `gl_PrimitiveID` is a primitive ID output, and if you want the input case you have to write `gl_PrimitiveIDIn` (note the `In` suffix). This change sets aside my bewilderment at the above long enough to implement a workaround in the GLSL legalization step. I also modified our current geometry shader cross compilation test to make use of an input primitive ID.	10 February 2020, 21:09:37 UTC
60dfb62	Tim Foley	10 February 2020, 18:25:29 UTC	Add attributes to enable dual-source blending on Vulkan (#1210) This change adds support for the `[[vk::location(...)]]` and `[[vk::index(...)]]` attributes, which can be used together to mark up shader outputs for dual-source blending on Vulkan. HLSL/Slang code like the following: ```hlsl struct Output { [[vk::location(0)]] float4 a : SV_Target0; [[vk::location(0), vk::index(1)]] float4 b : SV_Target1; } [shader("fragment")] Output main(...) { ...} ``` can be used to set up dual-source blending on both D3D and Vulkan APIs. The output GLSL for the above will look something like: ```glsl layout(location = 0) out vec4 a; layout(location = 0, index = 1) out vec4 b; void main() { ... } ``` The more or less straightforward parts of this change were: * Added new `attribute_syntax` declarations to the stdlib, for `[[vk::location(...)]]` and `[[vk::index(...)]]` * Added new AST node types for the new attribute cases, sharing a base class so that argument checking can be shared * Added checks for the arguments to the new attributes in `slang-check-modifier.cpp` (eventually this kind of logic shouldn't be needed for new attributes) * Updated GLSL emit logic so that it treats the `index`/`space` parts of a variable layout as the `location`/`index` for varying parameters. * Updated GLSL legalization so that when it translates entry-point parameters into globals (and scalarizes structures) it handles both a binding index and space for the parameters. * Added a cross-compilation test case to verify that the basics of the feature work The remaining work is all in `slang-parameter-binding.cpp`. There is some work that isn't technically related to this change (and which could be reverted if it causes problems), around the detection and handling of fragment shader outputs with `SV_Target` semantics. The basic changes (which could be backed out and then merged separately) are: * Made the special-case `SV_Target` logic only trigger for fragment shaders (that is the only place where `SV_Target` should appear, but we weren't guarding against it) * Made the logic to reserve a `u<N>` register for `SV_Target<N>` only trigger for D3D Shader Model 5.0 and below (since it is not required for SM 5.1 and up). This could be a breaking change for some users, but that seems unlikely. * Fixed one test case that relied on the behavior of reserving `u0` for `SV_Target0` even though it was a SM6.0 test. * Also added more comments to the system-value handling logic. The more interesting changes come up starting in `processEntryPointVaryingParameterDecl()`. The basic issue is that we have so far only supported implicit layout for varying parameters on GLSL/Vulkan, but the `[[vk::location(...)]]` attribute is a form of explicit layout annotation. Rather than try to kludge something that only works in narrow cases, I instead opted to try to fix things more generally. In `processEntryPointVaryingParameterDecl()` we now check for the `location` and `index` attributes when we are on "Khronos" targets (Vulkan/OpenGL/GLSL) and immediately add them to the variable layout being constructed if they are found. There is nothing in this logic specific to fragment-shader outputs, so this feature now applies to any varying input/output on Khronos targets. Allowing explicit layouts creates the potential for mixing implicit and explicit layout. For example, consider: ```hlsl struct Output { float4 color : COLOR; [[vk::location(0)]] float3 normal : NORMAL; } ``` What `location` should `color` get? Should this code be an error? There are two cases where this conundrum can come up: when working with `struct` types used for varying parameters, and the entry-point parameter list itself. For the varying `struct` case we currently make an expedient choice. We handle fields with both implicit or explicit layotu with appropriate logic, but logic that doesn't account for the case of mixing the two. Then at the end of layout for the `struct` we issue an error if there was a mix of implicit and explicit layout (such that our results aren't likely to be valid). For the entry point varying parameter case, things were already using a `ScopeLayoutBuilder` type (that encapsulates some logic shared between entry-point and global parameters). The entry-point-specific bits were moved out into a `SimpleScopeLayoutBuilder` and it was updated so that rather than assuming all parameters use implicit layout it does a two-phase layout approach similar to what we use for the global scope: * First all parameters are enumerated to collect explicit bindings and mark certain ranges as "used" * Next the parameters are enumerated again and those without explicit bindings get allocated space using a "first fit" algorithm In principle we could extend the two-phase approach to apply to `struct` types as well, but that would be best saved for a future refactoring of some of this parameter binding logic, since I would like to exploit more of the opportunities for sharing code across the uniform/varying and struct/entry-point/global cases. By moving the point where entry point parameters get their offsets assigned, it was necessary to move around some of the logic that removes varying parameter usage (and other things that shouldn't "leak" out of an entry point) to a different point in the entry point layout process. While adding these various pieces does not quite enable us to support explicit bindings on entry point parameters (e.g., putting `uniform Texture2D t : register(t0)` in an entry point parameter list) or in `struct` types (e.g., explicit `packoffset` annotations on fields), it starts to provide some of the infrastructure that we'd need in order to support those cases.	10 February 2020, 18:25:29 UTC
0eed012	jsmall-nvidia	08 February 2020, 16:19:31 UTC	Fixes to make all CPU compute shaders work on CUDA (#1211) * Launch CUDA test taking into account dispatch size. * Enable isCPUOnly hack to work on CUDA. * Rename 'isCPUOnly' hack to 'onlyCPULikeBinding'. * Add $T special type. Support SampleLevel on CUDA. * Fix typo.	08 February 2020, 16:19:31 UTC
7de90c1	jsmall-nvidia	07 February 2020, 20:03:30 UTC	Code standard changes for Lexer (#1209) * Upper camel -> lowerCamel m_ prefix members where appropriate _ prefix module local functions * m_ prefix members in Lexer. Fit's standard because type has methods/ctor.	07 February 2020, 20:03:30 UTC
199d1f5	jsmall-nvidia	07 February 2020, 18:10:04 UTC	HLSL Intrinsic coverage test improvements (#1206) * Fix CPP construct when matrix type. * Test intrinsics on float matrices. * Fix typo in _areNearlyEqual test. Increased default sensitivity. Added matrix-float test. * Matrix double test. Fixed some issues with CUDA. * Added reduced intrinsic version of matrix-double test. * Improve matrix double coverage. Test reflect/length etc on vector float. * * Added literal-float test. * Added vector double test * Improved coverage of vector/matrix tests * Disable Dx11 double-vector test because fails on CI. * Disable literal-float, because on CI fails.	07 February 2020, 18:10:04 UTC
af84d85	Tim Foley	07 February 2020, 16:45:32 UTC	Change handling of strings for HLSL/GLSL targets (#1204) * Change handling of strings for HLSL/GLSL targets This change switches our handling of string literals and `getStringHash` to something that is more streamlined at the cost of potentially being less general/flexible. * `String` is now allowed as a parameter type in user-defined functions * `getStringHash` is now allowed to apply to `String`-type values that aren't literals * The list of strings in an IR module is now generated during IR lowering as part of lowering a string literal expression, rather than being defined by recursively walking the IR of the module looking for `getStringHash` calls. The public API still refers to these as "hashed" strings, but they are realistically now "static strings." * When emitting code for HLSL/GLSL, the `String` type emits as `int`, and `getStringHash(x)` emits as `x`. In terms of implementation, the choice was whether to translate `String` over to `int` in an explicit IR pass, or to lump it into the emit pass. While adding the logic to emit clutters up an already complicated step, it is ultimately much easier to make the change there than to write a clean IR pass to eliminate all `String` use. Note that other targets that can handle a more full-featured `String` type are not addressed by this change and thus do not support `String` at all. It may be woth emitting `String` as `const char` on those targets, and emitting string literals directly, but the `getStringHash` function would need to be implemented in the "prelude" then, and we probably want to pick a well-known/-documented hash algorithm before we go that far. This change also brings along some some clean-ups to the `gpu-printing` example, since it can now take advantage of the new functionality of `String`. Fix up tests for new string handling * Add global string literal list to string-literal test (since we now list all static string literals and not just those passed to `getStringHash`) * Disable `getStringHash` test on CPU, since we don't have a working `String` on that platform right now (only HLSL/GLSL) Co-authored-by: Tim Foley <tim.foley.is@gmail.com>	07 February 2020, 16:45:32 UTC
7981da5	jsmall-nvidia	06 February 2020, 23:16:36 UTC	Float matrix intrinsic test/fixes (#1203) * Fix CPP construct when matrix type. * Test intrinsics on float matrices. * Fix typo in _areNearlyEqual test. Increased default sensitivity. Added matrix-float test.	06 February 2020, 23:16:36 UTC
d3331fb	jsmall-nvidia	06 February 2020, 19:31:09 UTC	Literal handling improvements (#1202) * WIP: 64 literal diagnostic and truncation. * Improve how integer truncation is handled/supported. Added literal-int64.slang test. Set a suffix on all literals. Fixed problem on C++ based targets where l suffix was not the same as int() cast. So on C++ derived emitters, int() is used instead of l suffix to have same behavior across targets. * Add literal diagnostic testing. * Allow lexer to lex - in front of literals. * Fix lexing and converting int literal with -. * Too large small values of floats become inf. Handling writing inf types out on different targets. Add function to deterimine if a float literals kind. * Roll back the support of lexer lexing negative literals. * Fixed tests broken because of diagnostics numbers. Improved _isFinite * Fix compilation on linux. * Fix problem with abs on linux - use Math::Abs. * Fix typo. * * Improve warnings for float literals zeroed * Improved 64 bit type documentation * Handle half * Improved comments * Fixed tests broken * Use capital letters for suffixes. * Make default behavior on outputting a int literal that is an 'int32_t' is cast (not suffix) to avoid platform inconsistencies. Improve documentation for 64 bit types. Make tests cover material in docs. * Fixed tests. * Rename FloatKind::Normal -> Finite * Fix half zero check.	06 February 2020, 19:31:09 UTC
9c84cce	Tim Foley	06 February 2020, 16:38:46 UTC	Improve checks and diagnostics around redeclarations (#1201) * Improve checks and diagnostics around redeclarations This change turns checking for redeclarations into a dedicated phase of semantic checking, and ensures that it applies to the main categories of declarations: functions, types, and variables. Note that "variables" here includes function parameters and `struct` fields in addition to the more obvious global and local variables. Some of the logic for checking redeclarations already existed for functions, and was refactored to deal with other cases of declarations. The checking for functions still needs to be special-cased because functions are much more flexible about the kinds of redeclarations that are allowed. In addition to improving the diagnosis of redeclaration itself, this change also changes the error message that is produced when referencing a symbol that is ambiguous due to begin redeclared. This is a small quality-of-life fix, and has the benefit of being much easier to implement than robust tracking of what variables have had redeclaration errors issued so that we can skip emitting an ambiguity error at the use site. A new test case was added to cover the redeclaration cases for variables (but not types or functions), and the test for function parameters was updated to account for the new more universal diagnostic message (since function parameters used to have special-case redeclaration checking). * fixup: missing file	06 February 2020, 16:38:46 UTC
b42b865	Tim Foley	05 February 2020, 21:26:39 UTC	Improve behavior when undefined identifier is a contextual keyword (#1200) The HLSL language has keywords with very common names like `triangle`, and Slang doesn't want to preclude users from using such names for their variables/functions/etc. In addition, Slang adds new keywords on top of HLSL (like `extension`) and we don't want those to prevent us from compiling existing code. As a result, almost all keywords in Slang are contextual keywords, and they can be shadowed by user varaibles. The down-side to making all keywords contextual is that in a case like this: ``` int test() { return triangle; } ``` The identifier `triangle` is not undefined as far as lookup (it is defined as a modifier keyword), so the existing "undefined identifier" logic gets bypassed, and instead we ran into an internal compiler error trying to construct an expression that refers to a modifier keyword. Fortunately, the internal compiler error in that case was overkill, and the compiler already had defensive logic to produce an expression with an error type if it couldn't figure out what the type of a declaration reference should be. The main fix here is thus to emit an "undefined identifier" error instead of an internal compiler error at the point where we see an attempt to reference a declaration that shouldn't be available in an expression context. In order to improve the quality of the diagnostic, the code for constructing declaration references was updated to pass along a source location to be used in error messages.	05 February 2020, 21:26:39 UTC

Newer
Older