Revision history - None - origin: https://github.com/shader-slang/slang

visit type:

Revision	Author	Date	Message	Commit Date
fcac02e	jsmall-nvidia	21 August 2020, 20:04:42 UTC	Vulkan update/NVAPI support (#1511) * First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md.	21 August 2020, 20:04:42 UTC
49067fd	Tim Foley	21 August 2020, 16:47:29 UTC	Fix stdlib declarations for texture Gather() (#1510) Fixes #1507 These operations were failing to take into account the way that array textures require an extra coordinate to be passed in for the primary location (but not the additional offsets). Adding `isArray` to the component count is the existing solution used for similar intrinsics elsewhere in the stdlib, and it is adopted here. Because our test framework isn't really set up to do a lot of texture testing (including having no support for texture arrays), the test added here is just a cross-compilation test that compares output with fxc for comparable input.	21 August 2020, 16:47:29 UTC
5e64ae7	Tim Foley	21 August 2020, 14:33:43 UTC	Another fix for overriding property decls (#1509) * Another fix for overriding property decls The central problem we keep running into with `property` decls in `interface`s comes down to two choices: 1. When a member lookup `obj.someName` or a simple lookup for `someName` produces an overloaded result, we make no attempt to resolve the overloading right away, and instead postpone disambiguation until the point where that expression gets used, in case the context where it gets used can help in disambiguation (a notable case being when there is a call expression `obj.someName(...)` or `someName(...)`). 2. When looking up members in a the scope of a type (either for `obj.someName` or `someName` in the context of a method), we include all results from base types in the set of overloads returned, even in cases where the type has a direct member that "overrides" the inherited one. The combination of these factors means that when a `struct` type implements a `property` to satisfy a requirement of an inherited `interface`, then references to `obj.someProp` end up being ambiguous between the property in the concrete `struct` type and the property it inherits through the `interface`. There is no quick fix possible for issue (2). It might seem that we could just skip over members inherited through `interface`s when doing lookup in a type, but that solution wouldn't apply to inheritance from another `struct` type, or any future scenario where we support default implementations of methods in interfaces. The simple idea of saying that a derived-type member named `M` hides all inherited members named `M` is possible, but would lead to a bad user interface when a type wants to support both a core "bottleneck" method and a bunch of convenience overloads with the same name. That leaves us with issue (1), and trying to find a reasonable fix for it. The common case is that any expression `e` eventually gets used in a context where it will be be subject to disambiguation: * If we form a call expression `e(...)`, then the overload resolution logic will (obviously) work to disambiguate which `e` was meant. * If `e` is used as an argument to another call (`f(... e ...)` or `... + e`), then `e` will be coerced to the expected parameter type for its argument position, and that coercion will disambiguate it (this is the bit that was fixed in #1501) * If `e` is used in another context where a type is expected/known, it will also be coerced: `if(e)`, `int v = e`, etc. The problem case that is left behind is any scenario where `e` is not subject to one of the above resolution cases, which mostly amounts to cases where an expression is never coerced to a single fixed type. There are a few important cases where this occurs today: * When the expression is used as the left-hand side of an assignement (`e = ...`). * When an expression is used to initialize a variable with an implicit type (`let v = e`). * When inferring generic arguments from the value arguments at a call site (`f(e)` where `f` is defined as `f<T>(T v)`) The key connecting thread in each of these cases is that the front-end needs to determine the type of `e` to make progress. Our semantic checking logic already has functions that try to draw a distinction between the two cases: * The `CheckTerm()` operation is supposed to be used when we expect that we will eventually coerce or otherwise diambiguate the term, and also in cases where we don't yet know if a term should name a type or a value * The `CheckExpr()` operation is supposed to be used when we do not expect that we will apply coercion/disambiguation to a term, and need to have assurances that it has been coerced into a non-overloaded expression with a reasonable type The simple part of the fix made here is to make `CheckExpr()` actually do part of what it is suppsoed to (attempt to disambiguate overloaded terms), and then audit all the call sites to `CheckExpr()` to make sure they are actually ones that intend to opt into that logic. The messier part of the fix is dealing with generic argument inference, because we need to extract the type of the disambiguated expression for the purposes of inference, but we don't want to disturb the actual argument list at a call site (because type coercion of the arguments is supposed to handle the disambiguation). This part is done with a bit of special-casing in the overload-resolution context, by adding a method that gets the type or an argument after disambiguation (when possible). * fixup Co-authored-by: Yong He <yonghe@outlook.com>	21 August 2020, 14:33:43 UTC
a8bc598	Yong He	21 August 2020, 08:10:45 UTC	Allow calling a generic function with an existential value (dynamic dispatch) (#1508) * Allow calling a generic function with an existential value (dynamic dispatch). * Fixes per review comments. * Clean up implementation by having `openExistential` return `ExtractExistentialType` instead of a DeclRef to the interface with a `ThisTypeSubstitution`. * More cleanups Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> Co-authored-by: Yong He <yhe@nvidia.com>	21 August 2020, 08:10:45 UTC
11748a7	Tim Foley	20 August 2020, 15:23:51 UTC	Initial support for a using construct (#1506) The basic idea is that if you have a namespace: namespace MyCoolNamespace { void f() { ... } ... } then you can bring the declarations from that namespace into scope with: using MyCoolNamespace; f(); The `using` construct is allowed in any scope where declarations are allowed. As an additional feature, the construct allows and then ignores the keyword `namespace` if it occurs right after `using`: using namespace MyCoolNamespace; Note that unlike in C++, `using` a namespace inside another namespace doesn't implicitly make the symbols available to clients of that namespace: namespace hidden { void secret() {...} ... } namespace api { using hidden; ... } api.secret(); // ERROR: `secret()` isn't a member of `api` The implementation of this feature was relatively simple, although it does leave out more advanced features that might be desirable in the future: * No support for `using MCN = MyCoolNamespace` sorts of tricks to define a short name * No support for `using` anything that isn't a namespace (e.g., to make the members of a type available without qualification) * No support for cases where multiple visible modules have a namespace of the same name (or dealing with overloaded namespaces in general)	20 August 2020, 15:23:51 UTC
b5a4161	jsmall-nvidia	19 August 2020, 19:51:43 UTC	Remove IncludeHandler. (#1505) nvAPI -> NVAPI nvAPIPath -> nvapiPath DxcIncludeHandler don't reference count. nv-api-path -> nvapi-path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	19 August 2020, 19:51:43 UTC
2fffbc5	jsmall-nvidia	19 August 2020, 18:15:36 UTC	Int64 atomic add RWByteAddressBuffer support (#1504) * Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but is necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float) in glsl. WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. * Int64 atomic add RwByteAddressBuffer support. * Fix typo in stdlib for int atomic ByteAddressBuffer. * Small fixes to int64 atomic test. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	19 August 2020, 18:15:36 UTC
b820f34	Yong He	18 August 2020, 20:08:45 UTC	Support initializing an existential value from a generic value. (#1503) * Support initializing an existential value from a generic value. * Remove trailing spaces and clean up debugging code.	18 August 2020, 20:08:45 UTC
9abcb6e	jsmall-nvidia	18 August 2020, 17:42:46 UTC	Support for float atomics on RWByteAddressBuffer (#1502) * Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but is necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float) in glsl. WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	18 August 2020, 17:42:46 UTC
697e7fb	Tim Foley	17 August 2020, 22:28:05 UTC	Attempt to fix lookup for members that "override" (#1501) Our current lookup process always finds all members of a type, which can include both an inherited member (e.g., from an `interface`) and one that logically overrides/implements it. If something downstream doesn't filter this result down and favor the derived member, then an ambiguity error will result. To date, this has mostly been a non-issue because we haven't emphasized inheritance, and the main case we did support (`struct` types implemented `interface` methods) gets disambiguated as part of overload resolution for function calls. Recent changes to support `property` declarations to `interface`s add the possibility for ambiguity between a "base" and "derived" declaration that can't rely on overload resolution for disambiguation. The approach in this PR is to add disambiguation logic to the other main place where the results of lookup get used. If a lookup result is being assigned to a variable, passed to a function, or otherwise used in a case where a value of a specific type is needed, it will be "coerced" to the desired type. This change makes it so that the first step in the coercion logic is to try to disambiguate the expression that is being coerced. In order to ensure that an overloaded expression can be detected and resolved even when just checking if coercion is possible, I needed to update the `canCoerce()` functions to also take the expression that is being tested for coercibility, and not just its type. There is only one case (that I saw) where coercion checks were being made without an expression value available, and that case didn't actually need/want to handle overloading. In order to test the fixes here, I added logic to the `property`-in-`interface` test to make sure that the critical cases work as expected (references to a derived member using "dot syntax" and "implicit `this`" syntax). Alternatives Considered ----------------------- The first attempt at this fix took a simpler approach: I added the disambiguation logic as a post-process on member lookup. That is, given `obj.foo` I would take the `LookupResult` for `foo` and immediately try to filter it to include only the most-derived members. This approach has the major benefit of catching even more use cases of values (and thus helping to ensure that we don't spend forever chasing down more of these ambiguity errors), but it also has two critical problems: 1. If we only trigger disambiguation when looking up `obj.foo`, then we can't do anything to help when `foo` is looked up as an ordinary identifier, but is actually equivalent to `this.foo`. A full fix would require doing this disambiguation on every* name lookup, which leads to the second issue: 2. It is important that for a method call like `obj.m(...)` we do not disambiguate when looking up `obj.m`, and instead let the overload resolution for the call resolve things. That choice is what makes it possible to call an inherited `m` declaration even when there is a derived `m` with a different signature. Issue (1) is covered by the test case that was added here, but we should probably have a test case for (2) to make sure we don't break that use case. Caveats ------- An important case that we don't solve in this PR is when the result of a lookup is captured in a variable without an explicit type: let f = obj.foo; That case also needs disambiguation, and should be addressed in a later change. A secondary issue is that our approach to prioritizing declarations during lookup is still quite naive. We really need a way for lookup to attach information about nesting of scopes to results (to be clear that results from inner scopes should be preferred over those from outer scopes), as well as have a robust mechanism for comparing the priority of members based on the inheritance graph of a type. This change doesn't do anything to make the situation better or worse.	17 August 2020, 22:28:05 UTC
ff2d490	Dietrich Geisler	17 August 2020, 16:50:44 UTC	GPU Foreach Loop (#1498) * GPU Foreach Loop This PR introduces the completed GPU foreach loop and updates the heterogeneous-hello-world example to use it. This PR builds on the previous introduction of the GPU Foreach loop parsing and semantic checking PR (#1482) by introducing IR lowering and emmitting. THe new feature can be used by having a GPU_Foreach loop interacting with a named non-CPP entry point, and using the -heterogeneous flag. * Fix to path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	17 August 2020, 16:50:44 UTC
0640a10	Yong He	14 August 2020, 19:56:43 UTC	Fix tuple type lowering (#1499) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	14 August 2020, 19:56:43 UTC
b37a777	Yong He	14 August 2020, 17:04:32 UTC	Lower existential types. (#1497) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	14 August 2020, 17:04:32 UTC
99366e7	Tim Foley	14 August 2020, 15:31:59 UTC	Fix an issue with explicit enum tag types (#1495) The basic problem here was that in a declaration like: ```hlsl enum Color : uint { Red, Orange, ... } ``` The `: uint` bit is represented as an `InheritanceDecl`, because that is what we use to represent the syntactic form of inheritance clauses like that. At the point where we parse the `InheritanceDecl` we don't yet know whether it represents a base interface or a "tag type" like `uint` in this case. The root problem that is then created is: an `enum` type is not a subtype of its "tag type," and treating it like a subtype can create problems. The main problem that arises is that looking in a type like `Color` will find both the members of color and the members of `uint`. In the case of things like `__init` declarations, that creates a problem where the `Color` type has two different `__init`s that take a `uint`: * The one it inherits from `uint` via that `InheritanceDecl` (even though it shouldn't) * The one it gets via an extension just for conforming to `__EnumType` (a non-user-exposed `interface` in the standard library) Because both of those `__init`s are inherited, neither is preferred over the other one and they create an ambiguity if somebody tries to write: ```hlsl uint u = ...; Colorc = Color(u); ``` The solution used in this PR is to add a compiler-internal modifier to the `InheritanceDecl` that introduces a "tag type" to an `enum`, in an early phase of checking (one of the ones that occurs before it is legal to enumerate the bases of a type). Then the lookup process is modified to ignore `InheritanceDecl`s with that modifier when doing lookup in super-types (since the declaration does not indicate a subtype/supertype relationship). This appears to get the basic feature working again, although it is possible that there are other parts of the compiler that use `InheritanceDecl`s and mistakently assume that all `InheritanceDecl`s introduce subtype/supertype relationships. We probably need to do a significant audit of the code to start being more clear about the nature of the relationships such declarations introduce. Such steps are left to future changes. Co-authored-by: Yong He <yonghe@outlook.com>	14 August 2020, 15:31:59 UTC
2bfe62a	Tim Foley	14 August 2020, 00:56:20 UTC	Support property declarations in interfaces (#1494) There are two main features in this change. First, we allow for `interface`s to declare `property` requirements, which can be satisfied by matching `property` declarations in a type that conforms to the interface: interface IRectangle { property float width { get; } property float height { get; } } struct Square : IRectangle { float size; property float width { get { return size; } } property float height { get { return size; } } } Second, we allow a type to satisfy a `property` requirement with an ordinary field of the same name: struct Rectangle : IRectangle { float width; float height; // no explicit `property` declarations needed } The implementation of these features is mostly in `slang-check-decl.cpp` in the logic for checking conformance of a type to an interface. The first feature simply requires adding logic to checking whether a candidate satisfying `property` declaration matches a required `property` declaration. To do so, it must have the same type, and an accessor to satisfy each of the required accessors. The second feature requires adding logic to synthesize an AST `property` declaration for a type, based on a required `property` declaration and its accessors. This means that, more or less, any type where `this.name` yields a storage location that does what is needed can satisfy a property requirement (there is no specific rule that says the storage needs to be a field, although that is the most likely case). The way that witnesses are stored for property declarations probably merits some description. During IR lowering, an abstract storage declaration like a subscript or `property` more or less desugars away, so that the actual interface requirements correspond to the accessors within it (the `get`, `set`, etc.). This means that a witness table should have entries/keys corresponding to the accessors and not the property itself. The process of finding/recording witnesses for `property` requirements thus installs entries for the individual accessors (with care taken to only install accessor witnesses once we are sure we have witnesses for all the requirements). Currently, the code also installs an entry for the property itself, although that is not strictly required, and might not be something we continue to do long-term. (Aside: it was somewhat surprising that an end-to-end test of `property` declarations in `interface`s Just Worked without any changes to IR lowering.) As we continue to write more code that synthesizes and checks AST expressions/statements, it becomes necessary to refactor the semantic checking logic so that it splits the recursive part (e.g., checking the operands of an assignment) from the validation part (e.g., checking that the assignment itself is valid). It is probably too big of a change to justify at this point, but it might be valuable in the future to have distinct hierarchies that represent unchecked and checked ASTs, with semantic checking mostly being a transformation from one to the other. The benefit of such a change is we could factor out a distinct "builder" API for constructing validated/checked AST nodes, with both semantic checking and AST synthesis being clients of that API.	14 August 2020, 00:56:20 UTC
482fd16	jsmall-nvidia	13 August 2020, 21:28:07 UTC	Added WavePrefixCountBits test. (#1493) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	13 August 2020, 21:28:07 UTC
876968c	Yong He	13 August 2020, 19:17:59 UTC	IR support for Tuple types. (#1492) * Tuple types. * Fix x86 warning * Improved deduplication Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	13 August 2020, 19:17:59 UTC
09adf10	Tim Foley	13 August 2020, 17:36:55 UTC	Allow both traditional and modern property syntax (#1487) The initial change to introduce `property` declarations tied them to a "modern" syntax: property width : float { ... } In practice, a great majority of users assume that properties in Slang will be declared like those in C#: property float height { ... } This change allows both options to parse correctly. The choice made here is to only parse as the "modern" syntax when it can be detected from lookahead (an identifier followed by a `:`), and fall back to the "traditional" syntax otherwise. That choice might not produce the best diagnostic messages around syntax errors in codebases that use the modern syntax, but it is the easiest trade-offs to make. We also add similar disambiguation logic for the `newValue` parameter of a `set` declaration (and other "modern"-style parameters). This strategy cannot be applied to all function parameters in general, because traditional parameter lists can still use `:` to introduce a semantic. Note: the same disambiguation strategy applied here could be used for `let` and `var` declarations: let a : int = 1; let int b = 2; This change does not try to introduce flexibility like that, because it seems unlikely for users to care.	13 August 2020, 17:36:55 UTC
e1ea7ed	Dietrich Geisler	12 August 2020, 17:39:08 UTC	GPU Foreach Parsing and Checking (#1482) This PR introduces parsing and semantic checking for a GPU foreach loop for heterogeneouis programming. A GPU foreach loop takes the form: ``` __GPU_FOREACH(renderer, gridDims, LAMBDA(uint3 dispatchThreadID) { kernelCall(args, ...); }); ``` And will allow the host code to call into a kernel with the correct renderer and grid dimensions. This commit also introduces a hack to unify types in the heterogeneous hello world file, which will hopefully be amended in the future. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	12 August 2020, 17:39:08 UTC
12b0fc6	jsmall-nvidia	12 August 2020, 13:30:25 UTC	Fix based on review comment problem. (#1490) Better split of responsibilites around _begin/_endInst	12 August 2020, 13:30:25 UTC
ec095a3	jsmall-nvidia	11 August 2020, 22:19:11 UTC	Bugfix: WaveActiveCountBits on glsl output. (#1488) * Fix WaveActiveCountBits on glsl output. * Fix warning `could not be inlined because the return instruction is not at the end of the function. This could be fixed by running merge-return before inlining.` from glslang - because we weren't including the CreateMergeReturnPasss on default optimization, and it's assumed in InlineExhaustivePass. * Keep WaveActiveCountBits use the default WaveMask impl. * Fix WaveCountBits calculation. Use WaveActiveBallot instead of the _WaveActiveBallot.	11 August 2020, 22:19:11 UTC
cc64c61	jsmall-nvidia	11 August 2020, 20:16:22 UTC	slang-glslang binaries update, including fixes from PR Bugfix: WaveActiveCountBits on glsloutput. Slang PR #1488 (#1489)	11 August 2020, 20:16:22 UTC
96805c7	jsmall-nvidia	11 August 2020, 15:37:38 UTC	Improvements to Casting (#1483) * Improve handling of cast detection when have a more complex cast than just a single identifier. * Improve comments around heuristic for casting * Added nested enum test. * Improve comments * Define function like - output change. * Use lookup for types in determining if cast or not. * Add _isCast function * Add heuristic test to nested-enum.slang that works if the type test fails. * Change hueristic based on review. Allow (..)( to always be an expression, because if it's a type it will be turned into a cast later. * Fix output of define-function-like.slang - which changes again with improved casting support. * Improve testing for type in cast - if we find a decl and it's not a type, then we know it's not a cast.	11 August 2020, 15:37:38 UTC
2903eb5	jsmall-nvidia	10 August 2020, 21:07:40 UTC	Slang binaries including #1485 fix (#1486) * Fix the minProgramTexelOffset should be -. * Improve comments. * Upgrade of slang-binaries which contains slang-glslang with the texel offset fix. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	10 August 2020, 21:07:40 UTC
023622c	jsmall-nvidia	10 August 2020, 19:52:32 UTC	Glslang texel offset bug fix (#1485) * Fix the minProgramTexelOffset should be -. * Improve comments.	10 August 2020, 19:52:32 UTC
dd980b4	Yong He	08 August 2020, 01:36:01 UTC	AnyValue packing/unpacking pass. (#1480) * AnyValue packing/unpacking pass. * Add diagnostic for types that does not fit in required AnyValueSize. * Add expected test result * Fix warnings.	08 August 2020, 01:36:01 UTC
20af567	jsmall-nvidia	06 August 2020, 21:15:55 UTC	Emit spir-v using MemoryArena to stop memory leak (#1479) * Use m_style for OSFindFilesResult * Refactor of FindFilesResult. * Fixes on linux for FindFiles. * Simplify FindFilesState, and linux support for pattern matching. * Small fixes to linux FindFilesState * Fix typo on linux FindFiles * Fix typo in linux FindFiles. * Renamed some variables, and improved comments on FindFiles. * Improve comments on FildFiles * Small improvements around FindFiles. * Refactor FindFiles again.. into a visitor and function in Path. * Fix some problems on linux. * Fix linux typo. * Renamed os -> find-file-util * find-file-utl -> directory-util * Make delete of PathInfo explicit. * Initialize alwaysCreateCollectedParam . * WIP spir-v emit using MemoryArena * Fix bug in spirv emit. * Fix bug with handling null termination on strings in spirv emit. * Small improvements in comments around emit spirv * Remove the 'dst' from emitOperand - we can only emit to the current inst. * Improve SpirV emit comments. * Don't store the created instruction in the InstConstructScope - as it's always the m_currentInst. Don't return the instruction after _beginInst. Slight comment improvements.	06 August 2020, 21:15:55 UTC
3231048	jsmall-nvidia	05 August 2020, 20:12:55 UTC	Refactor enumerating directory contents (#1478) * Use m_style for OSFindFilesResult * Refactor of FindFilesResult. * Fixes on linux for FindFiles. * Simplify FindFilesState, and linux support for pattern matching. * Small fixes to linux FindFilesState * Fix typo on linux FindFiles * Fix typo in linux FindFiles. * Renamed some variables, and improved comments on FindFiles. * Improve comments on FildFiles * Small improvements around FindFiles. * Refactor FindFiles again.. into a visitor and function in Path. * Fix some problems on linux. * Fix linux typo. * Renamed os -> find-file-util * find-file-utl -> directory-util Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	05 August 2020, 20:12:55 UTC
e713b56	Tim Foley	05 August 2020, 18:47:18 UTC	Change the policy for entry-point uniform parameters on Vulkan (#1476) Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal. Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward: * For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.) * For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets. * For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms. * For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record." * For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types). * The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan. The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently. The big picture is then that a kernel like: ```hlsl void computeMain(uniform float someValue) { ... } ``` will map to output GLSL like: ```glsl layout(push_constant) uniform { float someValue; } U; void main() { ... } ``` If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit: ```hlsl struct Params { float someValue; } void computeMain(uniform ConstantBuffer<Params> params) { ... } ``` (Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.) This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case). The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support: * Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly. * The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer` * The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in. * On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering). * On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer. The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed. A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added. An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding. Co-authored-by: Yong He <yonghe@outlook.com>	05 August 2020, 18:47:18 UTC
6fb2aa7	Yong He	05 August 2020, 17:32:52 UTC	`AnyValue` based dynamic dispatch code gen (#1477) * AnyValue based dynamic code gen * Fix aarch64 build error	05 August 2020, 17:32:52 UTC
092337a	jsmall-nvidia	04 August 2020, 23:34:33 UTC	Sampler Feedback improvements (#1475) * Add the Feedback texture types. Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK. * Starting point to test sampler feedback. * WIP on FeedbackSampler. * Use __target_intrinsic to override the output of sampler feedback types. * Use newer generic syntax for FeedbackTexture. * Reflects Feedback type. * SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK * Added reflection test. * Reneable issue with generics in sampler-feedback-basic.slang * Add methods to FeedbackTexture2D/Array. Make test cover test cases. * Sampler feedback produces DXC code. * Disabled Sampler feedback test - as requires newer version of DXC. * Fix bug in reflection tool output. * Fix problem with direct-spirv-emit.slang.expected due to update to glslang. * Fix direct-spirv-emit.slang * Use SLANG_RESOURCE_EXT_SHAPE_MASK again * Make Feedback be emitted as a textue type prefix. * Add support for GetDimensions to FeedbackTexture2D * WIP on CPU sampler feedback. Update of target compatibility. * Fix some bugs in C++ feedback sampler. Fix GetDimensions for FeedbackTextures. Run 'Compile' test for CPU compute feedback texture test. Update target-compatability.md * Fix GetDimensions call on feedback sampler. * Small documentation improvements. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	04 August 2020, 23:34:33 UTC
de309d9	Yong He	04 August 2020, 18:57:45 UTC	Fix leaks in slang-generate (#1472) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	04 August 2020, 18:57:45 UTC
00b1fe0	Tim Foley	04 August 2020, 17:43:05 UTC	Fix stdlib declarations of bit-shift ops (#1471) The declarations of the left- and right-shift operations in the Slang standard library were set up identically to the declarations of operator binary (and compound binary) operations. A consequence of this choice was that both operands to a shift were expected to have the same type, which can lead to a confusing result. If the user wrote a shift of the form `int >> uint`, then the ordinary promotion rules for Slang would decide to perform the operation on `uint` value, so it would change to `uint(int) >> uint` and perform an unsigned shift, which isn't what the user would expect. The fix implemented here is to make the shift operations be declared separately from the other binary operations, with two generic type parameters instead of one: distinct parameters for the left-hand-side and right-hand side types. Each parameter is only constrained to be a built-in integer type.	04 August 2020, 17:43:05 UTC
79ba927	jsmall-nvidia	03 August 2020, 19:46:16 UTC	First pass support for Sampler Feedback (#1470) * Add the Feedback texture types. Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK. * Starting point to test sampler feedback. * WIP on FeedbackSampler. * Use __target_intrinsic to override the output of sampler feedback types. * Use newer generic syntax for FeedbackTexture. * Reflects Feedback type. * SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK * Added reflection test. * Reneable issue with generics in sampler-feedback-basic.slang * Add methods to FeedbackTexture2D/Array. Make test cover test cases. * Sampler feedback produces DXC code. * Disabled Sampler feedback test - as requires newer version of DXC. * Fix bug in reflection tool output. * Fix problem with direct-spirv-emit.slang.expected due to update to glslang. * Fix direct-spirv-emit.slang * Use SLANG_RESOURCE_EXT_SHAPE_MASK again * Make Feedback be emitted as a textue type prefix. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	03 August 2020, 19:46:16 UTC
9ac5c51	Yong He	01 August 2020, 01:30:10 UTC	Add [anyValueSize] attribute to interfaces and propagate that in the IR. (#1469) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	01 August 2020, 01:30:10 UTC
bb42514	Tim Foley	31 July 2020, 23:33:30 UTC	Fix issues arising around DXR 1.1 RayQuery usage (#1468) This change includes a few different fixes for issues that arose in a user shader that made use of DXR 1.1. The existing solution we had for handling the DXR 1.1 `RayQuery` type relied on the fact that a declaration like: ```hlsl RayQuery<0> myRayQuery; ``` Looks like an undefined variable to existing Slang, while to dxc it is a variable declaration that runs an implicit default constructor (sneaking a bit of C++ into HLSL, but only in a way the standard library can use). Slang was getting away with the fact that this maps to an undefined variable because it turns out that our emit logic would output the exact same declaration for an undefined value (since declaring a variable without initializing it is the simplest way to get an undefined value of a given type in a C-like language). The main bug that arose here was that if the `RayQuery<...>`-typed variable was declared under control flow, then the `undefined` instructions introduced by our SSA pass would actually get inserted into the wrong block. Basically, when a block was trying to read a variable, and there was no preceding `store` to that variable in the block, we'd start looking for incoming values from its predecessor block(s). In the case where the variable never gets stored to, this search would eventually reach the first block of the function, where we'd realize the value must be `undefined`. The result was that we might insert an `undefined` instruction of some `T` into the first block of a function, but the type `T` might be the result of a lookup operation performed later in that function. This ends up creating a use of `T` that isn't dominated by the definition, which violates the SSA property. This violation of the SSA properties lead to us generating incorrect code in a later pass that deals with scoping differences between SSA form and our structured output statements; that code would end up creating a local variable to hold a type instead of a value. The main fix is in `slang-ir-ssa.cpp`, where we catch the case of trying to read a variable in the block that declares it, if there we no preceding `store`s. We simply insert an `undefined` instruction before the first such read and write that out as the value of the variable to be used for subsequent instructions (up to the next `store`). This fixes the SSA dominance property for the `undefined` values that get introduced and thus technically fixes the output code for the user shader. A secondary issue is that it is kind of gross to be relying on the behavior of `undefined` instructions in the IR for the semantics of an important standard library type like `RayQuery<...>`. A preceding change already added basic support for Slang to run default initializers (declared as `__init()`) on variables that are declared without an initial-value expression. This change adds such a default initializer to `RayQuery<...>` and maps it to a dedicated IR instruction that is intended to represent the idea of running a C++-style default constructor to produce a value. It turns out that the code we need to emit in that cse is identical to what we currently emit for `undefined` instructions, so that is helpful. A tertiary issue is that when trying to run the user shader in debug mode, I ran into an assertion because our type layout logic for reflection had never dealt with the issue of user-defined `enum` types being used in constant buffers or other memory that needs layout. I added a quick fix that lays out any `enum` types as their "tag type" (which defaults to `int`). Unfortunately, there is no easy way to check in a regression test for the user issue, because official `dxcompiler` versions with support for DXR 1.1 are not yet released (at least as of last time I checked).	31 July 2020, 23:33:30 UTC
011a743	Dietrich Geisler	31 July 2020, 21:51:52 UTC	Binary for Heterogeneous Example (#1467) * Binary Heterogeneous Example This PR introduces the ability to insert the binary of a non-CPU target by using the -heterogeneous flag. Specifically, this PR updates the emitting logic to produce a variable of name `__[name_of_entryPoint]` when the heterogeneous flag is present. * Prelude path fix Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	31 July 2020, 21:51:52 UTC
4549597	jsmall-nvidia	31 July 2020, 20:31:30 UTC	Upgrade to Glslang 11.0.0 (#1466) * Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but is necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * Small improvements to glslang building process documentation. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	31 July 2020, 20:31:30 UTC
fc8b575	jsmall-nvidia	31 July 2020, 16:25:58 UTC	Fix for bug where memory that has been allocated with new T[] (within a list) is freed with free in the RiffContainer. (#1473)	31 July 2020, 16:25:58 UTC
659d8ea	Yong He	29 July 2020, 00:05:13 UTC	Generalize lowerSimpleIntrinsicType to include generic arguments (#1464) * Generalize lowerSimpleIntrinsicType to include generic arguments * Use recursion instead of loop to get the correct ordering for nested generics	29 July 2020, 00:05:13 UTC
cd10673	Tim Foley	28 July 2020, 22:14:31 UTC	Change parameter passing convention for CUDA (#1463) The Big Picture =============== Given input Slang code like: ```hlsl Texture2D gA; [shader("compute")] void kernelFunc(uniform Texture2D b, uint3 tid : SV_DispatchThreadID) { ... } ``` the existing CUDA code generation strategy would always generate a kernel with a signature like: ```c++ struct GlobalParams { Texture2D gA; } struct EntryPointParams { Texture2D b; } extern "C" __global__ void kernelFunc(EntryPointParams* entryPointParams, GlobalParams* globalParams) { ... } ``` This choice was consistent with the conventions of the CPU kernel target, and shares the advantage that it is easy for the user to data-drive the logic for filling in parameters and then invoking a kernel. However, the approach outlined above has two serious problems when used for CUDA kernels: * First, it defies the programmer's expectation about what an "equivalent" CUDA kernel signature would be, which makes it awkward for a developer to invoke this kernel from CUDA C++ host code (especially in the context of an app that might also run hand-written CUDA kernels). * Second, the performance of this approach suffers because every access to a global or entry point parameter turns into a load from global memory. In contrast, a typical hand-written CUDA kernel passes its parameters via an implementation-specific path that (for current CUDA platforms) seems to be equivalent to `__constant__` memory in performance. This change alters the convention so that the Slang compiler takes the code from the top of this message and translates it into something like: ```c++ struct GlobalParams { Texture2D gA; } __constant__ GlobalParams SLANG_globalParams; extern "C" __global__ void kernelFunc( Texture2D b ) { ... } ``` This translation alleviates both problems with the current translation: * The signature of the generated CUDA kernel function is as close to that of the original as is possible (we had to eliminate the `SV_`-semantic varying inputs), and should directly match what the programmer would expect in common cases. Entry-point parameters are passed via CUDA kernel parameters, and should thus match in performance. Global parameters are passed via a variable in `__constant__` memory, and thus should also perform as well as possible/expected. Detailed Changes ================ * Disable the `collectEntryPointUniformParams` pass for CUDA, so that entry-point `uniform` parameters are not bundles into a single `struct` and/or `ConstantBuffer`. * When targeting CUDA, disable the logic for generating an entry-point parameter for passing in the global shader parameter(s) * Allow `CLikeSourceEmitter` subclasses to override the name generated for entry-point symbols, and use this to add the required prefix for each OptiX kernel type when translating a ray-tracing kernel. * Add logic to emit "parameter groups" in a specialized way for CUDA (this is the same approach that allows us to generate `cbufffer { ... }` declarations for fxc). A global-scope parameter group will turn into a global `__constant__` variable called `SLANG_globalParams` (that name becomes part of the ABI for Slang-compiled shaders). * Update the logic in `render-test` for loading and invoking CUDA kernels to handle the new policy. The last bullet there merits expansion, since it is indicative of the work a client using Slang would have to go through to use our generated kernels with the new policy: * When loading a CUDA module with one or more kernels, we also use `cuModuleGetGlobal` to query the address of the `SLANG_globalParams` symbol in that CUDA module. That pointer needs to be used when setting global parameter values to be used by kernels in that CUDA odule. * Because our existing `BindPoint` logic for CUDA always sets up parameter data in GPU memory, we end up having to copy the entry-point parameter data from GPU memory to host memory. This step would ideally be skipped in a codebase that understands the correct policy, but it is a bit unfortunate that it is no longer trivially correct for an application to store all parameter data in GPU memory. * Before invoking the kernel, we need to use a `cudaMemcpyAsync` to copy from the prepared GPU memory for global parameters over to the `SLANG_globalParams` symbol associated with the kernel to be invoked. Because this operations is issued on the same CUDA stream as the kernel call, it is guaranteed to not overlap with GPU kernel execution. * When invoking the kernel, we take advantage of the seldom-used `CU_LAUNCH_PARAM_BUFFER_POINTER` facility to specify a contiguous memory region with all the entry-point parameters in it instead of passing each entry-point parameter separately. Given Slang reflection it is also possible to query the offset of each entry-point parameter in the buffer, so we could invoke the kernel in the traditional fashion as well. The choice here is up to the application. Caveats ======= * This is a breaking change, and any subsequent release will need to reflect that fact. Any customers who rely on Slang's current CUDA codegen strategy are likely to be surprised by this change, and I don't see an easy way to give them a more gentle transition. * This change does not remove the logic that introduces a `KernelContext` type for code that requires it. That means that things like `static` global variables can continue to work on CUDA for now, but we know that those are not going to be something we can support in the long-term with separate compilation. * While the policy implemented in this change is a reasonable default, it is still not going to perfectly match expecations for some developers. In particular, some developers who are familiar with both D3D and CUDA will likely wonder why a global `cbuffer` in Slang translates to a global-memory pointer in the output CUDA instead of one global `__constant__` variable per `cbuffer`. A more detailed alternate translation would generate a distinct global `__constant__` variable for each top-level constant buffer or parameter block. We may need to refine the translation even more based on feedback from users who care about how we handle global-scope parameters. * Recent changes in Slang have broken the logic that handles the OptiX "shader record" as an alternative mechanism for passing entry-point parameters. In order to get any level of OptiX support up and running we will have to change the IR passes that run on CUDA kernels to actually run the "collection" of `uniform` parameters for ray tracing stages, and then to replace references to the resulting parameter with a call to the function to access the shader record. * The use of `SLANG_globalParams` here works well enough in the case of whole-program compilation; every `CUmodule` ends up with (zero or) one parameter with this name, and an application can just hard-code it. As a mechanism it wouldn't work in the presence of separately-compiled modules that might introduce their own global parameters (including cases like constant lookup tables that really want to be at the global scope). An alternative approach would have Slang generate output PTX for each module, where a module has an optional global symbol for its own global-scope parameters (with a mangled name that is based on the module name), and then a linked CUDA binary has all of those distinct symbols. Such an approach would be compatible with module-at-a-time reflection and parameter binding, but would lead to another breaking change down the line for code that switches to `SLANG_globalParams`.	28 July 2020, 22:14:31 UTC
dce1d35	Tim Foley	28 July 2020, 14:55:41 UTC	Fix support for nested generic intrinsics (#1462) The logic that detects intrinsic functions during emit was not able to properly detect an intrinsic generic method nested in a generic type. The basic problem was that this led to a `specialize(specialize(...), ...)` in the IR, which wasn't being handled (only one level of `specialize` was handled). The fix is local and simple. The larger issue was that the author of this commit had thought our IR ruled out nested generics like this, when in fact that is precisely how we handle nested generics throughout the IR. This oversight/misunderstanding means that we might have broken passes in other places that assume nested generics cannot happen. This change doesn't pretend to fix that other issue, but we should pay attention to it.	28 July 2020, 14:55:41 UTC
348058f	Dietrich Geisler	27 July 2020, 16:14:17 UTC	Baseline Heterogeneous Example (#1460) * Baseline Heterogeneous Example This PR introduces a baseline heterogeneous example, including both a Slang file and an associated C++ helper file. This refactoring primarily moves the Slang file "into the driver's seat" while maintaining that the C++ side still does most of the actual work. * Fix to prelude path	27 July 2020, 16:14:17 UTC
87940a6	Tim Foley	25 July 2020, 01:12:41 UTC	Fix bugs related to mutating implementations of interface methods (#1461) There are two main bug fixes here: * We were failing to diagnose when code calls a `[mutating]` method on a value that doesn't support mutation (that is an r-value instead of an l-value). * We had a bug in the synthesis logic for interface requirements where we used the result type of the requirement in place of each of the parameter types. The second bug made synthesis often produce incorrect signatures with `void` parameters. The first bug meant that even though a `[mutating]` method should not be able to satisfy a non-`[mutating]` method (and we had code to enforce this for the "exact match" case), when we go on to try and synthesize a non-`[mutating]` method that satisfies the requirement by delegating to the user-written one, it would end up succeeding, because nothing was stopping a non-`[mutating]` method from calling a `[mutating]` one. In each case this code adds a fix and a test case to confirm it.	25 July 2020, 01:12:41 UTC
261fe75	Yong He	24 July 2020, 23:37:51 UTC	Ensure labels are dumped in `lower-to-ir` (#1459) * Ensure labels are dumped in `lower-to-ir`. There is a `dumpIR` function that accepts a label parameter already in slang-emit.cpp. This change moves it to slang-ir.cpp so it may be called from other files. * update expected test result Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 July 2020, 23:37:51 UTC
17d0da2	jsmall-nvidia	24 July 2020, 20:41:43 UTC	Enable CUDA for active-mask tests. (#1458) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 July 2020, 20:41:43 UTC
eaf3f04	Tim Foley	24 July 2020, 19:07:39 UTC	Handle case of no global parameters for CPU/CUDA (#1457) The IR pass that introduces an explicit `KernelContext` for the CPU/CUDA back-ends was also responsible for adding an explicit parameter to the kernel entry point to receive the constant buffer (pointer) with all the global uniform parameters. However, if there were no global uniform parameters, this parameter wasn't getting introduced, which changed the signature/ABI of the generated entry point function. This change makes it so that the pass unconditionally adds a parameter. In the case where there are no global uniforms it just adds a `void*` parameter that never gets used. In order to avoid future regressions, this change also adds a test case to confirm that things work correctly when a kernel has only entry-point parameters and no global parameters.	24 July 2020, 19:07:39 UTC
ef9d76c	Yong He	24 July 2020, 17:18:22 UTC	`InterlockedAdd` CPU intrinsic implementation. (#1455) Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 July 2020, 17:18:22 UTC
cb0a08b	jsmall-nvidia	24 July 2020, 15:12:58 UTC	Test frame work improvements (#1452) * Add -hide-ignored Made API filter when enbled filter out non API tests. * Add ability to set categories at file level. Added wave, wave-mask and wave-active categories. * Added -api-only flag. * Don't synthesize tests from only CPU tests. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 July 2020, 15:12:58 UTC
7e952cd	Dietrich Geisler	24 July 2020, 05:50:53 UTC	CPU/GPU Compute Shader Example (#1451) * CPU/GPU Compute Shader Example This PR introduces an example to run a simple compute shader on the GPU in the heterogeneous-hello-world example. All loading code is currently run in C++, so the heterogeneity of this example is still a work in progress. This change updates exactly this example, and so should not cause issues elsewhere in the codebase. * Small fix * Added gfx to help the linker * Added back the struct * Updated premake to respect windows conditions * Completely removed het-example * Re-added example Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 July 2020, 05:50:53 UTC
61be38f	Yong He	23 July 2020, 22:33:04 UTC	Run array specialization in a sperate pass. (#1449) * Run array specialization in a sperate pass. * rename specializeFunctionCall->specializeFunctionCalls Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	23 July 2020, 22:33:04 UTC
fed4292	Yong He	23 July 2020, 20:47:12 UTC	Run SSA pass to clean up temporary variables during generics lowering. (#1447) * Run SSA pass to clean up generic temporary variables during lowering. * Fix `undefined` emitting logic. * revert dumpir control flag * Defer fold decision of `undefined` values after special case logic for GLSL and HLSL. * Update expected test result. * Manually update raygen.slang.glsl to minimize change. * fix formatting Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	23 July 2020, 20:47:12 UTC
e93d3a4	Tim Foley	23 July 2020, 16:20:28 UTC	Fix the way extension declarations are cached for lookup (#1450) During semantic checking, the compiler used to link together `ExtensionDecl`s into a singly-linked list dangling off of the `AggTypeDecl` that they applied to. This approach made lookup relatively easy, because given a `DeclRef` to an `AggTypeDecl` one could easily find and walk the list of candidate extensions. Unfortunately, the simple approach has two major strikes against it: * First, as we recently ran into, it creates a lifetime/ownership problem, in cases where the `ExtensionDecl` is outlived by the `AggTypeDecl` it applies to. This creates the one and only place in the compiler today where an "old" AST node might point to a "new" AST node, and it resulted in use-after-free problems in client code. * Second, the scoping of `extension`s ends up being completely wrong. All of the `extension` methods on a type end up being visible in all cases, instead of just in the context of modules where the `extension` itself is visible. The comparable feature in C# (static extension methods) is careful to not make scoping mistakes like this. The Swift langauge has loose scoping for `extension` more akin to what we have in Slang today, but the maintainers seem to consider it a misfeature. This change attempts to clean up both issues by changing the way that extension declarations are stored. There are two main pieces: 1. The primary "source of truth" for extension lookup has been moved to the `ModuleDecl`, where a module is responsible for storing a cache of the extensions declared within that module (keyed by the declaration of the type being extended). This cache is updated at the same point where the old code would mutate the AST node being depended on. 2. A secondary aggregated cache is added to the `SharedSemanticsContext` used during semantic checking. This cache includes entries from across multiple modules, and is intended to be invalidated and rebuilt on demand if new modules are added during checking. Access to the candidate extensions has now been put behind subroutines that require a semantics-checking context to be passed in (there was always one available in contexts that care about extensions). In addition, the operation for looking up members including those from extensions was refactored heavily to involve internal rather than external iteration and, more importantly, was changed so that it actually tests whether the `ExtensionDecl`s it loops over apply to the type in question, rather than blindly letting extensions members be looked up in ways that don't make sense. There are three test cases added here to confirm aspects of the fix: * First, I added a test that reproduces the crash that was being seen, so that we have a regression test for the fix. * Second, I added a basic semantic-checking test to confirm that an `extension` from an `import`ed module is still visible/usable, to confirm that I didn't break existing valid uses of extensions. * Third, I added a diagnostic test that ensures that we correctly ignore extensions that should not be visible in a given context as a result of `import` declarations. Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>	23 July 2020, 16:20:28 UTC
cf50355	jsmall-nvidia	23 July 2020, 13:37:58 UTC	Fix for vulkan tests failing (#1456) * Clean up device when VKRenderer dtor is run. Added destroy methods to VulkanSwapChain & VulkanDeviceQueue * Small fixes around testing if DeviceQueue is valid. * Disable active-mask tests. Different drivers appear to change the results.	23 July 2020, 13:37:58 UTC
1159204	Dietrich Geisler	20 July 2020, 18:53:23 UTC	Multiple Entry Point Backend (#1437) * Multiple Entry Point Backend This PR introduces changes to the IR linking, emitting, and options for multiple entry points. Specifically, this PR updates several locations to support a (potentially empty) list of entry points, adding list infrastructure and looping over entry points as appropriate. * Formatting change * Updated unknown target case to not require an entry point * Formatting and list consts updates Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	20 July 2020, 18:53:23 UTC
975c5db	jsmall-nvidia	17 July 2020, 20:38:18 UTC	Disable specializing function calls if they have a struct param, that contains an array (#1448) * This code is disabled, it was part of the optimization `Specialize function calls involving array arguments. (#1389)` on github. It is disabled here because it causes a problem when a struct is passed to a function that contains a structured buffer and an array. It is specialized on the struct type, and so those types become parameters to the function. If the struct contains a structured buffer this is a problem on GLSL/VK based targets because currently structured buffers cannot be function parameters. The fix for now is to just disable this optimization. * Fix typo in name of test expected values.	17 July 2020, 20:38:18 UTC
ee75558	jsmall-nvidia	16 July 2020, 20:55:31 UTC	Running generators as separate premake project step (#1441) * Put the running of generators into a separate project, to try and sure the generated products are available for other dependencies when compiling with multiple threads on linux. * Made paths Strings in slang-generate. Made paths use / for path separators (rather than \ on windows which causes some problems with #line). * Make the run-generators proj a utility step. * Made run-generators a StaticLib. * Fix problem with generating when not necessary. * Trying to get abspath to work on linux. * Add run-generator-main.cpp dummy file. * Add comment about the issues around linux and correct build triggering. * Add updated projects. * Remove the run-generators-main.cpp as no longer needed for 'run-generators' tool. Removed the adding of files by default from baseSlangProject Made the run generators project use slang-string.cpp as the file it builds from core. * Add the run-generators VS project.	16 July 2020, 20:55:31 UTC
62079c5	Yong He	16 July 2020, 20:09:17 UTC	Support associatedtype local variables and return values in dynamic dispatch code (#1444) * Refactor lower-generics pass into separate subpasses. * IR pass to generate witness table wrappers. * Support associatedtype local variables and return values in dynamic dispatch code.	16 July 2020, 20:09:17 UTC
5758d16	Yong He	15 July 2020, 19:48:56 UTC	IR pass to generate witness table wrappers. (#1443) * Refactor lower-generics pass into separate subpasses. * IR pass to generate witness table wrappers. * Re-generate vs project files. * Fix x86 build error.	15 July 2020, 19:48:56 UTC
e9d5ecb	Yong He	15 July 2020, 18:39:11 UTC	Refactor lower-generics pass into separate subpasses. (#1442)	15 July 2020, 18:39:11 UTC
723c9b1	Tim Foley	15 July 2020, 16:31:27 UTC	Remove KernelContext wrapper from CPU/CUDA emit (#1440) * Remove KernelContext wrapper from CPU/CUDA emit Currently, the CPU and CUDA C++ targets rely on a `KernelContext` type that is generated during emit, as a way to provide implicit access to things that were global in the input Slang code, but that can't actually be emitted as globals in the target language (because the semantics of global declarations differ). For example, input like: ```hlsl ConstantBuffer<Stuff> gStuff; // shader parameter groupshared int gData[1024]; // thread-group shared variable static int gCounter = 0; // "thread-local" global-scope variable void subroutine() { ... } [shader("compute")] void computeMain() { ... } ``` would translate to output C++ for CPU a bit like: ```c++ struct KernelContext { ConstantBuffer<Stuff> gStuff; int gData[1024]; int gCounter = 0; void subroutine() { ... } void computeMain() { ... } }; ``` Note that both `computeMain()` and `subroutine()` are non-`static` members functions on `KernelContext`, so they have an implicit `this` parameter of type `KernelContext`, which allows the bodies of those functions to implicitly reference `gStuff`, etc. by name in their bodies. Because `KernelContext::computeMain()` is a member function, we end up emitting an additional global-scope function to expose the entry point to the outside world, and that function is responsible for declaring a local `KernelContext` and invoking the generated entry point on it. This approach has several important drawbacks: * It complicates the emit logic for CPU and CUDA, with many special cases around when/how things get emitted * It complicates the implementation of dynamic dispatch, because what seems like a function pointer in Slang IR needs to be a pointer-to-member-function in C++. * It makes it difficult to have a non-kernel-oriented mode of compilation for CPU where a Slang function with a given signature gets output as a C++ CPU function with the "same" signature (not wrapped up as a member function of `KernelContext`. This change makes a step toward addressing these issues by making the introducing of the `KernelContext` type be something that is done in an explicit IR pass instead of being handled as part of the last-mile emit logic. The most important change is the removal of code related to `KernelContext` from the `slang-emit-{cpp,cuda}.{h,cpp}` files, with the equivalent logic instead being handled in a new pass in `slang-ir-explicit-global-context.{h,cpp}`. It should be noted that further cleanups to the emit logic should now be possible; in particular, both the CPU and CUDA emit paths are manually sequencing the `EmitAction`s instead of relying on the default logic, but at this point they should be able to just use the default. The additional cleanups are left for future work. The explicit IR pass does more or less what one would expect: it identifies global-scope entities (global variables and parameters) that need to be wrapped and turns them into fields of a `KernelContext` type. It then modifies all entry points to initialize a `KernelContext` as part of their startup. Finally, any code that used to refer to the global entities is changed to refer to a field of the context, with the context passed via new function parameters (the new parameter is only added to functions that need it for now). Transforming global variables into fields of a `KernelContext` type in the IR pass ends up dropping their initial-value expressions (since those were attached as basic blocks on the `IRGlobalVar`). To avoid breaking code that relies on global-scope (but thread-local) variables, this change also adds an explicit pass that takes the initialization logic on all global variables and moves it to explicit logic that runs at the start of every entry point in a linked module (`slang-ir-explicit-global-init.{h,cpp}`). This pass would also be useful when we get back to direct SPIR-V emit, since SPIR-V also requires initialization logic for globals to be emitted into entry points. One complication that arises when the IR is introducing the types for entry-point parameters, global-scope parameters, and the `KernelContext` type is that it becomes harder for the emit logic to utter the names of those types (they might not even have names, since `IRNameHint`s might get stripped). This created a problem since the wrapper operations that were being generated for CPU were taking `void` parameters and casting them to the appropriate type. To work around this issue, we have added an explicit IR pass (`slang-ir-entry-point-raw-ptr-params.{h,cpp}`) that transforms the signature of entry points so that any pointer parameters instead become raw pointer (`void`) parameters, with the casting being handled inside the entry point itself. One consequence of all the above changes is that for the CUDA target we no longer need a wrapper function to invoke the generated entry point any more, because the IR function for the entry point ends up having the correct/expected signature already. This is also the case for CPU when it comes to the `_Thread` wrapper function, but this change doesn't try to eliminate the wrapper because of a belief that the `_Thread`-level interface is going away anyway. Because the IR is now responsible for ensuring the signature of the IR entry point for CUDA and CPU is what is expected, I needed to modify the `slang-ir-entry-point-uniforms` pass to always create an explicit parameter for the entry point uniforms when compiling for CUDA/CPU, even if there were no `uniform` parameters on the entry point as written. This also ended up requiring some tweaks to the parameter layout logic to ensure that CPU/CUDA targets always treat `ConstantBuffer<T>` as a `T` even in the case where `T` is an empty `struct` type (which happens when we construct a `struct` type to represent the uniform parameters of an entry point with no uniform parameters...). There are several future changes that can/should build on this work: We should change the generated signatures for CUDA kernels, so that they don't rely on `KernelContext` for global-scope parameters. At that point we can avoid generating a `KernelContext` at all for CUDA, except when a program uses global-scope thread-local variables. * We should figure out how to make the "ABI" for dynamic-dispatch calls ensure that the kernel context is either always passed, or always not passed. Making a hard-and-fast rule as part of the calling convention for dynamic calls would ensure that they access through the context continues to work with dynamic calls (this change might break it in some cases). * We should figure out how to handle the layout for the `KernelContext` in cases where a program is composed of multiple separately-compiled modules. Right now the layout of the `KernelContext` requires global knowledge (as does the pass that introduces explicit initialization for global-scope thread-locals). * We should try to further clean up the CPU/CUDA C++ emit logic to fall back on the default emit behavior more, now that the various special-case approaches that were taken are no longer needed * fixup: restore build files to default configuration	15 July 2020, 16:31:27 UTC
48f26ef	Yong He	13 July 2020, 22:16:09 UTC	Dynamic code gen for functions returning generic types. (#1439) * Dynamic code gen for functions returning generic types. * Add expected test result.	13 July 2020, 22:16:09 UTC
249f48d	Tim Foley	10 July 2020, 21:30:57 UTC	CUDA/CPU varying compute inputs as IR pass (#1438) The main change here is that the CPU and CUDA C++ emit paths now rely on an earlier IR pass to legalize the varying parameter list of a kernel and translate references to varying parameters with semantics like `SV_DispatchThreadID`. Doing so removes a lot of special-case logic from the emit passes. This work moves us even closer to being able to eliminate `KernelContext` from the CPU/CUDA emit logic, because it removes the issue of state related to varying inputs being stored in `KernelContext`. The new pass that handles the legalization is in `slang-ir-legalize-varying-params.cpp`, and it borrows heavily from the existing `slang-ir-glsl-legalize.cpp` pass. The new pass factors out the target-independent and target-dependent logic, so that both CPU and CUDA can share much of the same code despite having very different rules for how the system-value parameters are being provided. An eventual goal is to have the new pass also handle the GLSL case, but doing so requires copying even more logic out of the GLSL-specific pass, and doing so seemed like a step to far for what was meant to be a stepping-stone change as part of other work. As a result of the incomplete nature of the pass, certain cases don't work for compute shader inputs for CPU/CUDA (e.g., wrapping your varying inputs in a `struct` type parameter), but those were cases that also didn't work in the existing `emit`-based logic. One major consequence of this change is that the logic for emitting the various different functions that represent an entry point for our CPU back-end has been streamlined and simplified. The original logic had a fair bit of cleverness built in to try and avoid unnecessary math ops when computing the various IDs/indices, while the new logic is much more simplistic (the main dispatch function loops over threadgroups with a triply-nested `for` and then delegates to the group-level function loops over threads with its own nested `for`s). Longer term, it will be important to simplify the CPU functions we emit further, by eliminating things like the `_Thread` function that should never really be exposed to users (the minimum granularity of invoking a CPU compute kernel should be a single threadgroup). We may eventually decide to synthesize all of the extra code that is being generated in the `emit` pass as IR instead.	10 July 2020, 21:30:57 UTC
6aad38a	Tim Foley	10 July 2020, 18:14:11 UTC	Fix a preprocessor bug affecting X-macros (#1436) * Fix a preprocessor bug affecting X-macros Fixes #1435 This bug exhibited as nondeterministic output from the preprocessor in release builds, but using a debug build it was narrowed down to a use-after-free issue. The core problem is subtle, but relates to how we set up the linked list that represents the "busy" status of macros in a particular expansion environment. Consider this scenario: ```hlsl X(A) ``` The flow we expect from the preprocessor is something like: 1. Read the `X` token in `X(A)` and recognize the start of a function-like macro invocation. Create an expansion environment for `X`, with the global environment as a parent, read in the arguments (just `A`), and push that expansion onto the stack. 2. Read the `M` token that starts the expansion of `M`, and recognize it as an invocation of the object-like macro representing the argument `M`. Create an expansion environment for the definition of `M` (which is just `A`), and push it onto the stack. 3. Read the token `A` from the expansion for the argument `M`, and recognize it as an invocation of the function-like macro `A`. Create an expansion environemnt for `A`, with the current environment as its parent, read in the arguments (just `0`), and push that expansion onto the stack. 4. Read the token `y` from the expansion for `A`, and recognize it as an invocation of the object-like macro representing the argument `y`. Create an expansion environment for the definition of `y` (which is just `0`) and push it onto the stack. 5. Read `0`. 6. Read a bunch of end-of-file tokens that cause all of these expansions to be popped. That all looks fine as written, but the gotcha is that the input stream for the expansion in step (2) is only a single token (`A`), which means that during step (3) the current input stream at the time we create the macro expansion for `A` is at the end of its input, and by the time we've read in the macro arguments that expansion will have been popped. The problem, then, is that the logic for setting up the stack of "busy" macros was being performed at the beginning of the expansion (the part referred to as "create an expansion" above), when it should only have been set up as part of pushing the xpansion onto the stack (since at that point we have a guarantee that the parent expansion cannot be popped until the child expansion has been). The fix here is thus pretty simple: we already have distinct operations for `initializeMacroExpansion()` and `pushMacroExpansion()`, and I simply moved the logic for setting up the "busy" state from the former to the latter. * fixup: typo	10 July 2020, 18:14:11 UTC
2503280	Yong He	10 July 2020, 16:13:50 UTC	Dynamic code gen for generic local variables. (#1434) * Dynamic code gen for generic local variables. * Fixes to function calls with generic typed `in` argument. * Fixes per code review comments	10 July 2020, 16:13:50 UTC
a5a67aa	Yong He	08 July 2020, 20:53:10 UTC	Checkin .clang-format and an example file for discussion (#1373) * Checkin .clang-format and an example file for discussion * Update clang-format settings per discussion comments * update .clang-format * Move .clangformat file to extras/ folder Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	08 July 2020, 20:53:10 UTC
9590948	Tim Foley	08 July 2020, 20:52:40 UTC	Add support for global uniform shader parameters (#1433) * Adding support for global uniform shader parameters This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope: ```hlsl uniform float gScaleFactor; void main() { ... = gScaleFactor; ... } ``` The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of: ```hlsl struct GlobalParams { float gScaleFactor; } cbuffer globalParams { GlobalParams globalParams; } void main() { ... = globalParams.gScaleFactor; ... } ``` The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation). The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic. A more detailed guide to the changes included follows: * The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters). * To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module. * Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR). * Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T`). Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing. * Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels. * Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step. * Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level. * Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding. * Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information. An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking. There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information. Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation or we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases. * fixup: missing files * fixup: comment the new code	08 July 2020, 20:52:40 UTC
cfb41bb	Dietrich Geisler	07 July 2020, 21:46:02 UTC	Public Keyword for Functions (#1432) This PR introduces support for the public modifier for functions. This keyword allows labelled functions to be written to the compiled without having a link to an entry point. The goal of this change is to help support heterogeneous design of Slang by permitting C++ code to interact with CPU slang functions. Internally, this PR adds the public decoration to the IR and defines a lowering from the public modifier in the AST to this decoration. Additionally, the Keep Alive decoration is added to any public modifier being lowered, which prevents DCE from eliminating functions labelled with the public keyword. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	07 July 2020, 21:46:02 UTC
f8cc28c	Yong He	07 July 2020, 20:25:47 UTC	Add a test case for dynamic dispatch with `This` type in interface decl. (#1431) * Add a test case for dynamic dispatch with `This` type in interface decl. * Update comments * fix typo in comments Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	07 July 2020, 20:25:47 UTC
1301f6b	Dietrich Geisler	07 July 2020, 15:57:56 UTC	Multiple Entry Point Cleanup (#1427) * Multiple Entry Point Cleanup This commit provides some in-code cleanup of the previous multiple entry point PR (#1411). Specifically, this PR provides refactoring of multiple entry point functions into helper functions, the removal of the EntryPointAndIndex struct, and various stylistic improvements. * Minor updates Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	07 July 2020, 15:57:56 UTC
cf62f13	Yong He	06 July 2020, 18:58:14 UTC	ShortList<T> and core.natvis improvements. (#1430) * ShortList<T> and core.natvis improvements. * Fix gcc build. * add `getBuffer()` accessor to `GetArrayViewResult`	06 July 2020, 18:58:14 UTC
ffd0b9c	Yong He	03 July 2020, 19:37:17 UTC	Emit pointers for CPU target. (#1418) Co-authored-by: Yong He <yhe@nvidia.com>	03 July 2020, 19:37:17 UTC
dfc9100	jsmall-nvidia	02 July 2020, 20:32:16 UTC	Bug fix in C++ extractor (#1429) * Fix bug from change in diagnostics. * Catch exceptions and display a message on problem with C++ extractor.	02 July 2020, 20:32:16 UTC
5fc0185	Tim Foley	02 July 2020, 18:45:59 UTC	Attempt to silence some warnings (#1428) * Attempt to silence some warnings This is an attempt to change code in `slang-ast-serialize.cpp` so that it doesn't trigger a warning(-as-error) in one of our build configurations. The original code is fine in terms of expressing its intent, so the right answer might actually be to silence the warning. * fixup: make sure to actually initialize	02 July 2020, 18:45:59 UTC
54675a3	jsmall-nvidia	02 July 2020, 13:09:52 UTC	Only call m_api functions if m_api has been validly set on dtor of VulkanDeviceQueue. (#1426)	02 July 2020, 13:09:52 UTC
6cbb88f	jsmall-nvidia	01 July 2020, 21:44:46 UTC	Disable dynamic dispatch tests on CUDA - as fails with exception about unhandled op. (#1425)	01 July 2020, 21:44:46 UTC
8c33e7b	jsmall-nvidia	01 July 2020, 20:11:56 UTC	Ignore tests that don't have all the rendering APIs they require available. (#1419)	01 July 2020, 20:11:56 UTC
5c15329	jsmall-nvidia	01 July 2020, 18:20:42 UTC	Fix bug in slang-dxc-support where it didn't get the source path correctly (#1420) * Fix handling of UniformState from #1396 * * Fix bug in slang-dxc-support where it didn't get the source path correctly * Make entryPointIndices const List<Int>&	01 July 2020, 18:20:42 UTC
69a0595	jsmall-nvidia	01 July 2020, 16:45:02 UTC	Fix handling of UniformState from #1396 (#1417)	01 July 2020, 16:45:02 UTC
8ced9d2	Tim Foley	30 June 2020, 19:25:27 UTC	Clean up unused code for IR object ownership (#1416) There was a small but non-trivial amount of code across `IRModule`, the `ObjectScopeManager`, and `StringRepresentationCache` that had to do with managing the lifetimes of `RefObject`s that might be referenced by IR instructions (and thus need to be kept alive for the lifetime of the IR module). We have long since migrated to a model where IR instruction do not include owned references to `RefObject`s, so these facilities weren't actually needed. This streamlines `IRModule`'s declaration, and trims code that we aren't actually using. One note for the future is that the `StringRepresentationCache` no longer does what its name implies (it is not a cache of `StringRepresentation`s), so we should consider giving it a more narrowly scoped name. I didn't include that in this change because I wanted to keep the diffs narrow and easy to review. A follow-on renaming change should be trivial if/when we can agree on what the type should be called at this point. Alternatively, we could simply bake the functionality of `StringRepresentationCache` into he IR deserialiation logic itself, since that is the only code using it.	30 June 2020, 19:25:27 UTC
dc44b08	Tim Foley	30 June 2020, 17:01:09 UTC	Initial work on property declarations (#1410) * Initial work on property declarations Introduction ============ The main feature added here is support for `property` declarations, which provide a nicer experience for working with getter/setter pairs. If existing code had something like this: ```hlsl struct Sphere { float4 centerAndRadius; // xyz: center, w: radius float3 getCenter() { return centerAndRadius.xyz; } void setCenter(float3 newValue) { centerAndRadius.xyz = newValue; } // similarly for radius... } void someFunc(in out Sphere s) { float3 c = s.getCenter(); s.setCenter(c + offset); } ``` It can be expressed instead using a `property` declaration for `center`: ```hlsl struct Sphere { float4 centerAndRadius; // xyz: center, w: radius property center : float3 { get { return centerAndRadius.xyz; } set(newValue) { centerAndRadius.xyz = newValue; } } // similarly for radius... } void someFunc(in out Sphere s) { float3 c = s.center; s.center = c + offset; } ``` The benefits at the declaration site aren't that signficiant (e.g., in the example above we actually have slightly more lines of code), but the improvement in code clarity for users is significant. Having `property` declarations should also make it easier to migrate from a simple field to a property with more complex logic without having to first abstract the use-site code using a getter and setter. An important future benefit of `property` syntax will be if we allow `interface`s to include `property` requirements, and then also allow those requirements to be satisfied by ordinary fields in concrete types. Subscripts ---------- The Slang compiler already has limited (stdlib-use-only) support for `__subscript` declarations, which are conceptually similar to `operator[]` from the C++ world, but are expressed in a way that is more in line with `subscript` declarations in Swift. A `SubscriptDecl` in the AST contains zero or more `AccessorDecl`s, which correspond to the `get` and `set` clauses inside the original declaration (there is also a case for a `__ref` accessor, to handle the case where access needs to return a single address/reference that can be atomically mutated). A major goal of the implementation here is to re-use as much of the infrastructure as possible for `__subscript` declarations when implementing `property` declarations. Nonmutating Setters ------------------- One additional thing added in this change is the ability to mark a `set` accessor on either a subscript or a property as `[nonmutating]`, and indeed all of the existing `set` accessors declared in the stdlib have been marked this way. The need for this modifier is a bit subtle. If we think about a typical subscript or property: ```hlsl struct MyThing { int f; property p : int { get { return f; } set(newValue) { f = newValue; } } } ``` it is clear we want the `set` accessor to translate to output HLSL as something like: ``` void MyThing_p_set(inout MyThing this, int newValue) { this.f = newValue; } ``` Note how the implicit `this` parameter is `inout` even though we didn't mark anything as `[mutating]`. This is the obvious thing a user would expect us to generate given a property declaration. Now consider a case like the following: ```hlsl struct MyThing { RWStructuredBuffer<int> storage; property p : int { get { return storage[0]; } set(newValue) { storage[0] = newValue; } } } ``` This new declaration doesn't require (or want) an `inout` `this` parameter at all: ``` void MyThing_p_set(MyThing this, int newValue) { this.storage[0] = newValue; } ``` In fact, given the limitations in the current Slang compiler around functions that return resource types (or use them for `inout` parameters), we can only support a `set` operation like this if we can ensure that the `this` parameter is considered to be `in` instead of `inout`. This is exactly the behavior we allow users to opt into with a `[nonmutating] set` declaration. All of the subscript operations in the stdlib today have `set` accessors that don't actually change the value of `this` that they act on (e.g., storing into a `RWStructuredBuffer` using its `operator[]` doesn't change the value of the `RWStructuredBuffer` variable -- just its contents). We'd gotten away without this detail so far just because `set` accessors were only being declared in the stdlib and they were all implicitly `[nonmutating]` anyway, so it never surfaced as an issue that the code we generated assumed a setter wouldn't change `this`. Implementation ============== Parser and AST -------------- Adding a new AST node for `PropertyDecl` and the relevant parsing logic was mostly straightforward. The biggest change was allowing a `set` declaration to introduce an explicit name for the parameter that represents the new value to be set. This change also adds a `[nonmutating]` attribute as a dual to `[mutating]`, for reasons I will get to later. Semantic Checking ----------------- The `getTypeForDeclRef` logic was updated to allow references to `property` declarations. Some of the semantic checking work for subscripts was pulled out into re-usable subroutines to allow it to be shared by `__subscript` and `property` declarations. The checking of accessor declarations, which sets their result type based on the type of the outer `__subscript` was changed to also handle an outer `property`. Some special-case logic was added for checking of `set` declarations to make sure that their parameter is given the expected type. Some logic around deciding whether or not `this` is mutable had to be updated to correctly note that `this` should be mutable by default in a `set` accessor, with an explicit `[nonmutating]` modifier required to opt out of this default. (This is the inverse of how a typical method or `get` accessor works). IR Lowering ----------- The good news is that after IR lowering, access to properties turns into ordinary function calls (equivalent to what hand-written getters and setters would produce), so that subsequent compiler steps (including all the target-specific emit logic) doesn't have to care about the new feature. The bad news is that adding `property` declarations has revealed a few holes in how IR lowering was handling `__subscript` declarations and their accessors, so that it didn't trivially work for the new case as-is. The IR lowering pass already has the `LoweredValInfo` type that abstractly represents a value that resulted from lowering some AST code to the IR. One of the cases of `LoweredValInfo` was `BoundSubscript` that represented an expression of the form `baseVal[someIndex]` where the AST-level expression referenced a `__subscript` declaration. The key feature of `BoundSubscript` is that it avoided deciding whether to invoke the getter, the setter, or both "too early" and instead tried to only invoke the expected/required operations on-demand. This change generalizes `BoundSubscript` to handle `property` references as well, so it changes to `BoundStorage`. Making the type handle user-defined property declarations required fixing a bunch of issues: * When building up argument lists in the IR, we need to know whether an argument corresponds to an `in` or an `out`/`inout` parameter, to decide whether to pass the value directly or a pointer to the value. Some of the logic in the lowering pass had been playing fast and loose with this, so this change tries to make sure that whenever we care computing a list of `IRInst` that represent the arguments to a call we have the information about the corresponding parameter. Similarly, when emitting a call to an accessor in the IR, the information about the expected type of the callee was missing/unavailable, and the code was incorrectly building up the expected type of the callee based on the types of the arguments at the call site. The logic has been changed so that we can extract the expected signature of an accessor (how it will be translated to the IR) using the same logic that is used to produce the actual `IRFunc` for the accessor (so hopefully both will always agree). * Dealing with `in` vs. `inout` differences around parameters means also dealing with the "fixup" code that is used to assign from the temporary used to pass an `inout` argument back into the actual l-value expression that was used. That logic has all been hoisted out of the expression visitor(s) and into the global scope. Future Work =========== The entire approach to handling l-values in the IR lowering pass is broken, and it is in need a of a complete rewrite based on new first-principles design goals. While something like `LoweredValInfo` is decent for abstracting over the easy cases of r-values, addresses, and a few complicated l-value cases like swizzling, it just doesn't scale to highly abstract l-values like we get from `__subcript` and `property` declarations, nor other corner cases of l-values that we need to handle (e.g., passing an `int` to an `inout float` parameter is allowed in HLSL, and performs conversions in both directions!). It Should be Easy (TM) to extend the logic that tries to synthesize an interface conformance witness method when there isn't an exact match to also support synthesizing a property declaration (plus its accessors) to witness a required property when the type has a field of the same name/type. * fixup: pedantic template parsing error (thanks, clang!) * fixup: cleanups and review feedback * Removed some `#ifdef`'d out code from merge change * Added proper diagnostics for accessor parameter constraints, which led to some fixes/refactorings * Added a test case for the accessor-related diagnostics	30 June 2020, 17:01:09 UTC
47b43f8	Dietrich Geisler	29 June 2020, 21:42:12 UTC	Backend for Multiple Entry Points (#1411) * Backend for Multiple Entry Points Introduces the basic backend on the compiler for zero or more entry points. Entry points have been extended to lists for several functions, with loopFunctions have been extended to take in entry points and indices as appropriate, to allow for multiple entry points once the frontend is expanded. Several functions are currently being assumed to have a single entry point for simplicity and provide a work in progress commit. * Progress on debugging fixes * Tests passing * Refactored emitEntryPoints * Updated lists to be by constant reference * Fixes to formatting * Refactoring updates for the compiler * Fix for compilation errors * Reformatting * More reformatting * Moved struct around to help with compilation Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	29 June 2020, 21:42:12 UTC
3e8bdb6	Yong He	26 June 2020, 18:59:33 UTC	Merge pull request #1408 from csyonghe/dyndispatch2 Dynamic dispatch for generic interface requirements and `associatedtype`	26 June 2020, 18:59:33 UTC
4e44398	Yong He	26 June 2020, 17:20:01 UTC	Merge remote-tracking branch 'official/master' into dyndispatch2	26 June 2020, 17:20:01 UTC
d084f63	jsmall-nvidia	26 June 2020, 16:40:31 UTC	AST serialize improvements (#1412) * Try to fix problem with C++ extractor concating tokens producing an erroneous result. * Improve naming/comments around C++ extractor fix. * Another small improvement around space concating when outputing token list. * Handle some more special cases for consecutive tokens for C++ extractor concat of tokens. * WIP AST serialization. * Comment out so compile works. * More work on AST serialization. * WIP AST serialize. * WIP AST Serialization - handling more types. * WIP: Compiles but not all types are converted, as not all List element types are handled. * Compiles with array types. * Finish off AST serialization of remaining types. * Remove ComputedLayoutModifier and TupleVarModifier. * Add fields to ASTSerialClass type. * Construct AST type layout. * AST Serialization working for writing to ASTSerialWriter. * Removed call to ASTSerialization::selfTest in session creation. * Fixes for gcc. * Diagnostics handling - better handling of dashify. * Improve comment around DiagnosticLookup. * Updated VS project. * Write out as a Stream, taking into account alignment. * First pass at serializing in AST. * Added support for deserializing arrays. * Small bug fixes. * Fix problem calculating layout. Split out loading on entries. * Fix typo in AST conversion. * Add some flags to control AST dumping. * Fix bug from a typo. * Special case handling of Name* in AST serialization. * Special case handling of Token lexemes, make Names on read. * Documentation on AST serialization. * ASTSerialTestUtil - put AST testing functions. Fix typo that broke compilation. * Fix typo.	26 June 2020, 16:40:31 UTC
4cf7119	Yong He	26 June 2020, 02:56:39 UTC	Add a TODO comment for generic interface requirement key	26 June 2020, 02:56:39 UTC
dd88ba1	Yong He	26 June 2020, 02:03:51 UTC	Fixes	26 June 2020, 02:03:51 UTC
5b57195	Yong He	26 June 2020, 01:26:12 UTC	Fixes.	26 June 2020, 01:26:12 UTC
09c64ac	Yong He	25 June 2020, 23:29:39 UTC	Merge remote-tracking branch 'official/master' into dyndispatch2	25 June 2020, 23:29:39 UTC
218a39b	Yong He	25 June 2020, 23:29:07 UTC	remove ThisPointerDecoration, generate IRInterfaceType in one pass	25 June 2020, 23:29:07 UTC
509e36b	Yong He	25 June 2020, 21:01:33 UTC	Remove interfaceType operand from lookup_witness_method inst	25 June 2020, 21:01:33 UTC
892acc4	jsmall-nvidia	25 June 2020, 20:41:14 UTC	AST Serialize Reading (#1409) * Try to fix problem with C++ extractor concating tokens producing an erroneous result. * Improve naming/comments around C++ extractor fix. * Another small improvement around space concating when outputing token list. * Handle some more special cases for consecutive tokens for C++ extractor concat of tokens. * WIP AST serialization. * Comment out so compile works. * More work on AST serialization. * WIP AST serialize. * WIP AST Serialization - handling more types. * WIP: Compiles but not all types are converted, as not all List element types are handled. * Compiles with array types. * Finish off AST serialization of remaining types. * Remove ComputedLayoutModifier and TupleVarModifier. * Add fields to ASTSerialClass type. * Construct AST type layout. * AST Serialization working for writing to ASTSerialWriter. * Removed call to ASTSerialization::selfTest in session creation. * Fixes for gcc. * Diagnostics handling - better handling of dashify. * Improve comment around DiagnosticLookup. * Updated VS project. * Write out as a Stream, taking into account alignment. * First pass at serializing in AST. * Added support for deserializing arrays. * Small bug fixes. * Fix problem calculating layout. Split out loading on entries. * Fix typo in AST conversion.	25 June 2020, 20:41:14 UTC
a1fed5e	Yong He	25 June 2020, 20:19:45 UTC	Partial fixes to code review comments	25 June 2020, 20:23:28 UTC
ffa9a35	Yong He	25 June 2020, 01:09:40 UTC	Fix `lowerFuncType` and small bug fixes.	25 June 2020, 03:25:49 UTC
161c525	Yong He	24 June 2020, 21:22:52 UTC	Fixes.	25 June 2020, 01:10:15 UTC
0ca75fe	Yong He	24 June 2020, 20:16:11 UTC	Dynamic dispatch for generic interface requirements. -Lower interfaces into actual `IRInterfaceType` insts. -Lower `DeclRef<AssocTypeDecl>` into `IRAssociatedType` -Generate proper IRType for generic functions. -Add a test case exercising dynamic dispatching a generic static function through an associated type. -Bug fixes for the test case.	25 June 2020, 01:10:15 UTC
3fe4f53	Dietrich Geisler	24 June 2020, 21:22:58 UTC	Heterogeneous example (#1399) * Introduced heterogeneous example. Example includes C++ source and header files, and does not currently make use of the associated slang file when building. The intent of this commit is to introduce the example as a baseline for later updates as the heterogeneous model is expanded. * Changing namespace * Renamed and rewrote README * Updated example to account for compiler updates * Updated path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 June 2020, 21:22:58 UTC
ae41db8	jsmall-nvidia	24 June 2020, 17:56:06 UTC	AST Serialization writing (#1407) * Try to fix problem with C++ extractor concating tokens producing an erroneous result. * Improve naming/comments around C++ extractor fix. * Another small improvement around space concating when outputing token list. * Handle some more special cases for consecutive tokens for C++ extractor concat of tokens. * WIP AST serialization. * Comment out so compile works. * More work on AST serialization. * WIP AST serialize. * WIP AST Serialization - handling more types. * WIP: Compiles but not all types are converted, as not all List element types are handled. * Compiles with array types. * Finish off AST serialization of remaining types. * Remove ComputedLayoutModifier and TupleVarModifier. * Add fields to ASTSerialClass type. * Construct AST type layout. * AST Serialization working for writing to ASTSerialWriter. * Removed call to ASTSerialization::selfTest in session creation. * Fixes for gcc. * Diagnostics handling - better handling of dashify. * Improve comment around DiagnosticLookup. * Updated VS project.	24 June 2020, 17:56:06 UTC
b595dd0	Yong He	19 June 2020, 20:16:22 UTC	Merge pull request #1403 from tfoleyNV/struct-inheritance-and-interfaces Work on struct inheritance and interfaces	19 June 2020, 20:16:22 UTC
1988011	Tim Foley	19 June 2020, 18:50:15 UTC	fixup: review feedback	19 June 2020, 18:50:15 UTC

Newer
Older