Revision history - None - origin: https://github.com/shader-slang/slang

visit type:

Revision	Author	Date	Message	Commit Date
8a70e20	jsmall-nvidia	06 October 2020, 17:30:55 UTC	InterlockedExchangeU64 support on RWByteAddressBuffer (#1572) * #include an absolute path didn't work - because paths were taken to always be relative. * Added [__requiresNVAPI] to functions that need nvapi support. * Added support for InterlockedExchangeU64 Added exchange-int64-byte-address-buffer test Fixed typo in cas-int64-byte-address-buffer test * Improve comment around NVAPI usage in hlsl.meta.slang	06 October 2020, 17:30:55 UTC
b6ad8df	jsmall-nvidia	06 October 2020, 13:47:12 UTC	Added [__requiresNVAPI] to functions missing it (#1571) * #include an absolute path didn't work - because paths were taken to always be relative. * Added [__requiresNVAPI] to functions that need nvapi support.	06 October 2020, 13:47:12 UTC
41d8610	Tim Foley	05 October 2020, 18:10:53 UTC	Small fixes for CUDA code emit (#1564) * Small fixes for CUDA code emit * Add a CUDA translation to `GroupMemoryBarrierWithWaveSync()`. We map this to `__syncwarp()` for CUDA (with no mask, implying a full-warp sync). * Consistently use `SLANG_PRELUDE_ASSERT` for assertions introduced in code emit (rather than just using the bare `assert(...)` function, which is not included by our CUDA prelude by default) * Add a new `SLANG_CUDA_STRUCTURED_BUFFER_NO_COUNT` flag to the CUDA prelude that allows the `count` field to be omitted from `(RW)StructuredBuffer<T>`. This is a bit of a hacky because the computed layouts will still assume the `count` field is present, but this feature is required by at least one client application for now. A better long-term fix will take more time to design and implement. * fixup: CUDA prelude code fix for pedantic compilers Co-authored-by: Tim Foley <tim.foley.is@gmail.com> Co-authored-by: Yong He <yonghe@outlook.com>	05 October 2020, 18:10:53 UTC
d930c65	Yong He	05 October 2020, 17:30:27 UTC	Update the type of a call inst during specialization. (#1569)	05 October 2020, 17:30:27 UTC
3321df7	Yong He	04 October 2020, 08:40:58 UTC	Handle partial existential parameter type specialization. (#1568) * Specialize exsitentials parameters in struct fields. * Cleanup. * Handle partial existential parameter type specialization. Co-authored-by: Yong He <yhe@nvidia.com>	04 October 2020, 08:40:58 UTC
24ecd1f	Yong He	02 October 2020, 19:53:29 UTC	Use new vulkan debug layer. (#1566) * Use new vulkan debug layer. * Try use VK_LAYER_KHRONOS_validation when it exists. Co-authored-by: Tim Foley <tim.foley.is@gmail.com>	02 October 2020, 19:53:29 UTC
aadf600	Yong He	02 October 2020, 16:49:18 UTC	Specialize exsitentials parameters in struct fields. (#1565) * Specialize exsitentials parameters in struct fields. * Cleanup. Co-authored-by: Yong He <yhe@nvidia.com>	02 October 2020, 16:49:18 UTC
274c20a	jsmall-nvidia	30 September 2020, 17:28:56 UTC	Generalizing Serialization (#1563) * First pass at generalizing serializer. * Split out ReflectClassInfo * Use the general ReflectClassInfo * Fix some typos in debug generalized serialization. * Add calculation of classIds. Make distinct addCopy/add on SerialClasses. * Write up of more generalized serialization * WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types. * Improvements to SerialExtraObjects. Keep RefObjects in scope in factory * Compiles with Serial refactor - doesn't quite work yet. * First pass serialization appears to work with refector. * Split out type info for general slang types. * Split out slang-serialize-misc-type-info.h * DebugSerialData -> SerialSourecLocData DebugSerialReader -> SerialSourceLocReader DebugSerialWriter -> SerialSourceLocWriter * Remove unused template that only compiles on VS. * Fix warning around unused function on non-VS.	30 September 2020, 17:28:56 UTC
94d3f2b	Yong He	27 September 2020, 03:09:50 UTC	Add API for whole program compilation. (#1562) * Add API for whole program compilation. This change exposes a new target flag: `SLANG_TARGET_FLAG_GENERATE_WHOLE_PROGRAM` that can be set on a target with `spSetTargetFlags`. When this flag is set, `spCompile` function generates target code for the entire input module instead of just the specified entrypoints. The resulting code will include all the entrypoints defined in the input source. The resulting whole program code can be retrieved with two new functions: `spGetTargetCodeBlob` and `spGetTargetHostCallable`. This change also cleans up the unnecessary `entryPointIndices` parameter of `TargetProgram::getOrCreateWholeProgramResult`, and modifies the `cpu-hello-world` example to make use of the new whole-program compilation API to simplify its logic. * Update comments.	27 September 2020, 03:09:50 UTC
b72353e	Yong He	24 September 2020, 21:30:12 UTC	Enable default cpp prelude. (#1560) * Enable default cpp prelude. * Print the "#include" line as a normal source if the file does not exist. * Bug fix * Fix. * Fix c++ prelude header. * Remove unnecessary fopen call.	24 September 2020, 21:30:12 UTC
150218b	Tim Foley	24 September 2020, 20:09:40 UTC	Refactor preprocessor API to avoid coupling (#1559) Based on review feedback from #1556, this change updates the Slang preprocessor so that it is no longer coupled to policy details from higher levels of the software stack. In particular, the preprocessor used to: * Deal with updating the list of file paths that a `Module` depends on. * (As of #1556) detect NVAPI-related macro definitions and use them to construct an AST-level `Modifier` attached to the `ModuleDecl`. This change introduces a callback interface where the `Preprocessor` calls out to a `PreprocessorHandler` at certain points during execution, allowing the handler to introduce custom logic that suits a particular high-level use case. This change also removes the dependence of the preprocessor on the `Linkage`, because in practice only a small number of its sub-objects were needed. As a convenience, a wrapper function that takes a `Linkage` was left in place so that the existing call sites didn't have to change very much.	24 September 2020, 20:09:40 UTC
fd2ac53	Tim Foley	24 September 2020, 03:49:35 UTC	Fix GLSL output for byte-address loads of vectors (#1558) While working on #1557, it became clear that something was going wrong when using `*ByteAddressBuffer.Load<T>` to load a vector type on GLSL/SPIR-V targets. The root problem was that the IR-level layout logic (which computes the "natural" layout of a type) had not yet been extended to handle vectors. The fix is simple enough, but it highlights the fact that we probably need to go ahead and "complete" that layout logic sooner or later. This change includes a test case that covers the behavior added here, as well as the case that #1557 fixes. Unfortunately, due to CI system limitations, the HLSL/dxc part of the test is not yet enabled.	24 September 2020, 03:49:35 UTC
8954052	Tim Foley	23 September 2020, 22:47:14 UTC	Simplify workflow when using NVAPI (#1556) In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code generated by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI and they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come before the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).	23 September 2020, 22:47:14 UTC
3d063a7	Tim Foley	23 September 2020, 19:24:16 UTC	Fix a bug around byte-address buffer loads of vectors (#1557) This problem is only visible when * using `RWByteAddressBuffer.Load<V>`, * when `V` is `vector<T,N>`, * and `V` is a non-32-bit type In such a case, the Slang compiler generates output HLSL like: someBuffer.Load<vector<T,N>>(someOffset); and dxc balks because it fails to parse the `>>` as closing the generics, and instead parses it as a shift operator. The solution here is simple: add a space before the closing `>` when emitting a `.Load<T>` operation. Note that this change does not come with a test fix yet because writing the test case exposed a more complicated issue with GLSL codegen for this same scenario. This change includes the simple single-byte fix, to unblock users while we work on a fix for the GLSL case (and that fix will include the test coverage).	23 September 2020, 19:24:16 UTC
a5b0cde	jsmall-nvidia	21 September 2020, 19:45:35 UTC	Allow #include of absolute paths (#1555) * #include an absolute path didn't work - because paths were taken to always be relative. * Improve comments. * Small comment improvement.	21 September 2020, 19:45:35 UTC
83514bd	Yong He	21 September 2020, 15:27:10 UTC	Enable all dynamic dispatch tests on CUDA. (#1552) * Enable all dynamic dispatch tests on CUDA. * Fix expected cross-compile test results.	21 September 2020, 15:27:10 UTC
21339e8	jsmall-nvidia	18 September 2020, 17:35:45 UTC	Serialization fixes based on review of #1547 (#1551) * Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux. * Fixes that came out of review * TranslationUnit -> Module * TargetModule -> TargetComponent * PAYLOAD_MIN_ALIGNMENT -> kPayloadMinAlignment	18 September 2020, 17:35:45 UTC
9a6eec6	jsmall-nvidia	18 September 2020, 15:02:06 UTC	Control container serialization with SerialOptionFlags (#1550) * Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux.	18 September 2020, 15:02:06 UTC
2ddca33	Yong He	17 September 2020, 23:46:23 UTC	Initial attempt to enable CUDA dynamic dispatch codegen (#1549) * Front-load cuda module loading to fill in RTTI pointers. * Enable dynamic dispatch codegen for CUDA.	17 September 2020, 23:46:23 UTC
017acb3	Yong He	17 September 2020, 22:07:08 UTC	Front-load cuda module loading to fill in RTTI pointers. (#1548) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	17 September 2020, 22:07:08 UTC
b9cddcb	jsmall-nvidia	17 September 2020, 20:47:57 UTC	Share debug information between AST and IR (#1547) * Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Typo fix for Clang/Linux. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	17 September 2020, 20:47:57 UTC
bbf492a	Tim Foley	17 September 2020, 19:13:50 UTC	Embed default prelude for CUDA (#1546) * Embed default prelude for CUDA Slang supports the notion of a "prelude" that gets prepended to the source code we generate in language. For some targets, a prelude is not necessary (e.g., we compile to HLSL/GLSL and then on to DXBC/DXIL/SPIR-V just fine without a prelude), but some targets have been implemented in a way that makes a prelude necessary (notably CPU and CUDA). For the targets that require a prelude, the Slang codebase includes usable preludes under the `prelude/` directory. Prior to this change, if a user was compiling for such a target (whether via command-line or API), there had to take responsibility for specifying the prelude to use (usually by passing in the contents of the prelude file(s) already included in the Slang distribution). It is reasonable for a user to expect an out-of-the-box experience where compilation to CUDA PTX or native CPU code should Just Work, similarly to how compilation to SPIR-V Just Works. This change is a step in the direction of providing a user experiene that Just Works for common cases. The main addition here is a tool called `slang-embed` that we run during our build to turn the `prelude/.h` files into `prelude/.h.cpp` files that embed the contents of the original `.h` file as a `const` variable. By compiling and linking in the generated `.h.cpp` file for the CUDA prelude, we are then able to set the default prelude to use for CUDA at the time a session/linkage is created. That default prelude will be used unless the user manually specifies their own prelude (which current users of the CUDA back-end must be doing). This change only sets up a default prelude for CUDA because of the way that the CPU prelude is split across multiple files. A strategy that provides a good default prelude for CPU may take more work, but that work might also be unnecessary if we switch to a strategy of using LLVM to generate native code. The implementation of the `slang-embed` tool is intentionally simple, and it will likely run into issues if/when we need to embed binary files or larger text files. The assumption being made here is that we can address those issues when they arise, and there is no reason to over-engineer the tool right now. The way that `slang-embed` is integrated into our build process is likely to require some iteration to make sure that it works across all platforms. I expect that this change will have multiple follow-up fixes related to trying to get the build to work as expected across all targets on CI. * fixup: trying to ensure that embedded prelude gets compiled into slang * fixup: properly clean up allocations in slang-embed * fixup: fix double free introduced by previous change * fixup: off-by-one allocation error	17 September 2020, 19:13:50 UTC
0124bf2	Tim Foley	17 September 2020, 16:54:49 UTC	Fix an issue with double-counting uniform data for CUDA/CPU (#1545) The `SimpleScopeLayoutBuilder` helper that is used to build up binding information for entry-point parameter lists has logic to try to support both explicit and implicit binding of parameters. This logic was added as part of supporting dual-source color blending on Vulkan. The basic approach is similar to that used for the global scope, where parameters with explicit binding first "carve out" the ranges they claim via a `UsedRangeSet`, and then parameters without explicit binding allocate space from what is left. The logic is (seemingly by accident) also applied to uniform/ordinary data, which creates a problem because the `ScopeLayoutBuilder` base type is also responsible for computing a layout for uniform/ordinary data that is 100% implicit (while dealing with all the relevant alignment restrictions). That logic goes on to add the computed uniform/ordinary resource usage to the computed type layout, but because such a layout has already been computed (albeit without taking alignment into account), the result is that the uniform/ordinary usage is reported at approximately double what it should be. The fix here is to skip uniform/ordinary resource usage when doing the explicit/implicit dance in `SimpleScopeLayoutBuilder`. This approach means that explicit bindings on entry-point `uniform` parameters will only apply to resources (which matches our rules for the global scope, where we don't allow for explicit binding on uniform/ordinary parameters). This is appropriate since the only reason we are supported explicit layout at all is for dual-source color blending (in general, we only support explicit `register` and `[[vk::binding(...)]]` modifiers on global parameters; users are stuck with our computed layouts in all other cases. Co-authored-by: Yong He <yonghe@outlook.com>	17 September 2020, 16:54:49 UTC
3e2cb34	Tim Foley	17 September 2020, 00:45:07 UTC	Fix some issues around dim3 for CUDA (#1544) The logic we use to compute `SV_DispatchThreadID` and friends for CUDA makes use of the `gridDim` and `blockDim` built-in variables in CUDA. These variables have type `dim3` which is similar to `uint3` but is considered a distinct type for some reason. The logic for computing the `SV`s currently pretends that `gridDim` and `blockDim` are `uint3`s, and this means that the code they emit doesn't always compile cleanly (although it does in our existing test cases...). This change adds a few overloads that work on `dim3` to the CUDA prelude and that seem to make the code we emit work again. Note: This change should be seen as a somewhat hacky quick fix rather than a real resolution to the underlying issue. It is probably better if we emit code that replaces uses of `gridDim` with `uint3(gridDim.x, gridDim.y, gridDim.z)` instead, to ensure that we get the typing correct, even if the result looks less idiomatic.	17 September 2020, 00:45:07 UTC
8dd0d26	Tim Foley	16 September 2020, 21:19:39 UTC	Search for multiple NVRTC versions (#1543) * Search for multiple NVRTC versions The main change here is that when locating the NVRTC compiler we try multiple library names and take the first one that loads successfully (with an ordering that means we try newer versions before older ones). In order to support this change, I needed to fix the wrapping logic that invokes the downstream compiler "locator" function, so that it does not report every failed dynamic library load as an error diagnostic (leading to compilation failure), but instead only reports such failures once the locator has reported failure. The form of the diagnostic output for failures is also changed, in that we now report a single umbrella error about failing to load a downstream compiler, and then report the actuall dynamic library load failures as notes on that diagnostic instead of errors of their own. This choice seems appropriate since for cases like NVRTC it is not the case that each failed library load is a compilation error. We only need one of the listed libraries to be loadable, so that reporting them all as errors risks confusing users. One wrinkle that arose during testing is that the 11.0 release of NVRTC dropped support for the `compute_30` target, which had previously been the minimum and default. I had to add logic to check for versions of 11 or greater and switch to `compute_35` as the default. Similar changes may be required as part of supporting newer NVRTC versions if support for more architectures gets deprecated and removed. A more complete implementation of this logic might try to load multiple NVRTC versions such that the Slang compiler can identify a suitable compiler based on the minimum feature level that code actually requires. That kind of cleanup is left as future work, since for most users the current approach will be sufficient. * testing: use verbose mode for running tests by default * fixup: guard against null diagnostic sink	16 September 2020, 21:19:39 UTC
0305099	Yong He	14 September 2020, 19:26:54 UTC	Support shader parameters that are an array of existential type. (#1542) * Support shader parameters that are an array of existential type. * Rename to getFirstNonExistentialValueCategory Co-authored-by: Yong He <yhe@nvidia.com>	14 September 2020, 19:26:54 UTC
2e3688a	Yong He	14 September 2020, 17:28:22 UTC	Dynamic dispatch bug fixes. (#1541) Co-authored-by: Yong He <yhe@nvidia.com>	14 September 2020, 17:28:22 UTC
5461671	Tim Foley	14 September 2020, 15:42:41 UTC	Change the layout we compute/store for parameter groups (#1540) The type layouts we store for parameter group types (`ConstantBuffer<T>` and `ParameterBlock<T>`) has a somewhat complicated internal structure, which we've slowly built up and evolved as we learned more about what was actually needed: * There is the outer `ParameterGroupTypeLayout` that represents the whole type (e.g., the `ParameterBlock<Something>`), and has resource usage/bindings based on what needs to be accounted for by anything (like a program) that uses the type. E.g., for a parameter block on Vulkan, the resource usage at this level would usually just be a single descriptor `set`. * There is the inner "element" layout, which represents the layout for the `Something` in `ParameterBlock<Something>`. This was initially just stored as a type layout (and an extra type layout is stored for backwards compatibility), but we later realized we needed to store a `VarLayout` for the element, to deal with the fact that it might have a non-zero offset. * Finally, there is the inner "container" layout, which represents the resource usage/bindings that are introduced by the block/buffer/group itself. In the case of a `ParameterBlock<Something>` this would include any "default" constant buffer that is needed in order to store the uniform/ordinary data from type `Something`. On targets like Vulkan and D3D12 such a buffer would no show up as part of the resource usage of the overall `ParameterBlock`, nor would it be expected to show as part of the "element" layout. The above is just setting the stage so that we can cover the design choice that this change centers around: for each of the above layouts, what should the type stored with that layout be? The answers seem simple at first: * The type for the outer `ParameterGroupTypeLayout` should clearly be the whole type (e.g., the `ParameterBlock<Something>`) * The type for the inner "element" layout should clearly be the element type (e.g., the `Something` in `ParameterBlock<Something>`) * The type for the inner "container" layout should be... hmm... That last question is the thorny one. There are two main options, each with trade-offs: 1. What is being done in the code before this change is to store the whole type (e.g., `ParameterGroup<Something>`) as the type of the "container" layout. This makes some superficial sense (the type of the container should be a container type). 2. What this change switches to is the type of the "container" layout being null (it could equivalently be any sentinel that represents the absence of a meaningful type). While option (1) seems like it would make sense, it risks creating an infinite regress for client application code. If they have a recursive routine that walks the Slang reflection hierarchy, then it will probably key off of the kind of each type it visits. Such a recursive walk would end up trying to treat the outer layout and the inner "container" layout equivalently, when they aren't really representing the same concepts. Even if it seems like this approach defendings against null-pointer crashes in client code, it really only delays them, since the inner "container" layout would yield a null type layout when asked for its element layout. In contrast, option (2) more accurately reflects the reality that the container layout is a `VarLayout` and `TypeLayout` that correspond to no variable/type in practice. Clients of the Slang reflection API already have to deal with `VarLayout`s that have no variable, so it is reasonable for them to deal with `TypeLayout`s that have no type. While the above statements may sound strange, it really comes down to the fact that a "type layout" is really just a way of encoding the "size" of something (where size can encapsulate all the different kinds of resources something can consume on our various targets), and a "variable layout" is really just a way of encoding the "offset" of something (again, where there can be different offsets per consumable resource). In that light, it makes sense that the "container" layout for a parameter group is really just a way of representing the resource allocation of the container itself, and is not associated with any type or variable. This change is technically a breaking change for clients of the reflection API, so it will need to be rolled out with an appropriate change to our version number.	14 September 2020, 15:42:41 UTC
459e788	Tim Foley	11 September 2020, 20:21:10 UTC	Remove some "do what I mean" logic from reflection API (#1539) The reflection API had a bit of DWIM (Do What I Mean) logic in that a client could query the resource usage/bindings of a `ParameterBlock<X>` and see not only the register `space` or descriptor `set` for the block itself, but also the constant buffer `register` or `binding` for its default constant buffer (if any). The reason for this behavior was that there was existing client code in Falcor that relied on that behavior for parameter blocks, and even after changing the way that parameter block layouts were computed and stored we sought to maintain backwards compatibility with that client code. The trouble is that the weird behavior then goes on to cause confusion for other clients of the Slang reflection API. This change removes the special-case logic, and fixes up our reflection tests to mirror the new (correct) information that we return. When this change is released, it will be a breaking change for any client code that still relies on the old behavior. We will need to coordinate with client application developers to fix their reflection logic. Note that all the same information can still be accessed, simply by using new reflection API that we have added.	11 September 2020, 20:21:10 UTC
e5b796d	Yong He	10 September 2020, 22:10:11 UTC	Allow existential types in `StructuredBuffer` element type. (#1536) * Allow existential types in `StructuredBuffer` element type. * Handle StructuredBuffer.Load/.Consume methods * Clean up unnecessary changes * Code cleanup * Update test comment	10 September 2020, 22:10:11 UTC
d6a2d29	Tim Foley	10 September 2020, 21:12:43 UTC	Add a pass to support resource return values (#1537) A long-standing problem for the Slang implementation has been that some targets (notably GLSL/SPIR-V) do not support treating resources (textures, buffers, samplers, etc.) as first-class types. Resource types on such platforms are restricted so that they may not be used as the type of: 1. fields of aggregate types (`struct`s) 2. local variables 3. function results or `out`/`inout` parameters Issue (1) is handled by our "type legalization" pass today, by splitting aggregates that contain resources into separate fields/variables/parameters. Issue (2) is worked around by putting code into SSA form and promoting local variables to SSA temporaries when possible; the net result is that many local variables of texture type are eliminated (that pass is not perfect, though, and it is possible for users to get errors when it doesn't fully clean up local variables of texture type). Issue (3) is a much more complicated matter, and it is what this change is concerned with. A typical solution to issue (3) is to simply inline all of the code in a program, at which point function results and `out`/`inout` parameters will no longer exist to cause problems. We reject such solutions for two reasons. First, there are limitations on control-flow structure in HLSL/GLSL/SPIR-V that mean they cannot express certain programs after inlining has been performed. Second, and more importantly, the philosophy of the Slang compiler is to perform as little duplication of code as possible, so that we do not accidentally contribute to binary size bloat. Instead, this change tackles the problem of functions that output resource types by adding a new specialization pass. The pass detects functions that ought to be specialized (because they have resource-type outputs), and inspects their bodies to see if the values they output have a predicatable structure that can be replicated outside of the function body. The same logic that inspects the function body also rewrites (a copy of) the function to not have the offending outputs. Finally, all the call sites to a function that is rewritten in this way also get rewritten so that instead of using output values from the function itself, they reproduce the expected output value(s) in their own code. The pass as presented here is intentionally limited in the scope of what it can optimize away (and the test case only touches on that specific functionality). The goal is to get a basic version of this pass in place and evaluated, and then to expand on its functionality incrementally over time.	10 September 2020, 21:12:43 UTC
3a34db6	jsmall-nvidia	08 September 2020, 18:48:05 UTC	Test if blob is returned. (#1535)	08 September 2020, 18:48:05 UTC
8740252	jsmall-nvidia	04 September 2020, 21:15:00 UTC	Use glslang linux binaries build on TC. (#1534)	04 September 2020, 21:15:00 UTC
8f96289	Yong He	04 September 2020, 17:18:44 UTC	Allow mixing unspecialized and specialized existential parameters. (#1533) * Allow mixing unspecialized and specialized existential parameters. * Fixes.	04 September 2020, 17:18:44 UTC
5e10f1b	Tim Foley	04 September 2020, 06:00:53 UTC	Fix a crashing issue for non-end-to-end compilation (#1532) The refactorings that added support for multiple entry points in an output file seemingly introduced a regression such that we crash on compilation that is not "end-to-end." Unfortunately, all of our testing only covers end-to-end compilation, and many users only use that mode. I've added a fix for the issue I ran into, but I haven't addressed the testing gap in this change. Without adding testing for non-end-to-end compilation, I expect further regressions to slip in over time. Co-authored-by: Yong He <yonghe@outlook.com>	04 September 2020, 06:00:53 UTC
5534b0d	Yong He	03 September 2020, 16:35:48 UTC	Rework type layout for ExistentialSpecializedType (#1531)	03 September 2020, 16:35:48 UTC
44929d9	Yong He	02 September 2020, 20:32:53 UTC	[slang-cpp-extractor] Don't modify generated source if there is no change. (#1530)	02 September 2020, 20:32:53 UTC
a2a7c4d	Yong He	02 September 2020, 19:21:28 UTC	Allow unspecialized existential shader parameters (dynamic dispatch). (#1529) * Allow unspecialized existential shader parameters (dynamic dispatch). * Fixes. * Fixes * disable cuda test	02 September 2020, 19:21:28 UTC
7f567df	Tim Foley	02 September 2020, 16:51:25 UTC	Add support for (undocumented) HLSL 16-bit bit-cast ops (#1528) As of SM 6.2, the dxc compiler added support for a set of 16-bit bit-cast operations to mirror the `asuint`, `asfloat`, and `asint` operations that were provided for 32-bit scalar types. These operations are not publicly documented, so we didn't think to add them. It should be noted that there was already a similar operation in HLSL, called `f32tof16`, that took as input a `float` and then packed a half-precision version of it into the low bits of a `uint`. The problem is that using that operation for `half`->`uint16_t` conversion required a round trip through a `float`, and downstream compilers seemingly can't optimize away that conversion. This change adds the new operations along with a test that tries to make use of them to ensure the results are what is expected. There are enough cases to cover that I had to write the test in a way where each thread only writes out a subset of the required output. There are two other changes here are that are not directly related to the main feature: First, it seems like the `[__forceInlineEarly]` attribute on some of these overloads interacts poorly with generics, and results in an `IRVectorType` appearing at local scope in the output code. That is semantically reasonable given our IR model, but it would ideally be something that gets eliminated as a result of deduplication of types. For now I've introduced a slight hack to make types always get inlined into their use sites during emission, which should handle the case of locally-defined types. I'm not 100% happy with that solution, but it seemed better than introducing a bunch of unrelated fixes into this PR. Second, the way that conversion operations were being declared for matrix types seems to have been incorrect: we had a single explicit initializer added to matrix types via an `extension` that allowed them to be initialized from other matrix types with the same size and any element type. In order to support implicit conversions of matrix types, I cribbed the code we were already using to introduce implicit conversion operations for vector types.	02 September 2020, 16:51:25 UTC
c2873f4	Tim Foley	02 September 2020, 01:37:54 UTC	Mark f32tof16 and f16tof32 as HLSL intrinsics (#1526) Fixes GitLab issue 85 These functions are intrinsic for HLSL, but were not marked as such, leading to emitting code that manually loops for the vector case. The looping code resulted in lower performance for some users, because apparently dxc was unable (or unwilling?) to unroll the loop, and ended up generating temporary ("stack-allocated") arrays for the vectors produced. As a longer-term solution, we may need to consider how the `VECTOR_MAP...` and `MATRIX_MAP...` idioms used in the stdlib get lowered, so that we can emit fully-unrolled versions in cases where the vector/matrix shape is known at the time we generate code. This PR does not attempt to address that larger issue.	02 September 2020, 01:37:54 UTC
5c56479	Yong He	01 September 2020, 20:47:26 UTC	Support dynamic existential shader parameters in render-test (#1525) * Support dynamic existential shader parameters in render-test * Fix linux build error. * Fixes. * Fix code review issues. * Fix gcc error. * More fixes. * More fixes.	01 September 2020, 20:47:26 UTC
69025ad	jsmall-nvidia	31 August 2020, 17:02:55 UTC	AST Serialization in Modules (#1524) * First pass at filter for AST serial writing. * Serialization of AST for modules. * Removed some commented out source. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	31 August 2020, 17:02:55 UTC
baa789e	Yong He	28 August 2020, 21:56:53 UTC	Add OrderedDictionary to core. (#1523)	28 August 2020, 21:56:53 UTC
9a2a35f	Tim Foley	28 August 2020, 18:21:40 UTC	Avoid nondeterministic ordering of output (#1522) Most people agree that it is a Good Thing when compilers are deterministic: the exact same input bits produce the exact same output bits every time the compiler is run. Bonus points are awarded if the results are independent of the platform the compiler was compiled for and run on. One of the easiest kinds of nondeterminism to have sneak into a compiler is for it to produce the "same" code inside functions, but sometimes emits functions or other global symbols in a different order from run to run. Right now, the Slang compiler has some of this kind of nondeterminism. The main way (but not necessarily the only way) that a compiler ends up producing output with a different ordering across runs is by iterating over the contents of a hash-based container (in our codebase, a `Dictionary` or `HashSet`), where the keys make use of pointers. Most operating systems intentionally try to randomize the address space of processes across runs (as a security feature), so that exact pointer values are not stable across runs, and thus hash value are not stable across runs, and thus the ordering of entries is not stable across runs. This change identifies a few cases of iterating over dictionaries or sets that could have produced output non-determinism: * The `HLSLIntrinsicSet` was using a `Dictionary` to store intrinsics that had been referenced, and would later produce a linear list of those intrinsics based on their order in the dictionary. * The `WitnessTable`s produced by the front-end stored a `Dictionary` or requirements, and lowering from AST->IR was iterating over that dictionary to ensure that everythign got emitted. * The `SharedSemanticsContext` was tracking a `HashSet` of modules that were imported into scope (so that their `extension`s should be visible), and an iteration over that list was used when producing candidate extensions during lookup. This case is unlikely to cause any nondeterminism in final output, but could lead to nondeterministic ordering in diagnostic messages for ambiguous reference/overload cases. * The IR linker maintains a `Dictionary` of symbols based on their mangled names, and iterates over it in code that clones all witness tables into the linked IR whether or not they are referenced. For most of these cases the fix is simple: * Keep both a `Dictionary`/`HashSet` and a `List` of the appropriate type * Whenever adding to the hash-based container also add to the list * Whenever iterating, iterate over the list In the final case of the IR linker, the relevant code was marked with a `TODO` comment noting that it shouldn't actually be needed, so I simply dropped it and the change doesn't seem to break any of our tests. I've been fairly confident that code wasn't needed for a while. This change isn't exactly elegant, and a better long term solution might be to introduce two new types, `OrderedDictionary` and `OrderedSet`, which are similar to `Dictionary` and `HashSet` except that they guarantee a deterministic order of enumeration of their contents, based on insertion order. (Note that a `SortedDictionary` and/or `SortedSet` that use something like a binary tree to produce a "determinsitc" sorted order wouldn't actually help here, because sorting entries by pointer values wouldn't solve the underlying problem that the pointer values aren't stable across runs) I've chosen to avoid adding new types to `core` in the interest of making the change as small as possible. If we all agree that new types are warranted, it should be easy to clean up these use cases. Testing this change is difficult, because we can't produce a reliable test to rule out nondeterminism. I have done best-effort testing by hand by crafting shaders that show output nondeterminism, and then compiling them both with and without these changes.	28 August 2020, 18:21:40 UTC
ab5b0a7	Yong He	28 August 2020, 16:04:55 UTC	Enable lower-generics pass universally. (#1518) * Enable lower-generics pass universally. * Exclude builtin interfaces and functions from lower-generics pass. * Update stdlib. * Fixup. * Fixes handling of nested intrinsic generic functions. * Fixes. * Fixes.	28 August 2020, 16:04:55 UTC
e9bf8de	Tim Foley	27 August 2020, 20:49:00 UTC	Enable simple extensions of interface types (#1521) The big picture here is that an `extension` can now apply to an interface type and provide convenience methods for all types that implement that interface. Suppose you have an interface for counters: interface ICounter { [mutating] void add(int val); } and a type that implements it: struct SimpleCounter : ICounter { int _state = 0; ... } If a common operation in your codebase is to increment a counter by adding one, you would be faced with the problem of either: * Add the `increment()` operation to `ICounter`, and force every implementation to implement the new requirement * Add the `increment()` operation to concrete counter types as needed, and thus not be able to use it in generic code * Make `increment()` a global ("free") function, and force clients of counters to have to know which operations use member syntax (`c.add(...)`) and which use global function call syntax (`increment(c)`). The whole idea of `extension`s is to allow for another option that is better than all of the above: extension ICounter { [mutating] void increment() { this.add(1); } } The core of the implementation is relatively straightforward, and consists of two complementary pieces. The first piece is that when emitting a concrete IR entity (function/type/whatever) we treat any enclosing `interface` type (or `extension` thereof) a bit like an enclosing `GenericDecl`, and introduce an `IRGeneric` to wrap things. The generic `IRGeneric` has parameters representing the `This` type for the interface, along with the witness table that shows how `This` conforms to the interface itself. We thus end up with an IR version of `increment()` something like: void increment<This : ICounter>(This this) { this.add(1); } The second (complementary) fix is that when there is code that references this `increment()` operation, we don't treat it like an interface requirement (look up based on its key), and instead treat it like a generic (since that is how it is lowered now) and speciaize it to the information we can glean from the `ThisTypeSubstitution`. A related fix that is required here is that within the body of `increment`, when we perform `this.add`, we need to ensure that the lookup of `add` in the base interface properly takes into account the subtype relationship (`This : ICounter`) and encodes it into the lookup result, so that we get `((ICounter) this).add`, and properly generate code that looks up the `add` method in the witness table for `This`.	27 August 2020, 20:49:00 UTC
bc0d0f9	Tim Foley	27 August 2020, 17:46:50 UTC	Clean up the way that lookup "through" a base type is encoded (#1519) * Clean up the way that lookup "through" a base type is encoded In order to undestand this change, it is important to undestand how lookup through base interfaces works prior to this change. In order to understand that it helps to be reminded of how inheritance relationships get encoded in the AST. Suppose the user writes: struct Base { int val; } struct Derived : Base { ... } ... Derived d = ...; int v = d.val; The question is how an expression like `d.val` gets semantically checked, and how it is encoded into the IR after semantic checking. You might assume it gets checked and encoded so that we end up with: int v = ((Base) d).val; and that seems like it should Just Work... so of course that isn't what Slang has been doing. Instead, we relied on the fact that the inheritance relationship `Derived : Base` is represented as an `InheritanceDecl` member of the `Derived` type, and we ended up checking the code into something like: int v = d.<anonymous>.val; where `<anonymous>` stands in for the name of the `InheritanceDecl` that represents inheritance from `Base`. This design choice makes a limited amount of sense when you consider how inheritance would typically be lowered to a C-like output language: // struct Derived : Base { ... } // => struct Derived { Base base; ... } The problem with that encoding is that it really doesn't make sense for almost any other scenario. In particular, if you have a generic type parameter `T` that was constrianed with `T : ISomething`, then the constraint isn't even technically a member of the type parameter `T`, so expressing thing as a member reference in the AST is completely incorrect. Unfortunately, by the time it was clear that we needed something better, a bunch of implementation work was done based on the existing representation. This change tries to clean things up so that lookup of a super-type member through a value of a sub-type does the obvious thing: cast the value to the super-type and then look up the member (as in `((Base) d).val`). The core of the change is that in lookup, instead of creating `Constraint` breadcrumbs whenever we are looking up in a super-type (with a reference to the `TypeConstraintDecl` being used) we instead use `SuperType` breadcrumbs (with a reference to a `SubtypeWitness`). Then when we create the expression from a `LookupResultItem`, we translate any `SuperType` breadcrumbs into `CastToSuperTypeExpr`s (an expression type that already existed). This change also adds support for lookup through the `This` type in the context of an interface, and in order for that to work we need a new kind of subtype witness to represent the knowledge that a `This` type is a subtype of the enclosing interface. Making that work forces us to change the representation of `TransitiveSubtypeWitness` so that it takes a pair of subtype witnesses (and not one subtype witness plus one `TypeConstraintDecl`). For the most part this is a small change, but it raises the possibility that some pieces of the code aren't going to be robust against all possible shapes of subtype witnesses. The IR lowering logic has relied on the weird `d.<anonymous>` representation in order to ensure that when looking up interface members we weren't always casting to the interface type (which would create a `makeExistential` instruction), and then calling using that. Basically, the IR lowering would ignore the `d.<anonymous>` part and just emit `d`, but we can't do that for `((Base) d)` or `((IThing) d)` because whehter or not we should actually perform the cast depends on context. For now we solve that problem by adding specific logic to ignore up-casts to interface types when they appear in member expressions or method calls. A more robust solution might be needed down the line, but this seems to work in practice. All of this work is cleanup that I found was needed in order to make `extension`s of `interface` types workable. * fixup: disable an incorrect test	27 August 2020, 17:46:50 UTC
c936433	jsmall-nvidia	27 August 2020, 15:49:20 UTC	Allow use of lib6_6 profile for DXC compilation. (#1520)	27 August 2020, 15:49:20 UTC
3cbff11	jsmall-nvidia	27 August 2020, 14:03:06 UTC	Removed use of pthreads in glslang (#1517) * Try removing pthreads from glslang. * Update slang-binaries to use glslang that doesn't use pthreads. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	27 August 2020, 14:03:06 UTC
ec55ac4	Yong He	26 August 2020, 22:05:59 UTC	Reorder existential tuple elements. (#1516) Co-authored-by: Tim Foley <tim.foley.is@gmail.com>	26 August 2020, 22:05:59 UTC
2dc1f89	jsmall-nvidia	26 August 2020, 18:38:24 UTC	Added more Atomic support for int64 types on RWByteAddressBuffer (#1515) * Support for more 64 bit atomics on ByteAddressBuffer. * min max 64bit test. * Disable CUDA version of min max 64 bit test - as produces the wrong output. * Update target-compatibility.md with added 64 bit atomics. Co-authored-by: Yong He <yonghe@outlook.com>	26 August 2020, 18:38:24 UTC
b8702df	Yong He	26 August 2020, 04:55:05 UTC	Export witness table and RTTI objects in compiled libraries. (#1514) * Export witness table objects in compiled code. - Ensure that witness tables are preceeded with `extern "C"` modifier in the generated C++ code. - RTTI objects use the mangled name of the type directly, so that can be queried using the type's mangled name directly from the resulting DLL. - Expose `Linkage::getTypeConformanceWitnessMangledName` to return the mangled name of witness tables to the host. - Ensure that all witness tables (including those for associated types) have proper mangled name. * Fix GCC error in Slang generated code.	26 August 2020, 04:55:05 UTC
4804753	jsmall-nvidia	24 August 2020, 19:23:40 UTC	RWByteAddressBuffer::InterlockedCompareExchangeU64 (#1513) * First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md. * Fixed NVAPI working on D3D12. * Test for specific NVAPI features. * Remove requiredFeatures from Renderer::Desc as was ignored. Tried to document more around nvapiExtnSlot. * Readded requiredFeatures to Renderer::Desc * Improve comments in the tests. * Rename Fp32 -> F32 Added cas-int64-byte-address-buffer.slang test Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 August 2020, 19:23:40 UTC
67ca549	jsmall-nvidia	24 August 2020, 18:26:27 UTC	NVAPI improvements (#1512) * First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md. * Fixed NVAPI working on D3D12. * Test for specific NVAPI features. * Remove requiredFeatures from Renderer::Desc as was ignored. Tried to document more around nvapiExtnSlot. * Readded requiredFeatures to Renderer::Desc * Improve comments in the tests.	24 August 2020, 18:26:27 UTC
fcac02e	jsmall-nvidia	21 August 2020, 20:04:42 UTC	Vulkan update/NVAPI support (#1511) * First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md.	21 August 2020, 20:04:42 UTC
49067fd	Tim Foley	21 August 2020, 16:47:29 UTC	Fix stdlib declarations for texture Gather() (#1510) Fixes #1507 These operations were failing to take into account the way that array textures require an extra coordinate to be passed in for the primary location (but not the additional offsets). Adding `isArray` to the component count is the existing solution used for similar intrinsics elsewhere in the stdlib, and it is adopted here. Because our test framework isn't really set up to do a lot of texture testing (including having no support for texture arrays), the test added here is just a cross-compilation test that compares output with fxc for comparable input.	21 August 2020, 16:47:29 UTC
5e64ae7	Tim Foley	21 August 2020, 14:33:43 UTC	Another fix for overriding property decls (#1509) * Another fix for overriding property decls The central problem we keep running into with `property` decls in `interface`s comes down to two choices: 1. When a member lookup `obj.someName` or a simple lookup for `someName` produces an overloaded result, we make no attempt to resolve the overloading right away, and instead postpone disambiguation until the point where that expression gets used, in case the context where it gets used can help in disambiguation (a notable case being when there is a call expression `obj.someName(...)` or `someName(...)`). 2. When looking up members in a the scope of a type (either for `obj.someName` or `someName` in the context of a method), we include all results from base types in the set of overloads returned, even in cases where the type has a direct member that "overrides" the inherited one. The combination of these factors means that when a `struct` type implements a `property` to satisfy a requirement of an inherited `interface`, then references to `obj.someProp` end up being ambiguous between the property in the concrete `struct` type and the property it inherits through the `interface`. There is no quick fix possible for issue (2). It might seem that we could just skip over members inherited through `interface`s when doing lookup in a type, but that solution wouldn't apply to inheritance from another `struct` type, or any future scenario where we support default implementations of methods in interfaces. The simple idea of saying that a derived-type member named `M` hides all inherited members named `M` is possible, but would lead to a bad user interface when a type wants to support both a core "bottleneck" method and a bunch of convenience overloads with the same name. That leaves us with issue (1), and trying to find a reasonable fix for it. The common case is that any expression `e` eventually gets used in a context where it will be be subject to disambiguation: * If we form a call expression `e(...)`, then the overload resolution logic will (obviously) work to disambiguate which `e` was meant. * If `e` is used as an argument to another call (`f(... e ...)` or `... + e`), then `e` will be coerced to the expected parameter type for its argument position, and that coercion will disambiguate it (this is the bit that was fixed in #1501) * If `e` is used in another context where a type is expected/known, it will also be coerced: `if(e)`, `int v = e`, etc. The problem case that is left behind is any scenario where `e` is not subject to one of the above resolution cases, which mostly amounts to cases where an expression is never coerced to a single fixed type. There are a few important cases where this occurs today: * When the expression is used as the left-hand side of an assignement (`e = ...`). * When an expression is used to initialize a variable with an implicit type (`let v = e`). * When inferring generic arguments from the value arguments at a call site (`f(e)` where `f` is defined as `f<T>(T v)`) The key connecting thread in each of these cases is that the front-end needs to determine the type of `e` to make progress. Our semantic checking logic already has functions that try to draw a distinction between the two cases: * The `CheckTerm()` operation is supposed to be used when we expect that we will eventually coerce or otherwise diambiguate the term, and also in cases where we don't yet know if a term should name a type or a value * The `CheckExpr()` operation is supposed to be used when we do not expect that we will apply coercion/disambiguation to a term, and need to have assurances that it has been coerced into a non-overloaded expression with a reasonable type The simple part of the fix made here is to make `CheckExpr()` actually do part of what it is suppsoed to (attempt to disambiguate overloaded terms), and then audit all the call sites to `CheckExpr()` to make sure they are actually ones that intend to opt into that logic. The messier part of the fix is dealing with generic argument inference, because we need to extract the type of the disambiguated expression for the purposes of inference, but we don't want to disturb the actual argument list at a call site (because type coercion of the arguments is supposed to handle the disambiguation). This part is done with a bit of special-casing in the overload-resolution context, by adding a method that gets the type or an argument after disambiguation (when possible). * fixup Co-authored-by: Yong He <yonghe@outlook.com>	21 August 2020, 14:33:43 UTC
a8bc598	Yong He	21 August 2020, 08:10:45 UTC	Allow calling a generic function with an existential value (dynamic dispatch) (#1508) * Allow calling a generic function with an existential value (dynamic dispatch). * Fixes per review comments. * Clean up implementation by having `openExistential` return `ExtractExistentialType` instead of a DeclRef to the interface with a `ThisTypeSubstitution`. * More cleanups Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> Co-authored-by: Yong He <yhe@nvidia.com>	21 August 2020, 08:10:45 UTC
11748a7	Tim Foley	20 August 2020, 15:23:51 UTC	Initial support for a using construct (#1506) The basic idea is that if you have a namespace: namespace MyCoolNamespace { void f() { ... } ... } then you can bring the declarations from that namespace into scope with: using MyCoolNamespace; f(); The `using` construct is allowed in any scope where declarations are allowed. As an additional feature, the construct allows and then ignores the keyword `namespace` if it occurs right after `using`: using namespace MyCoolNamespace; Note that unlike in C++, `using` a namespace inside another namespace doesn't implicitly make the symbols available to clients of that namespace: namespace hidden { void secret() {...} ... } namespace api { using hidden; ... } api.secret(); // ERROR: `secret()` isn't a member of `api` The implementation of this feature was relatively simple, although it does leave out more advanced features that might be desirable in the future: * No support for `using MCN = MyCoolNamespace` sorts of tricks to define a short name * No support for `using` anything that isn't a namespace (e.g., to make the members of a type available without qualification) * No support for cases where multiple visible modules have a namespace of the same name (or dealing with overloaded namespaces in general)	20 August 2020, 15:23:51 UTC
b5a4161	jsmall-nvidia	19 August 2020, 19:51:43 UTC	Remove IncludeHandler. (#1505) nvAPI -> NVAPI nvAPIPath -> nvapiPath DxcIncludeHandler don't reference count. nv-api-path -> nvapi-path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	19 August 2020, 19:51:43 UTC
2fffbc5	jsmall-nvidia	19 August 2020, 18:15:36 UTC	Int64 atomic add RWByteAddressBuffer support (#1504) * Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but is necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float) in glsl. WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. * Int64 atomic add RwByteAddressBuffer support. * Fix typo in stdlib for int atomic ByteAddressBuffer. * Small fixes to int64 atomic test. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	19 August 2020, 18:15:36 UTC
b820f34	Yong He	18 August 2020, 20:08:45 UTC	Support initializing an existential value from a generic value. (#1503) * Support initializing an existential value from a generic value. * Remove trailing spaces and clean up debugging code.	18 August 2020, 20:08:45 UTC
9abcb6e	jsmall-nvidia	18 August 2020, 17:42:46 UTC	Support for float atomics on RWByteAddressBuffer (#1502) * Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but is necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float) in glsl. WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	18 August 2020, 17:42:46 UTC
697e7fb	Tim Foley	17 August 2020, 22:28:05 UTC	Attempt to fix lookup for members that "override" (#1501) Our current lookup process always finds all members of a type, which can include both an inherited member (e.g., from an `interface`) and one that logically overrides/implements it. If something downstream doesn't filter this result down and favor the derived member, then an ambiguity error will result. To date, this has mostly been a non-issue because we haven't emphasized inheritance, and the main case we did support (`struct` types implemented `interface` methods) gets disambiguated as part of overload resolution for function calls. Recent changes to support `property` declarations to `interface`s add the possibility for ambiguity between a "base" and "derived" declaration that can't rely on overload resolution for disambiguation. The approach in this PR is to add disambiguation logic to the other main place where the results of lookup get used. If a lookup result is being assigned to a variable, passed to a function, or otherwise used in a case where a value of a specific type is needed, it will be "coerced" to the desired type. This change makes it so that the first step in the coercion logic is to try to disambiguate the expression that is being coerced. In order to ensure that an overloaded expression can be detected and resolved even when just checking if coercion is possible, I needed to update the `canCoerce()` functions to also take the expression that is being tested for coercibility, and not just its type. There is only one case (that I saw) where coercion checks were being made without an expression value available, and that case didn't actually need/want to handle overloading. In order to test the fixes here, I added logic to the `property`-in-`interface` test to make sure that the critical cases work as expected (references to a derived member using "dot syntax" and "implicit `this`" syntax). Alternatives Considered ----------------------- The first attempt at this fix took a simpler approach: I added the disambiguation logic as a post-process on member lookup. That is, given `obj.foo` I would take the `LookupResult` for `foo` and immediately try to filter it to include only the most-derived members. This approach has the major benefit of catching even more use cases of values (and thus helping to ensure that we don't spend forever chasing down more of these ambiguity errors), but it also has two critical problems: 1. If we only trigger disambiguation when looking up `obj.foo`, then we can't do anything to help when `foo` is looked up as an ordinary identifier, but is actually equivalent to `this.foo`. A full fix would require doing this disambiguation on every* name lookup, which leads to the second issue: 2. It is important that for a method call like `obj.m(...)` we do not disambiguate when looking up `obj.m`, and instead let the overload resolution for the call resolve things. That choice is what makes it possible to call an inherited `m` declaration even when there is a derived `m` with a different signature. Issue (1) is covered by the test case that was added here, but we should probably have a test case for (2) to make sure we don't break that use case. Caveats ------- An important case that we don't solve in this PR is when the result of a lookup is captured in a variable without an explicit type: let f = obj.foo; That case also needs disambiguation, and should be addressed in a later change. A secondary issue is that our approach to prioritizing declarations during lookup is still quite naive. We really need a way for lookup to attach information about nesting of scopes to results (to be clear that results from inner scopes should be preferred over those from outer scopes), as well as have a robust mechanism for comparing the priority of members based on the inheritance graph of a type. This change doesn't do anything to make the situation better or worse.	17 August 2020, 22:28:05 UTC
ff2d490	Dietrich Geisler	17 August 2020, 16:50:44 UTC	GPU Foreach Loop (#1498) * GPU Foreach Loop This PR introduces the completed GPU foreach loop and updates the heterogeneous-hello-world example to use it. This PR builds on the previous introduction of the GPU Foreach loop parsing and semantic checking PR (#1482) by introducing IR lowering and emmitting. THe new feature can be used by having a GPU_Foreach loop interacting with a named non-CPP entry point, and using the -heterogeneous flag. * Fix to path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	17 August 2020, 16:50:44 UTC
0640a10	Yong He	14 August 2020, 19:56:43 UTC	Fix tuple type lowering (#1499) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	14 August 2020, 19:56:43 UTC
b37a777	Yong He	14 August 2020, 17:04:32 UTC	Lower existential types. (#1497) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	14 August 2020, 17:04:32 UTC
99366e7	Tim Foley	14 August 2020, 15:31:59 UTC	Fix an issue with explicit enum tag types (#1495) The basic problem here was that in a declaration like: ```hlsl enum Color : uint { Red, Orange, ... } ``` The `: uint` bit is represented as an `InheritanceDecl`, because that is what we use to represent the syntactic form of inheritance clauses like that. At the point where we parse the `InheritanceDecl` we don't yet know whether it represents a base interface or a "tag type" like `uint` in this case. The root problem that is then created is: an `enum` type is not a subtype of its "tag type," and treating it like a subtype can create problems. The main problem that arises is that looking in a type like `Color` will find both the members of color and the members of `uint`. In the case of things like `__init` declarations, that creates a problem where the `Color` type has two different `__init`s that take a `uint`: * The one it inherits from `uint` via that `InheritanceDecl` (even though it shouldn't) * The one it gets via an extension just for conforming to `__EnumType` (a non-user-exposed `interface` in the standard library) Because both of those `__init`s are inherited, neither is preferred over the other one and they create an ambiguity if somebody tries to write: ```hlsl uint u = ...; Colorc = Color(u); ``` The solution used in this PR is to add a compiler-internal modifier to the `InheritanceDecl` that introduces a "tag type" to an `enum`, in an early phase of checking (one of the ones that occurs before it is legal to enumerate the bases of a type). Then the lookup process is modified to ignore `InheritanceDecl`s with that modifier when doing lookup in super-types (since the declaration does not indicate a subtype/supertype relationship). This appears to get the basic feature working again, although it is possible that there are other parts of the compiler that use `InheritanceDecl`s and mistakently assume that all `InheritanceDecl`s introduce subtype/supertype relationships. We probably need to do a significant audit of the code to start being more clear about the nature of the relationships such declarations introduce. Such steps are left to future changes. Co-authored-by: Yong He <yonghe@outlook.com>	14 August 2020, 15:31:59 UTC
2bfe62a	Tim Foley	14 August 2020, 00:56:20 UTC	Support property declarations in interfaces (#1494) There are two main features in this change. First, we allow for `interface`s to declare `property` requirements, which can be satisfied by matching `property` declarations in a type that conforms to the interface: interface IRectangle { property float width { get; } property float height { get; } } struct Square : IRectangle { float size; property float width { get { return size; } } property float height { get { return size; } } } Second, we allow a type to satisfy a `property` requirement with an ordinary field of the same name: struct Rectangle : IRectangle { float width; float height; // no explicit `property` declarations needed } The implementation of these features is mostly in `slang-check-decl.cpp` in the logic for checking conformance of a type to an interface. The first feature simply requires adding logic to checking whether a candidate satisfying `property` declaration matches a required `property` declaration. To do so, it must have the same type, and an accessor to satisfy each of the required accessors. The second feature requires adding logic to synthesize an AST `property` declaration for a type, based on a required `property` declaration and its accessors. This means that, more or less, any type where `this.name` yields a storage location that does what is needed can satisfy a property requirement (there is no specific rule that says the storage needs to be a field, although that is the most likely case). The way that witnesses are stored for property declarations probably merits some description. During IR lowering, an abstract storage declaration like a subscript or `property` more or less desugars away, so that the actual interface requirements correspond to the accessors within it (the `get`, `set`, etc.). This means that a witness table should have entries/keys corresponding to the accessors and not the property itself. The process of finding/recording witnesses for `property` requirements thus installs entries for the individual accessors (with care taken to only install accessor witnesses once we are sure we have witnesses for all the requirements). Currently, the code also installs an entry for the property itself, although that is not strictly required, and might not be something we continue to do long-term. (Aside: it was somewhat surprising that an end-to-end test of `property` declarations in `interface`s Just Worked without any changes to IR lowering.) As we continue to write more code that synthesizes and checks AST expressions/statements, it becomes necessary to refactor the semantic checking logic so that it splits the recursive part (e.g., checking the operands of an assignment) from the validation part (e.g., checking that the assignment itself is valid). It is probably too big of a change to justify at this point, but it might be valuable in the future to have distinct hierarchies that represent unchecked and checked ASTs, with semantic checking mostly being a transformation from one to the other. The benefit of such a change is we could factor out a distinct "builder" API for constructing validated/checked AST nodes, with both semantic checking and AST synthesis being clients of that API.	14 August 2020, 00:56:20 UTC
482fd16	jsmall-nvidia	13 August 2020, 21:28:07 UTC	Added WavePrefixCountBits test. (#1493) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	13 August 2020, 21:28:07 UTC
876968c	Yong He	13 August 2020, 19:17:59 UTC	IR support for Tuple types. (#1492) * Tuple types. * Fix x86 warning * Improved deduplication Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	13 August 2020, 19:17:59 UTC
09adf10	Tim Foley	13 August 2020, 17:36:55 UTC	Allow both traditional and modern property syntax (#1487) The initial change to introduce `property` declarations tied them to a "modern" syntax: property width : float { ... } In practice, a great majority of users assume that properties in Slang will be declared like those in C#: property float height { ... } This change allows both options to parse correctly. The choice made here is to only parse as the "modern" syntax when it can be detected from lookahead (an identifier followed by a `:`), and fall back to the "traditional" syntax otherwise. That choice might not produce the best diagnostic messages around syntax errors in codebases that use the modern syntax, but it is the easiest trade-offs to make. We also add similar disambiguation logic for the `newValue` parameter of a `set` declaration (and other "modern"-style parameters). This strategy cannot be applied to all function parameters in general, because traditional parameter lists can still use `:` to introduce a semantic. Note: the same disambiguation strategy applied here could be used for `let` and `var` declarations: let a : int = 1; let int b = 2; This change does not try to introduce flexibility like that, because it seems unlikely for users to care.	13 August 2020, 17:36:55 UTC
e1ea7ed	Dietrich Geisler	12 August 2020, 17:39:08 UTC	GPU Foreach Parsing and Checking (#1482) This PR introduces parsing and semantic checking for a GPU foreach loop for heterogeneouis programming. A GPU foreach loop takes the form: ``` __GPU_FOREACH(renderer, gridDims, LAMBDA(uint3 dispatchThreadID) { kernelCall(args, ...); }); ``` And will allow the host code to call into a kernel with the correct renderer and grid dimensions. This commit also introduces a hack to unify types in the heterogeneous hello world file, which will hopefully be amended in the future. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	12 August 2020, 17:39:08 UTC
12b0fc6	jsmall-nvidia	12 August 2020, 13:30:25 UTC	Fix based on review comment problem. (#1490) Better split of responsibilites around _begin/_endInst	12 August 2020, 13:30:25 UTC
ec095a3	jsmall-nvidia	11 August 2020, 22:19:11 UTC	Bugfix: WaveActiveCountBits on glsl output. (#1488) * Fix WaveActiveCountBits on glsl output. * Fix warning `could not be inlined because the return instruction is not at the end of the function. This could be fixed by running merge-return before inlining.` from glslang - because we weren't including the CreateMergeReturnPasss on default optimization, and it's assumed in InlineExhaustivePass. * Keep WaveActiveCountBits use the default WaveMask impl. * Fix WaveCountBits calculation. Use WaveActiveBallot instead of the _WaveActiveBallot.	11 August 2020, 22:19:11 UTC
cc64c61	jsmall-nvidia	11 August 2020, 20:16:22 UTC	slang-glslang binaries update, including fixes from PR Bugfix: WaveActiveCountBits on glsloutput. Slang PR #1488 (#1489)	11 August 2020, 20:16:22 UTC
96805c7	jsmall-nvidia	11 August 2020, 15:37:38 UTC	Improvements to Casting (#1483) * Improve handling of cast detection when have a more complex cast than just a single identifier. * Improve comments around heuristic for casting * Added nested enum test. * Improve comments * Define function like - output change. * Use lookup for types in determining if cast or not. * Add _isCast function * Add heuristic test to nested-enum.slang that works if the type test fails. * Change hueristic based on review. Allow (..)( to always be an expression, because if it's a type it will be turned into a cast later. * Fix output of define-function-like.slang - which changes again with improved casting support. * Improve testing for type in cast - if we find a decl and it's not a type, then we know it's not a cast.	11 August 2020, 15:37:38 UTC
2903eb5	jsmall-nvidia	10 August 2020, 21:07:40 UTC	Slang binaries including #1485 fix (#1486) * Fix the minProgramTexelOffset should be -. * Improve comments. * Upgrade of slang-binaries which contains slang-glslang with the texel offset fix. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	10 August 2020, 21:07:40 UTC
023622c	jsmall-nvidia	10 August 2020, 19:52:32 UTC	Glslang texel offset bug fix (#1485) * Fix the minProgramTexelOffset should be -. * Improve comments.	10 August 2020, 19:52:32 UTC
dd980b4	Yong He	08 August 2020, 01:36:01 UTC	AnyValue packing/unpacking pass. (#1480) * AnyValue packing/unpacking pass. * Add diagnostic for types that does not fit in required AnyValueSize. * Add expected test result * Fix warnings.	08 August 2020, 01:36:01 UTC
20af567	jsmall-nvidia	06 August 2020, 21:15:55 UTC	Emit spir-v using MemoryArena to stop memory leak (#1479) * Use m_style for OSFindFilesResult * Refactor of FindFilesResult. * Fixes on linux for FindFiles. * Simplify FindFilesState, and linux support for pattern matching. * Small fixes to linux FindFilesState * Fix typo on linux FindFiles * Fix typo in linux FindFiles. * Renamed some variables, and improved comments on FindFiles. * Improve comments on FildFiles * Small improvements around FindFiles. * Refactor FindFiles again.. into a visitor and function in Path. * Fix some problems on linux. * Fix linux typo. * Renamed os -> find-file-util * find-file-utl -> directory-util * Make delete of PathInfo explicit. * Initialize alwaysCreateCollectedParam . * WIP spir-v emit using MemoryArena * Fix bug in spirv emit. * Fix bug with handling null termination on strings in spirv emit. * Small improvements in comments around emit spirv * Remove the 'dst' from emitOperand - we can only emit to the current inst. * Improve SpirV emit comments. * Don't store the created instruction in the InstConstructScope - as it's always the m_currentInst. Don't return the instruction after _beginInst. Slight comment improvements.	06 August 2020, 21:15:55 UTC
3231048	jsmall-nvidia	05 August 2020, 20:12:55 UTC	Refactor enumerating directory contents (#1478) * Use m_style for OSFindFilesResult * Refactor of FindFilesResult. * Fixes on linux for FindFiles. * Simplify FindFilesState, and linux support for pattern matching. * Small fixes to linux FindFilesState * Fix typo on linux FindFiles * Fix typo in linux FindFiles. * Renamed some variables, and improved comments on FindFiles. * Improve comments on FildFiles * Small improvements around FindFiles. * Refactor FindFiles again.. into a visitor and function in Path. * Fix some problems on linux. * Fix linux typo. * Renamed os -> find-file-util * find-file-utl -> directory-util Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	05 August 2020, 20:12:55 UTC
e713b56	Tim Foley	05 August 2020, 18:47:18 UTC	Change the policy for entry-point uniform parameters on Vulkan (#1476) Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal. Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward: * For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.) * For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets. * For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms. * For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record." * For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types). * The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan. The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently. The big picture is then that a kernel like: ```hlsl void computeMain(uniform float someValue) { ... } ``` will map to output GLSL like: ```glsl layout(push_constant) uniform { float someValue; } U; void main() { ... } ``` If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit: ```hlsl struct Params { float someValue; } void computeMain(uniform ConstantBuffer<Params> params) { ... } ``` (Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.) This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case). The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support: * Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly. * The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer` * The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in. * On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering). * On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer. The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed. A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added. An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding. Co-authored-by: Yong He <yonghe@outlook.com>	05 August 2020, 18:47:18 UTC
6fb2aa7	Yong He	05 August 2020, 17:32:52 UTC	`AnyValue` based dynamic dispatch code gen (#1477) * AnyValue based dynamic code gen * Fix aarch64 build error	05 August 2020, 17:32:52 UTC
092337a	jsmall-nvidia	04 August 2020, 23:34:33 UTC	Sampler Feedback improvements (#1475) * Add the Feedback texture types. Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK. * Starting point to test sampler feedback. * WIP on FeedbackSampler. * Use __target_intrinsic to override the output of sampler feedback types. * Use newer generic syntax for FeedbackTexture. * Reflects Feedback type. * SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK * Added reflection test. * Reneable issue with generics in sampler-feedback-basic.slang * Add methods to FeedbackTexture2D/Array. Make test cover test cases. * Sampler feedback produces DXC code. * Disabled Sampler feedback test - as requires newer version of DXC. * Fix bug in reflection tool output. * Fix problem with direct-spirv-emit.slang.expected due to update to glslang. * Fix direct-spirv-emit.slang * Use SLANG_RESOURCE_EXT_SHAPE_MASK again * Make Feedback be emitted as a textue type prefix. * Add support for GetDimensions to FeedbackTexture2D * WIP on CPU sampler feedback. Update of target compatibility. * Fix some bugs in C++ feedback sampler. Fix GetDimensions for FeedbackTextures. Run 'Compile' test for CPU compute feedback texture test. Update target-compatability.md * Fix GetDimensions call on feedback sampler. * Small documentation improvements. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	04 August 2020, 23:34:33 UTC
de309d9	Yong He	04 August 2020, 18:57:45 UTC	Fix leaks in slang-generate (#1472) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	04 August 2020, 18:57:45 UTC
00b1fe0	Tim Foley	04 August 2020, 17:43:05 UTC	Fix stdlib declarations of bit-shift ops (#1471) The declarations of the left- and right-shift operations in the Slang standard library were set up identically to the declarations of operator binary (and compound binary) operations. A consequence of this choice was that both operands to a shift were expected to have the same type, which can lead to a confusing result. If the user wrote a shift of the form `int >> uint`, then the ordinary promotion rules for Slang would decide to perform the operation on `uint` value, so it would change to `uint(int) >> uint` and perform an unsigned shift, which isn't what the user would expect. The fix implemented here is to make the shift operations be declared separately from the other binary operations, with two generic type parameters instead of one: distinct parameters for the left-hand-side and right-hand side types. Each parameter is only constrained to be a built-in integer type.	04 August 2020, 17:43:05 UTC
79ba927	jsmall-nvidia	03 August 2020, 19:46:16 UTC	First pass support for Sampler Feedback (#1470) * Add the Feedback texture types. Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK. * Starting point to test sampler feedback. * WIP on FeedbackSampler. * Use __target_intrinsic to override the output of sampler feedback types. * Use newer generic syntax for FeedbackTexture. * Reflects Feedback type. * SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK * Added reflection test. * Reneable issue with generics in sampler-feedback-basic.slang * Add methods to FeedbackTexture2D/Array. Make test cover test cases. * Sampler feedback produces DXC code. * Disabled Sampler feedback test - as requires newer version of DXC. * Fix bug in reflection tool output. * Fix problem with direct-spirv-emit.slang.expected due to update to glslang. * Fix direct-spirv-emit.slang * Use SLANG_RESOURCE_EXT_SHAPE_MASK again * Make Feedback be emitted as a textue type prefix. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	03 August 2020, 19:46:16 UTC
9ac5c51	Yong He	01 August 2020, 01:30:10 UTC	Add [anyValueSize] attribute to interfaces and propagate that in the IR. (#1469) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	01 August 2020, 01:30:10 UTC
bb42514	Tim Foley	31 July 2020, 23:33:30 UTC	Fix issues arising around DXR 1.1 RayQuery usage (#1468) This change includes a few different fixes for issues that arose in a user shader that made use of DXR 1.1. The existing solution we had for handling the DXR 1.1 `RayQuery` type relied on the fact that a declaration like: ```hlsl RayQuery<0> myRayQuery; ``` Looks like an undefined variable to existing Slang, while to dxc it is a variable declaration that runs an implicit default constructor (sneaking a bit of C++ into HLSL, but only in a way the standard library can use). Slang was getting away with the fact that this maps to an undefined variable because it turns out that our emit logic would output the exact same declaration for an undefined value (since declaring a variable without initializing it is the simplest way to get an undefined value of a given type in a C-like language). The main bug that arose here was that if the `RayQuery<...>`-typed variable was declared under control flow, then the `undefined` instructions introduced by our SSA pass would actually get inserted into the wrong block. Basically, when a block was trying to read a variable, and there was no preceding `store` to that variable in the block, we'd start looking for incoming values from its predecessor block(s). In the case where the variable never gets stored to, this search would eventually reach the first block of the function, where we'd realize the value must be `undefined`. The result was that we might insert an `undefined` instruction of some `T` into the first block of a function, but the type `T` might be the result of a lookup operation performed later in that function. This ends up creating a use of `T` that isn't dominated by the definition, which violates the SSA property. This violation of the SSA properties lead to us generating incorrect code in a later pass that deals with scoping differences between SSA form and our structured output statements; that code would end up creating a local variable to hold a type instead of a value. The main fix is in `slang-ir-ssa.cpp`, where we catch the case of trying to read a variable in the block that declares it, if there we no preceding `store`s. We simply insert an `undefined` instruction before the first such read and write that out as the value of the variable to be used for subsequent instructions (up to the next `store`). This fixes the SSA dominance property for the `undefined` values that get introduced and thus technically fixes the output code for the user shader. A secondary issue is that it is kind of gross to be relying on the behavior of `undefined` instructions in the IR for the semantics of an important standard library type like `RayQuery<...>`. A preceding change already added basic support for Slang to run default initializers (declared as `__init()`) on variables that are declared without an initial-value expression. This change adds such a default initializer to `RayQuery<...>` and maps it to a dedicated IR instruction that is intended to represent the idea of running a C++-style default constructor to produce a value. It turns out that the code we need to emit in that cse is identical to what we currently emit for `undefined` instructions, so that is helpful. A tertiary issue is that when trying to run the user shader in debug mode, I ran into an assertion because our type layout logic for reflection had never dealt with the issue of user-defined `enum` types being used in constant buffers or other memory that needs layout. I added a quick fix that lays out any `enum` types as their "tag type" (which defaults to `int`). Unfortunately, there is no easy way to check in a regression test for the user issue, because official `dxcompiler` versions with support for DXR 1.1 are not yet released (at least as of last time I checked).	31 July 2020, 23:33:30 UTC
011a743	Dietrich Geisler	31 July 2020, 21:51:52 UTC	Binary for Heterogeneous Example (#1467) * Binary Heterogeneous Example This PR introduces the ability to insert the binary of a non-CPU target by using the -heterogeneous flag. Specifically, this PR updates the emitting logic to produce a variable of name `__[name_of_entryPoint]` when the heterogeneous flag is present. * Prelude path fix Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	31 July 2020, 21:51:52 UTC
4549597	jsmall-nvidia	31 July 2020, 20:31:30 UTC	Upgrade to Glslang 11.0.0 (#1466) * Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but is necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * Small improvements to glslang building process documentation. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	31 July 2020, 20:31:30 UTC
fc8b575	jsmall-nvidia	31 July 2020, 16:25:58 UTC	Fix for bug where memory that has been allocated with new T[] (within a list) is freed with free in the RiffContainer. (#1473)	31 July 2020, 16:25:58 UTC
659d8ea	Yong He	29 July 2020, 00:05:13 UTC	Generalize lowerSimpleIntrinsicType to include generic arguments (#1464) * Generalize lowerSimpleIntrinsicType to include generic arguments * Use recursion instead of loop to get the correct ordering for nested generics	29 July 2020, 00:05:13 UTC
cd10673	Tim Foley	28 July 2020, 22:14:31 UTC	Change parameter passing convention for CUDA (#1463) The Big Picture =============== Given input Slang code like: ```hlsl Texture2D gA; [shader("compute")] void kernelFunc(uniform Texture2D b, uint3 tid : SV_DispatchThreadID) { ... } ``` the existing CUDA code generation strategy would always generate a kernel with a signature like: ```c++ struct GlobalParams { Texture2D gA; } struct EntryPointParams { Texture2D b; } extern "C" __global__ void kernelFunc(EntryPointParams* entryPointParams, GlobalParams* globalParams) { ... } ``` This choice was consistent with the conventions of the CPU kernel target, and shares the advantage that it is easy for the user to data-drive the logic for filling in parameters and then invoking a kernel. However, the approach outlined above has two serious problems when used for CUDA kernels: * First, it defies the programmer's expectation about what an "equivalent" CUDA kernel signature would be, which makes it awkward for a developer to invoke this kernel from CUDA C++ host code (especially in the context of an app that might also run hand-written CUDA kernels). * Second, the performance of this approach suffers because every access to a global or entry point parameter turns into a load from global memory. In contrast, a typical hand-written CUDA kernel passes its parameters via an implementation-specific path that (for current CUDA platforms) seems to be equivalent to `__constant__` memory in performance. This change alters the convention so that the Slang compiler takes the code from the top of this message and translates it into something like: ```c++ struct GlobalParams { Texture2D gA; } __constant__ GlobalParams SLANG_globalParams; extern "C" __global__ void kernelFunc( Texture2D b ) { ... } ``` This translation alleviates both problems with the current translation: * The signature of the generated CUDA kernel function is as close to that of the original as is possible (we had to eliminate the `SV_`-semantic varying inputs), and should directly match what the programmer would expect in common cases. Entry-point parameters are passed via CUDA kernel parameters, and should thus match in performance. Global parameters are passed via a variable in `__constant__` memory, and thus should also perform as well as possible/expected. Detailed Changes ================ * Disable the `collectEntryPointUniformParams` pass for CUDA, so that entry-point `uniform` parameters are not bundles into a single `struct` and/or `ConstantBuffer`. * When targeting CUDA, disable the logic for generating an entry-point parameter for passing in the global shader parameter(s) * Allow `CLikeSourceEmitter` subclasses to override the name generated for entry-point symbols, and use this to add the required prefix for each OptiX kernel type when translating a ray-tracing kernel. * Add logic to emit "parameter groups" in a specialized way for CUDA (this is the same approach that allows us to generate `cbufffer { ... }` declarations for fxc). A global-scope parameter group will turn into a global `__constant__` variable called `SLANG_globalParams` (that name becomes part of the ABI for Slang-compiled shaders). * Update the logic in `render-test` for loading and invoking CUDA kernels to handle the new policy. The last bullet there merits expansion, since it is indicative of the work a client using Slang would have to go through to use our generated kernels with the new policy: * When loading a CUDA module with one or more kernels, we also use `cuModuleGetGlobal` to query the address of the `SLANG_globalParams` symbol in that CUDA module. That pointer needs to be used when setting global parameter values to be used by kernels in that CUDA odule. * Because our existing `BindPoint` logic for CUDA always sets up parameter data in GPU memory, we end up having to copy the entry-point parameter data from GPU memory to host memory. This step would ideally be skipped in a codebase that understands the correct policy, but it is a bit unfortunate that it is no longer trivially correct for an application to store all parameter data in GPU memory. * Before invoking the kernel, we need to use a `cudaMemcpyAsync` to copy from the prepared GPU memory for global parameters over to the `SLANG_globalParams` symbol associated with the kernel to be invoked. Because this operations is issued on the same CUDA stream as the kernel call, it is guaranteed to not overlap with GPU kernel execution. * When invoking the kernel, we take advantage of the seldom-used `CU_LAUNCH_PARAM_BUFFER_POINTER` facility to specify a contiguous memory region with all the entry-point parameters in it instead of passing each entry-point parameter separately. Given Slang reflection it is also possible to query the offset of each entry-point parameter in the buffer, so we could invoke the kernel in the traditional fashion as well. The choice here is up to the application. Caveats ======= * This is a breaking change, and any subsequent release will need to reflect that fact. Any customers who rely on Slang's current CUDA codegen strategy are likely to be surprised by this change, and I don't see an easy way to give them a more gentle transition. * This change does not remove the logic that introduces a `KernelContext` type for code that requires it. That means that things like `static` global variables can continue to work on CUDA for now, but we know that those are not going to be something we can support in the long-term with separate compilation. * While the policy implemented in this change is a reasonable default, it is still not going to perfectly match expecations for some developers. In particular, some developers who are familiar with both D3D and CUDA will likely wonder why a global `cbuffer` in Slang translates to a global-memory pointer in the output CUDA instead of one global `__constant__` variable per `cbuffer`. A more detailed alternate translation would generate a distinct global `__constant__` variable for each top-level constant buffer or parameter block. We may need to refine the translation even more based on feedback from users who care about how we handle global-scope parameters. * Recent changes in Slang have broken the logic that handles the OptiX "shader record" as an alternative mechanism for passing entry-point parameters. In order to get any level of OptiX support up and running we will have to change the IR passes that run on CUDA kernels to actually run the "collection" of `uniform` parameters for ray tracing stages, and then to replace references to the resulting parameter with a call to the function to access the shader record. * The use of `SLANG_globalParams` here works well enough in the case of whole-program compilation; every `CUmodule` ends up with (zero or) one parameter with this name, and an application can just hard-code it. As a mechanism it wouldn't work in the presence of separately-compiled modules that might introduce their own global parameters (including cases like constant lookup tables that really want to be at the global scope). An alternative approach would have Slang generate output PTX for each module, where a module has an optional global symbol for its own global-scope parameters (with a mangled name that is based on the module name), and then a linked CUDA binary has all of those distinct symbols. Such an approach would be compatible with module-at-a-time reflection and parameter binding, but would lead to another breaking change down the line for code that switches to `SLANG_globalParams`.	28 July 2020, 22:14:31 UTC
dce1d35	Tim Foley	28 July 2020, 14:55:41 UTC	Fix support for nested generic intrinsics (#1462) The logic that detects intrinsic functions during emit was not able to properly detect an intrinsic generic method nested in a generic type. The basic problem was that this led to a `specialize(specialize(...), ...)` in the IR, which wasn't being handled (only one level of `specialize` was handled). The fix is local and simple. The larger issue was that the author of this commit had thought our IR ruled out nested generics like this, when in fact that is precisely how we handle nested generics throughout the IR. This oversight/misunderstanding means that we might have broken passes in other places that assume nested generics cannot happen. This change doesn't pretend to fix that other issue, but we should pay attention to it.	28 July 2020, 14:55:41 UTC
348058f	Dietrich Geisler	27 July 2020, 16:14:17 UTC	Baseline Heterogeneous Example (#1460) * Baseline Heterogeneous Example This PR introduces a baseline heterogeneous example, including both a Slang file and an associated C++ helper file. This refactoring primarily moves the Slang file "into the driver's seat" while maintaining that the C++ side still does most of the actual work. * Fix to prelude path	27 July 2020, 16:14:17 UTC
87940a6	Tim Foley	25 July 2020, 01:12:41 UTC	Fix bugs related to mutating implementations of interface methods (#1461) There are two main bug fixes here: * We were failing to diagnose when code calls a `[mutating]` method on a value that doesn't support mutation (that is an r-value instead of an l-value). * We had a bug in the synthesis logic for interface requirements where we used the result type of the requirement in place of each of the parameter types. The second bug made synthesis often produce incorrect signatures with `void` parameters. The first bug meant that even though a `[mutating]` method should not be able to satisfy a non-`[mutating]` method (and we had code to enforce this for the "exact match" case), when we go on to try and synthesize a non-`[mutating]` method that satisfies the requirement by delegating to the user-written one, it would end up succeeding, because nothing was stopping a non-`[mutating]` method from calling a `[mutating]` one. In each case this code adds a fix and a test case to confirm it.	25 July 2020, 01:12:41 UTC
261fe75	Yong He	24 July 2020, 23:37:51 UTC	Ensure labels are dumped in `lower-to-ir` (#1459) * Ensure labels are dumped in `lower-to-ir`. There is a `dumpIR` function that accepts a label parameter already in slang-emit.cpp. This change moves it to slang-ir.cpp so it may be called from other files. * update expected test result Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 July 2020, 23:37:51 UTC
17d0da2	jsmall-nvidia	24 July 2020, 20:41:43 UTC	Enable CUDA for active-mask tests. (#1458) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	24 July 2020, 20:41:43 UTC

Newer
Older