Revision history - None - origin: https://github.com/shader-slang/slang

visit type:

Revision	Author	Date	Message	Commit Date
e21d5ad	jsmall-nvidia	21 November 2018, 18:41:34 UTC	Feature/early depth stencil (#727) * First pass support for early depth stencil. * Add a simple test to check if output has attributes. * Use cross compilation to test [earlydepthstencil] on glsl. * If target is dxil, use dxc to test against. Add hlsl to test earlydepthstencil against. * * Added spSessionHasCompileTargetSupport * Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run	21 November 2018, 18:41:34 UTC
9bb11b6	Tim Foley	21 November 2018, 16:15:33 UTC	Add support for unbounded arrays as shader parameters (#725) * Add support for unbounded arrays as shader parameters With this change, Slang shaders can use unbounded-size arrays as parameters, e.g.: ```hlsl Texture2D t[] : register(t3, space2); SamplerState s[]; ``` As shown in the above example, Slang supports both explicit `register` declarations on unbounded-size arrays and also implicit binding. When doing automatic parmaeter binding, Slang will allocate a full register space to an unbounded-size array of textures/smaplers, starting at register zero. Note that for the Vulkan target, an array of descriptors of any size (including unbounded size) consumes only a single `bindign`, so much of this logic is specific to D3D targets. Details on the changes made: * The single biggest change is a new `LayoutSize` type that is used to store a value that can either be a finite unsigned integer or a dedicated "infinite" value (which is stored as the all-bits-set `-1` value). This is used in places where a size could either be a finite value or an "unbounded" value, to both try to make standard math robust against the infinite case, and also to force code to deal with both the finite and infinite cases more explicitly when they care about the difference. * The public API was documented so that unbounded-size arrays report their size as `-1`. We should probably change this function to return a signed value instead of `size_t`, but that would technically be a source-breaking change, so we want to make sure we stage it appropriately. * The code that invokes fxc was updated so that it passes the appropriate flag to enable unbounded arrays of descriptors. I haven't looked yet at whether dxc needs such a flag, so there may need to be a follow-on change to add that. * The logic in the `UsedRanges::Add` method for tracking what registers have been claimed was rewritten because the previous version had some subtle bugs. The new version includes more detailed comments that attempt to explain why I think the new logic works. * The top-level logic for auto-assigning bindings to parameters has been overhauled to deal with the fact that a parameter that needs "infinite" amounts of a resource should be claiming a full register space for those resources instead. Whenever a parameter allocates any register spaces we want them all to be contiguous, so we have a loop that counts the requirements and allocates the spaces before we go along and dole them out. * When computing the layout for an array type, we need to carefully deal with unbounded-size arrays. In the case of an unbounded array of a "simple" resource type (e.g., `Texture2D[]`), we opt to expose the type layout as consuming an infinite number of the appropriate register, while in the case of a complex type (say, a `struct` with two texture fields), we need to instead allocate whole spaces for those fields. The logic here is more subtle than I would like, and interacts with the existing code that "adjusts" the element type of an array in order to make standard indexing math Just Work. * Similarly, when a `struct` type has unbounded-array fields, then we need to transform any field with infinite register requirements to instead consume a space in the resulting aggregate type. This case is comparatively easier than the array case. * The test case for unbounded arrays covers both explicit and implicit bindings, and also the case of an unbounded array over a `struct` type (it does not cover the case of a `struct` contianing unbounded arrays, so that will need to be added later). For this test we are both validation the output reflection data and that we produce the same code as fxc (with explicit bindings in the fxc case). * The reflection test app was modified to use the new API contract and detect when a parameter consumes `SLANG_UNBOUNDED_SIZE` resources. * Fixup: ensure unbounded size is defined at right bit width	21 November 2018, 16:15:33 UTC
7f97f35	jsmall-nvidia	20 November 2018, 21:00:48 UTC	Keep resources in scope for Vk in DescriptorSet. (#726)	20 November 2018, 21:00:48 UTC
1f4b495	Tim Foley	19 November 2018, 17:08:26 UTC	Fix declaration of RWTexture*.Load() operations (#722) The `Texture2D.Load()` operation takes a `uint3` of coordinates, with the `.xy` part holding the texel coordinates and the `.z` part holding a mip level. In contrast, the `RWTexture2D.Load()` operation only takes a `uint2`. This isn't clearly specified on MSDN, so Slang failed to get the declaration right. This change fixes it so that we only add the extra coordinate for the mip level on non-multisample, read-only texture types (the previous code only checked for multisample-ness). I also changed the logic that outputs swizzles to extract the coordinates and mip level so that it only applies when there is a mip level being passed through (this code should never actually be applied, though, because we shouldn't be generating `texelFetch` for RW texures anyway...). One final change that sneask in here is making the `offset` parameter for one of the load-with-offset cases correctly use the base coordinate count for the texture type (e.g., 2D even for `Texture2DArray`). That is an unrelated fix, but I thought I'd sneak it in here rather than do a separate PR.	19 November 2018, 17:08:26 UTC
7e0c5ad	Tim Foley	19 November 2018, 17:07:34 UTC	Add Vulkan cross-compilation for byte-address buffers (#721) * Add Vulkan cross-compilation for byte-address buffers This covers `ByteAddressBuffer`, `RWByteAddressBuffer`, and `RasterizerOrderedByteAddressBuffer`. A declaration of any of these types translates to a GLSL `buffer` declaration with a single `uint` array of data. Most of the methods on these types then have straightforward translations to operations on the array. The overall translation is similar to what was already being done for structured buffers. While implementing GLSL translation for the various atomic (`Interlocked`) methods, I discovered that some of these included declarations that aren't actually included in HLSL. I cleaned these up, including in the declarations of the global `Interlocked` functions. The test case that is included here covers only the most basic functionality: `Load`, `Load2`, `Load4` and `Store`. We should try to back-fill tests for the remaining methods when we have time. Two large caveats with this work: 1. We don't handle arrays of byte-address buffers, just as we don't handle arrays of structured buffers. That will take additional work. 2. We don't handle byte-address (or structured) buffers being passed as function parameters, since the parameter would need to be declared as a bare `uint[]` array. * Fixup: don't lump raytracing acceleration structures in with buffers Raytracing acceleration structures share a common base class with byte-address buffers (they are both buffer resources without a specific element type), and I was mistakently matching on this base class in an attempt to have a catch-all that applied to all byte-address buffers. The fix here was to add a distinct base class for all byte-address buffers and catch that instead. * Fixup: typos	19 November 2018, 17:07:34 UTC
0f67d92	jsmall-nvidia	16 November 2018, 21:04:56 UTC	Bug fix - vk::binding on structured buffers (#720) * Fix output of binding of structured buffer on GLSL. * Added test to check vk binding is coming thru. * Fix closethit binding inconsistency.	16 November 2018, 21:04:56 UTC
cfb1f61	jsmall-nvidia	13 November 2018, 22:02:27 UTC	* Fix bug outputing dxbc assembly (#719) * Make the disassembly methods returns SlangResult and String as last output param so as to make error case clear.	13 November 2018, 22:02:27 UTC
039c233	Tim Foley	12 November 2018, 17:57:46 UTC	Add callable shader support for Vulkan ray tracing (#718) * Add callable shader support for Vulkan ray tracing This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec. One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines. Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`. We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind). Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms). * Typo fix: assocaited->associated One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot. * Typo fix: defintiion->definition	12 November 2018, 17:57:46 UTC
c07f60a	Tim Foley	09 November 2018, 21:24:28 UTC	Update Vulkan ray tracing support to final extension spec (#717) * Update version of glslang used * Update VK raytracing support for final extension spec A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`. The Slang standard library and the GLSL baselines for the tests are consistently updated. The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in: ```glsl int payloadID = 0; traceNV(..., payloadID); // ERROR ``` In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here: 1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression. 2. We could introduce a pass to replace the payload-location instructions with literal integers. 3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site) 4. We could somehow mark the builtin function as not having side effects. We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM. We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response. This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.	09 November 2018, 21:24:28 UTC
d782a16	jsmall-nvidia	09 November 2018, 15:23:37 UTC	Feature/teamcity output (#715) * First pass support for TeamCity compatible output. * Remove reset of counters on starting suite - so summary is over all suites.	09 November 2018, 15:23:37 UTC
de6ca4d	jsmall-nvidia	07 November 2018, 00:17:10 UTC	Made sameStageSpecifiedMoreThanOnce a warning instead of an error. (#714)	07 November 2018, 00:17:10 UTC
02e18b5	Tim Foley	07 November 2018, 00:16:52 UTC	Translate NonUniformResourceIndex() calls to Vulkan GLSL (#713) These calls translate to uses of the `nonuniformEXT` qualifier introduced by the `GL_EXT_nonuniform_qualifier` extension. The standard library changes in this case are straightforward uses of existing compiler mechanisms. The test case is one of the less pleasant ones where we compare SPIR-V output against SPIR-V generated from a hand-coded GLSL baseline. This is a case where a simpler test type that just checks for specific textual matches in the output (and not whole files) would be better, but that is out of scope for this change.	07 November 2018, 00:16:52 UTC
1185bd4	jsmall-nvidia	06 November 2018, 19:28:25 UTC	Feature/shared library refactor (#712) * * Added ISlangSharedLibraryLoader and ISlangSharedLibrary * Implemented default implementations * Added slang API function to get/set the ISlangSharedLibraryLoader on the session * Put function caching onto the Session - so that if the loader is chaged, its easy to reset the shared libraries, and functions * Run premake. * Fix problem with setting null, would cause an unnecessary function/shared lib flush. * * Unload SharedLibrary when DefaultSharedLibrary is deleted. * Make SharedLibrary handle unload safely if already unloaded. * Refactor SharedLibrary, such that it becomes a utility class - simplifying it's semantics. * Simplified ISlangSharedLibrary such that doesn't have unload and isLoaded so easier to implement. Use updated SharedLibrary impl. * Disable aarch64 on windows * Premake windows files without aarch64 build. * Moved slang-shared-library to core (so can be used in code outside of main slang) Fixed problem in premake5 where on windows projects were incorrectly constructed * Allowed RefObject to base class of com types Added ConfigurableSharedLibraryLoader Added -dxc-path -fxc-path -glslang-path Fix problem with dxc-path not honoring it's path when loading dxil * Added documentation for command line control of dll loading paths. * Remove some tabbing issues. * Change name of include guard.	06 November 2018, 19:28:25 UTC
4533319	Tim Foley	01 November 2018, 22:08:54 UTC	Add support for a "strict" floating-point mode (#709) This change adds an API function and command line options for controlling the default floating-point behavior for a target, with options for "fast" and "precise" computation. The "precise" option gets mapped to the "IEEE strictness" mode in `fxc` and `dxc` (there is currently no equivalent option for glslang that I could find).	01 November 2018, 22:08:54 UTC
82f57ae	Tim Foley	01 November 2018, 20:23:45 UTC	Disable warnings when compiling through dxc (#710) I had hoped to just disable certain warnings (false positives that trigger based on how Slang outputs HLSL code), but dxc doesn't support fine-grained control over warnings from the command line (we might investigate `#pragma`-based options later). There are enough user complaints about how downstream compiler errors come mixed with tons of distracting warnings that disabling them will lead to a better user experience for now. This change also fixes a long-standing bug where apparently dxc does not guaranteee that the diagnostic blob it returns is nul-terminated (despite this convention being established by fxc), so that we need to nul-terminate it ourselves before emitting any diagnostic messages produced by dxc.	01 November 2018, 20:23:45 UTC
e55d8dc	jsmall-nvidia	01 November 2018, 17:58:24 UTC	Newer versions of gcc, optimize away tests for this being null, because this being null is defined as undefined behavior in the standard. This is a workaround that disables that optimization for now. (#708)	01 November 2018, 17:58:24 UTC
a73b24f	jsmall-nvidia	31 October 2018, 22:25:35 UTC	Added support for cross-compilation. (#707)	31 October 2018, 22:25:35 UTC
2993dd8	Tim Foley	31 October 2018, 14:01:07 UTC	Fix a precedence bug in code emit (#705) * Fix a precedence bug in code emit Given code like the following: ```hlsl float a = ...; float3 b = pow(a, 2.0); float3 c = b.xyz; ``` There is an implicit cast from `float` to `float3` in the computation of `b`, that Slang will always make explicit in the output. Slang will also tend to pull the computation of `b` into the next expression if it has no other use sites in the same function. When it does, the compiler was failing to parenthesize the result correctly, and yielded (more or less): ```hlsl float a = ...; float3 c = (float3) pow(a,2.0).xyz; ``` As you can see, the swizzle ended up attached to the `pow()` call instead of the cast, and the downstream compiler luckily complained that we couldn't apply an `.xyz` swizzle to a scalar value. This change adds the missing parentheses-insertion logic for that case of emitting a cast expression, so that we instead get: ```hlsl float a = ...; float3 c = ((float3) pow(a,2.0)).xyz; ``` I added a test case to catch this specific issue, but there is of course no guarantee that we haven't missed other cases in the emit logic. This is why I held out so long on getting to the "why so many parentheses?" complaints... * remove commented-out code from test program	31 October 2018, 14:01:07 UTC
f0f1571	Tim Foley	31 October 2018, 00:50:12 UTC	Fix handling of DXR profiles (#704) The logic in `getEffectiveProfile()` function was mapping these to use `Stage::Unknown` in an early attempt to handle the way that dxc requires the `lib_` profile for DXR shaders, instead of anything that mentions the stage name (in constrast to, e.g., `vs_5_1`). At the same time, the `GetHLSLProfileName()` function was updated to explicitly handle the DXR shaders and map anything it doesn't expect (including `Stage::Unknown`) to a profile named `unknown`, which dxc obviously doesn't like. This change tries to fix both issues by: Having `getEffectiveProfile()` no longer clobber the stage part of a profile for DXR shaders. * Having `GetHLSLProfileName()` map all unhandled cases to the `lib_*` profiles, since that seems likely to be how any future stages will need to be handled as well (based on the precedent with DXR) Along the way, I also fixed a bug where invoking command-line `slangc` with no `-stage` options and then relying on `[shader(...)]` attributes to pick up the entry points would lead to a crash since the array of per-entry-point output paths on each target would not be sized appropriately.	31 October 2018, 00:50:12 UTC
098dd5d	Tim Foley	30 October 2018, 21:21:34 UTC	Fix a crash on function-static variables with initializers (#703) This code path hadn't been used, and it had a crash due to not inserting the basic blocks it created (for initializing the variable) into the parent function. The fix adds a bit more smarts to the `IRBuilder` to help with inserting basic blocks into the flow of a function. The actual user issue was around `static const` declarations, and it is clear that the code is incorrectly treating a function local `static const` as if it were just `static`. That will need to be fixed in another change.	30 October 2018, 21:21:34 UTC
baf0608	jsmall-nvidia	30 October 2018, 19:31:27 UTC	Feature/serial string pool refactor (#702) * Ongoing serialization for full debug work. * Use StringRepresentationCache and StringSlicePool for serialization. * Removed some older path handling for serialization which had some wrong underlying assumptions. * Builds with refactored use of SubStringPool in ir-serialize. * Removed prohibitedCategories because not used anywhere. * Add category 'compatibility-issue' * Remove work in progress on debug serialization.	30 October 2018, 19:31:27 UTC
2a76449	Tim Foley	30 October 2018, 16:05:40 UTC	Fix declarations of InterlockedMin/Max (#700) Fixed #699 These functions were declared with an `in out` parameter (copy in, copy out) where they should have used `__ref` (true by-reference parameter passing).	30 October 2018, 16:05:40 UTC
d7c0970	Tim Foley	30 October 2018, 14:04:52 UTC	Fix implementation of >= in preprocessor conditionals (#701) By a copy-paste error, `>=` in a preprocessor condition was behaving as `<=`.	30 October 2018, 14:04:52 UTC
7259855	Tim Foley	29 October 2018, 21:44:39 UTC	Rework command-line options handling for entry points and targets (#697) * Rework command-line options handling for entry points and targets Overview: * The biggest functionality change is that the implicit ordering constraints when multiple `-entry` options are reversed: any `-stage` option affects the `-entry` to its left instead of to its right as it used to. This is technically a breaking change, but I expect most users aren't using this feature. * The options parsing tries to handle profile versions and stages as distinct data (rather than using the combined `Profile` type all over), and treats a `-profile` option that specifies both a profile version and a stage (e.g., `-profile ps_5_0`) as if it were sugar for both a `-profile` and a `-stage` (e.g., `-profile sm_5_0 -stage fragment`). * We now technically handle multiple `-target` options in one invocation of `-slangc`, but do not advertise that fact in the documentation because it might be confusing for users. Similar to the relationship between `-stage` and `-entry`, any `-profile` option affects the most recent `-target` option unless there is only one `-target`. * The logic for associating `-o` options with corresponding entry points and targets has been beefed up. The rule is that a `-o` option for a compiled kernel binds to the entry point to its left, unless there is only one entry point (just like for `-stage`). The associated target for a `-o` option is found via a search, however, because otherwise it would be impossible to specify `-o` options for both SPIR-V and DXIL in one pass. * The handling of output paths for entry points in the internal compiler structures was changed, because previously it could only handle one output path per entry point (even when there are multiple targets). The new logic builds up a per-target mapping from an entry point to its desired output path (if any). Details: * Support for formatting profile versions, stages, and compile targets (formats) was added to diagnostic printing, so that we can make better error messages. This is fairly ad hoc, and it would be nice to have all of the string<->enum stuff be more data-driven throughout the codebase. * Test cases were added for (almost) all of the error conditions in the current options validation. The main one that is missing is around specifying an `-entry` option before any source file when compiling multiple files. This is because the test runner is putting the source file name first on the command line automatically, so we can't reproduce that case. * Several reflection-related tests now reflect entry points where they didn't before, because the logic for detecting when to infer a default `main` entry point have been made more loose * On the dxc path, beefed up the handling of mapping from Slang `Profile`s to the coresponding string to use when invoking dxc. * A bunch of tests cases were in violation of the newly imposed rules, so those needed to be cleaned up. * There were also a bunch of test cases that had accidentally gotten "disabled" at some point because there were comparing output from `slangc` both with and without a `-pass-through` option, but that meant that any errors in command-line parsing produced the same error output in both the Slang and pass-through cases. This change updates `slang-test` to always expect a successful run for these tests, and then manually updates or disables the various test cases that are affected. * When merging the updated test for matrix layout mode, I found that the new command-line logic was failing to propagate a matrix layout mode passed to `render-test` into the compiler. This was because the `-matrix-layout` options were implemented as per-target, but the target was being set by API while the option came in via command line (passed through the API). It seems like we want matrix layout mode to be a global option anyway (rather than per-target), so I made that change here. Add missing expected output files * A 64-bit fix * Remove commented-out code noted in review	29 October 2018, 21:44:39 UTC
56c9de0	jsmall-nvidia	26 October 2018, 21:27:45 UTC	Premake improvements (#696) * Make CacheFileSystem dtor virtual. * Fixing problems around build.linux and windows intermediate files being placed in obj.	26 October 2018, 21:27:45 UTC
36742ae	Tim Foley	26 October 2018, 17:58:08 UTC	Work around dxc matrix layout behavior (#694) The Slang compiler allows the default matrix layout convention (row-major vs. column-major) to be specified via the command line or API. When generating output HLSL, Slang emits a `#pragma pack_matrix` directive for the chosen default convention, so that a user can generate plain HLSL output and still have it encode their desired defaults. The problem that has arisen is that many released versions of dxc (including those in the most recent Windows SDK at this time) ignore the `#pragma pack_matrix` directive (the feature has since been added to top-of-tree dxc). The main fix here is to instead pass the `-Zpr` option in to dxc when invoking it if the row-major (non-default) convention is requested. This will solve the problem for clients that use Slang to generate DXIL, but not for clients who use Slang to generate plain HLSL that they then pass into dxc (those clients are assumed to be able to work around the problem for themselves). In order to test the change, I added a test that fills a constant buffer with sequential integers, and then reads out the rows/columns of an `int3x4` matrix with both row- and column-major layout, as well as an integer placed after the matrix, so we can see the offset it was given. The `render-test` application did not yet support generating code via dxc/DXIL, so I added an option for that. This ends up assuming that anybody who is running the D3D12 tests will also have a version of dxc available.	26 October 2018, 17:58:08 UTC
cb9d679	jsmall-nvidia	26 October 2018, 12:16:54 UTC	Feature/file system cache (#692) * First pass at caching file system. * default-file-system -> slang-file-system fix problem with location("build.linux") confusing windows build for now. * Added CompressedResult Fix problem in Result construction with it being unsigned * Add support for Path simplification. * Testing for Path::Simplify. * Refactored CacheFileSystem - automatically handles ISlangFileSystem or ISlangFileSystemExt appropriately. Removed WrapFileSystem - because wasn't possible to emulate some of the behavior if just loadFile is implemented. Split out StringBlob - so that no need to convert between ISlangBlob and String repeatidly. * Remove unwanted code in ~CompileRequest	26 October 2018, 12:16:54 UTC
ad47fe7	Tim Foley	25 October 2018, 19:49:36 UTC	Fix Vulkan codegen for image atomics (#690) The basic problem was that the front-end was generating code that used a `uint` vector for the coordinates, while GLSL requires an `int` vector. Without support for implicit type conversions, this leads to GLSL compilation failure. The fix here is to insert the type conversion as late as possible (during GLSL emit). This isn't a pretty solution, but it is the easiest one to implement in the current compiler. A more forward-looking approach would be to support "force inline" functions in the stdlib, so that we can implement the conversion logic in a stdlib implementation specialized for the Vulkan/GLSL target. At the moment, everything to do with image atomics is all sleight of hand anyway, so making it incrementally messier isn't a bit hit.	25 October 2018, 19:49:36 UTC
4f0415e	jsmall-nvidia	25 October 2018, 15:24:16 UTC	Feature/premake linux (#689) * Premake work in progress for linux. * Added dump function. * Remove examples on linux Small warning fix. * * Don't build render-test on linux * Removed work around virtual destructor warning, and just used virtual dtor for simplicity * Git ignore obj directories * Fix premake working on windows. * * Fix sprintf_s functions * Make generates arg parsing more robust * Added FloatIntUnion to avoid type punning/strong aliasing issues, and repeated union definitions. * Work around problems building on linux with getClass claiming a strict aliasing issue. * Fix for targetBlock appearing potentiall used unintialized to gcc. * Linux slang link options -fPIC to make dll. * Add -fPIC to build options on linux. * Add -ldl for linux on slang. * Fixes to try and get premake working with .so on linux. * Make core compile with -fPIC * Try to fix linux linking with --no-as-needed before -ldl * Add rpath back. * Remove render-gl from linux build. * Re-add location for linux. * Don't include <malloc.h> except on windows. * Remove unused line to fix warning on osx. * Remove ambiguity on OSX for operator <<. * Fixing ambiguity with operator overloading and Int types for OSX. * Fix ambiguity around UInt and operator * Fix ambiguity of UInt conversion for OSX. * Added UnambiguousInt and UnambiguousUInt to make it easier to work around OSX integer coercion for UInt/Int types.	25 October 2018, 15:24:16 UTC
2700a89	jsmall-nvidia	23 October 2018, 00:48:59 UTC	Fix problem with __import not working (#688) * Added getPathType to ISlangFileSystemExt. This is needed so that when searching for a file it's existance can be tested without loading the file. On some platforms a getCanonicalPath can do this - but depending on how getCanonicalPath is implemented, it may not do. This test is made after the relative path is produced before finding the canonical path. * Test for importing along search path. * Added comment to explain the issue around WrapFileSystem impl of getPathType. * Make search path use / not \	23 October 2018, 00:48:59 UTC
cda9c3b	Matt Pharr	22 October 2018, 14:45:09 UTC	Osx build fixes (#681) * Remove 'register' qualifiers. These will be illegal come c++17 and give a warning on OSX. * Add UNREACHABLE_RETURNs to silence compiler warnings. * Make FileStream::GetPosition() compile on OSX (w.r.t. the linux build, I believe that strictly-speaking, fpos64_t is specified as an opaque type and the cast to an Int64 is not necessarily well-defined.) * Avoid an inadvertent trigraph.	22 October 2018, 14:45:09 UTC
53731f6	Tim Foley	19 October 2018, 20:53:18 UTC	Vulkan implicit sampler fixups (#686) * Fix sampler-less texture functions (#685) * Fix sampler-less texeture functions I'm honestly not sure how the original work on this feature in #648 worked at all (probably insufficient testing). We have these front-end modifiers to indicate that a particular function definition requires a certain GLSL version, or a GLSL extension in order to be used, and they are supposed to be automatically employed by the logic in `emit.cpp` to output `#extension` lines in the output GLSL. However, it turns out that nothing is actually wired up right now, so that adding the modifiers to a declaration is a placebo. This change propagates the modifiers through as decorations, and then uses them during GLSL code emit, which allows the functions that require `EXT_samplerless_texture_functions` to work. * fixup: 32-bit warning * Add serialization support for GLSL extension/version decorations	19 October 2018, 20:53:18 UTC
3a5214b	Tim Foley	18 October 2018, 16:30:38 UTC	Add support for static methods in interfaces (#680) This change allows an interface to include `static` methods as requirements, so that types that conform to the interface will need to satisfy the requirement with a `static` method. The essence of the check is simple: when checking that a method satisfies a requirement, we enforce that both are `static` or both are non-`static`. Making that simple change and adding a test change broke a few other places in the compiler that this change tries to fix. The main fix is to handle cases where we might look up an "effectively static" member of a type through an instance, and to make sure that we replace the instance-based lookup with type-based lookup. There was already logic along these lines in `lower-to-ir.cpp`, so this change centralizes it in `check.cpp` where it seems to logically belong.	18 October 2018, 16:30:38 UTC
f9710d5	jsmall-nvidia	17 October 2018, 15:57:33 UTC	IncludeFileSystem -> DefaultFileSystem (#677) Improvements in 'singleton'ness of DefaultFileSystem Made WrapFileSystem a stand alone type - to remove 'odd' aspects of deriving from DefaultFileSystem (such as inheriting getSingleton method/fixing ref counting) Simplified CompileRequest::loadFile - becauce fileSystemExt is always available.	17 October 2018, 15:57:33 UTC
3e74d39	jsmall-nvidia	16 October 2018, 22:49:11 UTC	Feature/include refactor (#675) * Refactor of path handling. * Added PathInfo * Changed ISlangFileSystem - such that has separate concepts of reading a file, getting a relative path and getting a canonical path * Added support for getting a canonical path for windows/linux * Made maps/testing around canonicalPaths * User output remains around 'foundPath' - which is the same as before * Small improvements around PathInfo * Added a type and make constructors to make clear the different 'path' uses * Fixed bug in findViewRecursively * Checking and reporting for ignored #pragma once. * Removed SLANG_PATH_TYPE_NONE as doesn't serve any useful purpose. * Improve comments in slang.h aroung ISlangFileSystem * Remove the need for <windows.h> in slang-io.cpp * Ran premake5. * Improvements and fixes around PathInfo. * Fix typo on linix GetCanonical * Make the ISlangFileSystem the same as before, and ISlangFileSystem contain the new methods. Internally it always uses the ISlangFileSystemExt, and will wrap a ISlangFileSystem with WrapFileSystem, if it is determined (via queryInterface) that it doesn't implement the full interface.	16 October 2018, 22:49:11 UTC
204fb3c	Tim Foley	16 October 2018, 21:07:24 UTC	Don't emit extra parentheses when outputting HLSL/GLSL (#674) When emitting high-level expressions we obviously need to avoid emitting expressions that mean something different than the original code. So if we had IR like: ``` %t0 = add %a %b %t1 = mul %t0 c ``` and `t0` isn't used anywhere else in the code, we can emit HLSL like: ```hlsl float t1 = (a + b) * c; ``` but not: ```hlsl float t1 = a + b c; // oops! precedence changed the meaning! ``` The existing strategy in Slang has been to be overly conservative about when it emits parentheses to guarantee precedence is respected, so you might get something like: ```hlsl float t1 = ((a) + (b)) (c); ``` which not only looks silly, but also leads to unhelpful diagnostics from the clang-based dxc, about all those extra parentheses being redundant. This change follows the pattern of (and actually reuses some code from) the old AST emit logic from earlier in the compiler's life (before the IR was introduced). The basic idea is that when emitting an expression we track the precedence of the expressions to its left and right, and use those to decide if parentheses are needed, and then to compute equivalent precedences to apply for sub-expressions. Applied correcty, this approach lets us emit minimal "unparsed" expressions. Applied incorrectly it could lead to subtle errors where the code Slang outputs doesn't faithfully represent the input. Here's hoping we get this right.	16 October 2018, 21:07:24 UTC
5b306c9	jsmall-nvidia	15 October 2018, 17:46:58 UTC	Feature/source loc improvements (#673) * Fix comment to better explain usage. * For getting the type string use a temporary SourceManager.	15 October 2018, 17:46:58 UTC
a7a87ce	jsmall-nvidia	13 October 2018, 02:02:09 UTC	Feature/source loc review (#672) * Fixes/improvements based around review comments. * SourceUnit -> SourceView * * Removed the HumaneSourceLoc as it's POD-like ness seemed to make that unnecessary * Made exposed member variables in SourceManager protected - so make clear where/how can be accesed * Improved description about SourceLoc and associated structures * Changed SourceLocType to 'Actual' and 'Nominal'. * Improved a comment.	13 October 2018, 02:02:09 UTC
c9ad368	Tim Foley	12 October 2018, 19:23:14 UTC	Add a warning on missing return, and initial SCCP pass (#671) * Add a warning on missing return, and initial SCCP pass The user-visible feature added here is a diagnostic for functions with non-`void` return type where control flow might fall off the end. This sounds like a trivial diagnostic to add as part of the front-end AST checking, but that can run afoul of really basic stuff like: ```hlsl int thisFunctionisOkay(int a) { while(true) { if(a > 10) return a; a = a2 + 1; } // no return here! } ``` This function "obviously" doesn't need to have a `return` statement at the end there, but realizing this fact relies on the compiler to understand that the `while(true)` loop can't exit normally, and doesn't contain any `break` statement. One can write "obvious" examples that need more and more complex analysis to rule out. The answer Slang uses for stuff like this is to do the analysis at the IR level right after initial code generation (this would be before serialization, BTW, so that attached `IRHighLevelDeclDecoration`s can be used). When lowering the AST to the IR, we always emit a `missingReturn` instruction (a subtype of `IRUnreachable`) at the end of its body if it isn't already terminated. The IR analysis pass to detect missing `return` statements is then as simple as just walking through all the functions in the module and making sure they don't contain `missingReturn` instructions. For that simple pass to work, we first need to make some effort to remove dead blocks that control flow can never reach. This change adds a very basic initial implementation of Spare Conditional Constant Propagation (SCCP), which is a well-known SSA optimization that combines constant propagation over SSA form with dead code elimination over a CFG to achieve optimizations that are not possible with either optimization along. For the moment, we don't actually implement any constant folding* as part of the SCCP pass, so we can eliminate the dead block in a case like the function above (and those in the test case added in this change), but will not catch things like a `while(0 < 1)` loop. Handling more "obvious" cases like that is left for future work. * fixup: warning on unreachable code * Handle case where user of an inst isn't in same function/code The code as assuming any instruction in the SSA work list has to come from the function/code being processed, but this misses the case where an instruction in a generic has a use inside the function that the generic produces. This change adds code to guard against that case.	12 October 2018, 19:23:14 UTC
13c6b69	Tim Foley	11 October 2018, 18:17:40 UTC	Fix error when one constant is defined equal to another (#670) * Fix error when one constant is defined equal to another Fixes #666 When a user declares one constant (usually a `static const` variable) to be exactly equal to another by name: ```hlsl static const a = 999; static const b = a; ``` Then the IR-level representation of `b` is an `IRGlobalConstant` whose value expression is just a pointer to the definition of `a`. The logic in `emitIRGlobalConstantInitializer()` was trying to always call `emitIRInstExpr` to emit the value of the constant as an expression, but that function only handles complex/compound expressions and not the case of simple named values (e.g., constants like `a`). The intention is for code to call `emitIROperand()` instead, and let it decide whether to emit an expression or a named reference using its own decision-making. The `IRGlobalConstant` case really just wants to pass in the "mode" flag it uses to influence that decision-making, but shouldn't be working around it. This change just replaces the `emitIRInstExp()` call with `emitIROpernad()` and adds a test case to confirm that this fixes the reported problem. * Fixups for bugs in previous change The first problem was that certain instruction ops were being special-cased to opt out of "folding" into expressions before we make the universal check to always fold when inside an initializer for a global constant. The second problem is that the `emitIROperand()` logic was always putting expressions around sub-expressions, which breaks parsing when the sub-expression is an initializer list (`{...}`). This fixup is pretty much a hack, but will be something we can remove once we don't emit unncessary parentheses overall, which is a better fix.	11 October 2018, 18:17:40 UTC
dcd9f78	Tim Foley	11 October 2018, 16:20:10 UTC	Add basic support for [mutating] methods (#667) By default, when writing a "method" (aka "member function") in Slang, the `this` parameter is implicitly an `in` parameter. So this: ```hlsl struct Foo { int state; int getState() { return state; } void setState(int s) { state = s; } }; ``` is desugared into something like this: ```hlsl struct Foo { int state }; int Foo_getState(Foo this) { return this.state; } // BAD: void Foo_setState(Foo this, int s) { this.state = s; } ``` That "setter" doesn't really do what was intended. It modifies a local copy of type `Foo`, because `in` parameters in HLSL represent by-value copy-in semantics, and are mutable in the body the function. Slang was updated to give a static error on the original code to catch this kind of mistake (so that `this` parameters are unlike ordinary function parameters, and no longer mutable). Of course, sometimes users want a mutable `this` parameter. Rather than make a mutable `this` the default (there are arguments both for and against this), this change adds a new attribute `[mutating]` that can be put on a method (member function) to indicate that its `this` parameter should be an `in out` parameter: ```hlsl [mutating] void setState(int s) { state = s; } ``` The above will translate to, more or less: ```hlsl void Foo_setState(inout Foo this, int s) { this.state = s; } ``` One added detail is that `[mutating]` can also be used on interface requirements, with the same semantics. A `[mutating]` requirement can be satisfied with a `[mutating]` or non-`[mutating]` method, while a non-`[mutating]` requirement can't be satisfied with a `[mutating]` method (the call sites would not expect mutation to happen). The design of `[mutating]` here is heavily influenced by the equivalent `mutating` keyword in Swift. Notes on the implementation: * Adding the new attribute was straightforward using the existing support, but I had to change around where attributes get checked in the overall sequencing of static checks, because attributes were being checked after function bodies, but with this change I need to look at semantically-checked attributes to determine the mutability of `this` * The check to restrict it so that `[mutating]` methods cannot satisfy non-`[mutating]` requirements was easy to add, but it points out the fact that there is a huge TODO comment where the actual checking of method signatures is supposed to happen. That is a bug waiting to bite users and needs to be fixed! * While we had special-case logic to detect attempts to modify state accessed through an immutable `this` (e.g., `this.state = s`), that logic didn't trigger when the mutation happened through a function/operator call (e.g., `this.state += s`), so this change factors out the validation logic for that case and calls through to it from both the assignment and `out` argument cases. * The error message for the special-case check was updated to note that the user could apply `[mutating]` to their function declaration to get rid of the error. * The semantic checking logic for an explicit `this` expression was already walking up through the scopes (created during parsing) and looking for a scope that represents an outer type declaration that `this` might be referring to. We simply extend it to note when it passes through the scope for a function or similar declaration (`FunctionDeclBase`) and check for the `[mutating]` attribute. If the attribute is seen, it returns a mutable `this` expression, and otherwise leaves it immutable. * The IR lowering logic then needed to be updated so that when adding an IR-level parameter to represent `this`, it gives it the appropriate "direction" based on the attributes of the function declaration being lowered. The rest of the IR logic works as-is, because it will treat `this` just like an other parameter (whether it is `in` or `inout`). * This biggest chunk of work was the "implicit `this`" case, because ordinary name lookup may resolve an expression like `state` into `this.state`, so that the `this` expression comes out of "thin air." To handle this case, I extended the structure of the "breadcrumbs" that come along with a lookup result (the breadcrumbs are used for any case where a single identifier like `state` needs to be embellished to a more complex expression as a result of lookup), so that it can identify whether a `Breadcrumb::Kind::This` node comes from a `[mutating]` context or not. Similar to the logic for an explicit `this`, we handle this by noting when we pass through a `FunctionDeclBase` when moving up through scopes, and look for the `[mutating]` attribute on it. The rest of the work was just plumbing the additional state through.	11 October 2018, 16:20:10 UTC
879ec1b	jsmall-nvidia	10 October 2018, 17:56:25 UTC	Feature/source loc refactor (#668) * * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Initial look at at variable byte encoding, and simple unit test. * Fixing problems with comparison due to naming differences with slang/fxc. * * More tests and perf improvements for byte encoding. * Mechanism to detect processor and processor features in main slang header. * Split out cpu based defines into slang-cpu-defines.h so do not polute slang.h * Support for variable byte encoding on serialization. * Removed unused flag. * Fix warning. * Fix calcMsByte32 for 0 values without using intrinsic. * Fix a mistake in calculating maximum instruction size. * Introduced the idea of SourceUnit. * Small improvements around naming. Add more functionality - including getting the HumaneLoc. * Add support for #line default * Compiling with new SourceLoc handling. * Fix off by one on #line directives. * Can use 32bits for SourceLoc. Fix serialize to use that. * Small fixes and comment on usage. * Premake run. * Fix signed warning. * Fix typo on StringSlicePool::has found in review.	10 October 2018, 17:56:25 UTC
60a91d6	jsmall-nvidia	09 October 2018, 19:09:32 UTC	Added -serial-ir command line option (#664) * Added -serial-ir option, to make generateIR always serialize in and out before further processing. Testing out serialization, and adding a kind of 'firewall' between compiler front end and backend. * Reduce peak memory usage, by discarding IR when stored in serialized form. Typo fix.	09 October 2018, 19:09:32 UTC
7ea9ff0	jsmall-nvidia	09 October 2018, 15:51:41 UTC	Feature/var byte encoding (#665) * * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Initial look at at variable byte encoding, and simple unit test. * Fixing problems with comparison due to naming differences with slang/fxc. * * More tests and perf improvements for byte encoding. * Mechanism to detect processor and processor features in main slang header. * Split out cpu based defines into slang-cpu-defines.h so do not polute slang.h * Support for variable byte encoding on serialization. * Removed unused flag. * Fix warning. * Fix calcMsByte32 for 0 values without using intrinsic. * Fix a mistake in calculating maximum instruction size.	09 October 2018, 15:51:41 UTC
4cb2a19	Tim Foley	04 October 2018, 23:05:23 UTC	Support cross-compilation of ray tracing shaders to Vulkan (#663) * Move to newer glslang * Support cross-compilation of ray tracing shaders to Vulkan This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay"). * The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized. * "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled. * The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed. The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations. In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial. The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses. The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL. GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable. That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function. The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types). Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic. For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a definition marked with the `__specialized_for_target(glsl)` modifier. The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin). Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output. There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works. The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global. We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable. We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below. We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types). This ends up removing some cases in the AST and IR type representations. By itself this change would break HLSL emit, because in that case the types really are intrinsic. We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types. The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters. This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline. In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters. We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output). The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now). To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout. A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection. The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader. While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case. Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders. The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation. This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient. Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`. The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage. All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline. The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process. This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports both DXR and VKRay. A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way: The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while. It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in all cases (including for HLSL output) rather than only on GLSL targets. The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic). The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions. A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic). The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization and information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code. This needs more work in a follow-on fix, but for now target specialization should again be working. The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding. The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels. This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s. The tricky bit to making that all work is that the same AST-level generic type parameter will then map to different IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function. The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts." This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file. Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals. The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its own `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the functions `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`. An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers. These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects). The lowering strategy here translates any `static` variable with an initializer into two globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet. There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes. We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level. Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point. The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage. Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass. * In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures. * A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code. * Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h` * Added a `$z` suffix to the GLSL translation of `Texture.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now. * The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout. * In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete. * Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important. * Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc. * Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460. * Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing. * Fixup: typos	04 October 2018, 23:05:23 UTC
84a4847	jsmall-nvidia	04 October 2018, 14:25:36 UTC	Remove need of IRHightLevelDecoration in emit (#660) * * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Fixing problems with comparison due to naming differences with slang/fxc.	04 October 2018, 14:25:36 UTC
eaafafe	Tim Foley	03 October 2018, 23:03:37 UTC	Update DXR API definitions for final spec. (#659) * Update DXR API definitions for final spec. The final version of the DXR API has changed the result type of the `DispatchRaysIndex()` and `DispatchRaysDimensions()` builtins to `uint3` (from `uint2`). * Add updates for DXR object<->world transformations The `ObjectToWorld()` and `WorldToObject()` functions were renamed to `ObjectToWorld3x4()` and `WorldToObject3x4()`, resepctively, and then new functions `ObjectToWorld4x3()` and `WorldToObject4x3()` were added to give convenient access to the transpose of these matrices. (No, I'm not clear on why user's couldn't just call `transpose()`, either) I've left the old function names in the standard library as forwarding functions just so that we don't break existing DXR code that relied on the old names.	03 October 2018, 23:03:37 UTC
cde0dec	jsmall-nvidia	02 October 2018, 21:22:15 UTC	Feature/ir serial debug (#657) * * Change the layout of IROp such that 'main' IROps are 0-x. * Removed MANUAL_RANGE instuction types, as no longer needed. * Work in prog on optimizing. * * Constant time lookup for IROpInfo * Refactor and document a little more the IROp layout * Mark ops that use 'other' bits * Fix typo in definition of kIROpFlag_UseOther * First pass at working out serialization structure. * Work in progress on ir-serialize * Storing strings in IRSerialInfo Split out IRSerialInfo from the IRSerializer - to make more explicit what is actually saved. * First pass at serializing out data. * First pass at serialize reading. * Fix riff fourcc mark order. * First pass at reconstructing IRInst / IRDecoration from serialized data. * Handling of TextureBaseType * Deserializing of constants. * Small changes around ir serialization. * Changed StringIndex indexing to not be an offset into the m_strings array, but an index into strings in order. Doing so makes cache lookup much faster, and makes the 'indicies' themselves smaller and therefore more compressible. * Removed the need for m_arena in IRSerialWriter. Previously it's purpose was to store the string contents that were being used to lookup UnownedStringSlice. Now we keep the StringRepresentation in scope and reference that, and so don't need the copy. * Don't need to construct the IRModuleInst as is created and set on createModule call. * Remove test code for testing serialization. * Fix problem with release build in ir-serialize causing warning. * Use SLANG_OFFSET_OF for offsets in non pod classes to avoid gcc/clang warning. Give storage to integral static variables to avoid linkage problems with gcc/clang. * Fix warnings under x86 win32 debug. * Small improvements around IR serialization. * * Support for serializing SourceLoc. * Small improvements around serialization. * RawSourceLoc allows for regular SourceLoc information to be held (and serialized) as is. This is only really useful for the 'passthru' mode as there needs to be a more compact mechanism to encode source locations. * Small fixes around comments for SourceLoc serializing.	02 October 2018, 21:22:15 UTC
852ea40	Yong He	28 September 2018, 17:17:20 UTC	Update README.md	28 September 2018, 17:17:20 UTC
648fc9b	jsmall-nvidia	28 September 2018, 13:39:08 UTC	Feature/ir serialize improvements (#655) * * Change the layout of IROp such that 'main' IROps are 0-x. * Removed MANUAL_RANGE instuction types, as no longer needed. * Work in prog on optimizing. * * Constant time lookup for IROpInfo * Refactor and document a little more the IROp layout * Mark ops that use 'other' bits * Fix typo in definition of kIROpFlag_UseOther * First pass at working out serialization structure. * Work in progress on ir-serialize * Storing strings in IRSerialInfo Split out IRSerialInfo from the IRSerializer - to make more explicit what is actually saved. * First pass at serializing out data. * First pass at serialize reading. * Fix riff fourcc mark order. * First pass at reconstructing IRInst / IRDecoration from serialized data. * Handling of TextureBaseType * Deserializing of constants. * Small changes around ir serialization. * Changed StringIndex indexing to not be an offset into the m_strings array, but an index into strings in order. Doing so makes cache lookup much faster, and makes the 'indicies' themselves smaller and therefore more compressible. * Removed the need for m_arena in IRSerialWriter. Previously it's purpose was to store the string contents that were being used to lookup UnownedStringSlice. Now we keep the StringRepresentation in scope and reference that, and so don't need the copy. * Don't need to construct the IRModuleInst as is created and set on createModule call. * Remove test code for testing serialization. * Fix problem with release build in ir-serialize causing warning. * Use SLANG_OFFSET_OF for offsets in non pod classes to avoid gcc/clang warning. Give storage to integral static variables to avoid linkage problems with gcc/clang. * Fix warnings under x86 win32 debug. * Small improvements around IR serialization.	28 September 2018, 13:39:08 UTC
d06bd7d	jsmall-nvidia	27 September 2018, 15:36:35 UTC	First pass implementation of IR serialization (#653) * * Change the layout of IROp such that 'main' IROps are 0-x. * Removed MANUAL_RANGE instuction types, as no longer needed. * Work in prog on optimizing. * * Constant time lookup for IROpInfo * Refactor and document a little more the IROp layout * Mark ops that use 'other' bits * Fix typo in definition of kIROpFlag_UseOther * First pass at working out serialization structure. * Work in progress on ir-serialize * Storing strings in IRSerialInfo Split out IRSerialInfo from the IRSerializer - to make more explicit what is actually saved. * First pass at serializing out data. * First pass at serialize reading. * Fix riff fourcc mark order. * First pass at reconstructing IRInst / IRDecoration from serialized data. * Handling of TextureBaseType * Deserializing of constants. * Small changes around ir serialization. * Changed StringIndex indexing to not be an offset into the m_strings array, but an index into strings in order. Doing so makes cache lookup much faster, and makes the 'indicies' themselves smaller and therefore more compressible. * Removed the need for m_arena in IRSerialWriter. Previously it's purpose was to store the string contents that were being used to lookup UnownedStringSlice. Now we keep the StringRepresentation in scope and reference that, and so don't need the copy. * Don't need to construct the IRModuleInst as is created and set on createModule call. * Remove test code for testing serialization. * Fix problem with release build in ir-serialize causing warning. * Use SLANG_OFFSET_OF for offsets in non pod classes to avoid gcc/clang warning. Give storage to integral static variables to avoid linkage problems with gcc/clang. * Fix warnings under x86 win32 debug.	27 September 2018, 15:36:35 UTC
ee54994	jsmall-nvidia	25 September 2018, 21:59:16 UTC	Improve IROp lookup (#650) * * Change the layout of IROp such that 'main' IROps are 0-x. * Removed MANUAL_RANGE instuction types, as no longer needed. * Work in prog on optimizing. * * Constant time lookup for IROpInfo * Refactor and document a little more the IROp layout * Mark ops that use 'other' bits * Fix typo in definition of kIROpFlag_UseOther	25 September 2018, 21:59:16 UTC
4f979d7	Tim Foley	25 September 2018, 02:17:12 UTC	Fixes around atomic operations (#652) * Fixes around atomic operations Work on #651 The existing handling of atomic operations had a few issues: * The HLSL atomic functions (`Interlocked`) didn't have mappings to GLSL Atomic operations on images weren't supported at all because the subscript operation on `RWTexture` types didn't provide a `ref` acessor The HLSL atomic functions were only providing the overloads that return the previous value through an `out` parameter, and not the ones that ignore the previous value. This change fixes these issues with the following changes: * `RWTexture` types now have a `ref` accessor on their subscript operation which maps to a new `imageSubscript` operation in the IR. By default this translates back to `tex[idx]` in output HLSL, but it makes a custom mapping possible for GLSL The `Interlocked` function definitions were expanded to include the overloads without the `out` parameter GLSL translations were added for the `Interlocked` functions. These mappings use some new customization points in the intrinsic operation emit logic to support outputting calls to either `atomic` or `imageAtomic` as required, and to expand an argument that is a subscript into an image as multiple arguments. This whole approach is quite hacky, and it doesn't seem like the approach we should take in the long run. Fix: typo in InterlockedAnd lowering One of the cases of `InterlockedAnd` was lowering to `atomicAnd` with a `$0` where we wanted the `$A` substitution to handle the possibility of an image.	25 September 2018, 02:17:12 UTC
32c8479	jsmall-nvidia	24 September 2018, 21:20:32 UTC	Remap IROp value ranges * Change the layout of IROp such that 'main' IROps are 0-x. (#649) * Removed MANUAL_RANGE instuction types, as no longer needed.	24 September 2018, 21:20:32 UTC
7250ed1	Tim Foley	21 September 2018, 18:12:23 UTC	Remove the "hack sampler" workaround (#648) * Update glslang version * Fix build for new glslang The latest glslang required a few changes to our manual build for their code (because we are not taking a dependency on CMake). * Rebuild project files using premake, which picks up a few files added to glslang, but also a few diffs in Slang's own project files in cases where they were edited manually instead of using premake. * Fix up the declaration our our device limits (which are inentionally set to not limit what code passes through our glslang), because the underlying structure definition in glslang has changed. This is a kludgy bit of glslang's design, but it doesn't make sense for us to invest in a more serious workaround. * Remove the "hack sampler" workaround When the `GL_KHR_vulkan_glsl` spec was introduced to allow GLSL to be compiled for Vulkan SPIR-V, it made an annoying mistake by leaving a few builtins as taking `sampler2D`, etc. when the equivalent SPIR-V operations only require a `texture2D`, etc. The relevant builtins are: * `textureSize` * `textureQueryLevels` * `textureSamples` * `texelFetch` * `texelFetchOffset` This means that shader code that wanted to use those operations needed to conspire to have a `sampler` handy so they could write, e.g.: ```glsl vec4 val = texelFetch(sampler2D(myTexture, someRandomSampler), p, lod); ``` when what they really wanted was this: ```glsl vec4 val = texelFetch(myTexture, p, lod); ``` That is annoying but probably something each to work around for a GLSL programmer, but when cross-compiling from HLSL, you might have an operation like: ```hlsl float4 val = myTexure.Load(p); ``` in which case a cross-compiler needs to manufacture a sampler out of thin air. If the shader happened to use a sampler for something else you could snag that, but in the worse case you had to cross-compile to GLSL that declared a new sampler. Slang did this by declaring a sampler called `SLANG_hack_samplerForTexelFetch` (because `texelFetch` is the operation that first surfaced the issue). For complex reasons we always define this sampler, even if we turn out not to need it in a particular output kernel. This choice has a bunch of annoying consequences: * There is always a sampler defined in descriptor set zero, because that's where we put the hack sampler, so a user-defined parameter block always has a set number of 1 or greater (see #646). * The hack sampler shows up in reflection output because users need to size their descriptor sets appropriately to pass along this sampler that won't actually be used if they don't want to get debug spew from the validation layers. We filed an issue on glslang about this problem, and eventually some kind folks from the gamedev community (who also saw the same problem) defined an extension spec (`GL_EXT_samplerless_texture_functions`) to fix the underlying issue and contributed a patch to glslang to make it support that extension. This change just backs the hack out of Slang now that we have a glslang version that supports the extension to get past the defect in the original GLSL-for-Vulkan definition. Besides yanking out the code for the hack, we also change the relevant builtins to declare that they require this new GLSL extension (so that we properly request it from glslang when the builtins are used), and fix some reflection test cases that exposed the existence of the "hack sampler." * Fixup: syntax error in stdlib generator files * Remove more code for hack sampler There was logic to ensure we always have a "default" register space/set when cross-compiling, because the hack sampler would need it. This is no longer necessary once we remove the hack sampler. * Fix expected test output. Fixing the root cause of issue #646 means that one of our test cases that tickles that issue now produces different output (luckily it can now be used as a regression test for the issue).	21 September 2018, 18:12:23 UTC
738bcb8	Tim Foley	20 September 2018, 15:14:25 UTC	Improve support for non-32-bit types. (#643) The main change here is to fill out the `BaseType` enumeration so that it covers the full range of 8/16/32/64-bit signed and unsigned integers, as well as 16/32/64-bit floating-point numbers, and then propagate that completion through various places in the code. More details: * The current `half`, `float`, `double`, `int`, and `uint` types are still the default names for their types, so things like `float16_t` and `int32_t` were added as `typedef`s. * We still need to generate the full gamut of vector/matrix `typedef`s for the new types, so that things like `float16_t4x3` will work (yes, I know that is ugly as sin, but that's the HLSL syntax...). * A few pieces of dead code from earlier in the compiler's life got removed, since I did a find-in-files for `BaseType::` and tried to either update or delete every site. * A few call sites that were enumerating integer base types in an ad-hoc fashion were changed to use a single `isIntegerBaseType()` function that I added in `check.cpp` * When compiling with dxc for shader model 6.2 and up, we enable the compiler's support for native 16-bit types via a flag. * The public API enumeration for reflection of scalar types added cases for 8- and 16-bit integers (it already exposed the other cases we need) * The lexer was updated to be extremely liberal in what kinds of suffixes it allows on literals. I also removed the logic that was treating, e.g., `0f` as a floating-point literal (it doesn't seem to be the right behavior). That would now be an integer literal with an invalid suffix. * The logic in the parser that applies types to literals was updated to handle a few more cases: `LL` and `ULL` for 64-bit integers, and `H` for 16-bit floats. * The mangling logic needed to be updated to handle the new cases, and I consolidated the handling of those types in their front-end and IR forms. * Removed the explicit `BasicExpressionType::ToString` logic, since all basic types are `DeclRefType`s in the front end, and we can just print them out as such. * As a bit of a gross hack, fudged the conversion costs so that `int` to `int64_t` conversion is a bit more costly. The problem there is that given an operation like `int(0) + uint(0)`, the best applicable candidates ended up being `+(uint,uint)` and `+(int64_t,int64_t)` because the cost of a single `int`-to-`uint` conversion was the same as the sum of the cost of an `int`-to-`int64_t` and a `uint`-to-`int64_t`. A better long-term fix here is to completely change our overload resolution strategy, but that is obviously way too big to squeeze into this change. * Type layout computation was updated to handle all the new types and give them their natural size/alignment. Note that this does not work for down-level HLSL where `half` is treated as a synonym for `float`. It also doesn't deal with the fact that many of these types aren't actually allowed in constant buffers for certain shader models. A future change should work to add error messages for unsupported stuff during type layout (or just make the types themselves require support for certain capabilities)	20 September 2018, 15:14:25 UTC
653fe97	jsmall-nvidia	19 September 2018, 19:27:50 UTC	Support for IRStringLit (#645) * * Added support for strings in IR with IRStringLit - with storage of chars after it * Added kIRDecorationOp_Transitory - can be used for detecting instructions constructed on stack * Made IRConstant hashing work off type * Fix comment that is out of date about how an instruction is determines to hold a transitory string.	19 September 2018, 19:27:50 UTC
a37b353	Tim Foley	19 September 2018, 15:32:28 UTC	Warn when undefined identifier used in preprocessor conditional (#642) This can mask an error when the user either typos a macro name when writing a conditional, or (as was the case for the user who pointed out this issue) they mistakenly assume that a `#define` in an `import`ed file has been made visible to them. This change just adds the warning in the obvious place, with a test code to ensure it triggers.	19 September 2018, 15:32:28 UTC
091f89a	jsmall-nvidia	17 September 2018, 16:01:52 UTC	Remove IRDeclRef as recommended in comments for PR #635 as no longer used. (#638)	17 September 2018, 16:01:52 UTC
8a150a9	jsmall-nvidia	17 September 2018, 14:28:11 UTC	Control unit tests being run with -category -exclude and using prefix. (#637) Unit tests appear in unit-test category Unit tests 'appear' in a directory unit-tests Removed the -unitTests option	17 September 2018, 14:28:11 UTC
24ad492	jsmall-nvidia	17 September 2018, 13:18:57 UTC	Hotfix/fixing warnings (#636) * * Remove dispose from IRInst * Use MemoryArena instead of MemoryPool * Make all IRInst not require Dtor - by having ref counted array store ptrs that need freeing * Increase block size - typically compilation is 2Mb of IR space(!) * Fix issues around StringRepresentation::equal because null has special meaning. * Don't bother to construct as String to compare StringRepresentation, just used UnownedStringSlice. * Added fromLiteral support to UnownedStringSlice and use instead of strlen version. * Use more conventional way to test StringRepresentation against a String. * Fix gcc/clang template problem with cast. * Fix warnings.	17 September 2018, 13:18:57 UTC
3c505c2	jsmall-nvidia	14 September 2018, 18:16:28 UTC	Improvements around IR representation and memory usage (#635) * * Remove dispose from IRInst * Use MemoryArena instead of MemoryPool * Make all IRInst not require Dtor - by having ref counted array store ptrs that need freeing * Increase block size - typically compilation is 2Mb of IR space(!) * Fix issues around StringRepresentation::equal because null has special meaning. * Don't bother to construct as String to compare StringRepresentation, just used UnownedStringSlice. * Added fromLiteral support to UnownedStringSlice and use instead of strlen version. * Use more conventional way to test StringRepresentation against a String. * Fix gcc/clang template problem with cast.	14 September 2018, 18:16:28 UTC
e1c9349	Tim Foley	13 September 2018, 21:32:13 UTC	Add a better error message for common global generic failure (#634) A common mistake that seems to come up when using global generic type parameters: ```hlsl interface IHero { ... } type_param H : IHero; ParameterBlock<H> gHero; ``` is to accidentally try to specialize the type parameter `H` using `H` itself as the argument (instead of some concrete type like `Batman`). The current front-end checks naively let this pass, because `H` satisfies all the requirements (it sure does declare that it implements `IHero`, which is the only requirement we have). This currently leads to downstream failure when we generate code with generic type parameters still left in the IR. This change implements a simple fix which is to: - Check when we are trying to specialize a global generic parameter using another global generic parameter, since this is currently always a mistake - Add a special-case diagnostic for the 99% case of this failure, which is specializing a type parameter to itself This fix is primarily motivated by the way generics support will initially be implemented in Falcor.	13 September 2018, 21:32:13 UTC
929745d	jsmall-nvidia	13 September 2018, 18:02:33 UTC	Feature/memory arena improvements (#633) * First pass at MemoryArena. * First pass at RandomGenerator. * Extract TestContext into external source file. * Fix warning on printf. * Use enum classes for Test enums. OutputMode -> TestOutputMode. * First pass at FreeList unit test. * Auto registering tests. Improvements to RandomGenerator. * Remove the need for unitTest headers - cos can use registering. * Added unitTest for MemoryArena. * Do unit tests. * Fix typo. * Fix problem limiting errors from TestContext. * Refactor of MemoryArena * Removed the ability to rewind (to improve memory usage/simplify) * Better memory usage - around oversized blocks + Will keep allocating from a normal block if more than 1/3 memory left, or an oversided block is allocated * Better unitTest coverage for MemoryArena. * Fixes based on code review * Remove e prefix from enum class types for TestContext * Added extra checking for allocations sizes * Fixed some typos * Added std::is_pod test to allocateAndCopyArray * Add include for is_pod needed for linux build.	13 September 2018, 18:02:33 UTC
f60135c	jsmall-nvidia	12 September 2018, 20:27:42 UTC	Feature/memory arena (#631) * First pass at MemoryArena. * First pass at RandomGenerator. * Extract TestContext into external source file. * Fix warning on printf. * Use enum classes for Test enums. OutputMode -> TestOutputMode. * First pass at FreeList unit test. * Auto registering tests. Improvements to RandomGenerator. * Remove the need for unitTest headers - cos can use registering. * Added unitTest for MemoryArena. * Do unit tests. * Fix typo.	12 September 2018, 20:27:42 UTC
9a97330	Tim Foley	27 August 2018, 19:33:35 UTC	Add basic support for #pragma once (#630) * Improve diagnostic messages for function redefinition The front-end was using internal "not implemented" errors instead of friendly user-facing errors to handle: * Redefinition of a function (same signature and both have bodies) * Multiple function declarations/definitions with the same parameter signature, but differnet return types This change simply turns both of these into reasonably friendly errors that explain what went wrong and point to the previous definition/declaration as appropriate. * Add support for detecting #pragma directives and handling them The logic here mirrors what was set up for preprocessor directives, just for "sub-directives" in this case. The only case here is the default one, which now reports a warning for directives we don't understand. * Add basic support for #pragma once Fixes #494 The approach here is simplistic in the extreme. When we see a `#pragma once` directive, we put the current file path (the location of the `#pragma` directive, as reported by our source manager) into a set, and then any paths in that set are ignored by subsequent `#include` directives. This should work for simple cases of `#pragma once`, but it is likely to fail in a variety of cases because our filesystem layer currently makes no attempt to normalize/canonicalize paths. Improving the robustness of the solution is left to future work. This change includes a simple test case to confirm that a second `#include` of a file with a `#pragma once` is successfully ignored.	27 August 2018, 19:33:35 UTC
6a8ad6e	jsmall-nvidia	22 August 2018, 15:36:02 UTC	Support for [[vk::push_constant]] (#629) * Support for attributed [[vk::push_constant]] and [[push_constant]]. Can also use layout(push_constant). * Fix test so matches the expected output. * Add expected output to binding-push-constant-gl.hlsl * Trivial change to force travis rebuild to test the gcc linux build really has a problem.	22 August 2018, 15:36:02 UTC
0ce1131	Tim Foley	21 August 2018, 15:40:25 UTC	Add support for more RasterizerOrdered types (#628) Fixes #627 The front-end has support for `RasterizerOrderedBuffer` and `RasterizerOrderedTexture`, but left out support for: `RasterizerOrderedByteAddressBuffer` * `RasterizerOrderedStructuredBuffer` [Nitpick: these tyeps are all amazingly annoying to type. It is easy to want to write `RasterOrdered` instead of the bulkier `RasterizerOrdered`, and almost everybody does in casual speech. There's already the issue of wanting to type `StructureBuffer` (a buffer of structures) instead of `StructuredBuffer` (a buffer that is... structured?). Then you have `ByteAddressBuffer` which is just adding to the confusion because it is nominally a "byte addressable" buffer (so that `ByteAddressedBuffer` would actually make sense), but then actually isn't byte addressable in practice.] There were a few `TODO` comments related to this already, and this change was mostly a matter of doing a find-in-files for `RWByteAddressBuffer` and `RWStructuredBuffer` and adding matching `RasterizerOrdered` cases. The test I added just checks that these types make it through the front-end, and doesn't do any actual confirmation that they work as intended. It is worth noting that the handling of ordering in GLSL/VK is different from in HLSL ("pixel shader interlock" instead of "rasterizer ordered views"), so coming up with a cross-compilation story would need to be a later step.	21 August 2018, 15:40:25 UTC
56d8a75	Tim Foley	11 August 2018, 05:21:44 UTC	Improve model-viewer support for lights (#626) * Improve model-viewer support for lights The main visible change here is that the model-viewer example supports multiple light sources, with a basic UI for adding new light sources to the scene, and for manipulating the ones that are there. Along the way I also refactored the `IMaterial` decomposition to be a bit less naive, while still only including a completely naive Blinn-Phong implementation. I also went ahead and spruced up the `cube.obj` file so that it has multiple materials, although it is still a completely uninteresting asset. * Fixup: Windows SDK version	11 August 2018, 05:21:44 UTC
73ff690	Tim Foley	06 August 2018, 22:52:38 UTC	Add basic support for "Dear IMGUI" (#625) This isn't being made visible just yet, but it will allow us to have a simple UI for loading models into the model-viewer example. In order to support rendering with IMGUI I had to add the following to the `Renderer` layer: * viewports * scissor rects * blend support These are really only fully implemented for D3D11, but adding them to the other back-ends should be a reasonably small task.	06 August 2018, 22:52:38 UTC
68d705f	Tim Foley	03 August 2018, 15:39:28 UTC	Major overhaul of Renderer abstraction, to support a new example (#624) The original goal here was to bring up a second example program: `model-viewer`. While the existing `hello-world` example is enough to get somebody up to speed with the basics of the Slang API (as a drop-in replacement for `D3DCompile` or similar), it doesn't really show any of the big-picture stuff that Slang is meant to enable. There wasn't any use of D3D12/Vulkan descriptor tables/sets, and there wasn't any use of interfaces, generics, or `ParameterBlock`s in the shader code. The `model-viewer` example addresses these issues. Its shader code involves generics, interfaces, and multiple `ParameterBlock`s, and the host-side code demonstrates a few key things for working with Slang: * There is an application-level abstraction for parameter blocks, that combines the graphics-API descriptor set object with Slang type information * There is a shader cache layer used to look up an appropriate variant of a rendering effect by using parameter block types to "plug in" global type variables * There is a clear separation between the phases of compilation: a first phase that does semantic checking and enables reflection-based allocation of graphics API objects, followed by one or more code generation passes for specialized kernels. This example is certainly not perfect, and it will need to be revamped more going forward. In particular: * The output picture is ugly as sin. We need a plan for how to get this to load better content, perhaps even popping up an error message to note that the required input data isn't present in the basic repository. * The shader code is too simplistic. There isn't any real material variety, and the `IMaterial` abstraction is completely wrong. * The use of parameter blocks is facile because there are no resource parameters right now. Fixing that will likely expose issues around interfacing with Slang's reflection API. * The whole example exposes the issue that Slang's current APIs aren't really designed for the benefit of two-phase compilation (since our many client application has been stuck on one-phase compilation). * Global type parameters are actually a Bad Idea that we only did for compatibility with existing codebases. We should not be showing them off in an example of the Right Way to use Slang, but the language support for type parameters on entry points is still not complete. Of course, the majority of the changes here are not inside the example applications, and instead involve a major overhaul of the `Renderer` abstraction that is used for both tests and examples. The main thrust of the change is to make the abstraction layer be closer to the D3D12/Vulkan model than to a D3D11-style model. This is important for the `model-viewer` example, since it aspires to show how Slang can be incorporated into a renderer that targets a modern API. The most important bit is actually the use of descriptor sets and "pipeline layouts" a la Vulkan, since without these Slang's `ParameterBlock` abstraction won't make a lot of sense. Implementation of the abstraction for the various APIs has very much been on an as-needed basis. The current implementation is just enough for the two examples to work, plus enough to get all the tests to pass in both debug and release builds on Windows. A big missing feature in the API abstraction right now is memory lifetime management. The code had been trending toward something D3D11-like where a constant buffer could be mapped per-frame with the implementation doing behind-the-scenes allocation for targets like D3D12/Vulkan. I'd like to shift more toward a model of just exposing "transient" allocations that are only valid for one frame, because these are more representation of how an efficient renderer for next-generation APIs will work. That transition isn't actually complete, though, so there are problems with the existing examples where `hello-world` is actually scribbling into memory that the GPU might still be using, while `model-viewer` is doing full-on heavy-weight allocations on a per-frame basis with no real concern for the performance implications. All together, there are a lot of things here that need more work, but this branch has been way too long-lived already, and so I'd like to get this checked in as long as all the tests pass.	03 August 2018, 15:39:28 UTC
5ea746a	Tim Foley	31 July 2018, 16:02:22 UTC	Fix imageStore output for types other than 4-vectors (#622) Fixes issue #620 Given a `RWTexture*` store operation like: ```hlsl RWTexture3D a<float>; ... float x = 1.0f; a[crd] = x; ``` We were generating output GLSL like: ```glsl layout(rgba32f) image3D a; ... float x = 1.0f; imageStore(a, crd, x); ``` but in that case, the `imageStore` operation expected a `vec4` and not a `float` for the last argument, and we fail GLSL compilation. This change extends our handling of the `imageStore` operation in the stdlib so that we pad out the last argument if it is not a 4-vector. We also flesh out the code that was picking a `layout(...)` modifier for image formats so that it doesn't just blindly use `layout(rgba32f)` and instead takes the element type fed to `RWTexture3D<...>` into account. With these two changes, the above HLSL/Slang code now translates to: ```glsl layout(r32f) image3D a; ... float x = 1.0f; imageStore(a, crd, vec4(x, float(0), float(0), float(0))); ``` Note that we are padding out the `x` argument to a full vector, and also that we declare the image with `layout(r32f)` to reflect the fact that it has only as single channel.	31 July 2018, 16:02:22 UTC
9914ca8	jsmall-nvidia	31 July 2018, 15:03:53 UTC	Feature/attributed binding (#621) * Typo fix, and added dxc to command line documentation. * Fix small typos. Added support for Scope to lexer. Fix bug in Token ctor. * Add support for attribute names that are scoped. * Added GLSLBindingAttribute. Make binding work through core.met.slang. * Allow [[gl::binding(binding, set)]] [[vk::binding(binding,set)]]	31 July 2018, 15:03:53 UTC
171f524	Tim Foley	26 July 2018, 21:48:48 UTC	Fix translation of RWTexture subscript operations for Vulkan (#618) Partially fixes #615 There's kind of a mess going on here, and it is difficult to be sure which of the changes here are strictly necessary. Also, our testing isn't setup to run tests that use `RWTexture2D`, so the only testing I can really run is manual tests using Falcor. The most basic issue here is that in an earlier change I added `ref` accessors for the subscript operation on various `RW` types in the standard library, and that included `RWTexture2D` (and the other `RWTexture` types). The compiler ended up favoring a `ref` accessor over a `set` accessor even when the `set` would suffice, but only the `set` accessor could be lowerd to GLSL/SPIR-V. This change ends up implementing two different fixes for the same problem: * Logic has been added to try and favor a `set` accessor over a `ref` accessor in the cases where either could be used (but still require a `ref` accessor to be used when it is really needed) * The `ref` accessor for `RWTexture*` has been removed, since it turns out that the operations that might have benefited from it (atomics, and component-granularity stores) aren't actually allowed on typed UAVs anyway. There is a deeper issue here that somebody needs to go through and rationalize our representation and handling of accessors like this, but I'm not going to be able to do that in the time I can put into this PR.	26 July 2018, 21:48:48 UTC
66f5f18	Tim Foley	26 July 2018, 20:36:28 UTC	Fix implicit flat interpolation for GLSL output (#619) There was some logic called `maybeEmitGLSLFlatModifier()` that was supposed to emit an implicit `flat` modifier for any varying shader parameter with an integer type that wasn't qualified as `nointerpolation` in the input HLSL/Slang (where `nointerpolation` is the equivalent of `flat`). This wasn't being triggered because I apparently added code to only emit the implicit modifier if there was no explicit one, but then I had this code: ```c++ bool anyModifiers = false; anyModifiers = true; ... if(!anyModifiers && ...) { maybeEmitGLSLFlatModifier(); } ``` Unsurprisingly, the `anyModifiers = true` line meant that things never actually triggered. Once I fixed that issue the next problem that arose was that the `maybeEmitGLSLFlatModifier()` logic was being applied to any varying integer parameter, which includes fragment outputs, but GLSL forbids the `flat` modifier on fragment outputs and so gave an error on a shader that wrote to an integer target. I fixed up the logic to take computed layout for a shader parameter into account, and only emit the `flat` modifier for fragment inputs. As the `TODO` commend at that location notes, there may be some arcane rules about how a vertex shader also needs to use `flat` when declaring the matching output, so we may need to make that test more careful down the road. For now the shader that originally surfaced this problem now works under Vulkan.	26 July 2018, 20:36:28 UTC
dff216c	jsmall-nvidia	25 July 2018, 15:57:44 UTC	Improved command line control for apis and synthesized tests (#616) * Parsing of control of api parameters no longer needs comma separator. Parsing of API list now can take an initial state. Document the command line option. * * Proper handling of 'default' (or initialFlags) - by using if the first token is an operator. * Clarified parsing of api flags. * Now 'vk' will mean just use vk. +vk will mean the defaults plus vk, and -vk is the defaults -vk. * Improve README.md on api expressions. Improve error text for failure to parse api expressions.	25 July 2018, 15:57:44 UTC
990ed73	jsmall-nvidia	17 July 2018, 17:36:19 UTC	Fix problem when doing options parse, that failure doesn't leave appropriate message in diagnostic string. (#612)	17 July 2018, 17:36:19 UTC
7b2a549	jsmall-nvidia	06 July 2018, 15:51:19 UTC	spCompile/spProcessCommandLineArguments return SlangResult (#610) * * Make spCompile return SlangResult * Make spProcessCommandLineArguments return SlangResult (and not internally exit) * Remove calls to exit() * Fix typos * Make all output from spProcessCommandLineArguments get sent to diagnostic sink.	06 July 2018, 15:51:19 UTC
338a770	Tim Foley	28 June 2018, 19:33:03 UTC	Fix up definitions of half and double in stdlib (#608) An earlier change made sure that the `half` and `double` types properly conform to the `__BuiltinFloatingPointType` and `__BuiltinRealType` interfaces, but somehow that change modified only the generated source file (`core.meta.slang.h`) and not the source that fed into the generator (`core.meta.slang`). This meant that when building the compiler, we'd end up with spurious diffs because we'd run the generation logic and clobber the (correct) output file with freshly generated (wrong) code. This change adds the missing lines to the source file to fix up the issue.	28 June 2018, 19:33:03 UTC
dfe13b5	Tim Foley	28 June 2018, 18:14:48 UTC	Share graphics API layer between tests/examples (#603) The `render-test` project has an in-progress graphics API abstraction layer, and it makes sense to share this code with our examples rather than write a bunch of redundant code between examples and tests. Most of this change is just moving files from `tools/render-test/` to a new library project at `tools/slang-graphics/`. The most complicated code change there is renaming from `render_test` to `slang_graphics`. The existing `hello` example was ported to use the graphics API layer instead of raw D3D11 API calls. It is still hard-coded to use the D3D11 back-end and the `SLANG_DXBC` target, so more work is needed if we want to actually support multiple APIs in the examples. I also went ahead and implemented an extremely rudimentary set of APIs to abstract over the Windows platform calls that were being made in the example, so that we could potentially run that same example on other platforms. I did not* port `render-test` to use those APIs, and I also did not implement them for anything but Windows (my assumption is that for most other platforms we would just use SDL2, and require people to ensure it is installed to their machine before building Slang examples).	28 June 2018, 18:14:48 UTC
22033f0	jsmall-nvidia	27 June 2018, 20:53:48 UTC	Support for Tessellation (#607) * Fix typo OuptutTopologyAttribute -> OutputTopologyAttribute First pass support for handing tesselation shaders - domain and hull. * Added attribute PatchConstantFuncAttribute * Added visitHLSLPatchType(HLSLPatchType* type) such that the patch type template parameters are handled * Added IRNotePatchConstantFunc - such that the patch constant function is referenced within IR * Added support for outputing typical tesselation attributes (although minimal validation is performed) * Added findFunctionDeclByName * Small improvements to diagnostic. * Improved diagnostics and checking for geometry shader attributes. * Added diagnostic if patchconstantfunc is not found Handle assert failure when outputing a domain shader alone and therefore attr->patchConstantFuncDecl is not set. * Simple script tess.hlsl to test out domain/hull shaders. * Added url for where hull shader attributes are defined. * Fix unsigned/signed comparison warning. * Restore removal of fix in "Improve generic argument inference for builtins (#598)" * Update tessellation test case to compare against fxc The test was previously comparing against fixed expected DXBC output, but this caused problems when the test runner tried to execute the test on Linux (where there is no fxc to invoke...), and would also be a potential source of problems down the road if different users run using different builds of fxc. The simple solution here is to convert the test to compare against fxc output generated on the fly. That test type is already filtered out on non-Windows builds, so it eliminates the portability issue (in a crude way). I also changed the test to compile both entry points in one compiler invocation, just to streamline things into fewer distinct tests. * Eliminate unnecessary call to `lowerFuncDecl` In a very obscure case this could cause a bug, if the patch-constant function had somehow already been lowered (because it was called somewhere else in the code). The call should not be needed because `ensureDecl` will lower a declaration on-demand if required, so eliminating it causes no problems for code that wouldn't be in that extreme corner case.	27 June 2018, 20:53:47 UTC
4bbd0e7	jsmall-nvidia	22 June 2018, 21:36:59 UTC	Feature/com helper (#606) * Added Result definitions to the slang.h * Removed slang-result.h and added slang-com-helper.h * Move slang-com-ptr.h to be publically available. * Add SLANG_IUNKNOWN macros to simplify implementing interfaces. Use the SLANG_IUNKNOWN macros to in slang.c * Removed slang-defines.h added outstanding defines to slang.h * Include slang-com-ptr.h and slang-com-helper.h in archives built with CI. * Use spaces instead of tabs on appveyor.yml * Put operator== and != for Guid in global namespace. * Fix binary windows archive to have all the slang headers.	22 June 2018, 21:36:59 UTC
4fa0111	jsmall-nvidia	22 June 2018, 19:06:41 UTC	Feature/com helper (#605) * Added Result definitions to the slang.h * Removed slang-result.h and added slang-com-helper.h * Move slang-com-ptr.h to be publically available. * Add SLANG_IUNKNOWN macros to simplify implementing interfaces. Use the SLANG_IUNKNOWN macros to in slang.c * Removed slang-defines.h added outstanding defines to slang.h * Include slang-com-ptr.h and slang-com-helper.h in archives built with CI. * Use spaces instead of tabs on appveyor.yml	22 June 2018, 19:06:41 UTC
d0c9571	jsmall-nvidia	22 June 2018, 17:09:01 UTC	Expose macros/functionality for defining interfaces (#604) * Added Result definitions to the slang.h * Removed slang-result.h and added slang-com-helper.h * Move slang-com-ptr.h to be publically available. * Add SLANG_IUNKNOWN macros to simplify implementing interfaces. Use the SLANG_IUNKNOWN macros to in slang.c * Removed slang-defines.h added outstanding defines to slang.h	22 June 2018, 17:09:01 UTC
e66d66b	Tim Foley	14 June 2018, 18:56:31 UTC	Add support for "blobs" and a file-system callback (#596) * Add support for "blobs" and a file-system callback The most obvious change here is that the Slang header now includes a few COM-style interfaces that can be used for communication between the application and compiler. In order to support the declaration of COM-like interfaces, several platform-detection macros were lifted out of `slang-defines.h` and into the public `slang.h` header. As it exists right now, this change makes the Slang API C++-only, but a C-compatible version can be defined later with the help of lots of macros (and/or something like an IDL compiler). The two big interfaces introduced are: * The `ISlangBlob` interface, which is compatible with `ID3DBlob`, `IDxcBlob`, etc. This is used to pass ownership of source/compiled code across the API boundary without copies. New versions of various entry points have been added to allow passing blobs: e.g., `spAddTranslationUnitSourceBlob` and `spGetEntryPointCodeBlob`. * The `ISlangFileSystem` interface, which is used to allow applications to intercept any attempt by the Slang compiler to load a file (input source files, include files, etc.). This is not the same as the `IDxcIncludeHandler` interface, because it assumes UTF-8 encoded path names, instead of the 16-bit encoding that dxc/Windows prefer. It is also not very similar to `ID3DInclude` as used by fxc, because this callback interface is not responsible for handling the search through include paths, etc. - it is just a file-system abstraction layer. Internally, a few different parts of the compiler were changed to either store data in blob form all the time, or to be able to synthesize a blob on-demand. Because our internal `String` type is a reference-counted copy-on-write type, using a `SlangStringBlob` to hold string data should achieve transfer of ownership back to the application without extraneous copies. There is plenty of room to clean up the architecture of some of these internal pieces if they know that their data will end up in a blob. The existing Slang testing doesn't touch any of the APIs introduced here, so they can only confirm that existing functionality hasn't been broken. The new ability to return code blobs has been tested by integration of that feature into Falcor, but there has been zero testing of the ability to pass in source code as blobs, and the ability to hook file loading. Future changes will need to add test coverage for the new features. * fixup: define SLANG_NO_THROW for non-Windows builds * fixup: header copy-paste error caught by clang/gcc * Cleanup: return reference-counted objects via output parameters Returning a reference-counted object through the API as a raw pointer creates challenges. The "obvious" answer is that the returned pointer should have an added reference (it is returned at "+1"), and the caller is responsible for releasing that reference. This makes sense when using raw pointers on the calling side: ```c++ IFoo* foo = spGetFoo(...); ... foo->Release(); ``` However, as soon as smart pointers start getting involved (to handle releasing reference counts when we are done with things), the picture gets more complicated: ```c++ MySmartPtr<IFoo> foo = spGetFoo(...); ... ``` The intention of code like that is that `foo` gets released when the smart pointer goes out of scope, but this probably doesn't happen with most smart pointer implementations. If the `MySmartPtr` constructor that takes a raw pointer retains it, then the destructor will only release that reference, and so the object will leak. It is possible that the user will have a smart pointer type where the constructor that takes a raw pointer doesn't retain it, but in general such types introduce the potential for errors of their own, and no matter what the Slang API shouldn't go in assuming any particular policy. This change makes it so that any reference-counted objects that are logically returned from a call are returned through output pointers. This design makes the leak-free cases easy (enough) to implement with raw pointers or smart pointers: ```c++ // raw pointer IFoo* foo = nullptr; spGetFoo(..., &foo); ... foo->Release(); // smart pointer MySmartPtr<IFoo> foo; spGetFoo(..., foo.writeableRef()); ... ``` The only assumption here is that any COM smart-pointer type needs to provide an operation like `writableRef` that is suitable for using that pointer as an output parameter. Given that COM loves output parameters, this seems like a safe assumption (at the very least, anybody who interacts with COM would be used to this convention). Future changes might introduce inline convenience methods for various operations that return results more directly, possibly by introducing a minimal smart-pointer type in the `slang.h` header (without prescribing that clients must use it...). * fixup: another error caught by gcc/clang	14 June 2018, 18:56:31 UTC
126e75d	Tim Foley	14 June 2018, 14:29:45 UTC	Improve generic argument inference for builtins (#598) Fixes #487 The basic problem here is that the user writes something like: ```hlsl float invSqrt2 = 1 / sqrt(2); ``` In this case the user knows that `sqrt()` is only defined for floating-point types, so they expect this to compile something like: ```hlsl float invSqrt2 = float(1) / sqrt(float(2)); ``` The challenge this creates for the Slang compiler is that we use generics to streamline our declarations of all the builtins, so that the scalar `sqrt()` function is actually declared as: ```hlsl T sqrt<T:__BuiltinFloatingPointType>(T value); ``` The `__BuiltinFloatingPointType` is an `interface` defined as part of the standard library, such that only built-in floating-point types conform to it (that is, `half`, `float`, and `double`). When generic argument inference applies to a call like `sqrt(2)`, we see an argument of type `int`, and try to infer `T=int`, which leads to a failure because `int` does not conform to `__BuiltinFloatingPointType`. The point where this currently fails in in the logic to "join" two types for inference, which is supposed to pick the best type that can represent both of two input types. E.g., a join between `float` and `int3` would be `float3`, since both of those types can convert to it, and it is the "minimal" type with that property. So, the goal here is simple: we want a "join" between `int` and `__BuiltinFloatingPointType` to yield the `float` type. The way we handle that in this change is to special case the join of a basic scalar type and an interface, by enumerating all the basic scalar types, filtering them for ones that support the chosen interface and can be implicitly converted from the argument type, and then picking the "best" of them (the comments in the code explain what "best" means in this context). The technique used here could be generalized in the future to deal with user-defined types or more cases, but that would risk slowing down overload resolution even more, which is already the most expensive part of our semantic checking pass. A test case has been added for the specific case of `sqrt()` applied to an `int` argument.	14 June 2018, 14:29:45 UTC
77562ef	Tim Foley	13 June 2018, 22:39:04 UTC	Make render-test use Slang for all shader compilation (#597) * Make render-test use Slang for all shader compilation This streamlines the code for render-test by having all its shader compilation go through the Slang API, so that it doesn't have to deal with custom logic to compile HLSL->DXBC and HLSL->DXIL. We were already leaning on Slang to generate SPIR-V for Vulkan, so this makes all the paths more consistent. My original plan with this change was to make the D3D12 render path start using DXIL at this point, since the change would make that easy, but it turns out that some aspects of how we handle parameter binding are not compatible with that right now, so it would need to come as a later change. There's a lot of details here, so I will try to walk through the changes, including the incidental ones: * Add logic to `premake5.lua` so that we copy the necessary libraries for HLSL shader compilation to our target directory from the Windows SDK. This is necessary so that our tests can actually invoke `dxcompiler.dll` * Re-run Premake to generate new project files. This moves around a few files that I manually added in previous changes without re-running Premake. * When invoking `fxc` as a pass-through compiler, be sure to pass along any macros defines via API or command-line. This isn't a strictly required change with how things worked out, but it is a positive one anyway, because it makes `slangc -pass-through fxc` more useful. * Don't print output from a downstream `fxc` invocation if it produces warnings but no errors. The main reason for this is so that our tests don't fail because of `fxc` warnings on Slang's output (which then don't match the baselines), but it can also be rationalized as not wanting to confuse users with warnings that don't come from the "real" compiler they are using. This probably needs fine-tuning as a policy. * Add the HLSL `NonUniformResourceIndex` function. This was an oversight because it isn't documented as a builtin on MSDN, and only gets mentioned obliquely when they talk about resource indexing. * Add `glsl_<version>` profiles to match our `sm_<version>` profiles, so that it is easy for a user to use the profile mechanism to request a specific GLSL version without also specifying a stage name. * Update the render-test logic so that there is a single `ShaderCompiler` implementation that always uses Slang, and get rid of all of the renderer-specific `ShaderCompiler` implementations. * Update logic in render-test `main.cpp` to select the options to use for the eventual Slang compile based on the choice of renderer and input language. I didn't change the options that render-test exposes, even though they are getting increasingly silly (e.g., `-glsl-rewrite` doesn't use GLSL as its input...). * Note: the D3D12 renderer will still use fxc, DXBC, and SM 5.0 for now, since trying to update it to switch to dxc, DXIL, and SM 6.0 didn't work well at the time. * Add a bit of supporting D3D12 code to make sure that we don't allocate a structured buffer when a buffer has a format. * Make sure to also define the `__HLSL__` macro when compiling Slang code, because otherwise a bunch of tests don't work (I'm not clear on how it worked before...). * fixup: missing file	13 June 2018, 22:39:04 UTC
a4dd936	Tim Foley	13 June 2018, 20:56:30 UTC	Fix some issues around codegen for l-values and assignment (#601) The problem here arose when a complicated l-value was formed like: ```hlsl struct Foo { float4 a; } RWStructuredBuffer<Foo> gBuffer; gBuffer[index].a.xz += whatever; ``` In this case the `gBuffer[index].a.xz` expression is a complex l-value in multiple ways: * The `gBuffer[index]` subscript could be routed to either a `get` accessor or a `ref` accessor (and maybe also a `set` accessor if we add one to the stdlib definition), and we defer the choice of which to call until as late as possible in codegen today. * The `_.a` part then becomes a "bound member acess" because we can't actually produce a direct pointer until we've resolved how to implement the subscript operation. * The `_.xz` part becomes a "swizzled l-value" because there is no way to materialize it as a pointer to contiguous storage in the orignal object (the `x` and `z` components of a vector aren't contiguous). Recent changes to support atomic operations on buffer elements introduced the `ref` accessor on `RWStructuredBuffer`, which made it possible to form a pointer to a buffer element in the IR. This interacted with some code for the "bound member" case that was trying to only introduce a temporary when absolutely necessary, and was doing so by assuming anything with an address didn't need to be moved into a temporary. The first fix is to clean up that logic in the bound-member case for assignment: always create a temporary, rather than do it conditionally. The second fix here is more systemic: we add logic to try to coerce the representation of an l-value during codegen into being a simple address, and employ that in cases where we know an address is desired. In a case like the above this helps to get things into the form that is required, so that a swizzled store can be issued. There is still some potential for cleanup in this logic, but I don't want to introduce more changes than seem necessary to fix the original problem.	13 June 2018, 20:56:30 UTC
860b0d6	Tim Foley	13 June 2018, 18:18:44 UTC	Fixes related to handling of empty types (#600) PR #577 tries to eliminate empty `struct` types by replacing them with a `LegalType::tuple` with zero elements, but this seems to run into problems in some cases, where we end up trying to match up `::none` values with empty `::tuple`s. An alternative way to handle this is to never create empty `LegalType::tuple`s (and the same for `LegalVal::tuple`), and instead create `LegalType::none` and `LegalVal::none`. PR #577 avoided this because there were various cases in the legalization logic that didn't robustly handle `LegalType::Flavor::none`. This PR thus includes two main changes: 1. Construct a `::none` type when we have an empty `struct` type. 2. Survery all places that handle the `::tuple` case and extend them to handle the `::none` case if it was missing. This fixes an issue filed in Falcor's internal GitLab as number 424.	13 June 2018, 18:18:44 UTC
167d857	Tim Foley	12 June 2018, 21:59:13 UTC	Initial support for enum declarations (#599) Slang `enum` declarations will always be scoped, e.g.: ```hlsl enum Color { Red, Green = 2, Blue, } Color c = Color.Red; // Not just `Red` ``` A user can write `enum class` as a placebo for now (to ease sharing of headers with C++). Slang does not currently support the `::` operator for static member lookup, so it must be `Color.Green` and not `Color::Green`. Support for `::` as an alternate syntax could be added later if there is strong user demand. An `enum` type can have a declared "tag type" using syntax like C++ `enum class`: ```hlsl enum MyThings : uint { First = 0, // ... } ``` The `enum` cases will store their values using that type. An `enum` that doesn't declare a tag type will use the type `int` by default. Enum cases are assigned values just like in C/C++: cases can have explicit values, but otherwise default to one more than the previous case, or zero for the first case. All `enum` types will automatically conform to a standard-library `interface` called `__EnumType`, which is used so that basic operators like equality testing can be defined generically for all `enum` types. This change only adds one operator at first (the `==` comparison), but other should be added later. An `enum` case needs to be explicitly converted to an integer where needed (e.g., `int(Color.Red)`). This is implemented by having the main integer types (`int` and `uint`) support built-in initializers that can work for any `enum` type (or rather, anything conforming to `__EnumType`). Eventually these will be restricted so that an `enum` type can only be converted to its associated tag type. IR code generation completely eliminates `enum` types and their cases. The `enum` type will be replaced with its tag type, and the cases will be replaced with the tag values. Currently this could leave some mess in the IR where cast operations are applied between values that actually have the same type.	12 June 2018, 21:59:13 UTC
7852a2b	Tim Foley	06 June 2018, 18:59:32 UTC	Add basic support for Shader Model 6.3 profiles (#594) * Add basic support for Shader Model 6.3 profiles This adds `vs_6_3` and friends as available profiles, but doesn't add any new builtins specific to Shader Model 6.3. In order to better support the ray tracing shader stages, Slang will not automatically map any attempt to compile a DXR shader up to SM 6.3 (the shader model officially required for these stages) and to the `lib_` profiles (because there are no stage-specific profiles for these cases). As an added detail, when invoking `dxcompiler.dll` to generate DXIL for DXR shaders, specify an empty entry-point name, since that is expected for `lib_` profiles. * Fixup: don't drop [shader(...)] attributes The previous change makes the "effective profile" for DXR compiles no longer include a stage, but we had been using the stage stored on the effective profile in exactly one place: when determining what to output for a `[shader("...")]` attribute. This fixup makes it so that we use the stage from the profile on the entry-point layout instead, which seems like the right choice anyway, if we are ever going to emit multiple entry points at once.	06 June 2018, 18:59:32 UTC
1a69812	Tim Foley	06 June 2018, 04:35:48 UTC	Fix atomic operations on RWBuffer (#593) * Fix atomic operations on RWBuffer An earlier change added support for passing true pointers to `__ref` parameters to fix the global `Interlocked()` functions when applied to `groupshared` variables or `RWStructureBuffer<T>` elements. That change didn't apply to `RWBuffer<T>` or `RWTexture2D<T>`, etc. because those types had so far only declared `get` and `set` accessors, but not any `ref` accessors (which return a pointer). The main fixes here are: Add `ref` accessors to the subscript oeprations on the `RW` resource types Adjust the logic for emitting calls to subscript accessors so that we don't get quite as eager about invoking a `ref` accessor, and instead try to invoke just a `get` or `set` accessor when these will suffice. This is important for Vulkan cross-compilation, where we don't yet support the semantics of our `ref` accessors. * Add a test case for atomics on a `RWBuffer` * Fix up `render-test` so that we can specify a format for a buffer resource, which allows us to use things other than `StructuredBuffer` and `ByteAddressBuffer`. The work there is probably not complete; I just did what I could to get the test working. * A bunch of files got whitespace edits thanks to the fact that I'm using editorconfig and others on the project seemingly arent... * fixup: remove ifdefed-out code	06 June 2018, 04:35:48 UTC
8b16bbf	Tim Foley	04 June 2018, 22:12:56 UTC	Emit directives to control matrix layout (#590) The HLSL/GLSL output by Slang should try to be robust against whatever flags somebody uses to compile it. Therefore, we will go ahead and output a target-language-specific directive to control the default matrix layout mode so that we can override whatever might be specified via flags. Also, as long as we are at it, this change goes ahead and makes Slang unconditionally emit row/column-major layout modifiers on all matrices (and arrays of matrices) whereas before these were only being output sometimes (the code to do it seemed buggy to me...).	04 June 2018, 22:12:56 UTC
698ba86	jsmall-nvidia	01 June 2018, 14:41:13 UTC	1st stage renderer binding refactor (#587) * First pass at support for textures in vulkan. * Binding state has first pass support for VkImageView VkSampler. * Split out _calcImageViewType * Fix bug in debug build around constant buffer being added but not part of the binding description for the test. * Offset recalculated for vk texture construction just store the texture size for each mip level. * When outputing a vector type with a size of 1 in GLSL, it needs to be output as the underlying type. For example vector<float,1> should be output as float in GLSL. * Vulkan render-test produces right output for the test tests/compute/textureSamplingTest.slang -slang -gcompute -o tests/compute/textureSamplingTest.slang.actual.txt -vk * Small improvement around xml encoding a string. * More generalized test synthesis. * Fix image usage flags for Vulkan. * Improvements to what gets synthesized vulkan tests. * Do transition on all mip levels. * Fixing problems appearing from vulkan debug layer. * Disable Vulkan synthesized tests for now. * Add Resource::Type member to Resource::DescBase. * Removed the CompactIndexSlice from binding. Just bind the indices needed. * BindingRegister -> RegisterSet * RegisterSet -> RegisterRange * Typo fix for debug build. * Remove comment that no longer applied.	01 June 2018, 14:41:13 UTC
8d77db3	Tim Foley	31 May 2018, 17:50:28 UTC	Add options to control matrix layout rules (#583) * Add options to control matrix layout rules Up to this point, the Slang compiler has assumed that the default matrix layout conventions for the target API will be used. This means column-major layout for D3D, and row major layout for GL/Vulkan (note that while GL/Vulkan describe the default as "column major" there is an implicit swap of "row" and "column" when mapping HLSL conventions to GLSL). This commit introduces two main changes: 1. The default layout convention is switched to column-major on all targets, to ensure that D3D and GL/Vulkan can easily be driven by the same application logic. I would prefer to make the default be row-major (because this is the "obvious" convention for matrices), but I don't want to deviate from the defaults in existing HLSL compilers. 2. Command-line and API options are introduced for setting the matrix layout convention to use (by default) for each code generation target. It is still possible for explicit qualifiers like `row_major` to change the layout from within shader code. I also added an API to query the matrix layout convention that was used for a type layout (which should be of the `SLANG_TYPE_KIND_MATRIX` kind), but this isn't yet exercised. I added a reflection test case to make sure that the offsets/sizes we compute for matrix-type fields are appropriately modified by the flag that gets passed in. In a future change we could possibly switch the default convention to row-major, if we also changed our testing to match, since there are currently not many clients to be adversely impacted by the change. * Fixup: silence 64-bit build warning	31 May 2018, 17:50:28 UTC
8c593ae	jsmall-nvidia	30 May 2018, 15:32:26 UTC	GroupMemoryBarrierWithGroupSync only works on groupshared memory - it doesn't block on global memory accesses. The fix is to copy the values to be processed by InterlockedAdd into shared array. The previous test ran successfully on Dx11, but broke on Dx12. (#586)	30 May 2018, 15:32:26 UTC
8b67c7b	jsmall-nvidia	29 May 2018, 20:48:04 UTC	Feature/vulkan texture (#579) * First pass at support for textures in vulkan. * Binding state has first pass support for VkImageView VkSampler. * Split out _calcImageViewType * Fix bug in debug build around constant buffer being added but not part of the binding description for the test. * Offset recalculated for vk texture construction just store the texture size for each mip level. * When outputing a vector type with a size of 1 in GLSL, it needs to be output as the underlying type. For example vector<float,1> should be output as float in GLSL. * Vulkan render-test produces right output for the test tests/compute/textureSamplingTest.slang -slang -gcompute -o tests/compute/textureSamplingTest.slang.actual.txt -vk * Small improvement around xml encoding a string. * More generalized test synthesis. * Fix image usage flags for Vulkan. * Improvements to what gets synthesized vulkan tests. * Do transition on all mip levels. * Fixing problems appearing from vulkan debug layer. * Disable Vulkan synthesized tests for now.	29 May 2018, 20:48:04 UTC
e7a8332	Tim Foley	29 May 2018, 18:39:55 UTC	Fix global atomic functions (#582) Fixes #581 This change adds a new parameter passing mode `__ref` to exist alongisde `in`, `out`, and `inout`. The `__ref` modifier indicates true by-reference parameter passing (whereas `inout` is copy-in-copy-out). This is not intended to be something that users interact with directly, but rather a low-level feature that lets us provide a correct signature for the `Interlocked*()` operations in the standard library. Most of the support for passing what are logically addresses around already exists in the IR, so the majority of the work here is just in introducing the new type `Ref<T>` and then using it appropriately when lowering `__ref` parameters/arguments to the IR.	29 May 2018, 18:39:55 UTC
ace9a8d	Yong He	25 May 2018, 14:01:34 UTC	Fixes 574. Eliminate empty structs during type legalization (#577)	25 May 2018, 14:01:34 UTC
18709fb	Tim Foley	25 May 2018, 02:20:11 UTC	A bunch of work to resolve #569 (#576) * render-test should not fail on HLSL compiler warnings The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails). Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings. * Be more careful about deleting IR instructions There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either. I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates. I also made sure to clear out the type operand. This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions. * Don't emit bitwise not for non-Boolean types It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a & 1)` is very different from `!(a & 1)`, even when interpreted as a condition). It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values). Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense. This caused an actual problem for one of my test cases, so it was important to fix it now. * Fix issue with cached resolution for overoaded operators The basic problem was that the lookup logic was forming a key based on the first definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the postfix definition of `operator++` and decide to use its opcode as the key. This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode. A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix). * Introduce an intermediate structured control-flow representation The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements. It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc. The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure. This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG. There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`. The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region. Splitting things in this way opens up some opportunities for future changes: * We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm. * We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s. * We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks. During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case. The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change. * Add a pass to resolve scoping issues in generated code Fixes #569 The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR. In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider: ```hlsl for(;;) { int a = xyz; if(a) { int b = a; break; } int c = a; } int d = b; ``` This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it. The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like: ```hlsl int tmp; for(;;) { int a = xyz; if(a) { int b = a; tmp = b; break; } int c = a; } int d = tmp; ``` That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site. A few details that came up during the implementation: * Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc. * We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation). There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into: ```hlsl for(;;) { int a = xyz; if(a) { break; } int c = a; } int b = a; int d = b; ``` Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a new scoping issue for `a`, when it gets used by `b`. Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work. This change includes a test case the reproduces the original issue, so that we can confirm the fix works.	25 May 2018, 02:20:11 UTC

Newer
Older