Revision history - None - origin: https://github.com/shader-slang/slang

visit type:

Revision	Author	Date	Message	Commit Date
a08a314	Tim Foley	22 January 2019, 22:57:25 UTC	Clean up variable declaration class hierarchy (#787) The AST class hierarchy for variable declarations had a few messy bits. First, there are more subclasses of `VarDeclBase` than seem strictly necessary; especially for stuff like `struct` member variable which use `StructField` even for `static` fields (which are effectively globals). Second, the AST node type for the "cases" within an `enum` was made a subclass of `VarDeclBase` for expediency, but this isn't really semantically accurate (and doesn't seem to be paying off much in deduplication of code). This change tries to address both of those problems. First, we replace the existing `Variable` and `StructField` cases with a single `VarDecl` case that covers globals, locals, and member variables. I haven't gone so far as to replace function parameters or generic value parameters, but that might be worth considering as a further clean-up. Second, we change `EnumCaseDecl` to inherit directly from `Decl` instead of `VarDeclBase` and add an explicit case for handling them where they were previously handled as if they were variable declarations (this was done by manually surveying all locations in the code that referenced `VarDeclBase`).	22 January 2019, 22:57:25 UTC
a005007	jsmall-nvidia	22 January 2019, 21:15:15 UTC	* Fix comment on VerbosePath flag (#790) * Only output a 'verbose path' if it's different from the already output nominal path	22 January 2019, 21:15:15 UTC
bd815f0	jsmall-nvidia	21 January 2019, 21:41:54 UTC	Feature/file unique identity (#789) * * Fix memory bug around expanding va_args - needed buffer to have space for terminating 0 * Fix problem with FileWriter defaults being globals, as memory they allocate, will only be freed after return from main - work around by making StdWriters RefObject derived, and kept in scope such the writers are destroyed before checks for leaks is found * Added SimplifyPathAndHash mode for CacheFileSystem - will simplify the path and see if simplified path is in cache before reading file (limiting amout of underlying file requests) * * Added calcReplaceChar * Renamed DefaultFileSystem to OSFileSystem * Made OSFileSystem convert windows \ to / on linux * Simplified logic for caching in CacheFileSystem. * Added pragma-once-c to add extra test, but also so there is an 'include' directory in preprocessor tests. * Small fixes in pragma once test. * Simplified cache handling path, so that paths/simplified paths area always added. * Improve naming of methods for different caches. * Removed references to 'canonicalPath' and made 'uniqueIdentity' * * Re-add support for canonicalPath to ISlangFileSystem -> not for uniqueIdentifier but as a way to display 'canonicalPath' * Added peliminary support for being able to display verbose paths in a diagnostic * Added 'clearCache' support * Added verbose path support to SourceManager (now needs a ISlangFileSystemExt to do this) * Added support for '-verbose-path' option to slangc and slang-test.	21 January 2019, 21:41:54 UTC
0a3ef7b	jsmall-nvidia	21 January 2019, 20:33:59 UTC	Path simplification/hash mode, plus bug fixes (#788) * * Fix memory bug around expanding va_args - needed buffer to have space for terminating 0 * Fix problem with FileWriter defaults being globals, as memory they allocate, will only be freed after return from main - work around by making StdWriters RefObject derived, and kept in scope such the writers are destroyed before checks for leaks is found * Added SimplifyPathAndHash mode for CacheFileSystem - will simplify the path and see if simplified path is in cache before reading file (limiting amout of underlying file requests) * * Added calcReplaceChar * Renamed DefaultFileSystem to OSFileSystem * Made OSFileSystem convert windows \ to / on linux * Simplified logic for caching in CacheFileSystem. * Added pragma-once-c to add extra test, but also so there is an 'include' directory in preprocessor tests. * Small fixes in pragma once test. * Simplified cache handling path, so that paths/simplified paths area always added. * Improve naming of methods for different caches.	21 January 2019, 20:33:59 UTC
c55b462	Tim Foley	18 January 2019, 18:53:09 UTC	Delete what appears to be stray code in options validation (#784) The code for options validation had something like: ```c++ { if(rawTarget.redundantProfileSet) if(rawTarget.conflictingProfileSet) { ... } else if(rawTarget.redundantProfileSet) { ... } } ``` The first `if` there seems to be stray code left over from some edit to this logic, since its case is handled later on.	18 January 2019, 18:53:09 UTC
3c7e1be	jsmall-nvidia	17 January 2019, 22:50:48 UTC	Feature/hash for source identity (#786) * * Added COMMAND_LINE_SIMPLE test type * Made how spawning works controllable by paramter/type SpawnType * Made break-outside-loop and global-uniform run command line slangc * calcRelativePath -> calcCombinedPath * Add 64 bit version of GetHash. * Add support for Hash based mode for CacheFileSystem.	17 January 2019, 22:50:48 UTC
668078a	Tim Foley	17 January 2019, 19:43:20 UTC	Fix diagnostic output for command-line slangc (#783) Somehow the work to support output redirection when running things in `slang-test` ended up interfering with the API-specified diagnostic callback in the `slangc` code. This change doesn't include fixes to the testing infrastructure to prevent a regression like this happening again.	17 January 2019, 19:43:20 UTC
0db6e24	jsmall-nvidia	17 January 2019, 02:49:38 UTC	Not finding dxil no longer an error. Outputs a warning. (#781) * * Allow dxc compilation to take place if dxil is not found. * Output a warning that output will not be signed. * Remove .dll from dxil in warning so more applicable cross platform.	17 January 2019, 02:49:38 UTC
aedf617	Tim Foley	16 January 2019, 20:48:11 UTC	Initial support for dynamic dispatch using "tagged union" types (#772) * Initial support for dynamic dispatch using "tagged union" types Suppose a user declares some generic shader code, like the following: ```hlsl interface IFrobnicator { ... } type_param T : IFrobincator; ParameterBlock<T : IFrobnicator> gFrobnicator; ... gFrobincator.frobnicate(value); ``` and then they have some concrete implementations of the required interface: ```hlsl struct A : IFrobnicator { ... } struct B : IFrobnicator { ... } ``` The current Slang compiler allows them to generate distinct compiled kernels for the case of `T=A` and the case of `T=B`. This means that the decision of which implementation to use must be made at or before the time when a shader gets bound in the application. This change adds a new ability where the Slang compiler can generate code to handle the case where `T` might be either `A` or `B`, and which case it is will be determined dynamically at runtime. This means a single compiled kernel can handle both cases, and the decision about which code path to run can be made any time before the shader executes. This new option is supported by defining a tagged union type. Via the API, the user specifies that `T` should be specialized to `__TaggedUnion(A,B)` (the double underscore indicates that this is an experimental and unsupported feature at present). We refer to the types `A` and `B` here as the "case" types of the tagged union. Conceptually, the compiler synthesizes a type something like: ```hlsl struct TU { union { A a; B b; } payload; uint tag; } ``` The user can then allocate a constant buffer to hold their tagged union type, and when they pick a concrete type to use (say `B`), they fill in the first `sizeof(B)` bytes of their buffer with data describing a `B` instance, and then set the `tag` field to the appopriate 0-based index of the case type they chose (in this case the `B` case gets the tag value `1`). Actually implementing tagged unions takes a few main steps: * Type parsing was extended to special-case `__TaggedUnion` as a contextual keyword. This is really only intended to be used when parsing types from the API or command-line, and Bad Things are likely to happen if a user ever puts it directly in their code. Eventually construction of tagged unions should be an API feature and not part of the language syntax. * Semantic checking was extended to recognize that a tagged union like `__TaggedUnion(A,B)` shoud support an interface like `IFrobnicator` whenever all of the case types suport it, as long as the interface is "safe" for use with tagged unions (which means it doesn't use a few of the advancd langauge features like associated types). * The IR was extended with instructions to represent tagged union types and to extract their tag and the payload for the different cases as needed. * IR generation was extended to synthesize implementations of interface methods for any interface that a tagged union needs to support. Right now the implementation is simplistic and only handles simple method requirements, which it does by emitting a `switch` instruction to pick between the different cases. * A new IR pass was introduced to "desugar" any tagged union types used in the code. The downstream HLSL and GLSL compilers don't support `union`s, so we have to instead emit a tagged union as a "bag of bits" and implement loading the data for particular cases from it manually. * Final code emit mostly Just Works after the above steps, but we had to introduce an explicit IR instruction for bit-casting to handle the output of the desugaring pass. There are a bunch of gaps and caveats in this implementation, but that seems reasonable for something that is an experimental feature. The various `TODO` comments and assertion failures in unimplemented cases are intended, so that this work can be checked in even if it isn't feature-complete. * fixup: missing files * fixup: typos	16 January 2019, 20:48:11 UTC
8e47a38	Tim Foley	16 January 2019, 19:50:14 UTC	Improve handling of {} initializer list expressions (#778) Fixes #775 It was reported (in #775) that Slang doesn't handle initializer-list syntax when initializing matrix variables. When starting on a fix for that it became apparent that the time was right to fix two broad issues in the compiler's current handling of `{}`-enclosed initializer lists. The first issue was that the front-end checking of initializer lists wasn't handling the C-style behavior where an initializer list can either contain nested `{}`-enclosed lists for sub-arrays/-structures, or directly contain "leaf" values for initializing those aggregates. For example, the following two variable declarations ought to be equivalent: ```hlsl int4 a[] = { {1, 2, 3, 4}, {5, 6, 7, 8} }; int4 b[] = { 1, 2, 3, 4, 5, 6, 7, 8 }; ``` Getting this distinction right is important because we want to support initializing a matrix either from a list of vectors for its rows, or a list of scalars for its elements (in row-major order). The front-end semantic checking logic for initializer lists was revamped so that it conceptually tries to "read" an expression of a desired type from the initializer list, and decides at each step whether to consume a single expression by coercing it to the desired type, or to recursively read multiple sub-values to construct the type as an aggregate. The logic for deciding between direct vs aggregate initialization could potentially use some tweaking, but luckily it should always handle the case where users introduce explicit `{}`-enclosed sub-lists to make their intention clear, so that existing Slang code should continue to work as before. The second issue was that initializers without the expected number of elements weren't implemented in code generation, so they would lead to internal compiler errors. This change revamps the codegen logic for initializer lists so that it can synthesize default values for fields/elements that were left out during initialization. This includes an attempt to support default initialization of `struct` fields based on explicitly written initialization expressions.	16 January 2019, 19:50:14 UTC
86e11e0	jsmall-nvidia	16 January 2019, 18:31:42 UTC	Feature/external compiler reporting (#776) * Added support for converting SlangResult to string in PlatformUtil. * * Added reportExternalCompilerError * Made external compilers use this * Made DiagnosticSink accept UnownedStringSlice * Made emitXXX compiler functions return SlangError * Use smart pointers to handle life of Com interfaces * * Make SlangResult compatible with HRESULT for some common cases. * Make PlatformUtil::appendResult return SlangResult * Compile check SLANG_RESULT. * Add tests for checking diagnostics from external compilers. * * Make external compiler tests only run on windows for now. * Added 'windows' and 'unix' categories * Added categories based on what backends are available. Will make more tests run on linux and handle case where dxcompiler is not available on appveyor. * * Added spSessionCheckPassThroughSupport * Use to determine whats available for categories for tests * Add support for outputting source filename/s when using pass through.	16 January 2019, 18:31:42 UTC
c260e6a	Tim Foley	16 January 2019, 17:15:02 UTC	Fix a bug in IR linking (#777) The IR linking logic was recently rewritten to use the (optional) `IRLinkageDecoration`s instead of assuming `IRGlobalVals` always have a mangled name field, and in that process a bug seems to have crept in where in the case that an instruction that would usually quality as a "global value" does not have linkage, we were failing to register the instruction we create in the output module as a replacement for the original instruction. This problem affects `static` variables inside of functions, leading to them potentially getting emitted multiple times.	16 January 2019, 17:15:02 UTC
f0633a7	Tim Foley	16 January 2019, 16:16:53 UTC	Add proper IR codegen support for local static const variables (#779) Previously the IR codegen logic was treating function-scope `static const` variables just like `static` variables, which results in them generating less efficient output HLSL/GLSL. This change special-cases function-local `static const` variables with logic that mirrors how we handle global-scope `static const` variables. The approach in this change attempts to find a simpler solution to deal with `static const` variables inside of generic functions than what is currently done for `static` variables in generic functions, but I haven't tested whether that works in practice, so I didn't apply the same approach to the plain `static` case. That would make a good follow-on change. I've included a single test case to demonstrate that with this fix the Slang compiler generates output DXBC that uses an indexable "immediate" constant buffer, whereas without the fix it generates an array in local memory (slow).	16 January 2019, 16:16:53 UTC
29ffac9	Tim Foley	15 January 2019, 21:15:09 UTC	Fix up declaration checking order for enums (#774) The logic in `check.cpp` for declaration checking is very messy and needs to be re-written, but in the interim we need to be careful to avoid any cases where a declaration, or some piece of it, gets redundantly checked multiple times. The way the logic had been working, the different "cases" in an `enum` type were being checked twice, and that meant that any initialization expression for a case would be type-checked the first time (potentially leading to a new AST) and then the checked AST would be checked again. This created a problem if the first round of checking introduced any AST nodes that the checking logic would not expect to see (because the parser cannot possibly produce them). The fix here is to follow the style of the other declaration checking cases, where checking is separated into two distinct phases (the "header" phase makes the declaration usable by others, while the "body" phase checks its implementation details for internal consistency). This change includes a test case that produced an internal compiler error before, and compiles without error now.	15 January 2019, 21:15:09 UTC
5b0c507	Tim Foley	14 January 2019, 22:27:44 UTC	Add an error for global uniform parameter declarations (#773) A global uniform parameter in HLSL might canonically be defined like this: ```hlsl uniform float gSomeParameter; ``` The fxc and dxc compilers automatically collect all such parameters into a synthesized constant buffer, along the lines of: ```hlsl cbuffer $Globals { float gSomeParameter; } ``` Slang currently supports parsing and semantic checking of declarations like the above, and computes shader parameter layout/binding information that is appropriate for a constant buffer like `$Globals` above, but it does not include the support to emit HLSL or GLSL code that matches that layout, so that use of global uniforms in Slang is silently unsupported. Making this problem worse, the HLSL language is quite lax, and will parse the following as shader parameters as well: ```hlsl int gCounter = 0; const float kScaleFactor = 2.0f; ``` Each of those declarations introduces a global shader parameter, and then provides a default value for it via the initializer. These declarations do not introduce an ordinary global variable or constant as might be expected. (For anybody who wants to know, `static` is required to introduce a "real" global variable (although it will be a thread-local global in practice), while `static const` is required to introduce a global constant) I was not too worried about users trying to use global-scope uniforms and failing (since that has fallen out of common HLSL/GLSL practice), but the possibility that users might try to declare global variables/constants and get shader parameters by mistake creates more of a risk so that this hole is worth plugging. The right long-term fix is of course to support the intended semantics of global-scope uniforms, but that feature needs to be prioritized against other requests. A few of the Slang tests were unwittingly relying on this functionality, including some compute tests that seemingly got away with it based on the DXBC generated from the HLSL output by Slang just happening to match the layout they expected. These tests have all been tweaked to use explicit `cbuffer`s or `ParameterBlock`s instead.	14 January 2019, 22:27:44 UTC
35dfe37	Tim Foley	11 January 2019, 23:20:37 UTC	Fix some subtle bugs in D3D constant buffer layout (#771) * Fix some subtle bugs in D3D constant buffer layout The root of the issue here is that the D3D constant buffer layout rules require 16-byte alignment for arrays and structures, but they do not round up the size of an array/structure type to account for that alignment. That means that in cases like the following: ```hlsl cbuffer C0 { float3 a[2]; float c0; } struct A { float4 x; float3 y; }; cbuffer C1 { A a; float c1; } ``` The `c0` and `c1` fields get an offset of 28 and not 32 like you might expect if the preceding array/structure field `a` had been padded out to match its 16-byte alignment. The actual fix here is relatively simple, and mostly amount to shuffling around some code in `type-layout.cpp` to ensure that the D3D constant buffer layout don't inherit the logic that was rounding up array/structure sizes. Along the way I took the opportunity to clean up the inheritance hierarchy by making the GLSL-family layout rules not try to share anythign with the D3D family (not that there is very little to share), which in turn allowed for some simplification of the GLSL side of things. Fixing this behavior changed the output of a few reflection tests. In the case of `tests/reflection/arrays.hlsl` the output confirmed that we had been producing bad reflection information in these kinds of cases. The output for `tests/reflection/matrix-layout.slang` also showed some bugs in our reflection, but these were overall more minor: we mis-reported the size of certain matrices as 64 bytes instead of 60, and as a result also computed the size of the overall constant buffer as 4 bytes bigger than needed. In all of these cases I double-checked the expected output against dxc to make sure that the new offsets/sizes are what we should have been producing in the first place. I also updated the reflection test harness to start outputting layout information for the element type of a structured buffer, which changed the output of `tests/reflection/structured-buffer.slang`, but this didn't show any change in what we reported: it is just information that wasn't in the output to begin with. Finally, I added two new tests around these subtle cases of buffer layout behavior (especially subtle because it varies across target APIs). The `tests/compute/buffer-layout.slang` test simply sets up a type to ilustrate the troublesome scenarios and then embeds it in both a constant buffer and structured buffer that will be backed by memory with sequential `int` values. We then read out the value of a field as a way to probe its de facto offset at runtime. This test doesn't really stress the Slang compiler (except for our ability to pass through the same type declarations to downstream compilers), but it is useful to confirm our expectations about where things land in memory. The `tests/reflection/buffer-layout.slang` test then uses the reflection test infrastructure to confirm that the same type declarations used in the compute test produce the expected offsets in our reported reflection information. Before the fixes in this change this test showed us producing dangerously incorrect results in our D3D reflection information, which has now been fixed to match the empirically-determined offsets from the compute test. * fixups based on review feedback	11 January 2019, 23:20:37 UTC
7ba0a80	jsmall-nvidia	10 January 2019, 21:42:24 UTC	Fixes problem when passing in nullptr for the channel - we need the DiagnosticSink.writer to be nullptr so that we get the buffered result. Unfortunately here, it was set to the default for the channel, which in diagnostics case meant throwing away the diagnostic. (#770)	10 January 2019, 21:42:24 UTC
dbf5f41	jsmall-nvidia	10 January 2019, 21:01:05 UTC	Improvements around review of debug serialization info (#769) * * Make SourceView and SourceFile no longer derive from RefObject * Both have life time now managed by SourceManager * Tidied up a little around the serialization test code - just create the IRModule once * Simplified code around deleting SourceView/File. * Looked into generateIRForTranslationUnit - seems reasonable to just call it once, because it has side effects.	10 January 2019, 21:01:05 UTC
eb33144	jsmall-nvidia	07 January 2019, 14:31:31 UTC	Feature/serialization debug info (#767) * Remove AppContext. Use StdChannels to hold writers, and TestToolUtil to hold test tool specific functionality. * StdChannels -> StdWriters * getStdOut -> getOut, getStdError -> getError * Renamed main.cpp files of tools to try and stop visual studio getting confused between files - such that clicking on an error takes editor to the right location. * Work in progress on being able to serialize debug information. * * Added MemoryStream * First pass converting to IRSerialData * Able to read and write IRSerialData with debug data * Start at reconstruting IR serialized data. * First pass of generation debug SourceLocs from debug data. Works for test set for line nos. * Bug fixes. Moved testing of serialization into IRSerialUtil * Work around problem with irModule = generateIRForTranslationUnit(translationUnit); two times in a row produces different output(!). Fix by just creating once. * Remove problem with use of ternary op in slang.cpp on gcc/clang. * Added -verify-debug-serial-ir option that makes IR modules go through full serialization with debug information and verification. * Add a test that does serial debug verification that is run by default on linux.	07 January 2019, 14:31:31 UTC
d155eaa	jsmall-nvidia	07 January 2019, 14:14:01 UTC	Feature/unique tool source names (#766) * Remove AppContext. Use StdChannels to hold writers, and TestToolUtil to hold test tool specific functionality. * StdChannels -> StdWriters * getStdOut -> getOut, getStdError -> getError * Renamed main.cpp files of tools to try and stop visual studio getting confused between files - such that clicking on an error takes editor to the right location.	07 January 2019, 14:14:01 UTC
efa2c8f	jsmall-nvidia	21 December 2018, 16:00:28 UTC	Feature/remove app context (#765) * Remove AppContext. Use StdChannels to hold writers, and TestToolUtil to hold test tool specific functionality. * StdChannels -> StdWriters * getStdOut -> getOut, getStdError -> getError	21 December 2018, 16:00:28 UTC
b5bda9b	jsmall-nvidia	21 December 2018, 14:28:05 UTC	* Made 'sub command' git-like - with parameters before going to slang-test and after going to the tool (#764) * Document some of the changes to command line invocation * Make -v option display the effective command that is being used to run the test	21 December 2018, 14:28:05 UTC
02e44ba	jsmall-nvidia	20 December 2018, 18:23:58 UTC	Feature/lex memory reduction (#762) * Only do scrubbing if needed. When allocating content try to limit size (with scrubbing each token takes up 1k), now it's 16 bytes min size. * Don't allocate for every call to write on the CallbackWriter - use the m_appendBuffer. * Don't allocate memory for CallbackWriter use m_appendBuffer. * Use UnownedStringSlice for suffix output for parsing float/int literals. Fix typo in invalidFloatingPointLiteralSuffix * Using memory arena to hold tokens that are not in SourceManager. * Improve comment on lexing. * Make UnownedStringSlice allocation simpler on SourceManager. * Fix error on gcc around UnownedStringSlice - because VC converted string + UnownedStringSlice automatically into a String. * Fix generateName needing concat string for gcc. * When constructing a Token in parseAttributeName - because it's a Identifier, we have to set the Name. * Remove translation through String on getIntrinsicOp * Make func-cbuffer-param disablable with -exclude compatibility-issue * Move memory leak in render-test. * From review - can just use "?:" instead of performing a concat.	20 December 2018, 18:23:58 UTC
332056a	Tim Foley	19 December 2018, 22:38:04 UTC	Refactor several IR passes (#761) * Refactor several IR passes This change takes some IR passes that lived together in `ir.cpp` and moves them into their own files to improve clarity. In most cases these were passes introduced early in the life of the IR, so that it didn't seem like a big deal to have them all in one file, but now that `ir.cpp` has grown unwieldly this seems like an important cleanup to make. To give a quick rundown of the passes involved: * The IR "linking" step has been pulled out to `ir-link.{h,cpp}`. This code for this pass is pretty much identical to what was in `ir.cpp`, and no attempt has been made to clean up or refactor it in the current change. * The GLSL legalization step has been pulled out to `ir-glsl-legalize.{h,cpp}`. This used to be invoked directly from the linking step, but has been made a new top-level pass invoked from `emit.cpp`. Just like with the linking, the code in the new file is just a copy-paste of what was in `ir.cpp`, and no attempt at cleanup has been made. Also note that it might be a good idea to move this pass later in the overall sequence, but this PR doesn't attempt to do that as it could change results. * The generic specialization step has been pulled out to `ir-specialize.{h,cpp}`. The file name does not explicitly reference generic specialization because I anticipate this pass having to perform other kinds of specialization as well. The code in this case amounts to a heavy cleanup/refactoring pass and thus deserves careful scrutiny. The reason for the cleanup is that the generic specialization step used to be part of the "linking" step long ago, and continued to share infrastructure with it long after that stopped making sense. The newly cleaned up pass has much simpler logic that should be easy enough to follow from the comments. * In order to reduce code dulication, the IR "cloning" part of the `ir-specialize-resources.{h,cpp}` pass was pulled into its own files (`ir-clone.{h,cpp}`) that both the generic specialization step and the resource-based specialization step now share. The remaining changes then pertain to deleting a bunch of code out of `ir.cpp` and adding the new files to the build. The only test that needed updating was `vkray/raygen`, where some subtle ordering change in the refactored generic specialization logic has lead to the relative order of the specialized `TraceRay` and `saturate` functions beind reversed. * fixup: typo in assert * fixup: typos in comments	19 December 2018, 22:38:04 UTC
b6a5474	Tim Foley	18 December 2018, 14:57:18 UTC	Fix for byte-address buffers on Vulkan (#760) * Fix output comparison for compute tests There was some vestigial logic there that was first doing a string-based comparison of actual/expected output, and then falling back to a path that parsed the expected output as a float, then converted that to an integer, then printed that integer in hex, and did the comparison with the result of that conversion. I'm not even clear on what that code was trying to accomplish, but its main effect was allowing a test failure to slide by unnoticed becaues somehow an all-zeroes actual output file was matching an expected output file with no zeros. My understanding is that it went something like this: * The first line of expected output was `A` (as in hexidecimal for the decimal integer `10`), and the first line of actual output was `0`. * The `StringToFloat` function was failing on the input string `"A"` and returned `0.0` to indicate failure (rather than reporting any kind of error) * We then converted the `0.0` to integer `0` and converted it to a base-16 string `"0"` * The comparison to the actual output passed, and then a careless early exit in the comparison loop meant that a full test would pass as soon as one line of output passed according to this "second change" logic. This change removes the broken code in the test runner since nothing was relying on it, other than the one broken test case we wanted to fix anyway. * Fix the declarations of byte-address buffer methods for Vulkan The HLSL `ByteAddressBuffer` and `RWByteAddressBuffer` types have methods `Load` and `Store` that take byte offsets from the start of the buffer, but we translate them into GLSL that uses `uint[]` array, so that indexing into that array will be off by a factor of four. Somehow the code for mutable byte address buffers was written to add 4, 8, and 12 bytes to the base offset of a vector to get to its subsequent components, but I never thought about the implications this would have for the base address itself. This change includes the following fixes: * Any place in the translation of a byte-address `Load` or `Store` method that was using the address/offset value has been changed to use `$1 / 4` instead of `$1`. * The offsets of 4, 8, and 12 have been changed to 1, 2, and 3 since they are now being added to an index instead of a byte offset * The `GetDimensions` methods have introduced a factor of `* 4` to account for the fact that they need to return a byte size and not a count of elements. With this change the existing `byte-address-buffer` test now produces the desired output under Vulkan.	18 December 2018, 14:57:18 UTC
583c72a	Tim Foley	18 December 2018, 03:26:32 UTC	First step toward supporting use of interfaces as existential types (#716) * First step toward supporting use of interfaces as existential types Traditional generics involve universal quantification. E.g., a declaration like: ``` void drive<T : IVehicle>(T vehicle); ``` indicates for for all types `T` that implement the `IVehicle` interface, the `drive()` function is available. In contrast, whend directly using an interface type like: ``` IVehicle v = ...; v.doSomething(); ``` we only know that there exists some concrete type (we could call it `E`) such that `v` refers to a value of type `E`, and `E` implements the `IVehicle` interface. In order to perform an operation like `v.doSomething()` we need to "open" the existential value so that we can look at the concrete type and how it implements the `IVehicle.doSomething` requirement. This change adds a very explicit representation of existentials to Slang's IR. An operation like `e = makeExistential(v, w)` creates a value of some existential type (interfaces being our only existential types for now), by wrapping a concrete value `v` (the type of `v` can be seen as an implicit operand) and a witness table `w` showing that the type of `v` implements the requirements of the chosen interface type. In turn, opening of an existential is handled with operations `extractExistential{Value\|Type\|WitnessTable}` which pull the corresponding piece of information out of a value of existential type (which somewhere in the code had to have been created with `makeExistential`). The change includes a trivial simplification pass that can detect cases where an `extractExistential` operation is applied direclty to a `makeExistential` operation, so that there is only one possible result that could be extracted. This allows for simplification of existential types used in trivial ways for local variables (this is mostly so I can check in a functional test, rather than to actually support useful code involving interfaces right now). The logic in the semantic checking phase of the compiler is comparatively more complex. When we are about to perform member lookup given an expression like `obj.member` we will first check if `obj` has an existential type, and if it does we will construct a suitable local context in which we extract the value, type, and witness table from the existential (these all become explicit AST expression nodes), and then use the extracted value as the base of the lookup operation. The nature of existential values is that two different values with the same existential (interface) type could wrap concrete values with differnt types, so that we need to carefully refer only to the extracted type/value/witness-table of specific values. We handle this right now by conceptually moving the existential-type value into a local variable (by introducing a `LetExpr` that amounts to `let v = <init> in <body>`) and then require that the extract expressions must refer to the (immutable) variable declaration from which they are extracting a value. (Eventually we should expand this so that when using an immutable local variable of existential type we just use that variable as-is rather than introduce a new temporary) A simple test case is included that uses an interface type in an almost trivial way for a local variable; this test can be run and produces the expected results. A more complex test case that passes an existential into a function is included, but left disabled because a more aggressive simplification approach is required to generate working code from it. Add missing file for expected test output * Fixups for merge from top-of-tree	18 December 2018, 03:26:32 UTC
3a02c59	Tim Foley	17 December 2018, 22:48:48 UTC	Specialize away resource-type function parameters (#759) * Specialize away resource-type function parameters Work on #397. Introduction ------------ Suppose a user writes a function that takes a resource type as a parameter: ```hlsl float4 getThing(RWStructuredBuffer<float4> buffer, int index) { return buffer[index]; } ``` This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`: ```hlsl RWStructuredBuffer<float4> gBuffer; ``` translates to a global GLSL `buffer` declaration: ```hlsl buffer _S0 { float4 _data[]; } gBuffer; ``` There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function. (Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy) The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`. The solution implemented in this change works around some, but not all, of the problems with full inlining. The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites. Thus if we have code like: ```hlsl RWStructuredBuffer<float4> gA; RWStructuredBuffer<float4> gB[10]; ... getThing(gA, x); getThing(gA, y); getThing(gB[someVal], z); ``` we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`: ```hlsl float4 getThing_gA(int index) { return gA[index]; } float4 getThing_gB(int _val, int index) { return gB[_val][index]; } ``` and the call sites will change to match: ```hlsl getThing_gA(x); getThing_gA(y); getThing_gB(someVal, z); ``` Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it. While this description motivates the change based on GLSL output, the same basic issue can arise for other targets. For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters. Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types. Implementation -------------- The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand. The code in this pass is heavily commented, so hopefully it serves to explain itself all right. After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output. To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`. This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live. When I first added the DCE pass it eliminated everything because there were no "roots" for liveness. I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on). A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate. I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive as if it used it, without actually using it. There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced. Testing ------- There are three different tests added as part of this change: * The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization. * The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic. * The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets. * fixup: warnings/errors from other compilers * fixup: typos and cleanup * fixup: typos	17 December 2018, 22:48:48 UTC
d2ddc59	jsmall-nvidia	17 December 2018, 14:22:14 UTC	Feature/test tool shared libraries (#758) * Remove circular reference to renderer on Vk & D3D12 DescriptorSetImpl * Refactor Stbi image loading such that memory is correctly freed when goes out of scope. Added Crt memory dump at termination. Reduced erroneous reporting by scoping TestContext. * Used capitalized acronym for STBImage to keep Tim happy. * Split out TestReporter - to just handle reporting test results Split out Options Made TestContext hold options, and the reporter Removed remaining memory leaks. * Small optimization for rawWrite, such that it directly writes over print.. * Improve comments on TestCategorySet * Fix typos in TestCategorySet * Made slangc a cpp file as part of slang-test (removing need for separate project/shared library). * * Made all test tools only available as dlls. * Made possible to invoke test tool dll from command line slang-test slangc [--bindir xxx] options to slangc * Fix Visual Studio projects that are no longer needed.	17 December 2018, 14:22:14 UTC
d43c566	jsmall-nvidia	14 December 2018, 20:24:21 UTC	Fix memory leaks around slang-test (#757) * Remove circular reference to renderer on Vk & D3D12 DescriptorSetImpl * Refactor Stbi image loading such that memory is correctly freed when goes out of scope. Added Crt memory dump at termination. Reduced erroneous reporting by scoping TestContext. * Used capitalized acronym for STBImage to keep Tim happy. * Split out TestReporter - to just handle reporting test results Split out Options Made TestContext hold options, and the reporter Removed remaining memory leaks. * Small optimization for rawWrite, such that it directly writes over print.. * Improve comments on TestCategorySet * Fix typos in TestCategorySet	14 December 2018, 20:24:21 UTC
ec745c0	Tim Foley	14 December 2018, 19:00:02 UTC	Represent global shader parameters explicitly in the IR (#756) Before this change, global shader parameters were represented in the IR as just being ordinary global variables. The only indication that a particular global represented a parameter was when it got a layotu attached to it (as part of back-end processing), and we've had a number of bugs related to layouts being dropped so that what should have been a shader parameter turned into an ordinary global variable in the output. This change is more strongly motivated by the fact that making shader parameters look like globals means that we cannot easily reason about their value when doing IR transformations. If we see two `load`s from the same global variable can we assume they yield the same value? In the general case we cannot, and this means that any transformation that wants to rely on the fact that an input `Texture2D` shader parameter can't actually change over the life of the program needs to do extra work. The fix here is to introduce a new kind of IR instruction that represents a global shader parameter directly (not a pointer to it as a global would), at which point there isn't even such a notion as a "load" from the parameter, since it represents the value directly. In several cases logic that used to apply to global variables in case they were shader parameters (by looking for a layout) is now moved to apply to these global parameters. The biggest source of issues in this change was that switching from pointers to plain values to represent these shader parameters stresses different cases in type legalization. I also had to deal with the case of legalization for GLSL where we actually do need global shader parameters that are writable (since varying output goes in the global scope), but in that case I borrowed the use of pointer-like `Out<...>` and `InOut<...>` types to represent that intent, which we were already using for function parameters representing outputs. A few tests started failing because the changes lead to a slightly different order of code emission, which in some HLSL tests resulted in a function parameter named `s` getting emitted before a global parameter named `s`, leading to the latter getting the name `s_1` instead of `s_0`. A few SPIR-V tests started failing because the new approach means that we no longer end up performing a load from all varying input parameters at the start of `main` and instead reference the varying inputs directly. The resulting code is more idomatic, but it differed from the baselines for those tests.	14 December 2018, 19:00:02 UTC
11793ed	jsmall-nvidia	13 December 2018, 23:14:38 UTC	Remove memory and resource leaks (#754) * Remove circular reference to renderer on Vk & D3D12 DescriptorSetImpl * Refactor Stbi image loading such that memory is correctly freed when goes out of scope. Added Crt memory dump at termination. Reduced erroneous reporting by scoping TestContext. * Used capitalized acronym for STBImage to keep Tim happy.	13 December 2018, 23:14:38 UTC
765c87e	Yong He	13 December 2018, 20:57:50 UTC	Merge pull request #755 from csyonghe/dictfix Fix dictionary performance issue	13 December 2018, 20:57:50 UTC
970b58a	Yong He	13 December 2018, 20:26:36 UTC	Merge branch 'master' into dictfix	13 December 2018, 20:26:36 UTC
822ed70	Tim Foley	13 December 2018, 20:13:58 UTC	Move mangled name out of IRGlobalValue (#752) * Move mangled name out of IRGlobalValue Previously the `IRGlobalValue` type was used as a root for all IR instructions that can have "linkage," in the sense that a definition in one module can satisfy a use in another module. The mangled symbol name was stored in state directly on each `IRGlobalValue`, which created some complications, and also forced IR instructions that wanted to support linkage to wedge into the hierarchy at that specific point. This change moves the mangled name out into a decoration: either an `IRImportDecoration` or an `IRExportDecoration`, both of which inherit from `IRLinkageDecoration` which exposes the mangled name. This change has a few benefits: * We can now have any kind of instruction be exported/imported, without having to inherit from `IRGlobalValue`. This could potentially let `IRStructType` and `IRWitnessTable` be simplified to just have operand lists instead of dummy chldren as they do today. * We can now easily have "global values" like functions that explicitly don't get linkage, instead of using a null or empty mangled name as a marker. * We can use the exact opcode on a linkage decoration to distinguish imports from exports, which could be used to more accurately resolve symbols during the linking step. Other than adding the decorations and making sure that AST->IR lowering adds them, the main changes here are around any code that used `IRGlobalValue`. Variables and parameters of type `IRGlobalValue` were changed to `IRInst` easily, so the main challenge was around code that casts to `IRGlobalValue. In cases where a cast to `IRGlobalValue` also performed a test for the mangled name being non-null/non-empty, we simply switched the code to check for the presence of an `IRLinkageDecoration`, since that is the new way of indicating a value with linakge. Most of the serious complications arose in `ir.cpp` around the "linking"/target-specialization and generic specialization steps. The "linking" logic was checking for `IRGlobalValue` to opt into some more complicated cloning logic, and just checking for a linkage decoration here wasn't sufficient since the front-end does* produce global values without linkage in some cases (e.g., for a function-`static` variable we produce a global variable without linkage). This logic was updated to just check for the cases that used to amount to `IRGlobalValue`s directly by opcode. It might be simpler in the short term to have kept `IRGlobalValue` around to make the existing casts Just Work, but I'm confident that this logic could actually be rewritten for much greater clarity and simplicity and that is the better way forward. The generic specialization logic was using some really messy code to generate a new mangled name to represent the specialized symbol, and then searching for an existing match for that name. The original idea there was that an IR module could include "pre-specialized" versions of certain generics to speed up back-end compilation by eliminating the need to specialize in some cases, but this feature has never been implemented so the overhead here is just a waste. Instead, I moved generic specialization to use a simpler dictionary to map the operands to a `specialize` instruction over to the resulting specialized value. This allows for some simplifications in the name mangling logic, because it no longer needs to figure out how to produce mangled names from IR instructions representing types/values. As part of this change I also overhauled the IR emit logic to produce cleaner output by default, borrowing some of the ideas from the logic in `emit.cpp`. IR values are now automatically given names based on their "name hint" decoration, if any, to make the code easier to follow, and I also made it so that types and literals get collapsed into their use sites in a new "simplified" IR dump mode (which is currently the default, with no way to opt into the other mode without tweaking the code). The resulting IR dumps are much nicer to look at, but as a result the one test that involves IR dumping (`ir/string-literal`) doesn't really test what it used to. One weird issue that came up during testing is that the `transitive-interface` test had previously been producing output that made no sense (that is, the expected output file wasn't really sensible), and somehow these changes were altering its behavior. Changing the test to use `int` values instead of `float` was enough to make the output be what I'd expect, and hand inspection of generating DXBC has me convinced we were compiling the `float` case correctly too. There appears to be some issue around tests with floating-point outputs that we should investigate. * fixup: C++ declaration order	13 December 2018, 20:13:58 UTC
8d13c06	Yong He	05 December 2018, 23:31:20 UTC	Fix dictionary performance issue	13 December 2018, 19:44:31 UTC
76280cf	jsmall-nvidia	12 December 2018, 20:17:54 UTC	Remove the valist from the ISlangWriter interface. Replace with begin/endAppendBuffer. (#751)	12 December 2018, 20:17:54 UTC
24c34cb	Tim Foley	12 December 2018, 18:15:33 UTC	Make shader parameters not writable (#750) There was already logic in place to make `const` variables not be writable, and this change extends it to also consider global shader parameters (in HLSL, any global variable that is not marked `static` or `groupshared`) to be rvalues instead of lvalues. As noted in the comments on this change, we could lift that restriction later on, if we have good handling of global variables with resource types in the back-end, but for now it is safer to just rule this out in the front-end.	12 December 2018, 18:15:33 UTC
49ed6b6	jsmall-nvidia	12 December 2018, 13:57:48 UTC	Running tests in slang-test process (#740) * First pass at having an interface to write text to that can be replaced. Simplifed and made more rigerous the interface used to write formatted strings. * Added AppContext to simplify setting up and parsing around of streams. * Added more simplified way to get the std error/out from AppContext. * Work in progress using dll for tools to speed up testing. * First pass at ISlangWriter interface. * Added support for writing VaArgs. Added NullWriter. * Use ISlangWriter for output. * Use ISlangWriter for output - replacing OutputCallback. Make IRDump go to ISlangWriter * SlangWriterTargetType -> SlangWriterChannel Improvements around AppContext * Shared library working with slang-reflection-test. * Dll testing working for render-test. * Include va_list definintion from header. * Fix errors from clang. * Fix typo for linux. * Added -usexes option * Fix typo. * Fix arguments problem on linux. * Fix typo for linux. * Add windows tool shared library projects. * Fix warning from x86 win build. Fix signed warning from slang-test/main.cpp * First attempt at getting premake to work on travis, and run tests. * Try moving build out into script. * Invoke bash scripts so they don't have to be executable. * Drive configuration/tests from env parameters set by travis * Try using source to run travis tests. * Remove the build.linux directory - but doing so will overwrite Makefile. * Made -fno-delete-null-pointer-checks gcc only. * Try to fix warning from -fno-delete-null-pointer-checks * Turn of warnings for unknown switches. * Try to make premake choose the correct tooling. * Disabled missing braces warning. * Disable -Wundefined-var-template on clang. * -Wunused-function disabled for clang. * Fix typo due to SlangBool. * Remove this nullptr tests. * "-Wno-unused-private-field" for clang. * Added "-Wno-undefined-bool-conversion" * Add DominatorList::end fix. * Split scripts into travis_build.sh travis_test.sh * Fix gcc/clang template pre-declaration issue around QualType. * Fix premake to build such that pthread correctly links with slang-glslang	12 December 2018, 13:57:48 UTC
62d3e38	Tim Foley	11 December 2018, 23:17:55 UTC	Decorations are instructions (#748) * Make a test case use IR serialization * Make all IR instructions usable as parents This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`. Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over. This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later. * Change IR decorations to be instructions. The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations and children. At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children. The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration->insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list. All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working. The `IRBuilder` was extended with factory functions to make the various decoration types and attach them. All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code. A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code. The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration). One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function). The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on. I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next. * fixup: 64-bit issues * fixup: forward declaration issues	11 December 2018, 23:17:55 UTC
565200f	jsmall-nvidia	11 December 2018, 17:31:40 UTC	Looks like a typo on DominatorList::end() (#749)	11 December 2018, 17:31:40 UTC
b299717	Tim Foley	10 December 2018, 20:42:15 UTC	Remove the "VM" and "bytecode" features (#745) * Remove the "VM" and "bytecode" features The "bytecode" in `bc.{h,cpp}` was an initial attempt at a serialized encoding for the Slang IR, but we now have the `ir-serialize.{h,cpp}` approach which was has been kept up to date much better. Similarly, the "VM" in `vm.{h,cpp}` was intended to be a system for interpreting Slang code in the bytecode format directly (so that you could load and evaluate code in a Slang module in a lightweight fashion). This never got used past a single test, which we eventually disabled. There are good ideas in some of this code, but at this point the implementations have bit-rotted to a point where trying to maintain it is more costly than it would be to re-created it if/when we ever decide these features are important again. * fixup: remove slang-eval-test from Makefile	10 December 2018, 20:42:15 UTC
32f57c3	Tim Foley	10 December 2018, 17:13:20 UTC	Tweak coding conventions (#747) After a lot of discussion, I'm backpedaling and making this document what the (main) conventions of the codebase are rather than trying to prescribe what I think they should be. In particular: * Go back to `m_` for implementation-details member names, since that is the main convention in place for them * Put the `k` prefix back on C-style `enum` types since that is how they are currently being defined. My inclination is to just stick with what we have for now to avoid extended debates. Its a bit of an inconsistent mess, but its enough for getting work done.	10 December 2018, 17:13:20 UTC
135eaff	Tim Foley	07 December 2018, 21:31:06 UTC	Change how buffers are emitted (#741) * Change how buffers are emitted This is a change with a lot of pieces, which can't always be separated out cleanly. I'm going to walk through them in what I hope is a logical order. The main goal of this change was to allow arrays of structured buffers to translate to Vulkan. Consider two declarations of structured buffers in HLSL/Slang: ```hlsl StructuredBuffer<X> single; StructuredBuffer<Y> multiple[10]; ``` The current translation logic was handling `single` by translating it into an unnamed GLSL `buffer` block like: ```glsl layout(std430) buffer _S1 { X single[]; }; ``` That syntax allows an expression like `single[i]` in Slang to be translated simply as `single[i]` in GLSL. But that naive translating doesn't work for `multiple`, since we need to declare a array of blocks in GLSL, which requires giving the whole thing a name: ```glsl layout(std430) buffer _S2 { Y _data[]; } multiple[10]; ``` Now a reference to `multiple[i][j]` in Slang needs to become `multiple[i]._data[j]` in GLSL. To avoid having way too many special cases around single structured buffers vs. arrays, it makes sense to allows emit things in the latter form, so that we instead lower `single` as: ```glsl layout(std430) buffer _S1 { X _data[]; } single; ``` So that now a reference to `single[i]` becomes `single._data[i]` in GLSL. Most of that can be handled in the standard library translation of the structured buffer indexing operations. The only wrinkle there is that there were some old special-case instructions in the IR intended to handle buffer load/store operations (these were added back when I was trying to keep the "VM" path working). These aren't really needed to have structured-buffer operations work; they can be handled as ordinary functions as far as the stdlib is concerned. I removed the old instructions. Along the way, it became clear that a few other cases follow the same pattern. Byte-addressed buffers are an obvious case. We were lowering HLSL/Slang: ```hlsl ByteAddressBuffer b; ... uint x = b.Load(0); ``` to GLSL like: ```glsl layout(std430) buffer _S1 { uint b[]; }; ... uint x = b[0]; ``` That logic would fail for arrays the same way that the structured buffer case was failing. The fix is the same: use named `buffer` blocks and then introduce an explicit `_data` field: ```glsl layout(std430) buffer _S1 { uint _data[]; } b; ... uint x = b._data[0]; ``` Just like with structured buffers, all of the VK translation for operations on byte-addressed buffers can be implemented directly in teh stdlib, so once the emit logic was changed it was just a matter of adding `._data` to a bunch of VK tranlsations. It turns out that arrays of constant buffers have more or less the same problem, and furthermore we have some problems with any code that directly uses the modern HLSL `ConstantBuffer<T>` type. Note: the emit logic around constant buffers sometimes refers to "parameter groups" because that is being used in the compiler as a catch-all term for constant buffers, texture buffers, and parameter blocks. The existing code was going out of its way to reproduce the way that constant buffer declarations are implicitly referenced in HLSL: ```hlsl cbuffer C { float f; } ... float tmp = f; // No reference to `C` here ``` This can be seen in the emit logic with the `isDerefBaseImplicit` function, which is used to take the internal IR representation for a reference to `f` (which is closer to the expression `(C).f` or `C->f`) and leave off any reference to `C` so that we emit just `f`. That kind of logic just flat out doesn't work in some important cases. Arrays of constant buffers are a clear one: ```hlsl ConstantBuffer<X> cbArray[3]; ... X x = cbArray[0]; ``` There is no way to translate that to an ordinary `cbuffer` declaration at all. The same problem can be created without arrays, though: ```hlsl ConstantBuffer<X> singleCB; ... X x = singleCB; ``` The current strategy for translating constant buffers was translating `singleCB` into a `cbuffer` declaration that reproduced the fields of `X` as its members, which just wouldn't work: ```hlsl cbuffer singleCB { float f; // field of `X` } ... X x = singleCB; // ERROR: there is nothing named `singleCB` in this HLSL ``` The new strategy is more consistent. We still generate a `cbuffer` declaration for a single constant buffer, but we always give it a single field of the chosen element type: ```hlsl cbuffer singleCB { X singleCB; } ... X x = singleCB; // this works fine! ``` And in the array case we generate code that uses the explicit `ConstantBuffer<T>` type: ```hlsl ConstantBuffer<X> cbArray[3]; ... X x = cbArray[0]; ``` The GLSL output is more complicated because unlike with HLSL there is no implicit conversion from a uniform block to its element type (there is no notion of an element type). The array case thus needs a `_data` field similar to what we do for structured buffers: ```glsl layout(std140) uniform _S3 { X _data; } cbArray[3]; ... X x = cbArray[0]._data; ``` And then the non-array case needs to have a similar `_data` field for consistency: ```glsl layout(std140) uniform _S1 { X _data; } singleCB; ... X x = singleCB._data; ``` This is handled by inserting the necessary reference to `_data` whenever we dereference a constant buffer, either as part of a load instruction (loading from the whole CB as a pointer), or an `IRFieldAddress` instruction which forms a pointer into the CB (e.g., `&(singleCB->f)` becomes `singleCB._data.f`). The current emit logic handles `ParameterBlock<X>` differently from `ConstantBuffer<X>`, but really only to allow parameter blocks to be explicitly named in the output, while constant buffers were left implicit by default. Thus the only difference was a legacy one (from back when trying to exactly reproduce the HLSL text we got as input was considered an important goal), and the new approach to emitting constant buffers would get rid of it. I removed the separate logic for emitting `ParameterBlock<X>` and just let the handling for constant buffers deal with it. Note that any resource types inside of a `ParameterBlock<X>` would have been moved out as part of legalization, so that a parameter block is 100% equivalent to a constant buffer when it comes time to emit code. Unsurprisingly, changing the way we generate HLSL and GLSL output for all these buffer types meant that any tests that were directly comparing the output of `slangc` against `fxc`, `dxc`, or `glslang` broke. The basic approach to fixing the breakage in GLSL tests was to update the GLSL baseline to reflect the new output startegy. In some cases I used macros to name the various `_S<digits>` temporaries so that future renaming will hopefully be easier (it would be great if we auto-generated temporary names with a bit more context). There was one GLSL test (`tests/bugs/vk-structured-buffer-binding`) that was using raw GLSL expected output, and this was changed to use a GLSL baseline to generate SPIR-V for comparison. For HLSL tests we were sometimes running the same input file through `slangc` and `fxc`/`dxc`, and in these cases I macro-ized the various `cbuffer` declarations to generate different declarations depending on the compiler. I completely dropped the tests coming from the D3D SDK because they aren't providing much coverage, and updating them would change them so far from the original code that the purported benefit (using a body of existing shaders) would be lost. I also dropped the explicit matrix layout qualifiers in the `matrix-layout` test because the new output strategy breaks those for GLSL (you can't put matrix layout qualifiers on `struct` fields, and now the body of every constant buffer is inside a `struct`). This isn't as big of a loss as it seems, because our handling of those qualifiers wasn't really right to begin with. Slang users should only be setting the matrix layout mode globally (and we should probably switch to error out on the explicit qualifiers for now). The other thing that got dropped is tests involving `packoffset` modifiers. Slang already warns that it doesn't support these, and the way they were used in the test cases is actually misleading. For the binding/layout-related tests, the goal was to show that Slang reproduces the same layout as fxc, in which case explicitly enforcing a layout via `packoffset` seems like cheating (are we sure we enforced the layout fxc would have produced?). The real reason was that Slang used to emit explicit `packoffset` on every* field of a `cbuffer` it would output, because of an `fxc` bug where you couldn't use `register` on textures/samplers declared inside a `cbuffer` unless every field in the `cbuffer` used a `register` or `packoffset` modifier. Slang hasn't required that behavior in a while because it now splits textures and samplers, and the one test case where we needed `packoffset` to work around the `fxc` bug in the baseline HLSL has been macro-ified even more to work around the bug. The amount of churn in the test cases is unfortunate, but it continues to point at the weakness of any testing strategy that checks for exact equivalent between Slang's output and that of other compilers. We need to keep working to replace these tests with better alternatives. In `check.cpp` there is logic to perform implicit dereferencing, so that if you write `obj.f` where `obj` is a `ConstantBuffer<X>` (or some other "pointer-like" type) and `f` is a field in `X`, then this effectively translates as `(obj).f`. That is, we dereference the value of type `ConstantBuffer<X>` to get a value of type `X`, and then refer to the field of the `X` value. There was a problem where the logic to insert that kind of implicit dereference operation was using a reference (`auto& type = ...`) for the type of the expression being dereferenced, and then clobbering it. This would mean that an expression of type `ConstantBuffer<X>` would have its type overwritten to be just `X` and then codegen would break later on. I'm not sure how we haven't run into that before. The `array-of-buffers` test case was added to confirm that we now support arrays of constant, structured, and byte-address buffers for both DXIL and SPIR-V output. Okay, so that was a lot of stuff, but hopefully it is clear how this all works to make the output of the compiler more consistent and explicit, while also supporting the required new functionality. fixup: review feedback	07 December 2018, 21:31:06 UTC
b0c2423	Tim Foley	06 December 2018, 21:30:34 UTC	Add more design documents (#742) * Add more design documents This change adds a set of documents about the design/implementation of the Slang compiler intended for helping to onboard new contributors, explain some of the choices that go into the current implementation, and outline the broad strokes of some of our future feature goals. As explained in the `README` for these docs, I don't expect them to be kept in sync with the code in the long run; there is no expectation that people making a PR for code changes also go and fix up these documents. The purpose of these is to add a bit of a "historical record" of design choices and project goals to the project itself, so they can be browsed in a way that is more convenient than the issue tracker or source control history. The "Coding Conventions" document is very much a first attempt, and I'm not 100% wedded to any of the decisions in there. We haven't enforced consistent conventions so far, but the importance of having something in place will only grow with the number of contributors. * Edits based on review * Fix many typos (thanks @jsmall-nvidia) * Change up some language in coding conventions based on discussions since when it was originally written. I'm still not 100% sure about some of the choices in there, but we need to get something established.	06 December 2018, 21:30:34 UTC
4f69056	Tim Foley	04 December 2018, 21:56:59 UTC	Fix an SSA construction bug (#739) Fixes #723 This fixes a case where the SSA construction pass wasn't dealing with the possibility of a phi node that it had provisionally introduced being replaced later. The result was invalid IR (caught with `-validate-ir`) that referenced an instruction nowhere in the IR module (because it was dropped). The fix centralizes the code for dealing with phi nodes that have been replaced, so that the two different paths where variables get "read" during SSA construction can both use the same logic.	04 December 2018, 21:56:59 UTC
3e96221	Tim Foley	04 December 2018, 02:27:28 UTC	A few extra functions cross-compiled to Vulkan. (#738) This is building on the work done in #737, and just borrows translations from [this](https://anteru.net/blog/2016/mapping-between-hlsl-and-glsl/index.html) blog post. One thing that I don't try to get right here, and that seems to be a recurring pattern is that there are something HLSL functions that operate on matrix types where the GLSL equivalents only work on vectors. These can be handled in "user space" by writing an explicit (profile-specific) overload that just calls the vector function on each row, but that adds complexity in the stdlib that I'm avoiding for now (since almost nobody performs these operations on matrices anyway).	04 December 2018, 02:27:28 UTC
d161d97	Tim Foley	03 December 2018, 22:57:19 UTC	Add Vulkan translations for HLSL barrier operations (#737) Fixes #730 The most basic of these already had translations added, and the Slang approach to cross-compilation made adding the new ones almost trivial. I also took the time to confirm that using "operator comma" for the sequencing (since the single HLSL function call needs to expand to a sequence of GLSL function calls) works fine with glslang. I added a test case to confirm that all of these operations can compile down to SPIR-V. I am not personally confident that these translations are perfect, but they are based on what was [linked](https://anteru.net/blog/2016/mapping-between-hlsl-and-glsl/index.html) from #730. The one difference is where that blog post was listing `memoryBarrierImage, memoryBarrierImage` I assumed they meant `memoryBarrierImage, memoryBarrierBuffer`.	03 December 2018, 22:57:19 UTC
3d60cc0	Tim Foley	01 December 2018, 01:52:52 UTC	Add support for Vulkan raytraicng "shader record" (#735) The syntax for this is a placeholder for now, since we will probably want to migrate to whatever gets decided on for dxc. To declare that some data should be part of the "shader record" use `layout(shaderRecordNV)` to mirror the GLSL raytracing extension: ```hlsl layout(shaderRecordNV) cbuffer MyShaderRecord { float4 someColor; uint someValue; } ``` The intention (not enforced) is that an application would map `MyShaderRecord` to "root constants" in the "local root signature" when compiling for DXR, while the output code in GLSL will always map to the shader record in Vulkan raytracing: ```glsl layout(shaderRecordNV) buffer MyShaderRecord { float4 someColor; uint someValue; }; ``` This change does not support declaring a global value of `struct` type with `layout(shaderRecordNV)` (or a `ParameterBlock` with the modifiers, although that would be a nice-to-have feature) and it does not support having the contents of the shader record be mutable (even if GLSL/Vulkan allows it). Those can/should be added in future changes. In terms of implementation, this closely mirrors the way that `layout(push_constant)` buffers were being handled, where the data inside the `ConstantBuffer<X>` (the value of type `X`) gets laid out using ordinary rules (and consuming ordinary `UNIFORM` storage, while the buffer itself is given a different layout resource to reflect that fact that it does not consume a VK `binding` any more, but a different conceptual resource. Note: an alternative design here (that might actually be preferrable) would be to have both push-constant and shader-record buffers be handled as alternative aliases for `ConstantBuffer` (or maybe `ParameterBlock`) so that you have, e.g.: ```hlsl PushConstantBuffer<X> myPushConstants; ShaderRecord<Y> myShaderRecord; ``` This alternative design avoids API-specific decorations on the declarations, and reflects the intent of the programmer very directly, even when they are compiling for a target like D3D that doesn't reflect these choices at the IL level (it could still be exposed through the Slang reflection API).	01 December 2018, 01:52:52 UTC
7f2d2c9	Tim Foley	30 November 2018, 23:03:31 UTC	Allow parameter blocks to be explicitly bound to spaces (#736) * Don't look at VK bindings when compiling for D3D and vice versa The compiler had been looking at all the modifiers on a declaration when piecing together binding information, whether or not those modifiers should apply on the chosen target API. This was working in practice because the "layout resource kinds" used by each API target were disjoint, for the most part. This change ensures that we don't even look at modifiers that don't apply on the chosen target, and furthermore adds a new warning that applies if the user is compiling a shader with explicit `register` bindings for Vulkan, if there are no corresponding `[[vk::binding(...)]]` attributes (under the assumption that if they want to be explicit in one case, they probably want to be explicit in all cases). * Allow explicit space/set bindings on parameter blocks The syntax for the D3D case is to specify a `space` in a `register` modifier, without any other register class: ```hlsl ParameterBlock<X> myBlock : regsiter(space999); ``` In the Vulkan case, the user must apply the `[[vk::binding(...)]]` attribute and is expected to use a `binding` of zero: ```hlsl [[vk::binding(0,999)]] ParameterBlock<X> myBlock; ``` This change includes a reflection test for the new capability (where we also confirm that it produces the expected output when compared with fxc), and a test for the diagnostic messages when the user messes up bindings for Vulkan. The implementation itself is fairly straightforward, since the compiler already treats registe spaces/sets as a resource that parameters can consume directly. Note: the test case for explicit parameter block space/set bindings includes some commented out code that lead to a compiler crash. I would like to fix the underlying issue, but it seemed sensible to keep the bug fix out of a change like this that is adding functionality.	30 November 2018, 23:03:31 UTC
e40c106	Tim Foley	30 November 2018, 18:42:42 UTC	Use register spaces by default for D3D12 targets (#734) The change here is that the logical that used to be controlled by `-parameter-blocks-use-register-spaces` is now turned on unconditionally, meaning that a `ParameterBlock<X>` will get its own register `space` by default when targetting D3D Shader Model 5.1 and later. I had originally made this feature optional because I wasn't sure whether Shader Model 5.1 should default to using register spaces or not, because D3D 11 doesn't support spaces at the API level and MSDN documetnation made it sound like SM5.1 was available for D3D11 as well. Subsequent reading has led me to understand that MSDN is wrong on this front, and SM5.1 and later are D3D12-only, so it is always safe to use spaces. The new logic is now that we automatically use spaces for parameter blocks any time it is possible (SM5.1+ and any Vulkan target), and otherwise fall back to not using spaces (SM5.0 and earlier). I updated a reflection test case that was covering parameter blocks to confirm the output differs between SM5.0 and 5.1.	30 November 2018, 18:42:42 UTC
b51a582	jsmall-nvidia	29 November 2018, 18:55:45 UTC	Use hardware if available for Dx11 testing (#733) * Try using hardware device before reference on dx11 * Output error string on renderer construction failure	29 November 2018, 18:55:45 UTC
e5cc466	Tim Foley	29 November 2018, 15:48:38 UTC	Fix uses of dynamic_cast on types in reflection API (#731) The `Type` infrastructure uses a class hierarchy, but blindly `dynamic_cast`ing to a desired case doesn't always give the expected result, because a `Type` could represent a `typedef` (a `NamedExpressionType`) that itself resolves to, e.g, a vector type (a `VectorExpressionType`). In that case a `dynamic_cast<VectorExpressionType*>(someType)` would fail, even though the type logically represents a vector. The `Type::As<T>()` method is designed to handle this case, by "looking through" simple `typedef`s to get at the real definition of a type. The fix in this case is to use `Type::As<T>()` at various points in the reflection code (`reflection.cpp`) instead of `dynamic_cast`. This problem surfaced with a `StructuredBuffer<float2>` not reflecting correctly, because the element type (`float2`) is actually a `typedef` (for `vector<float,2>`), so I've included a test case that stresses that case. Getting the right output in the test required tweaking the `slang-reflection-test` tool to produce additional output for resource types (currently narrowed down to only affect structured buffers to avoid large diffs in expected test outputs).	29 November 2018, 15:48:38 UTC
c3c34bf	Tim Foley	29 November 2018, 15:48:23 UTC	Add support for globallycoherent modifier (#732) The `globallycoherent` modifier indicates that resource might be read or written by threads outside of the current thread group, so that any memory barriers that affect it should guarantee coherency at the global memory scope, and not just thread-group scope. The equivalent GLSL modifier appears to be `coherent`. This change adds the front-end modifier, transforms it into an IR-level decoration during lowering, and then checks for the modifier during code emit. Note: this logic may not behave correctly when `globallycoherent` is added to a field in a `struct`, since the modifier would then need to be propagated to any variables created during type legalization. Checking up on that is left to future work. Note: it isn't entirely clear if `globallycoherent` should be treated as a declaration modifier or a type modifier. The point is moot for now because Slang doesn't have any support for type modifiers, but when we get around to that we will need to make a decision.	29 November 2018, 15:48:23 UTC
7f0ccc5	jsmall-nvidia	28 November 2018, 22:56:32 UTC	* Renamed spSessionHasCompileTargetSupport to spSessionCheckCompileTargetSupport. (#728) * Improved return codes from spSessionCheckCompileTargetSupport	28 November 2018, 22:56:32 UTC
e21d5ad	jsmall-nvidia	21 November 2018, 18:41:34 UTC	Feature/early depth stencil (#727) * First pass support for early depth stencil. * Add a simple test to check if output has attributes. * Use cross compilation to test [earlydepthstencil] on glsl. * If target is dxil, use dxc to test against. Add hlsl to test earlydepthstencil against. * * Added spSessionHasCompileTargetSupport * Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run	21 November 2018, 18:41:34 UTC
9bb11b6	Tim Foley	21 November 2018, 16:15:33 UTC	Add support for unbounded arrays as shader parameters (#725) * Add support for unbounded arrays as shader parameters With this change, Slang shaders can use unbounded-size arrays as parameters, e.g.: ```hlsl Texture2D t[] : register(t3, space2); SamplerState s[]; ``` As shown in the above example, Slang supports both explicit `register` declarations on unbounded-size arrays and also implicit binding. When doing automatic parmaeter binding, Slang will allocate a full register space to an unbounded-size array of textures/smaplers, starting at register zero. Note that for the Vulkan target, an array of descriptors of any size (including unbounded size) consumes only a single `bindign`, so much of this logic is specific to D3D targets. Details on the changes made: * The single biggest change is a new `LayoutSize` type that is used to store a value that can either be a finite unsigned integer or a dedicated "infinite" value (which is stored as the all-bits-set `-1` value). This is used in places where a size could either be a finite value or an "unbounded" value, to both try to make standard math robust against the infinite case, and also to force code to deal with both the finite and infinite cases more explicitly when they care about the difference. * The public API was documented so that unbounded-size arrays report their size as `-1`. We should probably change this function to return a signed value instead of `size_t`, but that would technically be a source-breaking change, so we want to make sure we stage it appropriately. * The code that invokes fxc was updated so that it passes the appropriate flag to enable unbounded arrays of descriptors. I haven't looked yet at whether dxc needs such a flag, so there may need to be a follow-on change to add that. * The logic in the `UsedRanges::Add` method for tracking what registers have been claimed was rewritten because the previous version had some subtle bugs. The new version includes more detailed comments that attempt to explain why I think the new logic works. * The top-level logic for auto-assigning bindings to parameters has been overhauled to deal with the fact that a parameter that needs "infinite" amounts of a resource should be claiming a full register space for those resources instead. Whenever a parameter allocates any register spaces we want them all to be contiguous, so we have a loop that counts the requirements and allocates the spaces before we go along and dole them out. * When computing the layout for an array type, we need to carefully deal with unbounded-size arrays. In the case of an unbounded array of a "simple" resource type (e.g., `Texture2D[]`), we opt to expose the type layout as consuming an infinite number of the appropriate register, while in the case of a complex type (say, a `struct` with two texture fields), we need to instead allocate whole spaces for those fields. The logic here is more subtle than I would like, and interacts with the existing code that "adjusts" the element type of an array in order to make standard indexing math Just Work. * Similarly, when a `struct` type has unbounded-array fields, then we need to transform any field with infinite register requirements to instead consume a space in the resulting aggregate type. This case is comparatively easier than the array case. * The test case for unbounded arrays covers both explicit and implicit bindings, and also the case of an unbounded array over a `struct` type (it does not cover the case of a `struct` contianing unbounded arrays, so that will need to be added later). For this test we are both validation the output reflection data and that we produce the same code as fxc (with explicit bindings in the fxc case). * The reflection test app was modified to use the new API contract and detect when a parameter consumes `SLANG_UNBOUNDED_SIZE` resources. * Fixup: ensure unbounded size is defined at right bit width	21 November 2018, 16:15:33 UTC
7f97f35	jsmall-nvidia	20 November 2018, 21:00:48 UTC	Keep resources in scope for Vk in DescriptorSet. (#726)	20 November 2018, 21:00:48 UTC
1f4b495	Tim Foley	19 November 2018, 17:08:26 UTC	Fix declaration of RWTexture*.Load() operations (#722) The `Texture2D.Load()` operation takes a `uint3` of coordinates, with the `.xy` part holding the texel coordinates and the `.z` part holding a mip level. In contrast, the `RWTexture2D.Load()` operation only takes a `uint2`. This isn't clearly specified on MSDN, so Slang failed to get the declaration right. This change fixes it so that we only add the extra coordinate for the mip level on non-multisample, read-only texture types (the previous code only checked for multisample-ness). I also changed the logic that outputs swizzles to extract the coordinates and mip level so that it only applies when there is a mip level being passed through (this code should never actually be applied, though, because we shouldn't be generating `texelFetch` for RW texures anyway...). One final change that sneask in here is making the `offset` parameter for one of the load-with-offset cases correctly use the base coordinate count for the texture type (e.g., 2D even for `Texture2DArray`). That is an unrelated fix, but I thought I'd sneak it in here rather than do a separate PR.	19 November 2018, 17:08:26 UTC
7e0c5ad	Tim Foley	19 November 2018, 17:07:34 UTC	Add Vulkan cross-compilation for byte-address buffers (#721) * Add Vulkan cross-compilation for byte-address buffers This covers `ByteAddressBuffer`, `RWByteAddressBuffer`, and `RasterizerOrderedByteAddressBuffer`. A declaration of any of these types translates to a GLSL `buffer` declaration with a single `uint` array of data. Most of the methods on these types then have straightforward translations to operations on the array. The overall translation is similar to what was already being done for structured buffers. While implementing GLSL translation for the various atomic (`Interlocked`) methods, I discovered that some of these included declarations that aren't actually included in HLSL. I cleaned these up, including in the declarations of the global `Interlocked` functions. The test case that is included here covers only the most basic functionality: `Load`, `Load2`, `Load4` and `Store`. We should try to back-fill tests for the remaining methods when we have time. Two large caveats with this work: 1. We don't handle arrays of byte-address buffers, just as we don't handle arrays of structured buffers. That will take additional work. 2. We don't handle byte-address (or structured) buffers being passed as function parameters, since the parameter would need to be declared as a bare `uint[]` array. * Fixup: don't lump raytracing acceleration structures in with buffers Raytracing acceleration structures share a common base class with byte-address buffers (they are both buffer resources without a specific element type), and I was mistakently matching on this base class in an attempt to have a catch-all that applied to all byte-address buffers. The fix here was to add a distinct base class for all byte-address buffers and catch that instead. * Fixup: typos	19 November 2018, 17:07:34 UTC
0f67d92	jsmall-nvidia	16 November 2018, 21:04:56 UTC	Bug fix - vk::binding on structured buffers (#720) * Fix output of binding of structured buffer on GLSL. * Added test to check vk binding is coming thru. * Fix closethit binding inconsistency.	16 November 2018, 21:04:56 UTC
cfb1f61	jsmall-nvidia	13 November 2018, 22:02:27 UTC	* Fix bug outputing dxbc assembly (#719) * Make the disassembly methods returns SlangResult and String as last output param so as to make error case clear.	13 November 2018, 22:02:27 UTC
039c233	Tim Foley	12 November 2018, 17:57:46 UTC	Add callable shader support for Vulkan ray tracing (#718) * Add callable shader support for Vulkan ray tracing This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec. One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines. Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`. We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind). Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms). * Typo fix: assocaited->associated One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot. * Typo fix: defintiion->definition	12 November 2018, 17:57:46 UTC
c07f60a	Tim Foley	09 November 2018, 21:24:28 UTC	Update Vulkan ray tracing support to final extension spec (#717) * Update version of glslang used * Update VK raytracing support for final extension spec A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`. The Slang standard library and the GLSL baselines for the tests are consistently updated. The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in: ```glsl int payloadID = 0; traceNV(..., payloadID); // ERROR ``` In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here: 1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression. 2. We could introduce a pass to replace the payload-location instructions with literal integers. 3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site) 4. We could somehow mark the builtin function as not having side effects. We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM. We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response. This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.	09 November 2018, 21:24:28 UTC
d782a16	jsmall-nvidia	09 November 2018, 15:23:37 UTC	Feature/teamcity output (#715) * First pass support for TeamCity compatible output. * Remove reset of counters on starting suite - so summary is over all suites.	09 November 2018, 15:23:37 UTC
de6ca4d	jsmall-nvidia	07 November 2018, 00:17:10 UTC	Made sameStageSpecifiedMoreThanOnce a warning instead of an error. (#714)	07 November 2018, 00:17:10 UTC
02e18b5	Tim Foley	07 November 2018, 00:16:52 UTC	Translate NonUniformResourceIndex() calls to Vulkan GLSL (#713) These calls translate to uses of the `nonuniformEXT` qualifier introduced by the `GL_EXT_nonuniform_qualifier` extension. The standard library changes in this case are straightforward uses of existing compiler mechanisms. The test case is one of the less pleasant ones where we compare SPIR-V output against SPIR-V generated from a hand-coded GLSL baseline. This is a case where a simpler test type that just checks for specific textual matches in the output (and not whole files) would be better, but that is out of scope for this change.	07 November 2018, 00:16:52 UTC
1185bd4	jsmall-nvidia	06 November 2018, 19:28:25 UTC	Feature/shared library refactor (#712) * * Added ISlangSharedLibraryLoader and ISlangSharedLibrary * Implemented default implementations * Added slang API function to get/set the ISlangSharedLibraryLoader on the session * Put function caching onto the Session - so that if the loader is chaged, its easy to reset the shared libraries, and functions * Run premake. * Fix problem with setting null, would cause an unnecessary function/shared lib flush. * * Unload SharedLibrary when DefaultSharedLibrary is deleted. * Make SharedLibrary handle unload safely if already unloaded. * Refactor SharedLibrary, such that it becomes a utility class - simplifying it's semantics. * Simplified ISlangSharedLibrary such that doesn't have unload and isLoaded so easier to implement. Use updated SharedLibrary impl. * Disable aarch64 on windows * Premake windows files without aarch64 build. * Moved slang-shared-library to core (so can be used in code outside of main slang) Fixed problem in premake5 where on windows projects were incorrectly constructed * Allowed RefObject to base class of com types Added ConfigurableSharedLibraryLoader Added -dxc-path -fxc-path -glslang-path Fix problem with dxc-path not honoring it's path when loading dxil * Added documentation for command line control of dll loading paths. * Remove some tabbing issues. * Change name of include guard.	06 November 2018, 19:28:25 UTC
4533319	Tim Foley	01 November 2018, 22:08:54 UTC	Add support for a "strict" floating-point mode (#709) This change adds an API function and command line options for controlling the default floating-point behavior for a target, with options for "fast" and "precise" computation. The "precise" option gets mapped to the "IEEE strictness" mode in `fxc` and `dxc` (there is currently no equivalent option for glslang that I could find).	01 November 2018, 22:08:54 UTC
82f57ae	Tim Foley	01 November 2018, 20:23:45 UTC	Disable warnings when compiling through dxc (#710) I had hoped to just disable certain warnings (false positives that trigger based on how Slang outputs HLSL code), but dxc doesn't support fine-grained control over warnings from the command line (we might investigate `#pragma`-based options later). There are enough user complaints about how downstream compiler errors come mixed with tons of distracting warnings that disabling them will lead to a better user experience for now. This change also fixes a long-standing bug where apparently dxc does not guaranteee that the diagnostic blob it returns is nul-terminated (despite this convention being established by fxc), so that we need to nul-terminate it ourselves before emitting any diagnostic messages produced by dxc.	01 November 2018, 20:23:45 UTC
e55d8dc	jsmall-nvidia	01 November 2018, 17:58:24 UTC	Newer versions of gcc, optimize away tests for this being null, because this being null is defined as undefined behavior in the standard. This is a workaround that disables that optimization for now. (#708)	01 November 2018, 17:58:24 UTC
a73b24f	jsmall-nvidia	31 October 2018, 22:25:35 UTC	Added support for cross-compilation. (#707)	31 October 2018, 22:25:35 UTC
2993dd8	Tim Foley	31 October 2018, 14:01:07 UTC	Fix a precedence bug in code emit (#705) * Fix a precedence bug in code emit Given code like the following: ```hlsl float a = ...; float3 b = pow(a, 2.0); float3 c = b.xyz; ``` There is an implicit cast from `float` to `float3` in the computation of `b`, that Slang will always make explicit in the output. Slang will also tend to pull the computation of `b` into the next expression if it has no other use sites in the same function. When it does, the compiler was failing to parenthesize the result correctly, and yielded (more or less): ```hlsl float a = ...; float3 c = (float3) pow(a,2.0).xyz; ``` As you can see, the swizzle ended up attached to the `pow()` call instead of the cast, and the downstream compiler luckily complained that we couldn't apply an `.xyz` swizzle to a scalar value. This change adds the missing parentheses-insertion logic for that case of emitting a cast expression, so that we instead get: ```hlsl float a = ...; float3 c = ((float3) pow(a,2.0)).xyz; ``` I added a test case to catch this specific issue, but there is of course no guarantee that we haven't missed other cases in the emit logic. This is why I held out so long on getting to the "why so many parentheses?" complaints... * remove commented-out code from test program	31 October 2018, 14:01:07 UTC
f0f1571	Tim Foley	31 October 2018, 00:50:12 UTC	Fix handling of DXR profiles (#704) The logic in `getEffectiveProfile()` function was mapping these to use `Stage::Unknown` in an early attempt to handle the way that dxc requires the `lib_` profile for DXR shaders, instead of anything that mentions the stage name (in constrast to, e.g., `vs_5_1`). At the same time, the `GetHLSLProfileName()` function was updated to explicitly handle the DXR shaders and map anything it doesn't expect (including `Stage::Unknown`) to a profile named `unknown`, which dxc obviously doesn't like. This change tries to fix both issues by: Having `getEffectiveProfile()` no longer clobber the stage part of a profile for DXR shaders. * Having `GetHLSLProfileName()` map all unhandled cases to the `lib_*` profiles, since that seems likely to be how any future stages will need to be handled as well (based on the precedent with DXR) Along the way, I also fixed a bug where invoking command-line `slangc` with no `-stage` options and then relying on `[shader(...)]` attributes to pick up the entry points would lead to a crash since the array of per-entry-point output paths on each target would not be sized appropriately.	31 October 2018, 00:50:12 UTC
098dd5d	Tim Foley	30 October 2018, 21:21:34 UTC	Fix a crash on function-static variables with initializers (#703) This code path hadn't been used, and it had a crash due to not inserting the basic blocks it created (for initializing the variable) into the parent function. The fix adds a bit more smarts to the `IRBuilder` to help with inserting basic blocks into the flow of a function. The actual user issue was around `static const` declarations, and it is clear that the code is incorrectly treating a function local `static const` as if it were just `static`. That will need to be fixed in another change.	30 October 2018, 21:21:34 UTC
baf0608	jsmall-nvidia	30 October 2018, 19:31:27 UTC	Feature/serial string pool refactor (#702) * Ongoing serialization for full debug work. * Use StringRepresentationCache and StringSlicePool for serialization. * Removed some older path handling for serialization which had some wrong underlying assumptions. * Builds with refactored use of SubStringPool in ir-serialize. * Removed prohibitedCategories because not used anywhere. * Add category 'compatibility-issue' * Remove work in progress on debug serialization.	30 October 2018, 19:31:27 UTC
2a76449	Tim Foley	30 October 2018, 16:05:40 UTC	Fix declarations of InterlockedMin/Max (#700) Fixed #699 These functions were declared with an `in out` parameter (copy in, copy out) where they should have used `__ref` (true by-reference parameter passing).	30 October 2018, 16:05:40 UTC
d7c0970	Tim Foley	30 October 2018, 14:04:52 UTC	Fix implementation of >= in preprocessor conditionals (#701) By a copy-paste error, `>=` in a preprocessor condition was behaving as `<=`.	30 October 2018, 14:04:52 UTC
7259855	Tim Foley	29 October 2018, 21:44:39 UTC	Rework command-line options handling for entry points and targets (#697) * Rework command-line options handling for entry points and targets Overview: * The biggest functionality change is that the implicit ordering constraints when multiple `-entry` options are reversed: any `-stage` option affects the `-entry` to its left instead of to its right as it used to. This is technically a breaking change, but I expect most users aren't using this feature. * The options parsing tries to handle profile versions and stages as distinct data (rather than using the combined `Profile` type all over), and treats a `-profile` option that specifies both a profile version and a stage (e.g., `-profile ps_5_0`) as if it were sugar for both a `-profile` and a `-stage` (e.g., `-profile sm_5_0 -stage fragment`). * We now technically handle multiple `-target` options in one invocation of `-slangc`, but do not advertise that fact in the documentation because it might be confusing for users. Similar to the relationship between `-stage` and `-entry`, any `-profile` option affects the most recent `-target` option unless there is only one `-target`. * The logic for associating `-o` options with corresponding entry points and targets has been beefed up. The rule is that a `-o` option for a compiled kernel binds to the entry point to its left, unless there is only one entry point (just like for `-stage`). The associated target for a `-o` option is found via a search, however, because otherwise it would be impossible to specify `-o` options for both SPIR-V and DXIL in one pass. * The handling of output paths for entry points in the internal compiler structures was changed, because previously it could only handle one output path per entry point (even when there are multiple targets). The new logic builds up a per-target mapping from an entry point to its desired output path (if any). Details: * Support for formatting profile versions, stages, and compile targets (formats) was added to diagnostic printing, so that we can make better error messages. This is fairly ad hoc, and it would be nice to have all of the string<->enum stuff be more data-driven throughout the codebase. * Test cases were added for (almost) all of the error conditions in the current options validation. The main one that is missing is around specifying an `-entry` option before any source file when compiling multiple files. This is because the test runner is putting the source file name first on the command line automatically, so we can't reproduce that case. * Several reflection-related tests now reflect entry points where they didn't before, because the logic for detecting when to infer a default `main` entry point have been made more loose * On the dxc path, beefed up the handling of mapping from Slang `Profile`s to the coresponding string to use when invoking dxc. * A bunch of tests cases were in violation of the newly imposed rules, so those needed to be cleaned up. * There were also a bunch of test cases that had accidentally gotten "disabled" at some point because there were comparing output from `slangc` both with and without a `-pass-through` option, but that meant that any errors in command-line parsing produced the same error output in both the Slang and pass-through cases. This change updates `slang-test` to always expect a successful run for these tests, and then manually updates or disables the various test cases that are affected. * When merging the updated test for matrix layout mode, I found that the new command-line logic was failing to propagate a matrix layout mode passed to `render-test` into the compiler. This was because the `-matrix-layout` options were implemented as per-target, but the target was being set by API while the option came in via command line (passed through the API). It seems like we want matrix layout mode to be a global option anyway (rather than per-target), so I made that change here. Add missing expected output files * A 64-bit fix * Remove commented-out code noted in review	29 October 2018, 21:44:39 UTC
56c9de0	jsmall-nvidia	26 October 2018, 21:27:45 UTC	Premake improvements (#696) * Make CacheFileSystem dtor virtual. * Fixing problems around build.linux and windows intermediate files being placed in obj.	26 October 2018, 21:27:45 UTC
36742ae	Tim Foley	26 October 2018, 17:58:08 UTC	Work around dxc matrix layout behavior (#694) The Slang compiler allows the default matrix layout convention (row-major vs. column-major) to be specified via the command line or API. When generating output HLSL, Slang emits a `#pragma pack_matrix` directive for the chosen default convention, so that a user can generate plain HLSL output and still have it encode their desired defaults. The problem that has arisen is that many released versions of dxc (including those in the most recent Windows SDK at this time) ignore the `#pragma pack_matrix` directive (the feature has since been added to top-of-tree dxc). The main fix here is to instead pass the `-Zpr` option in to dxc when invoking it if the row-major (non-default) convention is requested. This will solve the problem for clients that use Slang to generate DXIL, but not for clients who use Slang to generate plain HLSL that they then pass into dxc (those clients are assumed to be able to work around the problem for themselves). In order to test the change, I added a test that fills a constant buffer with sequential integers, and then reads out the rows/columns of an `int3x4` matrix with both row- and column-major layout, as well as an integer placed after the matrix, so we can see the offset it was given. The `render-test` application did not yet support generating code via dxc/DXIL, so I added an option for that. This ends up assuming that anybody who is running the D3D12 tests will also have a version of dxc available.	26 October 2018, 17:58:08 UTC
cb9d679	jsmall-nvidia	26 October 2018, 12:16:54 UTC	Feature/file system cache (#692) * First pass at caching file system. * default-file-system -> slang-file-system fix problem with location("build.linux") confusing windows build for now. * Added CompressedResult Fix problem in Result construction with it being unsigned * Add support for Path simplification. * Testing for Path::Simplify. * Refactored CacheFileSystem - automatically handles ISlangFileSystem or ISlangFileSystemExt appropriately. Removed WrapFileSystem - because wasn't possible to emulate some of the behavior if just loadFile is implemented. Split out StringBlob - so that no need to convert between ISlangBlob and String repeatidly. * Remove unwanted code in ~CompileRequest	26 October 2018, 12:16:54 UTC
ad47fe7	Tim Foley	25 October 2018, 19:49:36 UTC	Fix Vulkan codegen for image atomics (#690) The basic problem was that the front-end was generating code that used a `uint` vector for the coordinates, while GLSL requires an `int` vector. Without support for implicit type conversions, this leads to GLSL compilation failure. The fix here is to insert the type conversion as late as possible (during GLSL emit). This isn't a pretty solution, but it is the easiest one to implement in the current compiler. A more forward-looking approach would be to support "force inline" functions in the stdlib, so that we can implement the conversion logic in a stdlib implementation specialized for the Vulkan/GLSL target. At the moment, everything to do with image atomics is all sleight of hand anyway, so making it incrementally messier isn't a bit hit.	25 October 2018, 19:49:36 UTC
4f0415e	jsmall-nvidia	25 October 2018, 15:24:16 UTC	Feature/premake linux (#689) * Premake work in progress for linux. * Added dump function. * Remove examples on linux Small warning fix. * * Don't build render-test on linux * Removed work around virtual destructor warning, and just used virtual dtor for simplicity * Git ignore obj directories * Fix premake working on windows. * * Fix sprintf_s functions * Make generates arg parsing more robust * Added FloatIntUnion to avoid type punning/strong aliasing issues, and repeated union definitions. * Work around problems building on linux with getClass claiming a strict aliasing issue. * Fix for targetBlock appearing potentiall used unintialized to gcc. * Linux slang link options -fPIC to make dll. * Add -fPIC to build options on linux. * Add -ldl for linux on slang. * Fixes to try and get premake working with .so on linux. * Make core compile with -fPIC * Try to fix linux linking with --no-as-needed before -ldl * Add rpath back. * Remove render-gl from linux build. * Re-add location for linux. * Don't include <malloc.h> except on windows. * Remove unused line to fix warning on osx. * Remove ambiguity on OSX for operator <<. * Fixing ambiguity with operator overloading and Int types for OSX. * Fix ambiguity around UInt and operator * Fix ambiguity of UInt conversion for OSX. * Added UnambiguousInt and UnambiguousUInt to make it easier to work around OSX integer coercion for UInt/Int types.	25 October 2018, 15:24:16 UTC
2700a89	jsmall-nvidia	23 October 2018, 00:48:59 UTC	Fix problem with __import not working (#688) * Added getPathType to ISlangFileSystemExt. This is needed so that when searching for a file it's existance can be tested without loading the file. On some platforms a getCanonicalPath can do this - but depending on how getCanonicalPath is implemented, it may not do. This test is made after the relative path is produced before finding the canonical path. * Test for importing along search path. * Added comment to explain the issue around WrapFileSystem impl of getPathType. * Make search path use / not \	23 October 2018, 00:48:59 UTC
cda9c3b	Matt Pharr	22 October 2018, 14:45:09 UTC	Osx build fixes (#681) * Remove 'register' qualifiers. These will be illegal come c++17 and give a warning on OSX. * Add UNREACHABLE_RETURNs to silence compiler warnings. * Make FileStream::GetPosition() compile on OSX (w.r.t. the linux build, I believe that strictly-speaking, fpos64_t is specified as an opaque type and the cast to an Int64 is not necessarily well-defined.) * Avoid an inadvertent trigraph.	22 October 2018, 14:45:09 UTC
53731f6	Tim Foley	19 October 2018, 20:53:18 UTC	Vulkan implicit sampler fixups (#686) * Fix sampler-less texture functions (#685) * Fix sampler-less texeture functions I'm honestly not sure how the original work on this feature in #648 worked at all (probably insufficient testing). We have these front-end modifiers to indicate that a particular function definition requires a certain GLSL version, or a GLSL extension in order to be used, and they are supposed to be automatically employed by the logic in `emit.cpp` to output `#extension` lines in the output GLSL. However, it turns out that nothing is actually wired up right now, so that adding the modifiers to a declaration is a placebo. This change propagates the modifiers through as decorations, and then uses them during GLSL code emit, which allows the functions that require `EXT_samplerless_texture_functions` to work. * fixup: 32-bit warning * Add serialization support for GLSL extension/version decorations	19 October 2018, 20:53:18 UTC
3a5214b	Tim Foley	18 October 2018, 16:30:38 UTC	Add support for static methods in interfaces (#680) This change allows an interface to include `static` methods as requirements, so that types that conform to the interface will need to satisfy the requirement with a `static` method. The essence of the check is simple: when checking that a method satisfies a requirement, we enforce that both are `static` or both are non-`static`. Making that simple change and adding a test change broke a few other places in the compiler that this change tries to fix. The main fix is to handle cases where we might look up an "effectively static" member of a type through an instance, and to make sure that we replace the instance-based lookup with type-based lookup. There was already logic along these lines in `lower-to-ir.cpp`, so this change centralizes it in `check.cpp` where it seems to logically belong.	18 October 2018, 16:30:38 UTC
f9710d5	jsmall-nvidia	17 October 2018, 15:57:33 UTC	IncludeFileSystem -> DefaultFileSystem (#677) Improvements in 'singleton'ness of DefaultFileSystem Made WrapFileSystem a stand alone type - to remove 'odd' aspects of deriving from DefaultFileSystem (such as inheriting getSingleton method/fixing ref counting) Simplified CompileRequest::loadFile - becauce fileSystemExt is always available.	17 October 2018, 15:57:33 UTC
3e74d39	jsmall-nvidia	16 October 2018, 22:49:11 UTC	Feature/include refactor (#675) * Refactor of path handling. * Added PathInfo * Changed ISlangFileSystem - such that has separate concepts of reading a file, getting a relative path and getting a canonical path * Added support for getting a canonical path for windows/linux * Made maps/testing around canonicalPaths * User output remains around 'foundPath' - which is the same as before * Small improvements around PathInfo * Added a type and make constructors to make clear the different 'path' uses * Fixed bug in findViewRecursively * Checking and reporting for ignored #pragma once. * Removed SLANG_PATH_TYPE_NONE as doesn't serve any useful purpose. * Improve comments in slang.h aroung ISlangFileSystem * Remove the need for <windows.h> in slang-io.cpp * Ran premake5. * Improvements and fixes around PathInfo. * Fix typo on linix GetCanonical * Make the ISlangFileSystem the same as before, and ISlangFileSystem contain the new methods. Internally it always uses the ISlangFileSystemExt, and will wrap a ISlangFileSystem with WrapFileSystem, if it is determined (via queryInterface) that it doesn't implement the full interface.	16 October 2018, 22:49:11 UTC
204fb3c	Tim Foley	16 October 2018, 21:07:24 UTC	Don't emit extra parentheses when outputting HLSL/GLSL (#674) When emitting high-level expressions we obviously need to avoid emitting expressions that mean something different than the original code. So if we had IR like: ``` %t0 = add %a %b %t1 = mul %t0 c ``` and `t0` isn't used anywhere else in the code, we can emit HLSL like: ```hlsl float t1 = (a + b) * c; ``` but not: ```hlsl float t1 = a + b c; // oops! precedence changed the meaning! ``` The existing strategy in Slang has been to be overly conservative about when it emits parentheses to guarantee precedence is respected, so you might get something like: ```hlsl float t1 = ((a) + (b)) (c); ``` which not only looks silly, but also leads to unhelpful diagnostics from the clang-based dxc, about all those extra parentheses being redundant. This change follows the pattern of (and actually reuses some code from) the old AST emit logic from earlier in the compiler's life (before the IR was introduced). The basic idea is that when emitting an expression we track the precedence of the expressions to its left and right, and use those to decide if parentheses are needed, and then to compute equivalent precedences to apply for sub-expressions. Applied correcty, this approach lets us emit minimal "unparsed" expressions. Applied incorrectly it could lead to subtle errors where the code Slang outputs doesn't faithfully represent the input. Here's hoping we get this right.	16 October 2018, 21:07:24 UTC
5b306c9	jsmall-nvidia	15 October 2018, 17:46:58 UTC	Feature/source loc improvements (#673) * Fix comment to better explain usage. * For getting the type string use a temporary SourceManager.	15 October 2018, 17:46:58 UTC
a7a87ce	jsmall-nvidia	13 October 2018, 02:02:09 UTC	Feature/source loc review (#672) * Fixes/improvements based around review comments. * SourceUnit -> SourceView * * Removed the HumaneSourceLoc as it's POD-like ness seemed to make that unnecessary * Made exposed member variables in SourceManager protected - so make clear where/how can be accesed * Improved description about SourceLoc and associated structures * Changed SourceLocType to 'Actual' and 'Nominal'. * Improved a comment.	13 October 2018, 02:02:09 UTC
c9ad368	Tim Foley	12 October 2018, 19:23:14 UTC	Add a warning on missing return, and initial SCCP pass (#671) * Add a warning on missing return, and initial SCCP pass The user-visible feature added here is a diagnostic for functions with non-`void` return type where control flow might fall off the end. This sounds like a trivial diagnostic to add as part of the front-end AST checking, but that can run afoul of really basic stuff like: ```hlsl int thisFunctionisOkay(int a) { while(true) { if(a > 10) return a; a = a2 + 1; } // no return here! } ``` This function "obviously" doesn't need to have a `return` statement at the end there, but realizing this fact relies on the compiler to understand that the `while(true)` loop can't exit normally, and doesn't contain any `break` statement. One can write "obvious" examples that need more and more complex analysis to rule out. The answer Slang uses for stuff like this is to do the analysis at the IR level right after initial code generation (this would be before serialization, BTW, so that attached `IRHighLevelDeclDecoration`s can be used). When lowering the AST to the IR, we always emit a `missingReturn` instruction (a subtype of `IRUnreachable`) at the end of its body if it isn't already terminated. The IR analysis pass to detect missing `return` statements is then as simple as just walking through all the functions in the module and making sure they don't contain `missingReturn` instructions. For that simple pass to work, we first need to make some effort to remove dead blocks that control flow can never reach. This change adds a very basic initial implementation of Spare Conditional Constant Propagation (SCCP), which is a well-known SSA optimization that combines constant propagation over SSA form with dead code elimination over a CFG to achieve optimizations that are not possible with either optimization along. For the moment, we don't actually implement any constant folding* as part of the SCCP pass, so we can eliminate the dead block in a case like the function above (and those in the test case added in this change), but will not catch things like a `while(0 < 1)` loop. Handling more "obvious" cases like that is left for future work. * fixup: warning on unreachable code * Handle case where user of an inst isn't in same function/code The code as assuming any instruction in the SSA work list has to come from the function/code being processed, but this misses the case where an instruction in a generic has a use inside the function that the generic produces. This change adds code to guard against that case.	12 October 2018, 19:23:14 UTC
13c6b69	Tim Foley	11 October 2018, 18:17:40 UTC	Fix error when one constant is defined equal to another (#670) * Fix error when one constant is defined equal to another Fixes #666 When a user declares one constant (usually a `static const` variable) to be exactly equal to another by name: ```hlsl static const a = 999; static const b = a; ``` Then the IR-level representation of `b` is an `IRGlobalConstant` whose value expression is just a pointer to the definition of `a`. The logic in `emitIRGlobalConstantInitializer()` was trying to always call `emitIRInstExpr` to emit the value of the constant as an expression, but that function only handles complex/compound expressions and not the case of simple named values (e.g., constants like `a`). The intention is for code to call `emitIROperand()` instead, and let it decide whether to emit an expression or a named reference using its own decision-making. The `IRGlobalConstant` case really just wants to pass in the "mode" flag it uses to influence that decision-making, but shouldn't be working around it. This change just replaces the `emitIRInstExp()` call with `emitIROpernad()` and adds a test case to confirm that this fixes the reported problem. * Fixups for bugs in previous change The first problem was that certain instruction ops were being special-cased to opt out of "folding" into expressions before we make the universal check to always fold when inside an initializer for a global constant. The second problem is that the `emitIROperand()` logic was always putting expressions around sub-expressions, which breaks parsing when the sub-expression is an initializer list (`{...}`). This fixup is pretty much a hack, but will be something we can remove once we don't emit unncessary parentheses overall, which is a better fix.	11 October 2018, 18:17:40 UTC
dcd9f78	Tim Foley	11 October 2018, 16:20:10 UTC	Add basic support for [mutating] methods (#667) By default, when writing a "method" (aka "member function") in Slang, the `this` parameter is implicitly an `in` parameter. So this: ```hlsl struct Foo { int state; int getState() { return state; } void setState(int s) { state = s; } }; ``` is desugared into something like this: ```hlsl struct Foo { int state }; int Foo_getState(Foo this) { return this.state; } // BAD: void Foo_setState(Foo this, int s) { this.state = s; } ``` That "setter" doesn't really do what was intended. It modifies a local copy of type `Foo`, because `in` parameters in HLSL represent by-value copy-in semantics, and are mutable in the body the function. Slang was updated to give a static error on the original code to catch this kind of mistake (so that `this` parameters are unlike ordinary function parameters, and no longer mutable). Of course, sometimes users want a mutable `this` parameter. Rather than make a mutable `this` the default (there are arguments both for and against this), this change adds a new attribute `[mutating]` that can be put on a method (member function) to indicate that its `this` parameter should be an `in out` parameter: ```hlsl [mutating] void setState(int s) { state = s; } ``` The above will translate to, more or less: ```hlsl void Foo_setState(inout Foo this, int s) { this.state = s; } ``` One added detail is that `[mutating]` can also be used on interface requirements, with the same semantics. A `[mutating]` requirement can be satisfied with a `[mutating]` or non-`[mutating]` method, while a non-`[mutating]` requirement can't be satisfied with a `[mutating]` method (the call sites would not expect mutation to happen). The design of `[mutating]` here is heavily influenced by the equivalent `mutating` keyword in Swift. Notes on the implementation: * Adding the new attribute was straightforward using the existing support, but I had to change around where attributes get checked in the overall sequencing of static checks, because attributes were being checked after function bodies, but with this change I need to look at semantically-checked attributes to determine the mutability of `this` * The check to restrict it so that `[mutating]` methods cannot satisfy non-`[mutating]` requirements was easy to add, but it points out the fact that there is a huge TODO comment where the actual checking of method signatures is supposed to happen. That is a bug waiting to bite users and needs to be fixed! * While we had special-case logic to detect attempts to modify state accessed through an immutable `this` (e.g., `this.state = s`), that logic didn't trigger when the mutation happened through a function/operator call (e.g., `this.state += s`), so this change factors out the validation logic for that case and calls through to it from both the assignment and `out` argument cases. * The error message for the special-case check was updated to note that the user could apply `[mutating]` to their function declaration to get rid of the error. * The semantic checking logic for an explicit `this` expression was already walking up through the scopes (created during parsing) and looking for a scope that represents an outer type declaration that `this` might be referring to. We simply extend it to note when it passes through the scope for a function or similar declaration (`FunctionDeclBase`) and check for the `[mutating]` attribute. If the attribute is seen, it returns a mutable `this` expression, and otherwise leaves it immutable. * The IR lowering logic then needed to be updated so that when adding an IR-level parameter to represent `this`, it gives it the appropriate "direction" based on the attributes of the function declaration being lowered. The rest of the IR logic works as-is, because it will treat `this` just like an other parameter (whether it is `in` or `inout`). * This biggest chunk of work was the "implicit `this`" case, because ordinary name lookup may resolve an expression like `state` into `this.state`, so that the `this` expression comes out of "thin air." To handle this case, I extended the structure of the "breadcrumbs" that come along with a lookup result (the breadcrumbs are used for any case where a single identifier like `state` needs to be embellished to a more complex expression as a result of lookup), so that it can identify whether a `Breadcrumb::Kind::This` node comes from a `[mutating]` context or not. Similar to the logic for an explicit `this`, we handle this by noting when we pass through a `FunctionDeclBase` when moving up through scopes, and look for the `[mutating]` attribute on it. The rest of the work was just plumbing the additional state through.	11 October 2018, 16:20:10 UTC
879ec1b	jsmall-nvidia	10 October 2018, 17:56:25 UTC	Feature/source loc refactor (#668) * * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Initial look at at variable byte encoding, and simple unit test. * Fixing problems with comparison due to naming differences with slang/fxc. * * More tests and perf improvements for byte encoding. * Mechanism to detect processor and processor features in main slang header. * Split out cpu based defines into slang-cpu-defines.h so do not polute slang.h * Support for variable byte encoding on serialization. * Removed unused flag. * Fix warning. * Fix calcMsByte32 for 0 values without using intrinsic. * Fix a mistake in calculating maximum instruction size. * Introduced the idea of SourceUnit. * Small improvements around naming. Add more functionality - including getting the HumaneLoc. * Add support for #line default * Compiling with new SourceLoc handling. * Fix off by one on #line directives. * Can use 32bits for SourceLoc. Fix serialize to use that. * Small fixes and comment on usage. * Premake run. * Fix signed warning. * Fix typo on StringSlicePool::has found in review.	10 October 2018, 17:56:25 UTC
60a91d6	jsmall-nvidia	09 October 2018, 19:09:32 UTC	Added -serial-ir command line option (#664) * Added -serial-ir option, to make generateIR always serialize in and out before further processing. Testing out serialization, and adding a kind of 'firewall' between compiler front end and backend. * Reduce peak memory usage, by discarding IR when stored in serialized form. Typo fix.	09 October 2018, 19:09:32 UTC
7ea9ff0	jsmall-nvidia	09 October 2018, 15:51:41 UTC	Feature/var byte encoding (#665) * * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Initial look at at variable byte encoding, and simple unit test. * Fixing problems with comparison due to naming differences with slang/fxc. * * More tests and perf improvements for byte encoding. * Mechanism to detect processor and processor features in main slang header. * Split out cpu based defines into slang-cpu-defines.h so do not polute slang.h * Support for variable byte encoding on serialization. * Removed unused flag. * Fix warning. * Fix calcMsByte32 for 0 values without using intrinsic. * Fix a mistake in calculating maximum instruction size.	09 October 2018, 15:51:41 UTC
4cb2a19	Tim Foley	04 October 2018, 23:05:23 UTC	Support cross-compilation of ray tracing shaders to Vulkan (#663) * Move to newer glslang * Support cross-compilation of ray tracing shaders to Vulkan This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay"). * The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized. * "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled. * The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed. The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations. In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial. The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses. The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL. GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable. That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function. The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types). Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic. For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a definition marked with the `__specialized_for_target(glsl)` modifier. The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin). Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output. There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works. The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global. We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable. We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below. We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types). This ends up removing some cases in the AST and IR type representations. By itself this change would break HLSL emit, because in that case the types really are intrinsic. We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types. The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters. This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline. In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters. We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output). The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now). To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout. A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection. The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader. While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case. Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders. The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation. This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient. Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`. The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage. All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline. The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process. This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports both DXR and VKRay. A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way: The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while. It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in all cases (including for HLSL output) rather than only on GLSL targets. The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic). The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions. A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic). The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization and information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code. This needs more work in a follow-on fix, but for now target specialization should again be working. The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding. The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels. This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s. The tricky bit to making that all work is that the same AST-level generic type parameter will then map to different IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function. The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts." This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file. Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals. The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its own `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the functions `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`. An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers. These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects). The lowering strategy here translates any `static` variable with an initializer into two globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet. There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes. We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level. Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point. The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage. Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass. * In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures. * A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code. * Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h` * Added a `$z` suffix to the GLSL translation of `Texture.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now. * The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout. * In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete. * Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important. * Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc. * Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460. * Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing. * Fixup: typos	04 October 2018, 23:05:23 UTC
84a4847	jsmall-nvidia	04 October 2018, 14:25:36 UTC	Remove need of IRHightLevelDecoration in emit (#660) * * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Fixing problems with comparison due to naming differences with slang/fxc.	04 October 2018, 14:25:36 UTC

Newer
Older