https://github.com/shader-slang/slang

sort by:
Revision Author Date Message Commit Date
b8e1f62 Fix a subtle bug introduced into type legalization (#1632) The refactor of type legalization in PR #1594 introduced a subtle problem where an IR instruction might be removed from the hierachy (perhaps because its parent was removed during legalization) but would still be on the work list. Legalization of such instructions is wasteful (since it would never impact the output), but it also creates a problem if we try to insert new legalized instructions next to such a removed instruction. The logic for inserting an instruction before/after another asserts that the sibling instruction must have a parent, and leads to a failure in debug builds and a potential crash in release builds. This change adds a bit of defensive code to skip any instructions that appear to have been removed from the hierarchy (because they have no parent and are not the root/module instruction). An alternative approach would be to try to detect these instructions at the point where they would be added to the work list, but this approach seems simpler and more general. 08 December 2020, 02:18:31 UTC
0404fef Fix mistake where some public API functions weren't extern "C" (#1631) This was a serious problem that meant that some of our public API functions (exported from the DLL) had mangled C++ symbol names (which are not guaranteed to be portable across different compilers). The fix here is the expedient one: just add `extern "C"` to the relevant functions. The simplicity has a cost, though, in that this change introduces a significant break in binary compatibility. Source code should be compatible before/after this change, but it would be necessary to match an application and Slang shared library so that they agree on the mangled names of these symbols. We will need to treat this as a breaking release when this change goes out. Note that because of the way that C++ mangled names were being used, we *already* have introduced breaking changes to the binary interface in a recent change that made `SlangCompileRequest` into a `typedef` instead of a forward-declared `struct` type. To be clear, the breakage was due to the missing `extern "C"` and *not* the content of that change; the change would have been binary-compatible if the symbol names were correct. Given that the cat is out of the bag to some extent (our next release is going to break compatibility one way or another), it seems best to just get the whole thing sorted at once. 07 December 2020, 22:17:17 UTC
d7ce74a "Shader Toy" example and related fixes (#1629) * "Shader Toy" example and related fixes This change introduces a new `shader-toy` example program that is primarily designed to show how Slang's features for type-based encapsulation and modularity can be applied to modularity for effects along the lines of those from `shadertoy.com`. The Example ----------- The example is being checked in with an example "toy" effect that I hastily put together, so that it would not be encumbered with any IP concerns. I wrote the effect using the shadertoy.com editor, so I can be sure it is valid GLSL. During bringup of the application I used a pre-existing and larger effect for testing, so some of the support code that was added is not being used at present. The big-picture idea here is to have an exmaple that shows how to modularize things using Slang interfaces and generics, and then to use the Slang compiler API to manage the compilation, composition, specialization, and linking steps. For better or worse this leads to the sequence of API calls involved being much longer than what was in something like the `hello-world` example. Future Work (Example) --------------------- There is a lot of room for improvement and expansion here, so this should be viewed as a checkpoint of work in progress rather than something I'm claiming as a finalized demonstration of all we'd like to achieve. Areas for future work include: * We need to copy the integration of "Dear, IMGUI" that was already done for the `model-viewer` example so that this example can have a UI. * Now that the compilation flow is broken into all these additional steps, it should be possible to have the application load multiple effects as distinct modules, and then provide a UI for switching between them. The chosen effect module would be used to specialize the top-level shader(s) before kernel generation. * The checked-in logic includes a compute shader that can execute an effect, but that hasn't been tested nor has it been wired up to any kind of UI. We should have a way to switch between multiple execution methods, with a goal of eventually including CPU execution. * The "GLSL compatibility" code needs a lot of improvements before it is likely to be usable for a nontrivial number of shaders. Some of that work is waiting on Slang compiler fixes, though. * We should consider allowing the individual "toy" effects to define their own uniform parameters and expose those via a UI and reflection. The catch in this case is not that this would be difficult to do, but that it would be a semantic change to how shader toy effects currently work. The Compiler Fixes ------------------ Doing this work exposed a few bugs in Slang, and this change includes fixes for the ones that were quick to address. We already had logic in `slang-check-shader.cpp` that was validating the entry points in a compile request - either by checking the explicitly-listed entry points, or by scanning for `[shader("...")]` attributes. The problem is that the routine that did that checking was not being invoked on all compiles. The logic that handled entry points was only being run for manual compiles using `SlangCompileRequest`, while anything using `import` or `loadModule` would ignore entry points. I refactored the relevant code into a subroutine that will be invoked in all compilation scenarios. There were already `TODO` comments in `SpecializedComponentType` which made the point about how a specialized entry point like `myShader<YourType>` would need to properly show that it has dependencies on both the module that defines `myShader` *and* the module that defines `YourType`, while only the former was being handled at present. I went ahead and implemented the logic to scan the generic arguments for a specialized compoment type in order to determine what module(s) the arguments depend on (both type arguments and witness tables). With that change, using `IComponentType::link` on a specialized component will properly pull in the module(s) that the generic arguments come from. In `slang-ir-legalize-types.cpp` we could run into assertion failures in debug builds because of code trying to legalize layout `IRAttr`s for fields or parameters with types that need legalization. In practice it is safe to skip these layout attributes, because legalization of the fields/parameters they pertain to would result in creation of entirely new layout attributes, and the old ones would then be unreferenced. Future Work (Fixes) ------------------- There are other compiler bugs that this work exposed, but which this change does not address. These will need to be resolved as part of subsequent changes: * Slang allows for default-initialization of variables of a generic type. That is, given `<T : ISomething>` a user is allowed to declare `T x = {};` and the Slang front-end does not complain. Instead, this leads to an internal compiler error during IR lowering. * The Slang `__init()` feature probably needs to be upgraded to a properly supported feature, and we probably need a way to make implementing default-initialization an easy thing (e.g., any `struct` type that has initial-value expressions for all its fields should automatically and implicitly satsify an `init();` requirement declared in an interface) * Iniside an `__init()` definition, code has mutable access to members of the enclosing type, but for some reason the front-end is incorrectly treating `this` as immutable in those contexts. As a result you can write to `someField` but not `this.someField`. * User-defined operator overloads flat out don't work (which isn't surprising given that no clients have decided to use them yet, and we have no test coverage for them). This is actually due to the shadowing rules being used for lookup right now, so a fix for this issue is going to have far-reaching consequences around what overloads are visible where (and anything that impacts overload resolution is a big can of worms, including around performance). * fixup: test case had missing main function 07 December 2020, 17:29:37 UTC
e98c32f add windows release script (#1627) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> 04 December 2020, 18:44:03 UTC
47ed0f6 Projects in 'build' and Slang API separation (#1624) * #include an absolute path didn't work - because paths were taken to always be relative. * Move reflection to reflection-api. * Slight reorg to pull out potentially Slang internal functions from the reflection API impls. * Remove visual studio projects * Fix for slang-binaries copy. * Add the visual studio projects in build/visual-studio * Remove miniz project. * Differentiate the linePath from the filePath. * Improve comment in premake5.lua + to kick of CI. * Kick CI. 04 December 2020, 18:03:29 UTC
277780a Add github action to verify vs project file consistency. (#1625) * Add github action to verify vs project file consistency. * fix solution files * fix project files 03 December 2020, 22:48:42 UTC
a827798 Added miniz Visual Studio Project (#1623) 03 December 2020, 17:55:29 UTC
44c0a56 Add shader object parameter binding to renderer_test. (#1622) * Add shader object parameter binding to renderer_test. * remove multiple-definitions.hlsl * Fix cuda implementation. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> 03 December 2020, 16:23:05 UTC
ad5dda9 Fix [mutating] generic methods (#1618) Slang generates code that turns the implicit `this` parameter of a method into an explicit parameter. The logic that decides whether that parameter should be `inout` is a bit involved, and there was a bug where a generic method would lead to the use of an `in` modifier (the default) and override the `inout` modifier that was requested by the method itself. This change fixes the logic to treat generic declarations in the parent chain of a leaf method as having no bearing on whether an implicit `this` parameter should be `inout` or not. A test case is included that breaks with the old behavior, and demonstrates that a generic `[mutating]` method can now work correctly. 02 December 2020, 22:12:51 UTC
ae222bf Zip FileSystem support (#1617) * #include an absolute path didn't work - because paths were taken to always be relative. * Add miniz * Fix for separator in CacheFileSystem. Add compression unit test for zip. * Put zip compression into core. * Remove delimiter stripping if simplifying a path - as stripping will fix delimiters. * ZipFileSystem WIP. * More ZipFileSystem working. * Added isEmpty. Fixed small bug is contains. * First pass support for mutability on zip. * Improvements to File::read/writeAllBytes * Can access and save archive - but has memory leaks. * Fix memory leak. * Some ZIP compression tests. * Fix memory leak on ScopedAllocation. Fix off by one bug on UIntSet * Bug fix in UIntSet * Fix remaining ZipFileSystem issues. Adde stand alone unit-test. * Turn tabs to spaces in slang-io.h * Renamed mode ReadWrite (instead of just Write) * Make miniz it's own project. * Fix windows warning on win32. * Remove warnings needed when miniz was included as a header library. * Set the C++ standard via 'flags' in premake. * Add support for 'implicit' paths. * Add testing for implicit directories. Better handling of implicit directories. * Improve comments in ZipFileSystem. * Update comment around reader/writer transformation. 02 December 2020, 16:29:38 UTC
e631a25 Fix for TC problem with Path unit test (#1621) * #include an absolute path didn't work - because paths were taken to always be relative. * Hopefully fix for TC issue where canonical path causes problems - perhaps because on test machines visibility of paths outside the build environment is limited. 02 December 2020, 13:45:08 UTC
200e236 Make SlangCompileRequest COM type (#1620) * #include an absolute path didn't work - because paths were taken to always be relative. * WIP for COM CompileRequest. * Add more methods to IGlobalSession. * Fix createCompileRequest. Made slangc tool use COM style methods. * m_ prefix variables in EndToEndCompileRequest 01 December 2020, 22:29:01 UTC
339422d Enable all dynamic-dispatch tests on D3D/VK. (#1615) 30 November 2020, 16:59:34 UTC
de8dbdd Re-enable `interface-shader-param` tests. (#1614) 30 November 2020, 16:59:02 UTC
c0fab43 Bug fixes: Memory leak/off by one on UIntSet (#1616) * #include an absolute path didn't work - because paths were taken to always be relative. * Fix memory leak on ScopedAllocation. Fix off by one bug on UIntSet 20 November 2020, 22:24:35 UTC
ee5842a Make witness and RTTI handles lower to `uint2`. (#1613) * Make witness and RTTI handles lower to `uint2`. And enable some dynamic dispatch tests on D3D/VK. * Bug fixes. 20 November 2020, 16:34:14 UTC
4459d44 Unify handling of static and dynamic dispatch for interfaces (#1612) Overview ======== Prior to this change, we had two different code generation strategies for interface/existential types in Slang, that didn't always play nicely together: * The "legacy" static specialization approach could handle plugging in an arbitrary concrete type for an existential type parameter (including types with resources, etc.), but wouldn't work well with things like a `StructuredBuffer<>` of an interface type, and requires somewhat counter-intuitive layout rules to make work. * The new dynamic dispatch approach produces simpler, more easily understood layouts by assuming that values of interface type can fit into a fixed number of bytes. The tradeoff there is that it cannot handle types that include resources (only POD types). The goal of this change is to make it so that the two strategies can co-exist. In particular, in cases where a shader is amenable to both static specialization and dynamic dispatch, the type layouts should agree. In order to make the type layouts agree, we: * Declare that *all* values of existential type reserve storage according to the dynamic-dispatch rules (so 16 bytes for the RTTI and witness-table information, plus whatever bytes are needed to story "any value" of a conforming type). * Then we modify the "legacy" layout rules so that if a value of concrete type can fit in the reserved "any value" space for a given interface, then it is laid out there exactly like the dynamic dispatch rules would do. Otherwise, we fall back to the previous legacy rules (since we don't need to agree with the dynamic-dispatch layout on types that can't be used with dynamic dispatch). Details ======= * Renamed `ExistentialBox` to `BoundInterfaceType` to better clarify how it relates to `BindExistentialsType` * Unconditionally apply the `lowerGenerics` pass during emit, since it is now responsible for aspects of the lowering of existential types when specialization is used. * Made IR type layout take the target into account, so that the layout of resource types can vary by target (e.g., being POD on some targets, and invalid on others) * Cleaned up some issues around using global shader parameters as the "key" for their layout information in the global-scope layout (only comes up when there are global-scope `uniform` parameters) * Made there be a default any-value size (16) instead of making it be an error to leave out. This was the simplest option; we could try to go back to having an error, but we'd need to only issue it if we are sure a type/interface is being used with dynamic dispatch, since static dispatch doesn't have to obey the restrictions. * Changed lowering of existential types to tuples so that bound interfaces where the concrete type won't fit use a "pseudo-pointer" instead of an "any-value" to hold the payload * Changed IR type legalization to handle the "pseudo-pointer" case and apply layout information from an interface type over to the payload part when static specialization was used. * Changed some details of how witness tables were being lowered, so that we didn't have to create "proxy" witness tables for the constraints on associated types (just use the actual requirement entries we generate) * Changed witness tables so that they know the subtype doing the conforming * Added logic so that we don't generate pack/unpack logic and witness table wrapper functions for types that are incompatible with any-value/dynamic dispatch for a given interface. * Changed the core AST-level type layout logic to use the dynamic-dispatch layout in case things fit, and the legacy static specialization case when things don't (while also reserving space for the dynamic-dispatch fields) * Changed a bunch of test cases for static specialization to properly use the new layout (which introduces new buffers in some cases, and moves data around in others). Future Work =========== The experience of trying to reconcile our older way of handling interface-type specialization with our newer model (that supports dynamic dispatch) makes it clear that we really need to make similar changes to our handling of generic type parameters on entry points and at the global scope. A future change should make it so that a global type parameter is lowered with a type layout similar to a value parameter of interface type, including the RTTI and witness-table pieces, and just leaving out the "any value" piece. A similar translation strategy should apply to entry-point generic parameters (mirroring how we lower generic functions for dynamic dispatch already), and value specialization parameters. Co-authored-by: Yong He <yonghe@outlook.com> 19 November 2020, 09:26:43 UTC
b594510 File system refactor (#1611) * #include an absolute path didn't work - because paths were taken to always be relative. * WIP FileSystem refactor. * Made loadFile load the file in binary mode. * Fixed some comments. Fixed typo in RelativePath - not used 'fixedPath'. 19 November 2020, 09:14:48 UTC
ac41b99 Fix constant folding in attributes (#1610) * Fix constant folding in attributes * remove unnecessary change * remove unnecessary change * remove unnecessary change * Fixed circular checking issue. * cleanup * more cleanup * minimize diff * minimize diff * minimize diff 19 November 2020, 09:14:34 UTC
e140c49 Test for serializing out and reading back Stdlib (#1605) * #include an absolute path didn't work - because paths were taken to always be relative. * Mangling/module name extraction for GenericDecl * Add comment on SerialFilter to explain re-enabling Stmt. * Support setting up SyntaxDecl when reconstructed after deserialization. * Improvements to setup SyntaxDecl. * Fix typo so can read compressed SourceLocs. * Fix issue with SourceManger. * Simple test for serializing out stdlib and reading back in. * Fix calling convention. * Add override to StdLib impls. * Fix typo. * Apply testing to an actual compute test when using load-stdlib Make -load/compile-stdlib processable by Slang Move out testing into util into TestToolUtil so can be shared. * Slightly more concise setup of session. * Fix some errors introduced with session handling. * Made setup for compile same across slangc and slangc-tool. 18 November 2020, 23:12:43 UTC
d898d56 Serialized stdlib working (#1603) * #include an absolute path didn't work - because paths were taken to always be relative. * Mangling/module name extraction for GenericDecl * Add comment on SerialFilter to explain re-enabling Stmt. * Support setting up SyntaxDecl when reconstructed after deserialization. * Improvements to setup SyntaxDecl. * Fix typo so can read compressed SourceLocs. * Fix issue with SourceManger. 18 November 2020, 19:52:58 UTC
bdc589b Switch CI to github actions. (#1609) * Remove travis config files * change github build script * skip non-tag build on appveyor 17 November 2020, 20:03:45 UTC
7dd0ff9 Integrate github actions for linux deployment. (#1607) 17 November 2020, 16:57:13 UTC
39709fb Integrate github actions for build+test on Windows. (#1606) 17 November 2020, 16:56:33 UTC
f4dbe7d Fix premake5.lua for profile (#1604) * #include an absolute path didn't work - because paths were taken to always be relative. * The Profile project wasn't including the generated prelude. 17 November 2020, 14:49:03 UTC
46e1901 Fix VS2017 Warnings (#1602) * Fix VS2017 Warnings * Update slang-visitor.h Co-authored-by: jsmall-nvidia <jsmall@nvidia.com> 16 November 2020, 19:18:55 UTC
2c893d3 Integrate github action for linux build+test. (#1601) 11 November 2020, 20:33:32 UTC
8f0895e Include hierarchy output (#1595) * #include an absolute path didn't work - because paths were taken to always be relative. * Improve diagnostic for token pasting. * Token paste location test. * Output include hierarchy. * WIP on includes hierarchy. * Improved include hierarchy output - to handle source files without tokens. Improved test case. * Small comment improvements. Fixed a typo with not returning a reference. * Slight simplification of the ViewInitiatingHierarchy, by adding GetOrAddValue to Dictionary. * Remove the need for ViewInitiatingHierarchy type. * Improve output of path in diagnostic for includes hierarchy. * Remove comment in diagnostic for token-paste-location.slang * Update command line docs to include `-output-includes` Co-authored-by: Yong He <yonghe@outlook.com> 11 November 2020, 14:56:50 UTC
7bcc2b1 Use integer RTTI/witness handles in existential tuples. (#1598) * Use integer RTTI/witness handles in existential tuples. * Fix clang error. * Fix IR serialization to use 16bits for opcode. * Undo accidental comment change. * Use variable length encoding for opcode. * Fix compile error. * Fixing issues * Fix code review issues. 10 November 2020, 22:55:36 UTC
1c4d768 Fix IR serialization to use variable length encoding for opcode. (#1599) * Fix IR serialization to use 16bits for opcode. * Undo accidental comment change. * Use variable length encoding for opcode. * Fixing issues 10 November 2020, 21:07:42 UTC
c1e0a9d Fix comments. "white-list" -> "allow-list". (#1597) 06 November 2020, 18:31:15 UTC
444ff4d Specialize witness table lookups. (#1596) * Specialize witness table lookups. * Remove generated files from vcxproj * Fix call to generic interface methods. 06 November 2020, 18:26:27 UTC
94861d5 Set theme jekyll-theme-tactile 06 November 2020, 17:12:14 UTC
0f6765b Refactor the flow of type legalization (#1594) The existing type legalization logic worked as a single preorder pass over the IR tree. This could create problems in cases where an instruction might be processed before one of its operands (e.g., a function that references a global shader parameter is processed before that parameter). This change makes it so that type legalization uses a work list, and only adds instructions to the work list once their parent, type, and operands have been processed. As a result, we should be able to guarantee that an instruction will only be processed once all of its operands have been. One wrinkle here is that in the current IR it is possible to end up with a cycle of uses for global-scope instructions, specifically around interface types and their list of requirements. This change includes a short-term kludge to break those cycles and allow the pass to complete. As it stands, this is simply a refactoring pass and no new functionality is introduced. The changes are necessary to unblock work in a feature branch that depends on type legalization being more robust against IR that might use an unexpected ordering. 06 November 2020, 00:11:56 UTC
c985f5f Standard library save/loadable (#1592) * #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project. * Remove ImportExternalDecl type Added addImportSymbol and ImportSymbol type Fixed bug in container which meant it wouldn't read back AST module * Because of change of how imports and handled, store objects as SerialPointers. * First pass symbol lookup from mangled names. * Cache current module looked up from mangled name. * Fix SourceLoc bug. Improve comments. * Added diagnostic on mangled symbol not being found * Fix typo. * WIP serializing stdlib. * WIP serializing stdlib in. * Fix problem serializing arrays that hold data that is already serialized. * Remove clash of names in MagicTypeModifier. * Make conversion from char to String explicit. Fix reference count issue with SerialReader. * Add code to save/load stdlib. * Use return code to avoid warning - SerialContainerUtil::write(module, options, &stream)) * Make all String numeric ctors explicit. Added isChar to UnownedStringSlice. Added operator== for UnownedStringSlice to String to avoid need to convert to String and allocate. * Add error check to readAllText. * tabs -> spaces on String.h * tab -> spaces String.cpp * Remove msg for StringBuilder, just build inplace for exceptions. * Check SerialClasses - for name clashes. Renamed Modifier::name as Modifier::keywordName * Handling of extensions when deserializing AST - updating the moduleDecl->mapTypeToCandidateExtensions Co-authored-by: Tim Foley <tim.foley.is@gmail.com> 05 November 2020, 18:43:00 UTC
8d4c0ea Improve insertion location for "hoistable" instructions (#1593) The Slang IR builder has a notion of "hoistable" instructions, which are basically those instructions that represent a pure side-effect-free operation on their operands, and which can and should be deduplicated. Most types are "hoistable" instructions. In order to make deduplication of hoistable instructions work, we need to emit them at the right location. Consider if we had: ```hlsl void myFunc<T>(...) { if(someCondition) { vector<T, 4> a = ...; ... } else { vector<T, 4> b = ...; } } ``` The IR instruction that represents `vector<T,4>` can't be inserted at the global scope, because then the parameter `T` would not be visible to it. That instruction also shouldn't be inserted into the same block that declares `a`, because then the instruction itself wouldn't be visible at the point where `b` is declared. The IR builder already has logic to pick the right parent instruction. In the example given, the IR instruction for `vector<T,4>` should be inserted into the body of the IR generic, but outside of the IR function that represents `myFunc`. The problem this change fixes is that while the logic was picking the *parent* for a hoistable instruction correctly, it wasn't putting much care into pick the insertion *location*. The existing strategy amounted to: * If the IR builder was set with an insertion location inside the chosen parent, then use that insertion location * Otherwise, insert at the end of the chosen parent Neither of those options is perfect. Either could lead to an instruction being inserted after one of its uses, and the second option could even lead to a type being inserted *after* the `return` instruction in a function/generic, which violates another structural invariant of our IR (that every block must end with a terminator, and terminators must only appear at the end of blocks). This change updates the rules as follows: * If the type of the instruction being created, or any of its operands are in the chosen parent, then insert immediately after whichever of those instructions is last in that parent. * Otherwise, insert before the first non-decoration, non-parameter child of the chosen parent The combined effect of these two rules is now that we insert any hoistable instruction as early as we can in its parent, without violating the structural validity rules. (One small exception to these rules is that if the parent is the module then we don't worry about ordering and just insert at the end, since order-of-declaration isn't significant at module scope in our IR) All of our existing tests work with this new behavior, although there could conceivably be future cases that lead to complicated breakage. For example, if a pass looks at the first "ordinary" instruction in a block and saves it to use as an insertion point for parameter, and then proceeds to manipulate code in the block before going back and inserting parameters at the chosen location, there is a chance that a hoistable instruction might have been inserted before the chosen insertion point, leading to a parameter being inserted after an ordinary instruction. In general, though, code that works like that would already be playing a dangerous game in that it is manipulating instructions in a block while assuming the first instruction will remain fixed. This change is currently just a refactor, but the underlying issue surfaced as a bug when I made other changes in a feature branch. 05 November 2020, 02:40:57 UTC
0600716 Generate `switch` based dynamic dispatch logic. (#1591) Co-authored-by: Tim Foley <tim.foley.is@gmail.com> 29 October 2020, 17:21:07 UTC
494e09a Handling imported/exporting symbols from serialized modules (#1589) * #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project. * Remove ImportExternalDecl type Added addImportSymbol and ImportSymbol type Fixed bug in container which meant it wouldn't read back AST module * Because of change of how imports and handled, store objects as SerialPointers. * First pass symbol lookup from mangled names. * Cache current module looked up from mangled name. * Fix SourceLoc bug. Improve comments. * Added diagnostic on mangled symbol not being found * Fix typo. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> 29 October 2020, 15:45:56 UTC
1d7a7f2 Add sequential ID cache in Linkage for witness tables and RTTI objects. (#1590) 28 October 2020, 16:38:56 UTC
13945a5 Value type serialization via C++ Extractor (#1588) * #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project. 26 October 2020, 21:10:24 UTC
e702b70 Serialization design doc first pass (#1587) * #include an absolute path didn't work - because paths were taken to always be relative. * WIP on serialization design doc. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. 23 October 2020, 20:39:18 UTC
051b20c C++ extractor fix for access modifiers (#1586) * #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * Remove project references to previously generated files. 23 October 2020, 19:07:10 UTC
6d1fe29 Generate `if` based dispatch logic on GPU targets. (#1585) 23 October 2020, 06:44:11 UTC
10e1bae Single pass C++ extraction (#1583) * #include an absolute path didn't work - because paths were taken to always be relative. * Added CharUtil. Added TypeSet to extractor. First pass at being able to specify all headers for multiple output headers. * Fix includes for new C++ extractor convension. Update premake5 to use new extractor mechanisms. * Small improvements around StringUtil. * Split out NameConventionUtil. * Use a 'convert' to convert between convention types. * Fix output of build message for C++ extractor. Improve NameConventionUtil interface. * Improve comments. * Fix warning on gcc. * Fix clang warning. * Fix some typos in NameConventionUtil. * Small fix to premake5.lua * Fix generated includes. * Remove m_reflectType as no longer applicable with TypeSet. * Fix .gitignore for slang-generated-* files. Added getConvention to determine convention from slice. Add versions of split and convert that infer the from convention * Fix typo in spliting camel. * LineWhitespace -> HorizontalWhitespace * Improve CharUtil comments. 22 October 2020, 12:46:12 UTC
c094366 Bottleneck interface dispatch calls through a single function. (#1584) 21 October 2020, 02:07:14 UTC
624809a Small improvement in AST serialization (#1582) * #include an absolute path didn't work - because paths were taken to always be relative. * Make AST serialization types, marker include _AST_. Ie SLANG_CLASS -> SLANG_AST_CLASS and SLANG_ABSTRACT_CLASS -> SLANG_ABSTRACT_AST_CLASS 20 October 2020, 13:44:48 UTC
9b25d4a Fix saving Repro files on Linux (#1581) * #include an absolute path didn't work - because paths were taken to always be relative. * Ascii mode not always set on FileStream. Remove this-> if not needed. Simplify setting of m_fileAccess. * Fix typo. * Fix typo. * Clear up default FileAccess calculation. * Convert tabs to spaces. * Small naming improvements in FileStream::seek. 19 October 2020, 18:35:20 UTC
acf94e7 Hotfix: Crash due to ContainerDecl->members being altered whislt iterated over (#1580) * #include an absolute path didn't work - because paths were taken to always be relative. * Access the members iteration in _ensureAllDeclsRec via indices to avoid a change in the array invalidating the list. * Fix another iterator of members in SemanticVisitor * Slight improvements to comments - main purpose is to kick a new build. 19 October 2020, 16:05:18 UTC
d3e255b Fix a bug in IR lowering (#1578) The basic problem here is that when a function has multiple declarations with matching signatures (e.g., a forward declaration and then a later definition with a body), the IR lowering logic would lower all declarations whenever the first one was encountered, but then would only register an IR value as the lowered version of the first declaration. Other matching declarations would then run the risk of being lowered again, and in the case where they included features like loops with break/continue labels, that would create the risk of keys getting inserted into certain dictionaries more than one, leading to exceptions. This change ensures that when lowering a function that has multiple matching declarations to IR, we register an IR value for all of those declarations and not just the first. I have added a test case that leads to a crash without this change, to ensure that we don't introduce a regression down the line. 15 October 2020, 20:13:49 UTC
4149bf2 Fix Vk leak (#1579) * #include an absolute path didn't work - because paths were taken to always be relative. * Handle scope of VkShaderModule. * Fix tabbing issue. 15 October 2020, 15:32:34 UTC
4375ceb Add reflection API access to global params type layout (#1577) This change adds a single new entry point to our reflection API that allows an application to query the `TypeLayout` that represents the global-scope shader parameters. This can be used by the application in order to detect when the global parameters have required allocation of a default constant buffer, or simply to unify the handling of the global scope with handling of other kinds of parameters. 14 October 2020, 20:55:45 UTC
9cb3174 Repro test that loads repro (#1576) * #include an absolute path didn't work - because paths were taken to always be relative. * Slang repro test that reloads and runs compiled code. 13 October 2020, 20:30:25 UTC
fab1c9f Support CUDA bindless texture in dynamic dispatch code. (#1575) 09 October 2020, 18:29:11 UTC
11f3317 Make RTTI objects __constant__ in CUDA (#1573) Co-authored-by: Yong He <yhe@nvidia.com> 09 October 2020, 16:55:32 UTC
66ab6f4 NVAPI support doc (#1574) * #include an absolute path didn't work - because paths were taken to always be relative. * Split out NVAPI documentation. Attempt to describe updated usage. * Discuss downstream compiler include paths issues. * Fix links . * Apparently github supports relative links... * Fix typo. 09 October 2020, 13:54:16 UTC
55cd421 Fix C++ emit for `bit_cast` inst. (#1570) Co-authored-by: Yong He <yhe@nvidia.com> 07 October 2020, 07:30:10 UTC
4ad2e52 Use Reflection for (Serial)RefObject Serialization (#1567) * First pass at generalizing serializer. * Split out ReflectClassInfo * Use the general ReflectClassInfo * Fix some typos in debug generalized serialization. * Add calculation of classIds. Make distinct addCopy/add on SerialClasses. * Write up of more generalized serialization * WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types. * Improvements to SerialExtraObjects. Keep RefObjects in scope in factory * Compiles with Serial refactor - doesn't quite work yet. * First pass serialization appears to work with refector. * Split out type info for general slang types. * Split out slang-serialize-misc-type-info.h * DebugSerialData -> SerialSourecLocData DebugSerialReader -> SerialSourceLocReader DebugSerialWriter -> SerialSourceLocWriter * Remove unused template that only compiles on VS. * Fix warning around unused function on non-VS. * Improve output of type names that are in scopes in C++ extractor. Update premake5.lua to run generation for RefObject derived types. * C++ extractor working on RefObject type. * Split out serialization functionality that spans different types into slang-serialization-factory.cpp/.h Put AST type info into header. Removed RefObjectSerialSubType - use RefObjectType Add filtering for RefObject derived types Remove construction and filteringhacks. * Set up field serialization for SerialRefObject derived types. * Fix template problem compiling on Clang/Gcc * Work in progress to make Value types work. * Added slang-value-reflect.cpp 06 October 2020, 21:07:22 UTC
8a70e20 InterlockedExchangeU64 support on RWByteAddressBuffer (#1572) * #include an absolute path didn't work - because paths were taken to always be relative. * Added [__requiresNVAPI] to functions that need nvapi support. * Added support for InterlockedExchangeU64 Added exchange-int64-byte-address-buffer test Fixed typo in cas-int64-byte-address-buffer test * Improve comment around NVAPI usage in hlsl.meta.slang 06 October 2020, 17:30:55 UTC
b6ad8df Added [__requiresNVAPI] to functions missing it (#1571) * #include an absolute path didn't work - because paths were taken to always be relative. * Added [__requiresNVAPI] to functions that need nvapi support. 06 October 2020, 13:47:12 UTC
41d8610 Small fixes for CUDA code emit (#1564) * Small fixes for CUDA code emit * Add a CUDA translation to `GroupMemoryBarrierWithWaveSync()`. We map this to `__syncwarp()` for CUDA (with no mask, implying a full-warp sync). * Consistently use `SLANG_PRELUDE_ASSERT` for assertions introduced in code emit (rather than just using the bare `assert(...)` function, which is not included by our CUDA prelude by default) * Add a new `SLANG_CUDA_STRUCTURED_BUFFER_NO_COUNT` flag to the CUDA prelude that allows the `count` field to be omitted from `(RW)StructuredBuffer<T>`. This is a bit of a hacky because the computed layouts will still assume the `count` field is present, but this feature is required by at least one client application for now. A better long-term fix will take more time to design and implement. * fixup: CUDA prelude code fix for pedantic compilers Co-authored-by: Tim Foley <tim.foley.is@gmail.com> Co-authored-by: Yong He <yonghe@outlook.com> 05 October 2020, 18:10:53 UTC
d930c65 Update the type of a call inst during specialization. (#1569) 05 October 2020, 17:30:27 UTC
3321df7 Handle partial existential parameter type specialization. (#1568) * Specialize exsitentials parameters in struct fields. * Cleanup. * Handle partial existential parameter type specialization. Co-authored-by: Yong He <yhe@nvidia.com> 04 October 2020, 08:40:58 UTC
24ecd1f Use new vulkan debug layer. (#1566) * Use new vulkan debug layer. * Try use VK_LAYER_KHRONOS_validation when it exists. Co-authored-by: Tim Foley <tim.foley.is@gmail.com> 02 October 2020, 19:53:29 UTC
aadf600 Specialize exsitentials parameters in struct fields. (#1565) * Specialize exsitentials parameters in struct fields. * Cleanup. Co-authored-by: Yong He <yhe@nvidia.com> 02 October 2020, 16:49:18 UTC
274c20a Generalizing Serialization (#1563) * First pass at generalizing serializer. * Split out ReflectClassInfo * Use the general ReflectClassInfo * Fix some typos in debug generalized serialization. * Add calculation of classIds. Make distinct addCopy/add on SerialClasses. * Write up of more generalized serialization * WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types. * Improvements to SerialExtraObjects. Keep RefObjects in scope in factory * Compiles with Serial refactor - doesn't quite work yet. * First pass serialization appears to work with refector. * Split out type info for general slang types. * Split out slang-serialize-misc-type-info.h * DebugSerialData -> SerialSourecLocData DebugSerialReader -> SerialSourceLocReader DebugSerialWriter -> SerialSourceLocWriter * Remove unused template that only compiles on VS. * Fix warning around unused function on non-VS. 30 September 2020, 17:28:56 UTC
94d3f2b Add API for whole program compilation. (#1562) * Add API for whole program compilation. This change exposes a new target flag: `SLANG_TARGET_FLAG_GENERATE_WHOLE_PROGRAM` that can be set on a target with `spSetTargetFlags`. When this flag is set, `spCompile` function generates target code for the entire input module instead of just the specified entrypoints. The resulting code will include all the entrypoints defined in the input source. The resulting whole program code can be retrieved with two new functions: `spGetTargetCodeBlob` and `spGetTargetHostCallable`. This change also cleans up the unnecessary `entryPointIndices` parameter of `TargetProgram::getOrCreateWholeProgramResult`, and modifies the `cpu-hello-world` example to make use of the new whole-program compilation API to simplify its logic. * Update comments. 27 September 2020, 03:09:50 UTC
b72353e Enable default cpp prelude. (#1560) * Enable default cpp prelude. * Print the "#include" line as a normal source if the file does not exist. * Bug fix * Fix. * Fix c++ prelude header. * Remove unnecessary fopen call. 24 September 2020, 21:30:12 UTC
150218b Refactor preprocessor API to avoid coupling (#1559) Based on review feedback from #1556, this change updates the Slang preprocessor so that it is no longer coupled to policy details from higher levels of the software stack. In particular, the preprocessor used to: * Deal with updating the list of file paths that a `Module` depends on. * (As of #1556) detect NVAPI-related macro definitions and use them to construct an AST-level `Modifier` attached to the `ModuleDecl`. This change introduces a callback interface where the `Preprocessor` calls out to a `PreprocessorHandler` at certain points during execution, allowing the handler to introduce custom logic that suits a particular high-level use case. This change also removes the dependence of the preprocessor on the `Linkage`, because in practice only a small number of its sub-objects were needed. As a convenience, a wrapper function that takes a `Linkage` was left in place so that the existing call sites didn't have to change very much. 24 September 2020, 20:09:40 UTC
fd2ac53 Fix GLSL output for byte-address loads of vectors (#1558) While working on #1557, it became clear that something was going wrong when using `*ByteAddressBuffer.Load<T>` to load a vector type on GLSL/SPIR-V targets. The root problem was that the IR-level layout logic (which computes the "natural" layout of a type) had not yet been extended to handle vectors. The fix is simple enough, but it highlights the fact that we probably need to go ahead and "complete" that layout logic sooner or later. This change includes a test case that covers the behavior added here, as well as the case that #1557 fixes. Unfortunately, due to CI system limitations, the HLSL/dxc part of the test is not yet enabled. 24 September 2020, 03:49:35 UTC
8954052 Simplify workflow when using NVAPI (#1556) In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code *generated* by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI *and* they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come *before* the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined). 23 September 2020, 22:47:14 UTC
3d063a7 Fix a bug around byte-address buffer loads of vectors (#1557) This problem is only visible when * using `RWByteAddressBuffer.Load<V>`, * when `V` is `vector<T,N>`, * and `V` is a non-32-bit type In such a case, the Slang compiler generates output HLSL like: someBuffer.Load<vector<T,N>>(someOffset); and dxc balks because it fails to parse the `>>` as closing the generics, and instead parses it as a shift operator. The solution here is simple: add a space before the closing `>` when emitting a `.Load<T>` operation. Note that this change does not come with a test fix *yet* because writing the test case exposed a more complicated issue with GLSL codegen for this same scenario. This change includes the simple single-byte fix, to unblock users while we work on a fix for the GLSL case (and that fix will include the test coverage). 23 September 2020, 19:24:16 UTC
a5b0cde Allow #include of absolute paths (#1555) * #include an absolute path didn't work - because paths were taken to always be relative. * Improve comments. * Small comment improvement. 21 September 2020, 19:45:35 UTC
83514bd Enable all dynamic dispatch tests on CUDA. (#1552) * Enable all dynamic dispatch tests on CUDA. * Fix expected cross-compile test results. 21 September 2020, 15:27:10 UTC
21339e8 Serialization fixes based on review of #1547 (#1551) * Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux. * Fixes that came out of review * TranslationUnit -> Module * TargetModule -> TargetComponent * PAYLOAD_MIN_ALIGNMENT -> kPayloadMinAlignment 18 September 2020, 17:35:45 UTC
9a6eec6 Control container serialization with SerialOptionFlags (#1550) * Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux. 18 September 2020, 15:02:06 UTC
2ddca33 Initial attempt to enable CUDA dynamic dispatch codegen (#1549) * Front-load cuda module loading to fill in RTTI pointers. * Enable dynamic dispatch codegen for CUDA. 17 September 2020, 23:46:23 UTC
017acb3 Front-load cuda module loading to fill in RTTI pointers. (#1548) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> 17 September 2020, 22:07:08 UTC
b9cddcb Share debug information between AST and IR (#1547) * Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Typo fix for Clang/Linux. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> 17 September 2020, 20:47:57 UTC
bbf492a Embed default prelude for CUDA (#1546) * Embed default prelude for CUDA Slang supports the notion of a "prelude" that gets prepended to the source code we generate in language. For some targets, a prelude is not necessary (e.g., we compile to HLSL/GLSL and then on to DXBC/DXIL/SPIR-V just fine without a prelude), but some targets have been implemented in a way that makes a prelude necessary (notably CPU and CUDA). For the targets that require a prelude, the Slang codebase includes usable preludes under the `prelude/` directory. Prior to this change, if a user was compiling for such a target (whether via command-line or API), there had to take responsibility for specifying the prelude to use (usually by passing in the contents of the prelude file(s) already included in the Slang distribution). It is reasonable for a user to expect an out-of-the-box experience where compilation to CUDA PTX or native CPU code should Just Work, similarly to how compilation to SPIR-V Just Works. This change is a step in the direction of providing a user experiene that Just Works for common cases. The main addition here is a tool called `slang-embed` that we run during our build to turn the `prelude/*.h` files into `prelude/*.h.cpp` files that embed the contents of the original `.h` file as a `const` variable. By compiling and linking in the generated `.h.cpp` file for the CUDA prelude, we are then able to set the default prelude to use for CUDA at the time a session/linkage is created. That default prelude will be used unless the user manually specifies their own prelude (which current users of the CUDA back-end must be doing). This change only sets up a default prelude for CUDA because of the way that the CPU prelude is split across multiple files. A strategy that provides a good default prelude for CPU may take more work, but that work might also be unnecessary if we switch to a strategy of using LLVM to generate native code. The implementation of the `slang-embed` tool is intentionally simple, and it will likely run into issues if/when we need to embed binary files or larger text files. The assumption being made here is that we can address those issues when they arise, and there is no reason to over-engineer the tool right now. The way that `slang-embed` is integrated into our build process is likely to require some iteration to make sure that it works across all platforms. I expect that this change will have multiple follow-up fixes related to trying to get the build to work as expected across all targets on CI. * fixup: trying to ensure that embedded prelude gets compiled into slang * fixup: properly clean up allocations in slang-embed * fixup: fix double free introduced by previous change * fixup: off-by-one allocation error 17 September 2020, 19:13:50 UTC
0124bf2 Fix an issue with double-counting uniform data for CUDA/CPU (#1545) The `SimpleScopeLayoutBuilder` helper that is used to build up binding information for entry-point parameter lists has logic to try to support both explicit and implicit binding of parameters. This logic was added as part of supporting dual-source color blending on Vulkan. The basic approach is similar to that used for the global scope, where parameters with explicit binding first "carve out" the ranges they claim via a `UsedRangeSet`, and then parameters without explicit binding allocate space from what is left. The logic is (seemingly by accident) also applied to uniform/ordinary data, which creates a problem because the `ScopeLayoutBuilder` base type is also responsible for computing a layout for uniform/ordinary data that is 100% implicit (while dealing with all the relevant alignment restrictions). That logic goes on to add the computed uniform/ordinary resource usage to the computed type layout, but because such a layout has already been computed (albeit without taking alignment into account), the result is that the uniform/ordinary usage is reported at approximately double what it should be. The fix here is to skip uniform/ordinary resource usage when doing the explicit/implicit dance in `SimpleScopeLayoutBuilder`. This approach means that explicit bindings on entry-point `uniform` parameters will only apply to resources (which matches our rules for the global scope, where we don't allow for explicit binding on uniform/ordinary parameters). This is appropriate since the only reason we are supported explicit layout at all is for dual-source color blending (in general, we only support explicit `register` and `[[vk::binding(...)]]` modifiers on global parameters; users are stuck with our computed layouts in all other cases. Co-authored-by: Yong He <yonghe@outlook.com> 17 September 2020, 16:54:49 UTC
3e2cb34 Fix some issues around dim3 for CUDA (#1544) The logic we use to compute `SV_DispatchThreadID` and friends for CUDA makes use of the `gridDim` and `blockDim` built-in variables in CUDA. These variables have type `dim3` which is similar to `uint3` but is considered a distinct type for some reason. The logic for computing the `SV`s currently pretends that `gridDim` and `blockDim` are `uint3`s, and this means that the code they emit doesn't always compile cleanly (although it does in our existing test cases...). This change adds a few overloads that work on `dim3` to the CUDA prelude and that seem to make the code we emit work again. Note: This change should be seen as a somewhat hacky quick fix rather than a real resolution to the underlying issue. It is probably better if we emit code that replaces uses of `gridDim` with `uint3(gridDim.x, gridDim.y, gridDim.z)` instead, to ensure that we get the typing correct, even if the result looks less idiomatic. 17 September 2020, 00:45:07 UTC
8dd0d26 Search for multiple NVRTC versions (#1543) * Search for multiple NVRTC versions The main change here is that when locating the NVRTC compiler we try multiple library names and take the first one that loads successfully (with an ordering that means we try newer versions before older ones). In order to support this change, I needed to fix the wrapping logic that invokes the downstream compiler "locator" function, so that it does not report every failed dynamic library load as an error diagnostic (leading to compilation failure), but instead only reports such failures once the locator has reported failure. The form of the diagnostic output for failures is also changed, in that we now report a single umbrella error about failing to load a downstream compiler, and then report the actuall dynamic library load failures as notes on that diagnostic instead of errors of their own. This choice seems appropriate since for cases like NVRTC it is *not* the case that each failed library load is a compilation error. We only need one of the listed libraries to be loadable, so that reporting them all as errors risks confusing users. One wrinkle that arose during testing is that the 11.0 release of NVRTC dropped support for the `compute_30` target, which had previously been the minimum and default. I had to add logic to check for versions of 11 or greater and switch to `compute_35` as the default. Similar changes may be required as part of supporting newer NVRTC versions if support for more architectures gets deprecated and removed. A more complete implementation of this logic might try to load multiple NVRTC versions such that the Slang compiler can identify a suitable compiler based on the minimum feature level that code actually requires. That kind of cleanup is left as future work, since for most users the current approach will be sufficient. * testing: use verbose mode for running tests by default * fixup: guard against null diagnostic sink 16 September 2020, 21:19:39 UTC
0305099 Support shader parameters that are an array of existential type. (#1542) * Support shader parameters that are an array of existential type. * Rename to getFirstNonExistentialValueCategory Co-authored-by: Yong He <yhe@nvidia.com> 14 September 2020, 19:26:54 UTC
2e3688a Dynamic dispatch bug fixes. (#1541) Co-authored-by: Yong He <yhe@nvidia.com> 14 September 2020, 17:28:22 UTC
5461671 Change the layout we compute/store for parameter groups (#1540) The type layouts we store for parameter group types (`ConstantBuffer<T>` and `ParameterBlock<T>`) has a somewhat complicated internal structure, which we've slowly built up and evolved as we learned more about what was actually needed: * There is the outer `ParameterGroupTypeLayout` that represents the whole type (e.g., the `ParameterBlock<Something>`), and has resource usage/bindings based on what needs to be accounted for by anything (like a program) that uses the type. E.g., for a parameter block on Vulkan, the resource usage at this level would usually just be a single descriptor `set`. * There is the inner "element" layout, which represents the layout for the `Something` in `ParameterBlock<Something>`. This was initially just stored as a type layout (and an extra type layout is stored for backwards compatibility), but we later realized we needed to store a `VarLayout` for the element, to deal with the fact that it might have a non-zero offset. * Finally, there is the inner "container" layout, which represents the resource usage/bindings that are introduced by the block/buffer/group itself. In the case of a `ParameterBlock<Something>` this would include any "default" constant buffer that is needed in order to store the uniform/ordinary data from type `Something`. On targets like Vulkan and D3D12 such a buffer would no show up as part of the resource usage of the overall `ParameterBlock`, nor would it be expected to show as part of the "element" layout. The above is just setting the stage so that we can cover the design choice that this change centers around: for each of the above layouts, what should the *type* stored with that layout be? The answers seem simple at first: * The type for the outer `ParameterGroupTypeLayout` should clearly be the whole type (e.g., the `ParameterBlock<Something>`) * The type for the inner "element" layout should clearly be the element type (e.g., the `Something` in `ParameterBlock<Something>`) * The type for the inner "container" layout should be... hmm... That last question is the thorny one. There are two main options, each with trade-offs: 1. What is being done in the code before this change is to store the whole type (e.g., `ParameterGroup<Something>`) as the type of the "container" layout. This makes some superficial sense (the type of the container should be a container type). 2. What this change switches to is the type of the "container" layout being null (it could equivalently be any sentinel that represents the absence of a meaningful type). While option (1) seems like it would make sense, it risks creating an infinite regress for client application code. If they have a recursive routine that walks the Slang reflection hierarchy, then it will probably key off of the kind of each type it visits. Such a recursive walk would end up trying to treat the outer layout and the inner "container" layout equivalently, when they aren't really representing the same concepts. Even if it seems like this approach defendings against null-pointer crashes in client code, it really only delays them, since the inner "container" layout would yield a null type layout when asked for its element layout. In contrast, option (2) more accurately reflects the reality that the container layout is a `VarLayout` and `TypeLayout` that correspond to no variable/type in practice. Clients of the Slang reflection API already have to deal with `VarLayout`s that have no variable, so it is reasonable for them to deal with `TypeLayout`s that have no type. While the above statements may sound strange, it really comes down to the fact that a "type layout" is really just a way of encoding the "size" of something (where size can encapsulate all the different kinds of resources something can consume on our various targets), and a "variable layout" is really just a way of encoding the "offset" of something (again, where there can be different offsets per consumable resource). In that light, it makes sense that the "container" layout for a parameter group is really just a way of representing the resource allocation of the container itself, and is not associated with any type or variable. This change is technically a breaking change for clients of the reflection API, so it will need to be rolled out with an appropriate change to our version number. 14 September 2020, 15:42:41 UTC
459e788 Remove some "do what I mean" logic from reflection API (#1539) The reflection API had a bit of DWIM (Do What I Mean) logic in that a client could query the resource usage/bindings of a `ParameterBlock<X>` and see not only the register `space` or descriptor `set` for the block itself, but also the constant buffer `register` or `binding` for its default constant buffer (if any). The reason for this behavior was that there was existing client code in Falcor that relied on that behavior for parameter blocks, and even after changing the way that parameter block layouts were computed and stored we sought to maintain backwards compatibility with that client code. The trouble is that the weird behavior then goes on to cause confusion for other clients of the Slang reflection API. This change removes the special-case logic, and fixes up our reflection tests to mirror the new (correct) information that we return. When this change is released, it will be a breaking change for any client code that still relies on the old behavior. We will need to coordinate with client application developers to fix their reflection logic. Note that all the same information can still be accessed, simply by using new reflection API that we have added. 11 September 2020, 20:21:10 UTC
e5b796d Allow existential types in `StructuredBuffer` element type. (#1536) * Allow existential types in `StructuredBuffer` element type. * Handle StructuredBuffer.Load/.Consume methods * Clean up unnecessary changes * Code cleanup * Update test comment 10 September 2020, 22:10:11 UTC
d6a2d29 Add a pass to support resource return values (#1537) A long-standing problem for the Slang implementation has been that some targets (notably GLSL/SPIR-V) do not support treating resources (textures, buffers, samplers, etc.) as first-class types. Resource types on such platforms are restricted so that they may not be used as the type of: 1. fields of aggregate types (`struct`s) 2. local variables 3. function results or `out`/`inout` parameters Issue (1) is handled by our "type legalization" pass today, by splitting aggregates that contain resources into separate fields/variables/parameters. Issue (2) is worked around by putting code into SSA form and promoting local variables to SSA temporaries when possible; the net result is that many local variables of texture type are eliminated (that pass is not perfect, though, and it is possible for users to get errors when it doesn't fully clean up local variables of texture type). Issue (3) is a much more complicated matter, and it is what this change is concerned with. A typical solution to issue (3) is to simply inline all of the code in a program, at which point function results and `out`/`inout` parameters will no longer exist to cause problems. We reject such solutions for two reasons. First, there are limitations on control-flow structure in HLSL/GLSL/SPIR-V that mean they cannot express certain programs after inlining has been performed. Second, and more importantly, the philosophy of the Slang compiler is to perform as little duplication of code as possible, so that we do not accidentally contribute to binary size bloat. Instead, this change tackles the problem of functions that output resource types by adding a new specialization pass. The pass detects functions that ought to be specialized (because they have resource-type outputs), and inspects their bodies to see if the values they output have a predicatable structure that can be replicated outside of the function body. The same logic that inspects the function body also rewrites (a copy of) the function to not have the offending outputs. Finally, all the call sites to a function that is rewritten in this way also get rewritten so that instead of using output values from the function itself, they reproduce the expected output value(s) in their own code. The pass as presented here is intentionally limited in the scope of what it can optimize away (and the test case only touches on that specific functionality). The goal is to get a basic version of this pass in place and evaluated, and then to expand on its functionality incrementally over time. 10 September 2020, 21:12:43 UTC
3a34db6 Test if blob is returned. (#1535) 08 September 2020, 18:48:05 UTC
8740252 Use glslang linux binaries build on TC. (#1534) 04 September 2020, 21:15:00 UTC
8f96289 Allow mixing unspecialized and specialized existential parameters. (#1533) * Allow mixing unspecialized and specialized existential parameters. * Fixes. 04 September 2020, 17:18:44 UTC
5e10f1b Fix a crashing issue for non-end-to-end compilation (#1532) The refactorings that added support for multiple entry points in an output file seemingly introduced a regression such that we crash on compilation that is not "end-to-end." Unfortunately, all of our testing only covers end-to-end compilation, and many users only use that mode. I've added a fix for the issue I ran into, but I haven't addressed the testing gap in this change. Without adding testing for non-end-to-end compilation, I expect further regressions to slip in over time. Co-authored-by: Yong He <yonghe@outlook.com> 04 September 2020, 06:00:53 UTC
5534b0d Rework type layout for ExistentialSpecializedType (#1531) 03 September 2020, 16:35:48 UTC
44929d9 [slang-cpp-extractor] Don't modify generated source if there is no change. (#1530) 02 September 2020, 20:32:53 UTC
a2a7c4d Allow unspecialized existential shader parameters (dynamic dispatch). (#1529) * Allow unspecialized existential shader parameters (dynamic dispatch). * Fixes. * Fixes * disable cuda test 02 September 2020, 19:21:28 UTC
7f567df Add support for (undocumented) HLSL 16-bit bit-cast ops (#1528) As of SM 6.2, the dxc compiler added support for a set of 16-bit bit-cast operations to mirror the `asuint`, `asfloat`, and `asint` operations that were provided for 32-bit scalar types. These operations are not publicly documented, so we didn't think to add them. It should be noted that there was already a similar operation in HLSL, called `f32tof16`, that took as input a `float` and then packed a half-precision version of it into the low bits of a `uint`. The problem is that using that operation for `half`->`uint16_t` conversion required a round trip through a `float`, and downstream compilers seemingly can't optimize away that conversion. This change adds the new operations along with a test that tries to make use of them to ensure the results are what is expected. There are enough cases to cover that I had to write the test in a way where each thread only writes out a subset of the required output. There are two other changes here are that are not directly related to the main feature: First, it seems like the `[__forceInlineEarly]` attribute on some of these overloads interacts poorly with generics, and results in an `IRVectorType` appearing at local scope in the output code. That is semantically reasonable given our IR model, but it would ideally be something that gets eliminated as a result of deduplication of types. For now I've introduced a slight hack to make types always get inlined into their use sites during emission, which should handle the case of locally-defined types. I'm not 100% happy with that solution, but it seemed better than introducing a bunch of unrelated fixes into this PR. Second, the way that conversion operations were being declared for matrix types seems to have been incorrect: we had a single *explicit* initializer added to matrix types via an `extension` that allowed them to be initialized from other matrix types with the same size and *any* element type. In order to support implicit conversions of matrix types, I cribbed the code we were already using to introduce implicit conversion operations for vector types. 02 September 2020, 16:51:25 UTC
c2873f4 Mark f32tof16 and f16tof32 as HLSL intrinsics (#1526) Fixes GitLab issue 85 These functions are intrinsic for HLSL, but were not marked as such, leading to emitting code that manually loops for the vector case. The looping code resulted in lower performance for some users, because apparently dxc was unable (or unwilling?) to unroll the loop, and ended up generating temporary ("stack-allocated") arrays for the vectors produced. As a longer-term solution, we may need to consider how the `VECTOR_MAP...` and `MATRIX_MAP...` idioms used in the stdlib get lowered, so that we can emit fully-unrolled versions in cases where the vector/matrix shape is known at the time we generate code. This PR does not attempt to address that larger issue. 02 September 2020, 01:37:54 UTC
5c56479 Support dynamic existential shader parameters in render-test (#1525) * Support dynamic existential shader parameters in render-test * Fix linux build error. * Fixes. * Fix code review issues. * Fix gcc error. * More fixes. * More fixes. 01 September 2020, 20:47:26 UTC
69025ad AST Serialization in Modules (#1524) * First pass at filter for AST serial writing. * Serialization of AST for modules. * Removed some commented out source. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> 31 August 2020, 17:02:55 UTC
baa789e Add OrderedDictionary to core. (#1523) 28 August 2020, 21:56:53 UTC
back to top