Revision history - None - origin: https://github.com/shader-slang/slang

visit type:

Revision	Author	Date	Message	Commit Date
8ec293c	Yong He	16 June 2020, 01:10:02 UTC	Merge pull request #1390 from csyonghe/glsl-loop Emit [[dont_unroll]] GLSL attribute for [loop] attribute.	16 June 2020, 01:10:02 UTC
926e4bb	Tim Foley	15 June 2020, 20:56:11 UTC	Merge branch 'master' into glsl-loop	15 June 2020, 20:56:11 UTC
3461ed4	Yong He	15 June 2020, 20:55:56 UTC	Specialize function calls involving array arguments. (#1389) Fixes #890. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	15 June 2020, 20:55:56 UTC
d84cfb7	Tim Foley	15 June 2020, 19:05:04 UTC	Remove implicit conversions to `void` (#1388) * Remove implicit conversions to `void` Fixes #1372 The standard library code had accidentally introduced implicit-conversion `__init` operations on the `void` type that accepted each of the other basic types, so that a function written like: ```hlsl void bad() { return 1; } ``` would translate to: ```hlsl void bad() { return (void)1; } ``` The dual problesm are that the input code should have produced a diagnostic of some kind, and the output code doesn't appear to compile correctly through fxc. This change introduces several fixes aimed at this issue: * First, the problem in the stdlib code is plugged: we don't introduce implicit conversion operations to or from `void` (we'd only been banning it in one direction before) * Next, an explicit `__init` was added to `void` that accepts any type so that existing HLSL code that might do `(void) someExpression` to ignore a result will continue to work. This is a compatibility feature, and it might be argued that we should at least warn when it is used. Note that this function is expected to never appear in output HLSL/GLSL because its result will never be used, and it is marked `[__readNone]` allowing calls to it to be eliminated as dead code. * During IR lowering, we now take care to only emit the `IRReturnVal` instruction type if there is a non-`void` value being returned, and use `IRReturnVoid` for both the case where no expression was used in the `return` statement and the case where an expression of type `void` is returned. * A test case was added to confirm that returning `1` from a `void` function isn't allowed, while returning `(void) 1` is. The net result of these changes is that we now produce an error for the bad input code, we allow explicit casts to `void` as a compatibility feature, and we are more robust about treating `void` as if it is an ordinary type in the front-end. * fixup: missing file	15 June 2020, 19:05:04 UTC
7e7425d	Yong He	15 June 2020, 16:05:49 UTC	Merge branch 'master' into glsl-loop	15 June 2020, 16:05:49 UTC
90444f8	Yong He	15 June 2020, 16:04:53 UTC	Generate IRType for interfaces, and reference them as `operand[0]` in IRWitnessTable values (#1387) * Generate IRType for interfaces, and use them as the type of IRWitnessTable values. This results the following IR for the included test case: ``` [export("_S3tu010IInterface7Computep1pii")] let %1 : _ = key [export("_ST3tu010IInterface")] [nameHint("IInterface")] interface %IInterface : _(%1); [export("_S3tu04Impl7Computep1pii")] [nameHint("Impl.Compute")] func %Implx5FCompute : Func(Int, Int) { block %2( [nameHint("inVal")] param %inVal : Int): let %3 : Int = mul(%inVal, %inVal) return_val(%3) } [export("_SW3tu04Impl3tu010IInterface")] witness_table %4 : %IInterface { witness_table_entry(%1,%Implx5FCompute) } ``` * Fixes per code review comments. Moved interface type reference in IRWitnessTable from their type to operand[0]. * Fix typo in comment.	15 June 2020, 16:04:53 UTC
04a81ab	Yong He	13 June 2020, 07:24:21 UTC	Emit [[dont_unroll]] attribute in GLSL	13 June 2020, 07:25:12 UTC
36a06f1	Tim Foley	12 June 2020, 20:30:32 UTC	Diagnose circularly-defined constants (#1384) * Diagnose circularly-defined constants Work on #1374 This change diagnoses cases like the following: ```hlsl static const int kCircular = kCircular; static const int kInfinite = kInfinite + 1; static const int kHere = kThere; static const int kThere = kHere; ``` By diagnosing these as errors in the front-end we protect against infinite recursion leading to stack overflow crashes. The basic approach is to have front-end constant folding track variables that are in use when folding a sub-expression, and then diagnosing an error if the same variable is encountered again while it is in use. In order to make sure the error occurs whether or not the constant is referenced, we invoke constant folding on all `static const` integer variables. Limitations: * This only works for integers, since that is all front-end constant folding applies to. A future change can/should catch circularity in constants at the IR level (and handle more types). * This only works for constants. Circular references in the definition of a global variable are harder to diagnose, but at least shouldn't result in compiler crashes. * This doesn't work across modules, or through generic specialization: anything that requires global knowledge won't be checked * fixup: missing files * fixup: review feedback	12 June 2020, 20:30:32 UTC
2359921	Yong He	12 June 2020, 00:13:27 UTC	Merge pull request #1383 from csyonghe/dyndispatch Add compiler flag to disable specialization pass.	12 June 2020, 00:13:27 UTC
8452129	Yong He	11 June 2020, 18:10:40 UTC	Merge branch 'master' into dyndispatch	11 June 2020, 18:10:40 UTC
1c77c44	jsmall-nvidia	11 June 2020, 18:06:27 UTC	Fix problem with C++ extractor ernoneous concating of type tokens (#1382) * Try to fix problem with C++ extractor concating tokens producing an erroneous result. * Improve naming/comments around C++ extractor fix. * Another small improvement around space concating when outputing token list. * Handle some more special cases for consecutive tokens for C++ extractor concat of tokens.	11 June 2020, 18:06:27 UTC
8de0a2e	Yong He	10 June 2020, 21:57:30 UTC	Add compiler flag to disable specialization pass.	10 June 2020, 21:57:30 UTC
98459ba	Yong He	09 June 2020, 17:35:26 UTC	Merge pull request #1381 from csyonghe/master Generate .tar.gz file in linux release	09 June 2020, 17:35:26 UTC
00e0e25	Yong He	08 June 2020, 23:16:12 UTC	Generate .tar.gz file in linux release	08 June 2020, 23:17:34 UTC
78696a6	jsmall-nvidia	08 June 2020, 19:28:48 UTC	Small fixes/improvements based on review. (#1379)	08 June 2020, 19:28:48 UTC
b3fbb92	Yong He	08 June 2020, 16:28:03 UTC	Merge pull request #1378 from csyonghe/fix Filter lookup results from interfaces in `visitMemberExpr`.	08 June 2020, 16:28:03 UTC
956ede9	Yong He	06 June 2020, 02:38:43 UTC	Filter lookup results from interfaces in `visitMemberExpr`. Fixes #1377	06 June 2020, 02:43:30 UTC
7d4432b	Yong He	06 June 2020, 02:34:55 UTC	Merge pull request #1375 from csyonghe/findtypebynamefix Fix FindTypeByName reflection API not finding stdlib types.	06 June 2020, 02:34:55 UTC
52026c7	Yong He	06 June 2020, 01:34:24 UTC	Merge branch 'master' into findtypebynamefix	06 June 2020, 01:34:24 UTC
43c1467	jsmall-nvidia	05 June 2020, 22:20:09 UTC	ASTNodes use MemoryArena (#1376) * Add a ASTBuilder to a Module Only construct on valid ASTBuilder (was being called on nullptr on occassion) * Add nodes to ASTBuilder. * Compiles with RefPtr removed from AST node types. * Initialize all AST node pointer variables in headers to nullptr; * Initialize AST node variables as nullptr. Make ASTBuilder keep a ref on node types. Make SyntaxParseCallback returns a NodeBase * Don't release canonicalType on dtor (managed by ASTBuilder). * Give ASTBuilders a name and id, to help in debugging. For now destroy the session TypeCache, to stop it holding things released when the compile request destroys ASTBuilders. * Moved the TypeCheckingCache over to Linkage from Session. * NodeBase no longer derived from RefObject. * Only add/dtor nodes that need destruction. First pass compile on linux.	05 June 2020, 22:20:09 UTC
92fc3aa	Yong He	05 June 2020, 20:01:06 UTC	Merge branch 'master' into findtypebynamefix	05 June 2020, 20:01:06 UTC
e3e1cf2	Yong He	05 June 2020, 20:00:32 UTC	Merge pull request #1371 from csyonghe/loop_attrib Emit [loop] attribute to output HLSL.	05 June 2020, 20:00:32 UTC
389be08	Yong He	05 June 2020, 19:57:57 UTC	Fix FindTypeByName reflection API not finding stdlib types.	05 June 2020, 19:57:57 UTC
00db821	Tim Foley	05 June 2020, 17:00:29 UTC	Merge branch 'master' into loop_attrib	05 June 2020, 17:00:29 UTC
3bb7807	Tim Foley	05 June 2020, 17:00:15 UTC	Fixes for active mask synthesis + tests (#1370) * Fixes for active mask synthesis + tests There are two fixes here: * The code generation that follows active mask synthesis was requiring CUDA SM architecture version 7.0 for one of the introduced instructions, but not all of them. This change centralizes the handling of upgrading the required CUDA SM architecture version, and makes sure that the instructions introduced by active mask synthesis request version 7.0. * The tests for active mask synthesis were not flagged as requiring the `cuda_sm_7_0` feature when invoking `render-test-tool`, which meant they run but produce unexpected results when invoked on a GPU without the required semantics for functions like `__ballot_sync()`. This change adds the missing `-render-feature cuda_sm_7_0` to those tests. * fixup: mark more tests that rely on implicit active mask	05 June 2020, 17:00:15 UTC
ecac0c7	jsmall-nvidia	04 June 2020, 21:15:38 UTC	Add a ASTBuilder to a Module (#1369) Only construct on valid ASTBuilder (was being called on nullptr on occassion)	04 June 2020, 21:15:38 UTC
899824e	Yong He	04 June 2020, 21:07:30 UTC	Add missing loop-attribs.slang.hlsl for the test case	04 June 2020, 21:07:30 UTC
2c30d36	Yong He	04 June 2020, 21:06:10 UTC	Remove aborting in emitLoopControlDecoration default case.	04 June 2020, 21:06:10 UTC
5e52b33	Yong He	04 June 2020, 21:04:54 UTC	Change loop-attrib test case to CROSS_COMPILE per review comments	04 June 2020, 21:04:54 UTC
4e2f277	Yong He	04 June 2020, 19:42:17 UTC	Emit [loop] attribute to output HLSL.	04 June 2020, 19:42:17 UTC
f3d637b	Tim Foley	04 June 2020, 18:53:13 UTC	First steps toward inheritance for struct types (#1366) * First steps toward inheritance for struct types This change adds the ability for a `struct` type to declare a base type that is another `struct`: ```hlsl struct Base { int baseMember; } struct Derived : Base { int derivedMember; } ``` The semantics of the feature are that code like the above desugars into code like: ```hlsl struct Base { int baseMember; } struct Derived { Base _base; int derivedMember; } ``` At points where a member from the base type is being projected out, or the value is being implicitly cast to the base type, the compiler transforms the code to reference the implicitly-generated `_base` member. That means code like this: ```hlsl void f(Base b); ... Derived d = ...; int x = d.baseMember; f(d); ``` gets transformed into a form like this: ```hlsl void f(Base b); ... Derived d = ...; int x = d._base.baseMember; f(d._base); ``` Note that as a result of this choice, the behavior when passing a `Derived` value to a function that expects a `Base` (including to inherited member functions) is that of "object shearing" from the C++ world: the called function can only "see" the `Base` part of the argument, and any operations performed on it will behave as if the value was indeed a `Base`. There is no polymorphism going on because Slang doesn't currently have `virtual` methods. In an attempt to work toward inheritance being a robust feature, this change adds a bunch of more detailed logic for checking the bases of various declarations: * An `interface` declaration is only allowed to inherit from other `interface`s * An `extension` declaration can only introduce inheritance from `interface`s * A `struct` declaration can only inherit from at most one other `struct`, and that `struct` must be the first entry in the list of bases This change also adds a mechanism to control whether a `struct` or `interface` in one module can inherit from a `struct` or `interface` declared in another module: * If the base declaration is marked `[open]`, then the inheritance is allowed * If the base declaration is marked `[sealed]`, then the inheritance is allowed * If it is not marked otherwise, a `struct` is implicitly `[sealed]` * If it is not marked otherwise, an `interface` is implicitly `[open]` These seem like reasonable defaults. In order to safeguard the standard library a bit, the interfaces for builtin types have been marked `[sealed]` to make sure that a user cannot declare a `struct` and then mark it as a `BuiltinFloatingPointType`. This step should bring us a bit closer to being able to document and expose these interfaces for built-in types so that users can write code that is generic over them. There are some big caveats with this work, such that it really only represents a stepping-stone toward a usable inheritance feature. The most important caveats are: * If a `Derived` type tries to conform to an interface, such that one or more interface requirements are satisfied with members inherited from the `Base` type, that is likely to cause a crash or incorrect code generation. * If a `Derived` type tries to inherit from a `Base` type that conforms to one or more interfaces, the witness table generated for the conformance of `Derived` to that interface is likely to lead to a crash or incorrect code generation. It is clear that solving both of those issues will be necessary before we can really promote `struct` inheritance as a feature for users to try out. * fixup: trying to appease clang error * fixups: review feedback	04 June 2020, 18:53:13 UTC
1b8731c	jsmall-nvidia	03 June 2020, 21:22:48 UTC	Devirtualize AST types (#1368) * Make getSup work with more general non-virtual 'virtual' mechanism. * WIP: Non virtual AST types. * Project change. * Type doesn't implement equalsImpl * Fix macro invocation Make Overridden functions public to make simply accessible by base types. * Use SLANG_UNEXPECTED. * GetScalarType -> getScalarType Use SLANG_UNEXPECTED instead on ASSERT in NamedExpressionType and TypeType	03 June 2020, 21:22:48 UTC
4e3e7f2	jsmall-nvidia	03 June 2020, 13:41:31 UTC	Disable CUDA active mask tests as failing on CI. (#1367)	03 June 2020, 13:41:31 UTC
d386e27	jsmall-nvidia	02 June 2020, 19:26:51 UTC	Added spGetBuildTagString. (#1365)	02 June 2020, 19:26:51 UTC
f87b632	jsmall-nvidia	02 June 2020, 18:05:35 UTC	Make stdlib path just be the filename. (#1364) * Made bad-operaor-call available on all targets. Fix the line filename to not inclue path, to avoid paths being absolute and therefores value be host environment dependent (causing tests to fail). * Disable on linux because theres still a problem on gcc x86 where the file path is different. * Fix to some typos in bad-operator-call.slang * Fix diagnostic for bad-operator-call.slang	02 June 2020, 18:05:35 UTC
926a0c5	Dietrich Geisler	02 June 2020, 16:12:35 UTC	Working matrix swizzle (#1354) * Working matrix swizzle. Supports one and zero indexing and multiple elements. Performs semantic checking of the swizzle. Matrix swizzles are transformed into a vector of indexing operations during lowering to the IR. This change does not handle matrix swizzle as lvalues. * Renaming * Added missing semicolon * Initialize variable for gcc * Added the expect file for diagnostics * Matrix swizzle updated per PR feedback * Stylistic fix * Formatting fixes * Fix compiling with AST change. Change indentation. Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>	02 June 2020, 16:12:35 UTC
8acb704	jsmall-nvidia	29 May 2020, 19:56:28 UTC	Bug fix problem with ray tracing from fragment shader (#1362) * Added GLSL_460 if ray tracing is used on fragment shader. Moved GLSL specific setup init function. * Split out _requireRayTracing method.	29 May 2020, 19:56:28 UTC
9773495	jsmall-nvidia	29 May 2020, 18:26:48 UTC	NodeBase types constructed with astNodeType member set (#1363) * Maked Substituions derived from NodeBase * * Add astNodeTYpe field to NodeBase * Make Substitutions derived from NodeBase * Make all construction through ASTBuilder * Make getClassInfo non virtual (just uses the astNodeType)	29 May 2020, 18:26:48 UTC
45e414f	jsmall-nvidia	29 May 2020, 15:01:24 UTC	Update target compat doc to include Wave intrinsics for CUDA (#1361)	29 May 2020, 15:01:24 UTC
f3d7042	jsmall-nvidia	29 May 2020, 12:36:10 UTC	Feature/ast syntax standard (#1360) * Small improvements to documentation and code around DiagnosticSink * Made methods/functions in slang-syntax.h be lowerCamel Removed some commented out source (was placed elsewhere in code) * Making AST related methods and function lowerCamel. Made IsLeftValue -> isLeftValue.	29 May 2020, 12:36:10 UTC
95597d7	jsmall-nvidia	28 May 2020, 19:32:19 UTC	Small improvements to documentation and code around DiagnosticSink (#1359)	28 May 2020, 19:32:19 UTC
c2d3134	jsmall-nvidia	28 May 2020, 18:01:51 UTC	WIP: ASTBuilder (#1358) * Compiles. * Small tidy up around session/ASTBuilder. * Tests are now passing. * Fix Visual Studio project. * Fix using new X to use builder when protectedness of Ctor is not enough. Substitute->substitute * Add some missing ast nodes created outside of ASTBuilder. * Compile time check that ASTBuilder is making an AST type. * Moced findClasInfo and findSyntaxClass (essentially the same thing) to SharedASTBuilder from Session.	28 May 2020, 18:01:51 UTC
e5d0f33	Tim Foley	26 May 2020, 22:11:22 UTC	Synthesize "active mask" for CUDA (#1352) * Synthesize "active mask" for CUDA The Big Picture =============== The most important change here is to `hlsl.meta.slang`, where the declaration of `WaveGetActiveMask()` is changed so that instead of mapping to `__activemask()` on CUDA (which is semantically incorrect) it maps to a dedicated IR instruction. The other `WaveActive()` intrinsics that make use of the implicit "active mask" concept had already been changed in #1336 so that they explicitly translate to call the equivalent `WaveMask()` intrinsic with the result of `WaveGetActiveMask()`. As a result, all of the `WaveActive()` functions are now no different from a user-defined function that uses `WaveGetActiveMask()`. The bulk of the work in this change goes into an IR pass to replace the new instruction for getting the active mask gets replaced with appropriately computed values before we generate output CUDA code. That work is in `slang-ir-synthesize-active-mask.{h,cpp}`. Utilities ========= There are a few pieces of code that were helpful in writing the main pass but that can be explained separately: IR instructions were added corresponding to the Slang `WaveMaskBallot()` and `WaveMaskMatch()` functions, which map to the CUDA `__ballot_sync()` and `__match_any_sync()` operations, respectively. These are only implemented for the CUDA target because they are only being generated as part of our CUDA-only pass. * The `IRDominatorTree` type was updated to make it a bit more robust in the presence of unreachable blocks in the CFG. It is possible that the same ends could be achieved more efficiently by folding the corner cases into the main logic, but I went ahead and made things very explicit for now. * I added an `IREdge` utility type to better encapsulate the way that certain code operating on the predecessors/successors of an `IRBlock` were using an `IRUse` to represent a control-flow edge. The `IREdge` type makes the logic of those operations more explicit. A future change should proably change it so that `IRBlock::getPredecessors()` and `getSuccessors()` are instead `getIncomingEdges()` and `getOutgoingEdges()` and work as iterators over `IREdge` values, given the way that the predecessor and successor lists today can contain duplicates. Using the above `IREdge` type, the logic for detecting and break critical edges was broken down into something that is a bit more clear (I hope), and that also factors out the breaking of an edge (by inserting a block along it) into a reusable subroutine. The Main Pass ============= The implementation of the new pass is in `slang-ir-synthesize-active-mask.cpp`, and that file attempts to include enough comments to make the logic clear. A brief summary for the benefit of the commit history: * The first order of business is to identify functions that need to have the active mask value piped into them, and to add an additional parameter to them so that the active mask is passed down explicitly. Call sites are adjusted to pass down the active mask which can then result in new functions being identified as needing the active mask. * The next challenge is for a function that uses the active mask, to compute the active mask value to use in each basic block. The entry block can easily use the active mask value that was passed in, while other blocks need more work. * When doing a conditional branch, we can compute the new mask for the block we branch to as a function of the existing mask and the branch condition. E.g., the value `WaveMaskBallot(existingMask, condition)` can be used as the mask for the "then" block of an `if` statement. * When control flow paths need to "reconverge" at a point after a structured control-flow statement, we need to insert logic to synchronize and re-build the mask that will execute after the statement, while also excluding any lanes/threads that exited the statement in other ways (e.g., an early `return` from the function). The explanation here is fairly hand-wavy, but the actual pass uses much more crisp definitions, so the code itself should be inspected if you care about the details. Tests ===== The tests for the new feature are all under `tests/hlsl-intrinsic/active-mask/`. Most of them stress a single control-flow construct (`if`, `switch`, or loop) and write out the value of `WaveGetActiveMask()` at various points in the code. In practice, our definition of the active mask doesn't always agree with what D3D/Vulkan implementations seem to produce in practice, and as a result a certain amount of effort has gone into adding tweaks to the tests that force them to produce the expected output on existing graphics APIs. These tweaks usually amount to introducing conditional branches that aren't actually conditional in practice (the branch condition is always `true` or always `false` at runtime), in order to trick some simplistic analysis approaches that downstream compilers seem to employ. One test case currently fails on our CUDA target (`switch-trivial-fallthrough.slang`) and has been disabled. This is an expected failure, because making it produce the expected value requires a bit of detailed/careful coding that would add a lot of additional complexity to this change. It seemed better to leave that as future work. Future Work =========== * As discussed under "Tests" above, the handling of simple `switch` statements in the current pass is incomplete. * There's an entire can of worms to be dealt with around the handling of fall-through for `switch`. * The current work also doesn't handle `discard` statements, which is unimportant right now (CUDA doesn't have fragment shaders), but might matter if we decide to synthesize masks for other targets. Similar work would probably be needed if we ever have `throw` or other non-local control flow that crosses function boundaries. * An important optimization opportunity is being left on the floor in this change. When block that comes "after" a structured control-flow region (which is encoded explicitly in Slang IR and SPIR-V) post-dominates the entry block of the region, then we know that the active mask when exiting the region must be the same as the mask when entering the region, and there is no need to insert explicit code to cause "re-convergence." This should be addressed in a follow-on change once we add code to Slang for computing a post-dominator tree from a function CFG. * Related to the above, the decision-making around whether a basic block "needs" the active mask is perhaps too conservative, since it decides that any block that precedes one needing the active mask also needs it. This isn't true in cases where the active mask for a merge block can be inferred by post-dominance (as described above), so that the blocks that branch to it don't need to compute an active mask at all. * If/when we extend the CPU target to support these operations (along with SIMD code generation, I assume), we will also need to synthesize an active mask on that platform, but the approach taken here (which pretty much relies on support for CUDA "cooperative groups") wouldn't seem to apply in the SIMD case. * Similarly, the approach taken to computing the active mask here requires a new enough CUDA SM architecture version to support explicit cooperative groups. If we want to run on older CUDA-supporting architectures, we will need a new and potentially very different strategy. * Because the new pass here changes the signature of functions that require the active mask (and not those that don't), it creates possible problems for generating code that uses dynamic dispatch (via function pointers). In principle, we need to know at a call site whether or not the callee uses the active mask. There are multiple possible solutions to this problem, and they'd need to be worked through before we can make the implicit active mask and dynamic dispatch be mutually compatible. * Related to changing function signatures: no effort is made in this pass to clean up the IR type of the functions it modifies, so there could technically be mismatches between the IR type of a function and its actual signature. If/when this causes problems for downstream passes we probably need to do some cleanup. * fixup: backslash-escaped lines I did some "ASCII art" sorts of diagrams to explain cases in the CFG, and some of those diagrams used backslash (`\`) characters as the last character on the line, causing them to count as escaped newlines for C/C++. The gcc compiler apparently balked at those lines, since they made some of the single-line comments into multi-line comments. I solved the problem by adding a terminating column of `\|` characters at the end of each line that was part of an ASCII art diagram. * fixup: typos Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>	26 May 2020, 22:11:22 UTC
b136904	jsmall-nvidia	26 May 2020, 17:53:10 UTC	Improvements around hashing (#1355) * Fields from upper to lower case in slang-ast-decl.h * Lower camel field names in slang-ast-stmt.h * Fix fields in slang-ast-expr.h * slang-ast-type.h make fields lowerCamel. * slang-ast-base.h members functions lowerCamel. * Method names in slang-ast-type.h to lowerCamel. * GetCanonicalType -> getCanonicalType * Substitute -> substitute * Equals -> equals ToString -> toString * ParentDecl -> parentDecl Members -> members * * Make hash code types explicit * Use HashCode as return type of GetHashCode * Added conversion from double to int64_t * Split Stable from other hash functions * toHash32/64 to convert a HashCode to the other styles. GetHashCode32/64 -> getHashCode32/64 GetStableHashCode32/64 -> getStableHashCode32/64 * Other Get/Stable/HashCode32/64 fixes * GetHashCode -> getHashCode * Equals -> equals * CreateCanonicalType -> createCanonicalType * Catches of polymorphic types should be through references otherwise slicing can occur. * Fixes for newer verison of gcc. Fix hashing problem on gcc for Dictionary. * Another fix for GetHashPos * Fix signed issue around GetHashPos	26 May 2020, 17:53:10 UTC
ee2ec68	Yong He	25 May 2020, 21:53:15 UTC	Merge pull request #1356 from csyonghe/master Remove non-ascii characters from source file.	25 May 2020, 21:53:15 UTC
fd28dcf	Yong He	25 May 2020, 20:07:26 UTC	Remove non-ascii characters from source file.	25 May 2020, 20:07:26 UTC
076a4c0	jsmall-nvidia	22 May 2020, 18:21:37 UTC	Tidy up around AST nodes (#1353) * Fields from upper to lower case in slang-ast-decl.h * Lower camel field names in slang-ast-stmt.h * Fix fields in slang-ast-expr.h * slang-ast-type.h make fields lowerCamel. * slang-ast-base.h members functions lowerCamel. * Method names in slang-ast-type.h to lowerCamel. * GetCanonicalType -> getCanonicalType * Substitute -> substitute * Equals -> equals ToString -> toString * ParentDecl -> parentDecl Members -> members	22 May 2020, 18:21:37 UTC
daf53bb	jsmall-nvidia	21 May 2020, 19:36:02 UTC	Non virtual accept implementation on AST types (#1351) * First pass impl of making accept on AST node types non virtual. * A single switch for ITypeVistor on Val type. * Use ORIGIN to choose ITypeVisitor dispatch. * Don't use ORIGIN - we don't need special handling for ITypeVisitor on Val derived types.	21 May 2020, 19:36:02 UTC
d90ae36	jsmall-nvidia	21 May 2020, 18:06:18 UTC	AST dump improvements (#1350) * Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes. * First pass at field dumping. * Update project for field dumping. * WIP AST Dumper. * More AST dump compiling. * Fix bug in StringSlicePool where it doesn't use the copy of the UnownedStringSlice in the map. * Add support for SLANG_RELFECTED and SLANG_UNREFLECTED More AST dump support. * Support for hierarchical dumping/flat dumping. Use SourceWriter to dump. * Add -dump-ast command line option. * Add fixes to VS project to incude AST dump. * Fix compilation on gcc. * Add fix for type ambiguity issue on x86 VS. * Fixes from merge of reducing Token size. * Fix comment about using SourceWriter. * Improvement AST dumping around * Pointers (write as hex) * Scope * Turn off some unneeded fields in AST hierarchy * Only output the initial module in full.	21 May 2020, 18:06:18 UTC
96a00c8	jsmall-nvidia	20 May 2020, 14:56:49 UTC	AST dumping via C++ Extractor reflection (#1348) * Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes. * First pass at field dumping. * Update project for field dumping. * WIP AST Dumper. * More AST dump compiling. * Fix bug in StringSlicePool where it doesn't use the copy of the UnownedStringSlice in the map. * Add support for SLANG_RELFECTED and SLANG_UNREFLECTED More AST dump support. * Support for hierarchical dumping/flat dumping. Use SourceWriter to dump. * Add -dump-ast command line option. * Add fixes to VS project to incude AST dump. * Fix compilation on gcc. * Add fix for type ambiguity issue on x86 VS. * Fixes from merge of reducing Token size. * Fix comment about using SourceWriter.	20 May 2020, 14:56:49 UTC
c54c957	jsmall-nvidia	19 May 2020, 19:37:40 UTC	Reduce the size of Token (#1349) * Token size on 64 bits is 24 bytes (from 40). On 32 bits is 16 bytes from 24. * Added hasContent method to Token. Some other small improvements around Token.	19 May 2020, 19:37:40 UTC
1de1431	jsmall-nvidia	18 May 2020, 19:24:05 UTC	Use 'balance' for extracting array suffix in C++ extractor (#1346) * Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes.	18 May 2020, 19:24:05 UTC
4397d90	Tim Foley	15 May 2020, 21:42:02 UTC	Add a few missing enum cases to slang.h (#1347) These are cases where the C-style API had the appropriate `SLANG_*` case defined, but the wrapper C++ `enum class` declaration didn't have a matching case.	15 May 2020, 21:42:02 UTC
d41f5d4	jsmall-nvidia	14 May 2020, 22:28:50 UTC	Change to make a single implementation of SLANG_ABSTRACT_CLASS and SLANG_CLASS, to simplify macro injection and not require macro redefinition in each file. (#1345)	14 May 2020, 22:28:50 UTC
daaae74	jsmall-nvidia	14 May 2020, 18:21:38 UTC	Add support for parsing array types to C++ extractor. (#1343)	14 May 2020, 18:21:38 UTC
f5dfa1e	Tim Foley	11 May 2020, 21:57:05 UTC	Add GLSL translation for HLSL fmod() (#1342) The existing code was assuming `fmod()` was available as a builtin in GLSL, which isn't true. It also isn't possible to translate the HLSL `fmod()` to the GLSL `mod()` since the two have slightly different semantics. This change introduces a definition for `fmod(x,y)` that amounts to `x - y*trunc(x/y)` which should agree with the HLSL version except in corner cases (e.g., there are some cases where the HLSL version returns `-0` and this one will return `0`).	11 May 2020, 21:57:05 UTC
798f3bc	jsmall-nvidia	08 May 2020, 18:31:40 UTC	AST nodes using C++ Extractor (#1341) * Extractor builds without any reference to syntax (as it will be helping to produce this!). * Change macros to include the super class. * WIP replacing defs files. * Added indexOf(const UnownedSubString& in) to UnownedSubString. Refactored extractor * Output a macro for each type with the extracted info - can be used during injection in class * Simplify the header file - as can get super type and last from macro now * Store the 'origin' of a definition * Some small tidy ups to the extractor. * Improve comments on the extractor options. * Made CPPExtractor own SourceOrigins * Small fixes around SourceOrigin. * Small tidy up around macroOrign * WIP Visitor seems now to work correctly. Split out types used by ast into slang-ast-support-types.h * Fix remaining problems with C++ extractor being used with AST nodes. Add CountOf to extractor type ids. Added ReflectClassInfo::getInfo to turn an ASTNodeType into a ReflectClassInfo * Fix compiling on linux. Fix typo in memset. * Small tidy up around comments/layout. Moved NodeBase casting to NodeBase. * Make premake generate project that builds with cpp-extractor for AST. * Get the source directory from the filter in premake. * Fix typo in source path * Explicitly set the source path for premake generation for AST. * Special case handling of override to apease Clang. * Use a more general way to find the slang-ast-reflect.h file to run the extractor. * Appveyor is not triggering slang-cpp-extractor - try putting dependson together. * Put building slang-cpp-extractor first. * Disable some project options to stop MSBuild producing internal compiler errors. * Try reordering the projects in premake5.lua * Hack to try and make slang-cpp-extractor built on appveyor. * Disable flags - not required for MSBuild on appveyor. * Disable flags not required for build on AppVeyor. * Updated Visual Studio projects with slang-cpp-extractor. * Added Visual Studio slang-cpp-extractor project.	08 May 2020, 18:31:40 UTC
c16abd4	jsmall-nvidia	07 May 2020, 19:00:33 UTC	Enhanced C++ extractor (#1340) * Extractor builds without any reference to syntax (as it will be helping to produce this!). * Change macros to include the super class. * Added indexOf(const UnownedSubString& in) to UnownedSubString. Refactored extractor * Output a macro for each type with the extracted info - can be used during injection in class * Simplify the header file - as can get super type and last from macro now * Store the 'origin' of a definition * Some small tidy ups to the extractor. * Improve comments on the extractor options. * Made CPPExtractor own SourceOrigins * Small fixes around SourceOrigin. * Small tidy up around macroOrign	07 May 2020, 19:00:33 UTC
9245460	Tim Foley	05 May 2020, 16:36:34 UTC	Try to stop AppVeyor from clobbering release logs (#1339) We trigger releases by creating tags on GitHub, and ideally we try to include a commit log of stuff that changed since the previous release. Unfortunately, the way that AppVeyor CI is currently set up, it clobbers the information for a release (including its description) as part of the deployment step, so all releases just list `Slang <verison>` as the description even if the programmer manually created a suitable release log message. This change tweaks a setting in `appveyor.yml` that seems (AFAICT from the documentation and various issues I've been reading) to make it so that AppVeyor will simply push the binary artifacts to the release if it already exists, rather than set all of the release information from scratch. We will have to wait until we next do a release to see how this works.	05 May 2020, 16:36:34 UTC
2dcfce4	Tim Foley	04 May 2020, 20:10:35 UTC	Disable a test that was breaking on CI (#1338) The CI system doesn't have a new enough dxc to support 16-bit load/store from byte-addressed buffers, so I am disabling the test for now. A better long-term fix is to put an appropriate version of dxc into `slang-binaries` and use that for our CI tests.	04 May 2020, 20:10:35 UTC
f599788	jsmall-nvidia	04 May 2020, 17:46:24 UTC	C++ Extractor (#1337) * WIP: Doing texing using slangs lexer for cpp-extractor * Node tree for C++ extraction. * Bug fixing. Add dump of hierarchy. * First pass at extracting fields. * Parse template types. * Use diagnostics defs for C++ extractor. * Simplify Diagnostic Defs. * Remove the brace stack. * Added IdentifierLookup. * Add handling for >> style template close. * Improved identifier handling/keywords. * Added ability to check if reader is at cursor position. * Handling of an unspecified root type. * Parsing code comments. Tidy up some parsing - to use advanceIf functions more. * Improve path handling. * Fixes around changes to Path interface. * Working Range, Type and Scope header. * Extract the middle part of marker and put in output. Gives more flexibility at macro injection, and in class definitions. * Split DERIVED types into it's own macro, to provide way to generate for derived types. * Fix clang/g++ compile issue. * Tabs -> spaces. * Fix small bug in getFileNameWithoutExt * Small improvement around naming. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	04 May 2020, 17:46:24 UTC
5d3a737	Tim Foley	04 May 2020, 16:06:55 UTC	Make stdlib WaveActive* call WaveMask* (#1336) This change makes the various `WaveActive()` functions have default implementations that call `WaveMask()` passing `WaveActiveMask()`. The new definitions will be used during CUDA code generation, which simplifies some of the duplication that was occuring in the `__target_intrinsic` modifiers. This change does not add logic to make computation of `WaveGetActiveMask()` corect on CUDA, so these functions will still fail to provide the behavior that users need/expect. A future change will need to add logic to synthesize the value of `WaveGetActiveMask()` automatically.	04 May 2020, 16:06:55 UTC
c697fe5	Tim Foley	01 May 2020, 15:53:11 UTC	Improve GLSL coverage of boolean binary ops (#1335) * Improve GLSL coverage of boolean binary ops This change ensures that the `&&`, `\|\|`, `&`, `\|`, and `^` apply correctly to vectors of `bool` values when targetting GLSL. Most of the changes are in the GLSL emit path, where the IR instructions for these operators are bottlenecked through a small set of helper routines to cover the different cases. In general: * The vector variants of the operations are implemented by casting to `uint` vectors, performing bitwise ops, then casting back * The scalar variants are handled by conveting the bitwise operations to their equivalent logical operator (the one interesting case there is bitwise `^` where the equivalent logical operation on `bool` is `!=`) This change makes it clear that our IR really shouldn't have distinct opcodes for logical vs. bitwise and/or/xor, and instead should just have a single family of operations where the behavior differs based on the type of the operand. That is already de facto the way things work (a user can always write `&`, `\|` and `^` and expect them to work on `bool` and vectors of `bool`), so that the GLSL output path has to deal with the overlap. Having two sets of IR ops here actually makes for more code instead of less. * Fixups: review feedback and test ! operator	01 May 2020, 15:53:11 UTC
bd8562c	Tim Foley	27 April 2020, 22:17:54 UTC	Add support for generic load/store on byte-addressed buffers (#1334) * Add support for generic load/store on byte-addressed buffers Introduction ============ The HLSL `ByteAddressBuffer` types originaly only supported loading/storing `uint` values or vectors of the same, using `Load`/`Load2`/`Load3`/`Load4` or `Store`/`Store2`/`Store3`/`Store4`. More recent versions of dxc have added support for generic `Load<T>` and `Store<T>`, which adds a two main pieces of functionality for users. The first and more fundamental feature is that `T` can be a type that isn't 32 bits in size (or a vector with elements of such a type), thus exposing a capability that is difficult or impossible to emulate on top of 32-bit load/store (depending on what guarantees `StructuredBuffer` makes about the atomicity of loads/stores). The secondary benefit of having a generic `Load<T>` and `Store<T>` is that it becomes possible to load/store types like `float` without manual bit-casting, and also becomes possible to load/store `struct` types so long as all the fields are loadable/storable. This change adds generic `Load<T>` and `Store<T>` to the Slang standard library definition of byte-address buffers, and tries to bring those same benefits to as many targets as possible. In particular, the secondary benefits become available on all targets, including DXBC: byte-address buffers can be used to directly load/store types other than `uint`, including user-defined `struct` types, so long as all of the fields of those types can be loaded/stored. The ability to load/store non-32-bit types depends on target capabilities, and so is only available where direct support for those types is available. For 16-bit types like `half` this includes both Vulkan and D3D12 DXIL with appropriate extensions or shader models. The implementation is somewhat involved, so I will try to explain the pieces here. Standard Library ================ The changes to the Slang standard library in `hlsl.meta.slang` are pretty simple. We add new `Load<T>` and `Store<T>` generic methods to `ByteAddressBuffer`, and route them through to a new IR opcode. Right now the generic `Load<T>` and `Store<T>` do not* place any constraints on the type `T`, although in practice they should only work when `T` is a fixed-size type that only contains "first class" uniform/ordinary data (so no resources, unless the target makes resource types first class). Our front-end checking cannot currently represent first-class-ness and validate it (nor can it represent fixed-size-ness), so these gaps will have to do for now. Rather than directly translate `Load<T>` or `Store<T>` calls into a single instruction, we instead bottleneck them through internal-use-only subroutines. The design choice here is intended to ensure that for some large user-defined type like `MassiveMaterialStruct` we only emit code for loading all of its fields once in the output HLSL/GLSL rather than once per load site. While downstream compilers are likely to inline all of this logic anyway, we are doing what we can to avoid generating bloated code. Emit and C++/CUDA ================= Over in `slang-emit-c-like.cpp` we translate the new ops into output code in a straightforward way. A call like `obj.Load<Foo>(offset)` will eventually output as a call like `obj.Load<Foo>(offset)` in the generated code, by default. For the CPU C++ and CUDA C++ codegen paths, this is enough to make a workable implementation, and we add suitable templated `Load<T>` and `Store<T>` declarations to the prelude for those targets. Legalization ============ For targets like DXBC and GLSL there is no way to emit a load operation for an aggregate type like a `struct`, so we introduce a legalization pass on the IR that will translate our byte-address-buffer load/store ops into multiple ops that are legal for the target. Scalarization ------------- The big picture here is easy enough to understand: when we see a load of a `struct` type from a byte-address buffer, we translate that into loads for each of the fields, and then assemble a new `struct` value from the results. We do similar things for arrays, matrices, and optionally for vectors (depending on the target). Bit Casting ----------- After scalarization alone, we might have a load of a `float` or a `float3` that isn't legal for D3D11/DXBC, but that would be legal if we just loaded a `uint` or `uint3` and then bit-casted it. The legalization pass thus includes an option to allow for loads/stores to be translated to operate on a same-size unsigned integer type and then to bit-cast. To make this work actually usable, I had to add some more details to the implementation of the bit-cast op during HLSL emit and, more importantly, I had to customize the way that the byte-address buffer load/store ops get emitted to HLSL so that it prefers to use the existing operations like `Load`/`Load2`/`Load3`/`Load4` instead of the generic one, whenever operating on `uint`s or vectors of `uint`. Translation to Structured Buffers --------------------------------- Even after scalarizing all byte-address-buffer loads/stores, we still have a problem for GLSL targets, because a single global `buffer` declaration used to back a byte-address buffer can only have a single element type (currently always `uint`), so the granularity of loads/stores it can express is fixed at declaration time. If we want to load a `half` from a byte-address buffer, we need a dedicated `buffer` declaration in the output GLSL with an element type of `half`. The solution we employ here is to translate all byte-address buffer loads into "equivalent" structured-buffer ops when targetting GLSL. We add logic to find the underlying global shader parameter that was used for a load/store and introduce a new structured-buffer parameter with the desired element type (e.g., `half`) and then rewrite the load/store op to use that buffer instead. We copy layout information from the original buffer to the new one, so that in the output GLSL all the various `buffer`s will use a single `binding` and thus alias "for free." We don't want to create a new global buffer for every load/store, so we try to cache these "equivalent" structured buffers as best as we can. For the caching I ended up needing a pair to use as a key, so I tweaked the `KeyValuePair<K,V>` type in `core` so that it could actually work for that purpose. Because we are working at the level of IR instructions instead of stdlib functions at this work I had to add new IR opcodes to represent structured-buffer load/store that only (currently) apply to GLSL. Layout ====== In order to translate a load/store of a `struct` type into per-field load/store we need a way to access layout information for the types of the fields. Previously layout information has been an AST-level concern that then gets passed down to the IR only when needed and only on global parameters, so layout information isn't always available in cases like this, at the actual load/store point. As an expedient move for now I've introduced a dedicated module that does IR-level layout and caches its results on the IR types themselves. This approach only supports the "natural" layout of a type, and thus is usable for structured buffers and byte-address buffers (or general pointer load/store on targets that support it), but which is not usable for things like constant buffer layout. We've known for a while that the Right Way to do layout going forward is to have an IR-based layout system, and this could either be seen as a first step toward it, or else as a gross short-term hack. YMMV. Details ======= The GLSL "extension tracker" stuff around type support needed to be tweaked to recognize that types like `int16_t` aren't actually available by default. I switched it from using a "black list" of unavailable types at initialization time over to using a "white list" of types that are known to always be available without any extensions. Tests ===== There are two tests checked in here: one for the basic case of a `struct` type that has fields that should all be natively loadable, and one that stresses 16-bit types. Each test uses both load and store operations. Future Directions ================= Right now we translate vector load/store to GLSL as load/store of individual scalars, which means the assumed alignment is just that of the scalars (consistent with HLSL byte-address buffer rules). We could conceivably introduce some controls to allow outputting the vector load/store ops more directly to GLSL (e.g., declaring a `buffer` of `float4`s), which might enable more efficient load/store based on the alignment rules for `buffer`s. The IR layout work has a number of rough edges, but the most worrying is probably the assumption that all matrices are laid out in row-major order. Slang really needs an overhaul of its handling of matrices and matrix layout, so I don't know if we can do much better in the near term. At some point the IR-based layout system needs to be reconciled with our current AST-base layout, and we need to figure out how "natural" layout and the currently computed layouts co-exist (in particular, we need to make sure that the IR-based layout and the existing layout logic for structured buffers will agree). This probably needs to come along once we have moved the core layout logic to operate on IR types instead of AST types (a change we keep talking about). As part of this work I had to touch the implementation of bit-casting for HLSL, and it seems like that logic has some serious gaps. We really ought to consider a separate legalization pass that can turn IR bitcast instructions into the separate ops that a target actually supports so that we can implement `uint64_t`<->`double` and other conersions that are technically achievable, but which are hard to express in HLSL today. * fixup: missing files	27 April 2020, 22:17:54 UTC
6f5c250	jsmall-nvidia	23 April 2020, 18:40:01 UTC	Small improvements around atomics (#1333) * Use the original value in the test. Run test on VK. * Added RWBuffer and Buffer types to C++ prelude. * Add vk to atomics.slang tests * Update target-compatibility around atomics. When tests disabled in atomics-buffer.slang explained why. * tabs -> spaces. * Small docs improvement.	23 April 2020, 18:40:01 UTC
806ab08	jsmall-nvidia	23 April 2020, 14:04:10 UTC	Fix typo in TypeTextUtil::getCompileTargetName(SlangCompileTarget target) (#1332)	23 April 2020, 14:04:10 UTC
e45f8c1	Tim Foley	22 April 2020, 18:09:09 UTC	Disable OptiX tests by default. (#1331) When running `slang-test`, the OptiX tests will be skipped by default for now, and must be explicitly enabled by adding `-category optix` on the command line. I will need to add a better discovery mechanism down the line, closer to how support for different graphics APIs is being tested, but for now this should be enough to unblock our CI builds.	22 April 2020, 18:09:09 UTC
58904b5	Tim Foley	21 April 2020, 22:41:52 UTC	Diagnose attempts to call instance methods from static methods (#1330) Currently we fail to diagnose code that calls an instance method from a static method using implicit `this`, and instead crash during lowering of the AST to the IR. This change introduces a bit more detail to the "this parameter mode" that is computed during lookup, so that it differentiates three cases. The existing two cases of a mutable `this` and immutable `this` remain, but we add a third case where the "this parameter mode" only allows for a reference to the `This` type. When turning lookup "breadcrumb" information into actual expressions, we respect this setting to construct either a `This` or `this` expression. In order to actually diagnose the incorrect reference, I had to add code around an existing `TODO` comment that noted how we should diagnose attempts to refer to instance members through a type. Enabling that diagnostic revealed a missing case needed by generics (including those in the stdlib) - a type-constraint member is always referenced statically. Putting the diagnostic for a static reference to a non-static member in its new bottleneck location meant that some code higher up the call static that handles explicit static member references had to be tweaked to not produce double error messages. This change includes a new diagnostic test to show that we now give an error message on code that makes this mistake, instead of crashing.	21 April 2020, 22:41:52 UTC
2c55d37	jsmall-nvidia	21 April 2020, 21:16:22 UTC	Fix for a generic definition, followed by a declaration with target intrisic causing a crash (#1329) * * Make a 'definition' if a function has a body or a target intrinsic defined * Added test for this situation. * Fix tab. * Fix single-target-intrisic.slang expected output. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	21 April 2020, 21:16:22 UTC
77d5971	jsmall-nvidia	21 April 2020, 18:09:36 UTC	Small Improvements around Wave Intrinsics (#1328) * Fix issues in wave-mask/wave.slang tests. WaveGetActiveMask -> WaveGetConvergedMask. Update target-compatibility.md * First pass at wave-intrinsics.md documentation. Write up around WaveMaskSharedSync. * Added more of the Wave intrinsics as WaveMask intrinsics. Improvements to documentation around wave-intrinsics. * Add the Wave intrinsics for SM6.5 for WaveMask Expand WaveMask intrinsics Improve WaveMask documentation * Added WaveMaskIsFirstLane. * Added WaveGetConvergedMask for glsl and hlsl. Added wave-get-converged-mask.slang test. * WaveGetActiveMask/Multi and WageGetConvergedMask/Multi * Improve Wave intrinsics docs. Adde WaveGetActveMulti WaveGetConvergedMulti, WaveGetActiveMask (for vk/hlsl). * Enable GLSL WaveMultiPrefixBitAnd. * Re-add definitions of f16tof32 and f32to16 from #1326 * Remove multiple definition of f32tof16 Disable optix call to Ray trace test, if OPTIX not available. * Improve wave intrinsics documetnation - remove the __generic as part of definitions, small improvements. * Change comment to try and trigger build.	21 April 2020, 18:09:36 UTC
7de5f63	jsmall-nvidia	21 April 2020, 13:32:21 UTC	WaveMask remaining intrinsics and tests (#1327) * Fix issues in wave-mask/wave.slang tests. WaveGetActiveMask -> WaveGetConvergedMask. Update target-compatibility.md * First pass at wave-intrinsics.md documentation. Write up around WaveMaskSharedSync. * Added more of the Wave intrinsics as WaveMask intrinsics. Improvements to documentation around wave-intrinsics. * Add the Wave intrinsics for SM6.5 for WaveMask Expand WaveMask intrinsics Improve WaveMask documentation * Added WaveMaskIsFirstLane. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	21 April 2020, 13:32:21 UTC
6d4fa92	Tim Foley	20 April 2020, 23:24:49 UTC	Fix stdlib definitions of half<->float conversion (#1326) These ended up being additional cases where we needed to use an explicit loop over components in the stdlib in order to produce valid GLSL output, but the existing declarations weren't doing it. I added a very minimal cross-compilation test just to confirm that we generate valid SPIR-V for code using `f16tof32()`.	20 April 2020, 23:24:49 UTC
c4441d8	jsmall-nvidia	20 April 2020, 17:03:18 UTC	Feature/wave mask review (#1325) * Fix issues in wave-mask/wave.slang tests. WaveGetActiveMask -> WaveGetConvergedMask. Update target-compatibility.md * First pass at wave-intrinsics.md documentation. Write up around WaveMaskSharedSync. * Added more of the Wave intrinsics as WaveMask intrinsics. Improvements to documentation around wave-intrinsics.	20 April 2020, 17:03:18 UTC
acb1c39	Tim Foley	17 April 2020, 15:53:41 UTC	Add support for global shader parameters to OptiX path (#1323) There are two main pieces here. First, we specialize the code generaiton for CUDA kernels to account for the way that shader parameters are passed differently for ordinary compute kernels vs. ray-tracing kernels. Both global and entry-point shader parameters in Slang are translated to kernel function parameters for CUDA compute kernels, while for OptiX ray tracing kernels we need to use a global `__constant__` variable for the global parameters, and the SBT data (accessed via an OptiX API function) for entry-point shader parameters. This choice bakes in a few pieces of policy when it comes to how Slang ray-tracing shaders translate to OptiX: * It fixes the name used for the global `__constant__` variable for global shader parameters to be `SLANG_globalParams`. Since that name has to be specified when creating a pipeline with the OptiX API, the choice of name effectively becomes an ABI contract for Slang's code generation. * It fixes the choice that global parameters in Slang map to per-launch parameters in OptiX, and entry-point parameters in Slang map to SBT-backed parameters in OptiX. This is a reasonable policy, and it is also one that we are likely to need to codify for Vulkan as well, but it is always a bit unfortunate to bake policy choices like this into the compiler (especially when shaders compiled for D3D can often decouple the form of their HLSL/Slang code from how things are bound in the API). The second piece is a lot of refactoring of the logic in `render-test/cuda/cuda-compute-util.cpp`, so that the logic for setting up (and reading back) the buffers of parameter data can be shared between the compute and ray-tracing paths. The result may not be a true global optimum for how the code is organized, but it at least serves the goal of not duplicating the parameter-binding logic between compute and ray-tracing.	17 April 2020, 15:53:41 UTC
12b30af	jsmall-nvidia	16 April 2020, 22:25:53 UTC	Workaround for matching of dxc diagnostics (#1324) * Specialized handling for comparison of dxc output that ignores line/column numbers. * Simplify areAllEqualWithSplit.	16 April 2020, 22:25:53 UTC
b5a5317	jsmall-nvidia	16 April 2020, 13:57:38 UTC	Added wave.slang.expected.txt (#1322)	16 April 2020, 13:57:38 UTC
d5d3222	jsmall-nvidia	15 April 2020, 18:14:58 UTC	First support for 'WaveMask' intrinsics (#1321) * WIP tests to confirm divergence on CUDA. * Added wave.slang test that uses masks. Made all CUDA intrinsic impls take a mask explicitly. Added initial WaveMaskXXX intrinsics. * Added WaveMaskSharedSync. * Improvements aroung WaveMaskSharedSync/WaveMaskSync * Remove tabs.	15 April 2020, 18:14:58 UTC
fbac017	jsmall-nvidia	14 April 2020, 21:00:11 UTC	CUDA global scope initialization of arrays without function calls. (#1320) * Fix CUDA output of a static const array if values are all literals. * Fix bug in Convert definition. * Output makeArray such that is deconstructed on CUDA to fill in based on what the target type is. Tries to expand such that there are no function calls so that static const global scope definitions work. * Fix unbounded-array-of-array-syntax.slang to work correctly on CUDA. * Remove tabs. * Check works with static const vector/matrix. * Fix typo in type comparison. * Shorten _areEquivalent test. * Rename _emitInitializerList. Some small comment fixes. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	14 April 2020, 21:00:11 UTC
cbdee1b	Tim Foley	14 April 2020, 17:55:10 UTC	Fix front-end handling of generic static methods (#1319) * Fix front-end handling of generic static methods The front-end logic that was testing if a member was usable as a static member neglected to unwrap any generic-ness and look at the declaration inside (the parser currently puts all modifiers on the inner declaration instead of the outer generic). The test case included here is not a full compute test so that it only runs the front-end checking logic (where we had the bug). * fixup: tabs->spaces	14 April 2020, 17:55:10 UTC
79f6a01	Tim Foley	14 April 2020, 15:48:54 UTC	Change rules for layout of buffers/blocks containing only interface types (#1318) TL;DR: This is a tweak the rules for layout that only affects a corner case for people who actually use `interface`-type shader parameters (which for now is just our own test cases). The tweaked rules seem like they make it easier to write the application code for interfacing with Slang, but even if we change our minds later the risk here should be low (again: nobody is using this stuff right now). Slang already has a rule that a constant buffer that contains no ordinary/uniform data doesn't actually allocate a constant buffer `binding`/`register`: struct A { float4 x; Texture2D y; } // has uniform/ordinary data struct B { Texture2D u; SamplerState v; } // has none ConstantBuffer<A> gA; // gets a constant buffer register/binding ConstantBuffer<B> gB; // does not There is similar logic for `ParameterBlock`, where the feature makes more sense. A user would be somewhat surprised if they declared a parmaeter block with a texture and a sampler in it, but then the generating code reserved Vulkan `binding=0` for a constant buffer they never asked for. The behavior in the case of a plain `ConstantBuffer` is chosen to be consistent with the parameter block case. (Aside: all of this is a non-issue for targets with direct support for pointers, like CUDA and CPU. On those platforms a constant buffer or parameter block always translates to a pointer to the contained data.) Now, suppose the user declares a constant buffer with an interface type in it: interface IFoo { ... } ConstantBuffer<IFoo> gBuffer; When the layout logic sees the declaration of `gBuffer` it doesn't yet know what type will be plugged in as `IFoo` there. Will it contain uniform/ordinary data, such that a constant buffer is needed? The existing logic in the type layout step implemented a complicated rule that amounted to: * A `ConstantBuffer` or `cbuffer` that only contains `interface`/existential-type data will not be allocated a constant buffer `register`/`binding` during the initial layout process (on unspecialized code). That means that any resources declared after it will take the next consecutive `register`/`binding` without leaving any "gap" for the `ConstantBuffer` variable. * After specialization (e.g., when we know that `Thing` should be plugged in for `IFoo`), if we discover that there is uniform/ordinary data in `Thing` then we will allocate a constant buffer `register`/`binding` for the `ConstantBuffer`, but that register/binding will necessarily come after any `register`s/`binding`s that were allocated to parameters during the first pass. * Parameter blocks were intended to work the same when when it comes to whether or not they allocate a default `space`/`set`, but that logic appears to not have worked as intended. These rules make some logical sense: a `ConstantBuffer` declaration only pays for what the element type actually needs, and if that changes due to specialization then the new resource allocation comes after the unspecialized resources (so that the locations of unspecialized parameters are stable across specializations). The problem is that in practice it is almost impossible to write client application code that uses the Slang reflection API and makes reasonable choices in the presence of these rules. A general-purpose `ShaderObject` abstraction in application code ends up having to deal with multiple possible states that an object could be in: 1. An object where the element type `E` contains no uniform/ordinary data, and no interface/existential fields, so a constant buffer doesn't need to be allocated or bound. 2. An object where the element type `E` contains no uniform/ordinary data, but has interace/existential fields, with two sub-cases: a. When no values bound to interface/existential fields use uniform/ordinary dat, then the parent object must not bind a buffer b. When the type of value bound to an interface/existential field uses uniform/ordinary data, then the parent object needs to have a buffer allocated, and bind it. 3. When the element type `E` contains uniform/ordinary data, then a buffer should be allocated and bound (although its size/contents may change as interface/existential fields get re-bound) Needing to deal with a possible shift between cases (2a) and (2b) based on what gets bound at runtime is a mess, and it is important to note that even though both (2a) and (3) require a buffer to be bound, the rules about where the buffer gets bound aren't consistent (so that the application needs to undrestand the distinction between "primary" and "pending" data in a type layout). This change introduces a different rule, which seems to be more complicated to explain, but actually seems to simplify things for the application: * A `ConstantBuffer` or `cbuffer` that only contains `interface`/existential-type data always has a constant buffer `register`/`binding` allocated for it "just in case." * If after specialization there is any uniform/ordinary data, then that will use the buffer `register`/`binding` that was already allocated (that's easy enough). * If after speciazliation there isn't any uniform/ordinary data, then the generated HLSL/GLSL shader code won't declare a buffer, but the `register`/`binding` is still claimed. * A `ParameterBlock` behaves equivalently, so that if it contains any `interface`/existential fields, then it will always allocate a `space`/`set` "just in case" The effect of these rules is to streamline the cases that an application needs to deal with down to two: 1. If the element type `E` of a shader object contains no uniform/ordinary or interface/existential fields, then no buffer needs to be allocated or bound 2. If the element type `E` contains any uniform/ordinary or interface/existential fields, then it is always safe to allocate and bind a buffer (even in the cases where it might be ignored). Furthermore, the reflection data for the constant buffer `register`/`binding` becomes consistent in case (2), so that the application can always expect to find it in the same way.	14 April 2020, 15:48:54 UTC
b2c9fcc	jsmall-nvidia	13 April 2020, 16:34:20 UTC	Remove Not constant folding - because it doesn't take into account the type change. (#1317) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	13 April 2020, 16:34:20 UTC
4a5c606	Tim Foley	10 April 2020, 21:15:08 UTC	Fix CUDA build of render-test (#1316) The CUDA build of the render-test tool had been broken in a fixup change to #1307 (which was ostensibly adding features for the CUDA path). The fix is a simple one-liner.	10 April 2020, 21:15:08 UTC
2d16fcd	Tim Foley	10 April 2020, 16:20:36 UTC	Fix crashing bug when using overloaded name as generic arg (#1315) If somebody defines two `struct` types with the same name: ```hlsl struct A {} // ... struct A {} ``` and then tries to use that name when specializing a generic function: ```hlsl void doThing<T>() { ... } // ... doThing<A>(); ``` then the Slang front-end currently crashes, which leads to it not diagnosing the original problem (the conflicting declarations of `A`). This change fixes up the checking of generic arguments so that it properly fills in dummy "error" arguments in place of missing or incorrect arguments, and thus guarantees that the generic substitution it creates will at least be usable for the next steps of checking (rather than leaving null pointers in the substitution). This change also fixes up the error message for the case where a generic application like `F<A>` is formed where `F` is not a generic. We already had a more refined diagnostic defined for that case, but for some reason the site in the code where we ought to use it was still issuing an internal compiler error around an unimplemented feature. This chagne includes a diagnostic test case to cover both of the above fixes.	10 April 2020, 16:20:36 UTC
a01c09c	jsmall-nvidia	09 April 2020, 19:43:09 UTC	Literal folding on other operators (#1314) * Fold prefix operators if they prefix an int literal. * Make test case a bit more convoluted. * Remove ++ and -- as not appropriate for folding of literals. * Set output buffer name.	09 April 2020, 19:43:09 UTC
78acd32	Tim Foley	08 April 2020, 22:30:12 UTC	Replace /* unhandled / in source emit with a real error (#1313) For a long time the various source-to-source back-ends have been emitted the text `/ unhandled /` when they encounter an IR instruction opcode that didn't have any emit logic implemented. This choice had two apparent benefits: In most common cases, emitting `/* unhandled /` in place of an expression would lead to downstream compilation failure, so most catastrophic cases seemed to work as desired (e.g., if we emit `int i = / unhandled /;` we get a downstream parse error and we know something is wrong. In a few cases, if a dead-but-harmless instruction slips through (e.g., a type formed in the body of a specialized generic function), we would emit `/* unhandled /;`, which is a valid empty statement. It is already clear from the above write-up that the benefits of the policy aren't really that compelling, and where it has recently turned out to be a big liability is that there are actually plenty of cases where emitting `/ unhandled /` instead of a sub-expression won't cause downstream compilation failure, and will instead silently compute incorrect results: Emitting `/* unhandled / + b` instead of `a + b` Emitting `/* unhandled /(a)` instead of `f(a)`, or even `/ unhandled /(a, b)` instead of `f(a,b)` Emitting `f(/unhandled/)` instead of `f(a)` in cases where `f` is a built-in with both zero-argument and one-argument overloads The right fix here is simple: where we would have emitted `/* unhandled /` to the output we should instead diagnose an internal compiler error, thus leading to compilation failure. This change appears to pass all our current tests, but it is possible that there are going to be complicated cases in user code that were relying on the previous lax behavior. I know from experience that we sometimes see `/ unhandled */` in output for generics, and while we have eliminated many of those cases I don't have confidence we've dealt with them all. When this change lands we should make sure that the first release that incorporates it is marked as potentially breaking for clients, and we should make sure to test the changes in the context of the client codebases before those codebases integrate the new release.	08 April 2020, 22:30:12 UTC
6274e17	Tim Foley	08 April 2020, 20:57:24 UTC	Initial work to support OptiX output for ray tracing shaders (#1307) * Initial work to support OptiX output for ray tracing shaders This change represents in-progress work toward allowing Slang/HLSL ray-tracing shaders to be cross-compiled for execution on top of OptiX. The work as it exists here is incomplete, but the changes are incremental and should not disturb existing supported use cases. One major unresolved issue in this work is that the OptiX SDK does not appear to set an environment variable Changes include: * Modified the premake script to support new options for adding OptiX to the build. Right now the default path to the OptiX SDK is hard-coded because the installer doesn't seem to set an environment variable. We will want to update that to have a reasonable default path for both Windows and Unix-y platforms in a later chance. * I ran the premake generator on the project since I added new options, which resulted in a bunch of diffs to the Visual Studio project files that are unrelated to this change. Many of the diffs come from previous edits that added files using only the Visual Studio IDE rather than by re-running premake, so it is arguably better to have the checked-in project files more accurately reflect the generated files used for CI builds. * The "downstream compiler" abstraction was extended to have an explicit notion of the kind of pipeline that shaders are being compiled for (e.g., compute vs. rasterization vs. ray tracing). This option is used to tell the NVRTC case when it needs to include the OptiX SDK headers in the search path for shader compilation (and also when it should add a `#define` to make the prelude pull in OptiX). This code again uses a hard-coded default path for the OptiX SDK; we will need to modify that to have a better discovery approach and also to support an API or command-line override. * One note for the future is that instead of passing down a "pipeline type" we could instead pass down the list/set of stages for the kernels being compiled, and the OptiX support could be enabled whenever there is any ray tracing entry point present in a module. That approach would allow mixing RT and compute kernels during downstream compilation. We will need to revisit these choices when we start supporting code generation for multiple entry points at a time. * The CUDA emit logic is currently mostly unchanged. The biggest difference is that when emitting a ray-tracing entry point we prefix the name of the generated `__global__` function with a marker for its stage type, as required by the OptiX runtime (e.g., a `__raygen__` prefix is required on all ray-generation entry points). * The `Renderer` abstraction had a bare minimum of changes made to be able to understand that ray-tracing pipelines exist, and also that some APIs will require the name of each entry point along with its binary data in order to create a program. * The `ShaderCompileRequest` type was updated so that only a single "source" is supported (rather than distinct source for each entry point), and also the entry points have been turned into a single list where each entry identifies its stage instead of a fixed list of fields for the supported entry-point types. * The CUDA compute path had a lot of code added to support execution for the new ray-tracing pipeline type. The logic is mostly derived from the `optixHello` example in the OptiX SDK, and at present only supports running a single ray-generation shader with no parameters. The code here is not intended to be ready for use, but represents a signficiant amount of learning-by-doing. * The `slang-support.cpp` file in `render-test` was updated so that instead of having separate compilation logic for compute vs. rasterization shaders (which would mean adding a third path for ray tracing), there is now a single flow to the code that works for all pipeline types and any kind of entry points. * Implicit in the new code is dropping support for the way GLSL was being compiled for pass-through render tests, which means pass-through GLSL render tests will no longer work. It seems like we didn't have any of those to begin with, though, so it is no great loss. * Also implicit are some new invariants about how shaders without known/default entry points need to be handled. For example, the ray tracing case intentionally does not fill in entry points on the `ShaderCompileRequest` and instead fully relies on the Slang compiler's support for discovering and enumerating entry points via reflection. As a consequence of those edits the `-no-default-entry-point` flag on `render-test` is probably not working, but it seems like we don't have any test cases that use that flag anyway. Given the seemingly breaking changes in those last two bullets, I was surprised to find that all our current tests seem to pass with this change. If there are things that I'm missing, I hope they will come up in review. * fixup: issues from review and CI * Some issues noted during the review process (e.g., a missing `break`) * Fix logic for render tests with `-no-default-entry-point`. I had somehow missed that we had tests reliant on that flag. This required a bit of refactoring to pass down the relevant flag (luckily the function in question was already being passed most of what was in `Options`, so that just passing that in directly actually simplifies the call sites a bit. * There was a missing line of code to actually add the default compute entry points to the compile request. I think this was a problem that slipped in as part of some pre-PR refactoring/cleanup changes that I failed to re-test.	08 April 2020, 20:57:24 UTC
f38c082	Tim Foley	08 April 2020, 19:37:16 UTC	Fix expected output for dxc-error test. (#1312) I'm not sure how this slipped in, but I know that I missed this when testing all my recent PRs because I end up havign a bunch of random not-ready-to-commit repro tests in my source tree which means I always get at least some test failures and have to scan them for the ones that are real. Somehow I have had a blind spot for this one.	08 April 2020, 19:37:16 UTC
a53f817	Tim Foley	08 April 2020, 16:41:59 UTC	Fixes for IR generics (#1311) * Fixes for IR generics There are a few different fixes going on here (and a single test that covers all of them). 1. Fix optionality of trailing semicolon for `struct`s ====================================================== We have logic in the parser that tries to make a trailing `;` on a `struct` declaration optional. That logic is a bit subtle and couild potentially break non-idiomatic HLSL input, so we try to only trigger it for files written in Slang (and not HLSL). For command-line `slangc` this is based on the file extension (`.slang` vs. `.hlsl`), and for the API it is based on the user-specified language. The missing piece here was that the path for handling `import`ed code was not setting the source language of imported files at all, and so those files were not getting opted into the Slang-specific behavior. As a result, `import`ed code couldn't leave off the semicolon. 2. Fix generic code involving empty `interface`s ================================================ We have logic that tries to only specialize "definitions," but the definition-vs-declaration distinction at the IR level has historically been slippery. One corner case was that a witness table for an interface with no methods would always be considered a declaration, because it was empty. The notion of what is/isn't a definition has been made more nuanced so that it amounts to two main points: * If something is decorated as `[import(...)]`, it is not a definition * If something is a generic/func (a declaration that should have a body), and it has no body, it is a declaration Otherwise we consider anything a definition, which means that non-`[import(...)]` witness tables are now definitions whether or not they have anything in them. 3. Fix IR lowering for members of generic types =============================================== The IR lowering logic was trying to be a little careful in how it recurisvely emitted "all" `Decl`s to IR code. In particular, we don't want to recurse into things like function parameters, local variables, etc. since those can never be directly referenced by external code (they don't have linkage). The existing logic was basically emitting everything at global scope, and then only recursing into (non-generic) type declarations. This created a problem where a method declared inside a generic `struct` would not be emitted to the IR for its own module at all unless it happened to be called by other code in the same module. The fix here was to also recurse into the inner declaration of `GenericDecl`s. I also made the code recurse into any `AggTypeDeclBase` instead of just `AggTypeDecl`s, which means that members in `extension` declarations should not properly be emitted to the IR. Conclusion ========== These fixes should clear up some (but not all) cases where we might emit an `/* unhandled /` into output HLSL/GLSL. A future change will need to make that path a hard error and then clean up the remaining cases. fixup: tabs->spaces	08 April 2020, 16:41:59 UTC
a9214f3	jsmall-nvidia	08 April 2020, 14:56:00 UTC	Remove static struct members from layout and reflection (#1310) * * Added MemberFilterStyle - controls action of FilteredMemberList and FilteredMemberRefList * Splt out template implementations * Use more standard method names dofr FilteredMemberRefList * Added reflect-static.slang test * Added isNotEmpty/isEmpty to filtered lists * Added ability to index into filtered list (so not require building of array) * Default MemberFilterStyle to All. * Remove explicit MemberFilterStyle::All	08 April 2020, 14:56:00 UTC
ba232e4	Tim Foley	07 April 2020, 20:56:01 UTC	Fix a bug around generic functions using DXR RayQuery (#1309) The DXR `RayQuery` type is our first generic type defined in the stdlib that is marked as a target intrinsic but does not map to a custom `IRType` case. Instead, a reference to `RayQuery<T>` is encoded in the IR as an ordinary `specialize` instruction. Unfortunately, this doesn't play nice with the current specialization logic, which considered a `specialize` instruction to not represent a "fully specialized" instruction, which then inhibits specialization of generics/functions that use such an instruction. The fix here was to add more nuanced logic to the check for "fully specialized"-ness, so that it considers a `specialize` to already be fully specialized when the generic it applies to represents a target intrinsic. I also added a note that the whole notion of "fully specialized"-ness that we use isn't really the right thing for the specialization pass, and how we really ought to use a notion of what is or is not a "value." This change doesn't include a test because the only way to trigger the issue is using the DXR 1.1 `RayQuery` type, and that type is not supported in current release versions of DXC.	07 April 2020, 20:56:01 UTC
c5db04b	jsmall-nvidia	02 April 2020, 23:52:12 UTC	Fix WaveGetLaneIndex for glsl (#1306) * Fix typo in stdlib around WaveGetLaneIndex and WaveGetLaneCount * Reorder emit so #extensions come before layout * Added wave-get-lane-index.slang test.	02 April 2020, 23:52:12 UTC
00e1dba	jsmall-nvidia	02 April 2020, 21:06:16 UTC	Optimize creation of memberDictionary (#1305) * Improve performance of building members dictionary by adding when needed. * Fix unbounded-array-of-array-syntax.slang, that DISABLE_TEST now uses up an index. Use IGNORE_TEST. * Improve variable name. Small improvements. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	02 April 2020, 21:06:16 UTC
487d4a4	Tim Foley	02 April 2020, 15:52:42 UTC	Add basic support for namespaces (#1304) This change adds logic for parsing `namespace` declarations, referencing them, and looking up their members. * The parser changes are a bit subtle, because that is where we deal with the issue of "re-opening" a namespace. We kludge things a bit by re-using an existing `NamespaceDecl` in the same parent if one is available, and thereby ensure that all the members in the same namespace can see on another. * In order to allow namespaces to be referenced by name they need to have a type so that a `DeclRefExpr` to them can be formed. For this purpose we introduce `NamespaceType` which is the (singleton) type of a reference to a given namespace. * The new `NamespaceType` case is detected in the `MemberExpr` checking logic and routed to the same logic that `StaticMemberExpr` uses, and the static lookup logic was extended with support for looking up in a namespace (a thin wrapper around one of the existing worker routines in `slang-lookup.cpp`. * I made `NamespaceDecl` have a shared base class with `ModuleDecl` in the hopes that this would allow us to allow references to modules by name in the future. That hasn't been tested as part of this change. * I cleaned up a bunch of logic around `ModuleDecl` holding a `Scope` pointer that was being used for some of the more ad hoc lookup routines in the public API. Those have been switched over to something that is a bit more sensible given the language rules and that doesn't rely on keeping state sititng around on the `ModuleDecl`. * I added a test case to make sure the new funcitonality works, which includes re-opening a namespace, and it also tests both `.` and `::` operations for lookup in a namespace. * The main missing feature here is the ability to do something like C++ `using`. It would probably be cleanest if we used `import` for this, since we already have that syntax (and having both `import` and `using` seems like a recipe for confusion). Most of the infrastructure is present to support `import`ing one namespace into another (in a way that wouldn't automatically pollute the namespace for clients), but some careful thought needs to be put into how import of namespaces vs. modules should work.	02 April 2020, 15:52:42 UTC
5e73e98	jsmall-nvidia	31 March 2020, 18:06:34 UTC	Improve diagnostic parsing from GCC. (#1303) Enable x86_64 CPU tests on TC.	31 March 2020, 18:06:34 UTC
ea76905	jsmall-nvidia	30 March 2020, 23:23:09 UTC	CUDA version handling (#1301) * render feature for CUDA compute model. * Use SemanticVersion type. * Enable CUDA wave tests that require CUDA SM 7.0. Provide mechanism for DownstreamCompiler to specify version numbers. * Enabled wave-equality.slang * Make CUDA SM version major version not just a single digit. * Fix assert. * DownstreamCompiler::Version -> CapabilityVersion	30 March 2020, 23:23:09 UTC
ad5b60c	Tim Foley	30 March 2020, 19:47:43 UTC	Add a test for static const declarations in structure types (#1300) The functionality already appears to work, and this test is just to make sure we don't regress on it. The most interesting thing here is that I'm using this change to pitch a new organization for tests around what part of the language they cover (rather than the kind of test they are), since the `tests/compute/` directory is getting overly full and is hard to navigate. We can consider moving individual tests into more of a hierarchy at some later point.	30 March 2020, 19:47:43 UTC
6f43b26	jsmall-nvidia	27 March 2020, 22:35:06 UTC	WaveBroadcastAt/WaveShuffle (#1299) * Support for WaveReadLaneAt with dynamic (but uniform across Wave) on Vk by enabling VK1.4. Fixed wave-lane-at.slang test to test with laneId that is uniform across the Wave. * Added WaveShuffle intrinsic. Test for WaveShuffle intrinsic. * Added some documentation on WaveShuffle * Fix that version required for subgroupBroadcast to be non constexpr is actually 1.5 * Added WaveBroadcastLaneAt Documented WaveShuffle/BroadcastLaneAt/ReadLaneAt * Update docs around WaveBroadcast/Read/Shuffle. Use '_waveShuffle` as name in CUDA prelude to better describe it's more flexible behavior.	27 March 2020, 22:35:06 UTC
e267ce2	jsmall-nvidia	27 March 2020, 20:16:27 UTC	Adds WaveShuffle intrinsic (#1298) * Support for WaveReadLaneAt with dynamic (but uniform across Wave) on Vk by enabling VK1.4. Fixed wave-lane-at.slang test to test with laneId that is uniform across the Wave. * Added WaveShuffle intrinsic. Test for WaveShuffle intrinsic. * Added some documentation on WaveShuffle * Fix that version required for subgroupBroadcast to be non constexpr is actually 1.5	27 March 2020, 20:16:27 UTC
5b0b843	jsmall-nvidia	27 March 2020, 18:49:41 UTC	Support for WaveReadLaneAt with dynamic (but uniform across Wave) on Vk by enabling VK1.4. (#1297) Fixed wave-lane-at.slang test to test with laneId that is uniform across the Wave.	27 March 2020, 18:49:41 UTC
cc753f3	jsmall-nvidia	26 March 2020, 13:35:35 UTC	Disable CPU tests on TC. (#1295) Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>	26 March 2020, 13:35:35 UTC

Newer
Older