Revision - 1b89f78 - Capabilities System, CapabilitySet Logic Overhaul (#4145)

Revision 1b89f78cd1762aa08402bd656e807b66833b11d0 authored by ArielG-NV on 16 May 2024, 04:04:12 UTC, committed by GitHub on 16 May 2024, 04:04:12 UTC

Capabilities System, CapabilitySet Logic Overhaul (#4145)

* Capabilities System, Backing Logic Overhaul

Fixes #4015

Problems to address:
1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster.
2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging.

What was addressed:
1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below.
Design:
* `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation.
* `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched.
* `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched.
* `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations.

2. Errors
* Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors.
* Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility.

What is missing:
1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul.
2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint.

Notes:
* UIntSet had implementation bugs which were fixed in this PR.
* The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation.

* fix .natvis debug view

* Small optimizations I found while working on the addition

the AST building pass looks like so now:
1% = ~capabilitySet
2% = capabilitySet()
1.5% capabilitySet::unionWith()
0.8% capabilitySet::join()
1.5% auxillary info for debugging
~0.5-1% extra visitor overhead

~5% total for the visitor
~6.5% for total runtime costs

* fix caps which were wrong but worked

* push minor syntax fix (still looking for why other tests fail)

* perf & bug fixes

1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario.
2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code.

NOTE: did not address review comments

* fix glsl.meta caps error

* fixing findBest logic again & UIntSet wrapper

findBest was not checking for 'more specialized' targets & was element counter was flawed

* faster getElements algorithm + natvis for UIntSet + wrong warning

* type incompatability of bitscanForward implementations

* try to fix warnings again

* remove ptr for clang intrinsic

* add missing header

* ifdef to allow clang compile

* compiler hackery to fix up platform/type independent operations

* bracket

* fix MSVC error

* missing template

* change types out again

* changes to fix compiling

* adjustment to parameter for Clang/GCC

* added iterator to delay processing all atomSets of a CapabilitySet

* add a few missing consts's

* ensure we never have more than 1 disjointSet

Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons:
1. We still need 0-1 set functionality.
2. Might as well keep the code, just disallow the problematic functionality.

* address review comments

non linked-list refactor review comments addressed; add doc comments + remove redundant code

* comments + remove isValid for bool operator

* push removal of linkedlist for capabilities

* add missing break

* address review comments

minor adjustments of syntax

* push a fix to the `CapabilitySet({shader, missing target})` code

* quality + error

1. add iterator to UIntSet
2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400)

* fix target_switch erroring + temporarily remove UIntSet::Interator

temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator

* fix the UIntSet iterator

* Revert "fix the UIntSet iterator" temporarily to pull from master

* add metal error as per texture.slang

(took a while I realize this was why things were breaking, likely should adjust errors to reflect this)

* Rework UIntSet to have a template for output type

This is done so it is reasonable to debug the iterator output and not just dealing with messy int's

Fix problems with the iterators implemented + invalid capabilities handling

* removed incorrect `__target_switch` capability

barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile.

* remove some uses of getElements

* remove undeclared_stage for now

* remove redundant code associated with `undeclared_stage`

* remove unused variable

* address review

specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate

* move GLSL_150 capdef change to sm_4_1 (more accurate)

* address most review comments

did not address: https://github.com/shader-slang/slang/pull/4145#discussion_r1602256776

* revert incorrect code review suggestion

* push changes for all code review suggestions

1 parent 3b0de8b

Files
Changes

Permalinks

File	Mode	Size
.github
build
cmake
deps
docs
examples
external
extras
prelude
source
tests
tools
.editorconfig	-rw-r--r--	984 bytes
.gitattributes	-rw-r--r--	95 bytes
.gitignore	-rw-r--r--	1.5 KB
.gitmodules	-rw-r--r--	1.2 KB
.mailmap	-rw-r--r--	84 bytes
CMakeLists.txt	-rw-r--r--	22.1 KB
CMakePresets.json	-rw-r--r--	4.0 KB
CODE_OF_CONDUCT.md	-rw-r--r--	3.1 KB
CONTRIBUTION.md	-rw-r--r--	10.1 KB
LICENSE	-rw-r--r--	1.1 KB
README.md	-rw-r--r--	8.5 KB
github_build.sh	-rw-r--r--	1.6 KB
github_macos_build.sh	-rw-r--r--	1.3 KB
make-slang-tag-version.bat	-rw-r--r--	210 bytes
premake.bat	-rw-r--r--	120 bytes
premake5.lua	-rw-r--r--	66.1 KB
slang-com-helper.h	-rw-r--r--	4.9 KB
slang-com-ptr.h	-rw-r--r--	5.0 KB
slang-gfx.h	-rw-r--r--	88.2 KB
slang-tag-version.h	-rw-r--r--	36 bytes
slang.h	-rw-r--r--	208.6 KB
slang.sln	-rw-r--r--	51.5 KB
test.bat	-rw-r--r--	1.4 KB
test.ps1	-rw-r--r--	154 bytes

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

Capabilities System, CapabilitySet Logic Overhaul (#4145)

README.md