swh:1:snp:af87cd67498ef4fe47c76ed3e7caffe5b61facaf
Revision d56150aae496979dab2ba82105adcaecc555fdcc authored by Vincenzo Eduardo Padulano on 03 October 2023, 11:16:49 UTC, committed by Vincenzo Eduardo Padulano on 03 October 2023, 17:15:19 UTC
This is a reproducer test for some sporadic CI failures, e.g.

```python
========================================================================== FAILURES ===========================================================================
_______________________________________________________ TestDefinePerSample.test_definepersample_simple _______________________________________________________

self = <check_definepersample.TestDefinePerSample object at 0x13e0c6190>, connection = <Client: 'tcp://127.0.0.1:55253' processes=2 threads=2, memory=4.00 GiB>

    def test_definepersample_simple(self, connection):
        """
        Test DefinePerSample operation on three samples using a predefined
        string of operations.
        """

        df = Dask.RDataFrame(self.maintreename, self.filenames, daskclient=connection)

        # Associate a number to each sample
        definepersample_code = """
        if(rdfsampleinfo_.Contains(\"{}\")) return 1;
        else if (rdfsampleinfo_.Contains(\"{}\")) return 2;
        else if (rdfsampleinfo_.Contains(\"{}\")) return 3;
        else return 0;
        """.format(*self.samples)

        df1 = df.DefinePerSample("sampleid", definepersample_code)

        # Filter by the sample number. Each filtered dataframe should contain
        # 10 entries, equal to the number of entries per sample
        samplescounts = [df1.Filter("sampleid == {}".format(id)).Count() for id in [1, 2, 3]]

        for count in samplescounts:
>           assert count.GetValue() == 10
E           AssertionError

check_definepersample.py:62: AssertionError
-------------------------------------------------------------------- Captured stderr setup --------------------------------------------------------------------
RDataFrame::Run: event loop was interrupted
2023-09-08 14:51:57,002 - distributed.worker - WARNING - Compute Failed
Key:       dask_mapper-a92ac090-9407-4849-921a-d187ceffd3ed
Function:  dask_mapper
args:      (EmptySourceRange(exec_id=ExecutionIdentifier(rdf_uuid=UUID('5d67c0a7-58f4-488d-8e44-bb5aa0fac480'), graph_uuid=UUID('69353465-0a90-4eef-b101-a1eb93f0c13a')), id=0, start=0, end=50))
kwargs:    {}
Exception: "RuntimeError('C++ exception thrown:\\n\\truntime_error: Graph was applied to a mix of scalar values and collections. This is not supported.')"
```

Which is due to Dask assigning two tasks to the same worker for the test with
the DefinePeSample calls. The Count operation would fail to report the correct
amount of entries due to the fact that the DefinePerSample callback was
previously deleted at the end of every event loop, specifically at the end of
the first task's event loop. Consequently, when the second task starts and it
picks up the same RDataFrame to clone the action, the DefinePerSample would
never be actually called.
1 parent eb911c6
History
Tip revision: 6c9118fb23c981c28a53dc215c68f2be00c04e3e authored by Jonas Rembser on 12 April 2024, 19:22:15 UTC
[RF] Enable `roofit_multiprocess` on the CI
Tip revision: 6c9118f

README.md

back to top