Paul "LeoNerd" Evans
2025-01-17 20:33:46 UTC
(CC'ed to Dave M in particular, because there's a lot of overlap with
some signatures unit tests I believe you wrote)
After a bit of a Christmas-time pause, I am now back and working on
finishing my `faster-signatures` branch, the first of the 4 steps to
hopefully get us to named parameters in signatures, plus overall
performance benefits to all signatured code.
My branch now attempts to unconditionally apply the OP_MULTIPARAM
rewrite to any signatured sub, and in doing so manages to break a few of
the t/perf/opcount.t tests (which is entirely to be expected), and in
addition breaks a few of the tests in t/op/signatures.t. Aside from
these, all of core perl passes just fine - though that *may* just be
because we don't use signatures very much even within core. Perhaps
against CPAN there'd be additional issues, but hopefully nothing too
major I can't work out.
The core tests that break here are the entire range of t150 to t162 (I
won't link to exact line numbers as they may move in future):
https://github.com/Perl/perl5/blob/blead/t/op/signatures.t
The scenarios that fail are all to do with what happens if code in a
defaulting expression attempts to modify some other state somewhere,
and fall into a few different categories.
* Modifications of @_
Scenarios t150, t151 and the range t156 to t159 all attempt some
variation of modifying @_ during processing arguments, in order to
see that later parameter variables that would continue to read from
@_ indeed see that modification.
* Modifications of (lexical) parameter variables by captured closures
Scenarios t152 to t155 all perform a weird trick, whereby a named
sub with a signature defines within it another named sub that
captures one of those lexical variables defined in the signature of
the outer one. That inner function modifies the captured lexical
variable in some way. This inner function is called as a defaulting
expression by an earlier parameter of its outer one *before* the
parameter that the inner one captured is processed.
sub outer ( $x = inner(), $y ) {
sub inner { $y = ... }
# test logic here
}
((yes; it's a nontrivial scenario to imagine, you'd best go read
the code in the real unit test file to be clear ;) ))
* Reuse of values in captured parameter variables from lexical
closures
Scenarios t160 and t161 perform an even weirder trick. Here, an
outer signatured function is defined, which contains inside it
another named function which lexically captures one of the
parameter variables. The inner function modifies that variable,
then calls the outer function by passing in elements of that
parameter variable, in order to test reĆ¼se of values into that
variable by its own signature handling.
((again, this is really weird; go read the code ;) ))
Currently in my branch, all three of these categories of tests are
failing. In each case, I really don't feel that a lot of effort should
be put into trying to fix it.
The entire point of the OP_MULTIPARAM change is to get to the no-snails
situation where we're no longer populating @_ at all, so any tests for
the interaction of @_ with signatures are necessarily going to become
broken. I vote we just get rid of those tests in that first category.
The tests in the second and third categories are *already* doing
something weird that ought to provoke a warning anyway - namely, having
an outer (named) function that declares a lexical variable that's then
captured by an inner named function. It's analogous to:
$ perl -Mv5.36 -e 'sub outer { my $var; sub inner { undef $var; } }'
Variable "$var" will not stay shared at -e line 1.
The behaviour seen in any of the unit tests in these second two
categories isn't repeatable in general in a real program, because of
that "not stay shared" problem. In more detail: the problem is that the
inner function only captures the $var from outer at CvDEPTH==1; i.e.
the first call. If any deeper recursion happens, then it won't see
those different variables. So quite apart from the fact that I've
changed behaviour, I'm not entirely sure that tests are asserting on
behaviour we'd want to claim is stable anyway.
In more detail on why they fail: tests in the second category (t152 to
t155) fail because of the changed order of argument processing. I
accidentally foreshadowed this in one of my previous emails:
https://www.nntp.perl.org/group/perl.perl5.porters/2024/11/msg269155.html
The test, as written, uses the side-effect of a function invoked in the
defaulting expression of the first parameter variable to cause a
side-effect of populating the second parameter variable (which is a
slurpy array or hash). The test then asserts that this variable gets
emptied again because of parameter processing. That's because in the
currently-existing signature handling code, each parameter variable is
processed individually from start to end. In my new OP_MULTIPARAM code,
all of the assignments from argument values happen first, and only then
afterwards are defaulting expressions run. This means that in the new
code, the side-effect on that variable persists, and the asserting code
does not see it as empty, as it had been expecting. This causes the
test to now fail.
Tests in the third category (t160 and t161) not only rely on this same
inverted lexical sharing with inner defined named subs, but also are
testing refcounts-on-the-stack behaviour. The tests currently fail for
me now but they would pass with a refcounted stack. I could insert some
code to artificially bump the refcount of those stack values if RC
stack is not defined, but this code would only be useful for working
around this *really really odd* inverted nesting case, and are just
pointless slowdown in the 99.9{many more nines}% of all other cases.
At this point, I don't have a good feel for what is the best course of
action on the latter two test categories. Those tests ought not
outright break (currently some of them throw Perl internals errors and
I need to fix that), but I think the exact semantics that are currently
being asserted on are quite fragile and not reflective of any
guarantees we'd actually want to make for real-world code.
How does anyone else feel on these?
some signatures unit tests I believe you wrote)
After a bit of a Christmas-time pause, I am now back and working on
finishing my `faster-signatures` branch, the first of the 4 steps to
hopefully get us to named parameters in signatures, plus overall
performance benefits to all signatured code.
My branch now attempts to unconditionally apply the OP_MULTIPARAM
rewrite to any signatured sub, and in doing so manages to break a few of
the t/perf/opcount.t tests (which is entirely to be expected), and in
addition breaks a few of the tests in t/op/signatures.t. Aside from
these, all of core perl passes just fine - though that *may* just be
because we don't use signatures very much even within core. Perhaps
against CPAN there'd be additional issues, but hopefully nothing too
major I can't work out.
The core tests that break here are the entire range of t150 to t162 (I
won't link to exact line numbers as they may move in future):
https://github.com/Perl/perl5/blob/blead/t/op/signatures.t
The scenarios that fail are all to do with what happens if code in a
defaulting expression attempts to modify some other state somewhere,
and fall into a few different categories.
* Modifications of @_
Scenarios t150, t151 and the range t156 to t159 all attempt some
variation of modifying @_ during processing arguments, in order to
see that later parameter variables that would continue to read from
@_ indeed see that modification.
* Modifications of (lexical) parameter variables by captured closures
Scenarios t152 to t155 all perform a weird trick, whereby a named
sub with a signature defines within it another named sub that
captures one of those lexical variables defined in the signature of
the outer one. That inner function modifies the captured lexical
variable in some way. This inner function is called as a defaulting
expression by an earlier parameter of its outer one *before* the
parameter that the inner one captured is processed.
sub outer ( $x = inner(), $y ) {
sub inner { $y = ... }
# test logic here
}
((yes; it's a nontrivial scenario to imagine, you'd best go read
the code in the real unit test file to be clear ;) ))
* Reuse of values in captured parameter variables from lexical
closures
Scenarios t160 and t161 perform an even weirder trick. Here, an
outer signatured function is defined, which contains inside it
another named function which lexically captures one of the
parameter variables. The inner function modifies that variable,
then calls the outer function by passing in elements of that
parameter variable, in order to test reĆ¼se of values into that
variable by its own signature handling.
((again, this is really weird; go read the code ;) ))
Currently in my branch, all three of these categories of tests are
failing. In each case, I really don't feel that a lot of effort should
be put into trying to fix it.
The entire point of the OP_MULTIPARAM change is to get to the no-snails
situation where we're no longer populating @_ at all, so any tests for
the interaction of @_ with signatures are necessarily going to become
broken. I vote we just get rid of those tests in that first category.
The tests in the second and third categories are *already* doing
something weird that ought to provoke a warning anyway - namely, having
an outer (named) function that declares a lexical variable that's then
captured by an inner named function. It's analogous to:
$ perl -Mv5.36 -e 'sub outer { my $var; sub inner { undef $var; } }'
Variable "$var" will not stay shared at -e line 1.
The behaviour seen in any of the unit tests in these second two
categories isn't repeatable in general in a real program, because of
that "not stay shared" problem. In more detail: the problem is that the
inner function only captures the $var from outer at CvDEPTH==1; i.e.
the first call. If any deeper recursion happens, then it won't see
those different variables. So quite apart from the fact that I've
changed behaviour, I'm not entirely sure that tests are asserting on
behaviour we'd want to claim is stable anyway.
In more detail on why they fail: tests in the second category (t152 to
t155) fail because of the changed order of argument processing. I
accidentally foreshadowed this in one of my previous emails:
https://www.nntp.perl.org/group/perl.perl5.porters/2024/11/msg269155.html
There is one small detail about this rewrite that occurs to me is
*technically* a user-visible change, but the details and
circumstances around are so obscure it makes me feel like it's one
we don't need to care about.
it goes on...*technically* a user-visible change, but the details and
circumstances around are so obscure it makes me feel like it's one
we don't need to care about.
The test, as written, uses the side-effect of a function invoked in the
defaulting expression of the first parameter variable to cause a
side-effect of populating the second parameter variable (which is a
slurpy array or hash). The test then asserts that this variable gets
emptied again because of parameter processing. That's because in the
currently-existing signature handling code, each parameter variable is
processed individually from start to end. In my new OP_MULTIPARAM code,
all of the assignments from argument values happen first, and only then
afterwards are defaulting expressions run. This means that in the new
code, the side-effect on that variable persists, and the asserting code
does not see it as empty, as it had been expecting. This causes the
test to now fail.
Tests in the third category (t160 and t161) not only rely on this same
inverted lexical sharing with inner defined named subs, but also are
testing refcounts-on-the-stack behaviour. The tests currently fail for
me now but they would pass with a refcounted stack. I could insert some
code to artificially bump the refcount of those stack values if RC
stack is not defined, but this code would only be useful for working
around this *really really odd* inverted nesting case, and are just
pointless slowdown in the 99.9{many more nines}% of all other cases.
At this point, I don't have a good feel for what is the best course of
action on the latter two test categories. Those tests ought not
outright break (currently some of them throw Perl internals errors and
I need to fix that), but I think the exact semantics that are currently
being asserted on are quite fragile and not reflective of any
guarantees we'd actually want to make for real-world code.
How does anyone else feel on these?
--
Paul "LeoNerd" Evans
***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS
Paul "LeoNerd" Evans
***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS