Update on signatures, named params, no-snails, ...

Discussion:

(too old to reply)

Paul "LeoNerd" Evans

2024-11-18 17:17:32 UTC

Work on my branch[1] continues. I believe I now have a roughly
four-step plan that ends in higher-performance signatures everywhere,
and support for named parameters. It's four steps, because the first
two steps are needed just to shuffle a bunch of things around into a
better state to then add these things later. Those first two stages
don't appear to give any immediate wins, and without the explanation of
"it's needed for later stuff", would at first glance appear to make
things worse. This being the reason I'm explaining this all - hopefully
it helps those first two to make sense.

My plan for these four steps is:

1) Create OP_MULTIPARAM as a rewrite-optimiser step

As discussed in previous emails, this is initially an optimisation
rewrite in the same style as OP_MULTIDEREF and OP_MULTICONCAT,
which collapses a bunch of existing OP_ARG* ops together into a
single op so it runs a bit faster. Crucially it also holds all the
logic in one place in one op, as will be needed for step 3 below.

2) Change the parser to emit OP_MULTIPARAM directly

Once the implementation in pp_multiparam is found to be nicely
sufficient and working, it's time to change the parser
implementation. Rather than having the in-core parser create and
emit the older style OP_ARG* ops only to immediately get rewritten,
it might as well accumulate all the information it needs then emit
a single OP_MULTIPARAM directly. This saves some of the
inefficiencies of performing that two-stage rewrite, and is also
necessary for step 4 below.

3) Add CVf_NOSNAIL and the no-snails entersub performance improvement

This is the step that finally achieves the long-awaited
performance improvement in signatured subs. The overall idea is
that pp_entersub does not need to bother copying the caller
arguments off the stack into the @_ array (called the "snail"
array, hence the title of this step). Instead, those arguments can
just be left on the stack, and pp_multiparam can consume them
directly from there. By removing a whole stage of argument
copying, as well as all the fiddling with another depth of the @_
array in the first place, lots of unnecessary work can be removed,
with the user-visible upshot being that calling a sub that uses
signatures is now faster than it used to be. Quite how much, I
don't yet have figures on, and will need to be measured once I get
to it.

The reason this step depended on step 1 is that consuming the
arguments directly off the stack is a bit more subtle than just
popping them off (primarily because they need to be handled in
left-to-right order, so they're not being popped, but handled
inplace in order). This logic is all much easier to handle if it
all lives entirely within one pp function, rather than being split
across many in the existing pp_argcheck + N * pp_argelem approach.

4) Add named parameters to parser + OP_MULTIPARAM

Finally, with all this done, it will be possible to get the parser
to recognise the additional syntax of named parameters, and to add
information into the OP_MULTIPARAM structure to implement them.
Because these don't exist as individual ops (and it wouldn't make
sense to have them as individual ops anyway), this required the
parser to be able to emit the OP_MULTIPARAM directly, hence
depending on step 2. This step doesn't *strictly* depend on step
3, but if it's done before implementing no-snails, it just means
the work of converting to no-snails now has a lot more code to work
with. Doing it in this order means less having to write new code
only to change it shortly after.

As there are four steps here, each of which are quite large work in
their own, I think I would like to submit a PR for each step
individually, rather than try to combine them all together in one
super-huge lump at the end. This makes them somewhat easier to
review.

The main reason I'm writing this email therefore, is to point out that
this is the plan. This hopefully helps when reviewing the first two PRs,
as it gives some description on the overall direction and reason for
the changes. Otherwise, step 1 in particular would appear rather
pointless on its own, without this wider context. I'll point back at
this email thread in the PR descriptions themselves.

So far in all this plan I have put almost zero thought into how it
might interact with other CPAN modules, or even core's own 'class'
feature. It's possible some details will need a bit of adjusting or at
worst entirely rethinking in order to keep these all working.

Offhand, the things I can think of that would interact are:

core's feature 'class'

XS::Parse::Sublike, and hence:

Signature::Attribute::Alias
Signature::Attribute::Checked

Object::Pad

Future::AsyncAwait

All of those were created by me, so I am in a good place to think about
the relative trade-offs and details involved. It's entirely possible
that there could even be CPAN modules out there /not/ written by me,
that likewise would have an impact. If anyone knows about any I would
love to hear about them as then I can keep them in mind for my plans
too, and ensure not to break things. I can't guarantee not to break
modules I don't know about though.. ;)

[1]: My work-in-progress branch:
https://github.com/leonerd/perl5/tree/faster-signatures

--
Paul "LeoNerd" Evans

***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

Paul "LeoNerd" Evans

2024-11-18 17:51:48 UTC

Permalink

On Mon, 18 Nov 2024 17:17:32 +0000

Post by Paul "LeoNerd" Evans
1) Create OP_MULTIPARAM as a rewrite-optimiser step
As discussed in previous emails, this is initially an
optimisation rewrite in the same style as OP_MULTIDEREF and
OP_MULTICONCAT, which collapses a bunch of existing OP_ARG* ops
together into a single op so it runs a bit faster. Crucially it also
holds all the logic in one place in one op, as will be needed for
step 3 below.

There is one small detail about this rewrite that occurs to me is
*technically* a user-visible change, but the details and circumstances
around are so obscure it makes me feel like it's one we don't need to
care about.

This concerns our dear friend, `get` magic on scalars (often visible to
Perl code as FETCH on tied scalars) and its interaction with defaulting
expressions in parameters.

Right now in the existing signatures implementation, and also in my new
plan, any `get` magic on argument values is evaluated strictly
left-to-right in the order the caller passed the values in. That won't
change. It is also the case for both existing and new implementations,
any defaulting expressions defined on parameters are also evaluated in
left-to-right order of how the parameter variables were written in the
subroutine.

However, in the current implementation, all processing for each
successive argument is handled together, before moving on to the next
one. Assign its value if the caller passed one in, or if not, evaluate
the defaulting expression before moving to the next. This means that if
any caller-passed argument value had `get` magic, that argument value's
magic is invoked before any processing of the next argument, even to
evaluate its defaulting expression. That is, while both argument values
and defaulting expressions are handled left-to-right within themselves,
they are all interleaved together.

Consider this little time graph of the sequence that things may happen,
if we pass three argument values that all have `get` magic, into a
function that defines three parameters that all have defaulting
expressions. Time goes ---> that way.

get-ARG1 get-ARG2 get-ARG3
eval-PARAM1 eval-PARAM2 eval-PARAM3

In the new implementation, all the handling of caller-passed values and
assigning them into variables will happen *first*, before any defaulting
expressions are evaluated. This means in the new approach, *every*
`get` magic on every caller-passed argument will have been invoked,
before *any* of the parameter defaulting expressions get to run.

Now our timing sequence will look like:

get-ARG1 get-ARG2 get-ARG3
eval-PARAM1 eval-PARAM3 eval-PARAM3

This technically makes a difference because now even the defaulting
expression for parameter 1 happens *after* any side-effects of the
`get` magic on arguments 2 and 3, whereas in the old code it would not
have. We've preserved the evaluation order of `get`s, and the evaluation
order of the param expressions, just changed the order they interleave
in.

"But wait", I hear you cry - "it surely doesn't matter, because we only
run defaulting expressions for parameters where the caller didn't even
pass in an argument value? We'd only run the expression for param1 if
there wasn't even an arg1 present, let alone arg2 or arg3. If those
args aren't present, they definitely don't have `get` magic on them. So
it can't matter."

Aha, this isn't quite true though. As well as `=` for defaulting
expressions, don't forget that we have `//=` (and `||=`) as well. It
could be the case that arg1 is present but undefined, and then arg2 has
`get` magic that has some side-effect. In the new implementation, the
side-effect of arg2's `get` magic will run before param1's defaulting
expression gets to run, whereas in the existing one, it does not.

All in all, this obscure corner-case requires three things:

1) argument values that have `get` magic with side-effects
2) parameter defaulting expressions that are sensitive to the
side-effects of the argument values that come *AFTER* their
corresponding value
3) these parameter expressions use the `//=` or `||=` operator

Owing to the fact that this is a pretty rare possibility of
circumstances, and you'd have to be doing some pretty weird things in
both your `get` magic and your param defaulting expression to even
notice it, and that the visible outcome is simply a change in order of
the `get` magic's side-effects, I vote we say we just don't care about
it. It's a quirk of the implementation that users should not rely on.

If anyone has a different opinion here, please let me know and supply a
compelling counter-example that demonstrates a situation that we should
in fact care about. Otherwise, I will continue implementing in this
direction.

--
Paul "LeoNerd" Evans

***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

Paul "LeoNerd" Evans

2025-01-20 21:35:53 UTC

Permalink

On Mon, 18 Nov 2024 17:17:32 +0000

Post by Paul "LeoNerd" Evans
Work on my branch[1] continues.

...

Post by Paul "LeoNerd" Evans
1) Create OP_MULTIPARAM as a rewrite-optimiser step
As discussed in previous emails, this is initially an
optimisation rewrite in the same style as OP_MULTIDEREF and
OP_MULTICONCAT, which collapses a bunch of existing OP_ARG* ops
together into a single op so it runs a bit faster. Crucially it
also holds all the logic in one place in one op, as will be
needed for step 3 below.

...

Post by Paul "LeoNerd" Evans
https://github.com/leonerd/perl5/tree/faster-signatures

This branch has now progressed to the point that it passes all the core
tests even with the rewrites now being unconditional (I've removed the
temporary named feature for it). I did have to change a couple of
unit-tests relating to signatures, but overall it seems good.

I've installed it locally and tested it against a bunch of my
signature-using code, and so far no bad effects. I'd be interested to
hear results from anyone else testing it out too.

** Folks: Help me test this please. **

One thing I haven't fully worked out yet is the interaction between
these things and OP_METHSTART used by `method` subs. I have the start
of an experimental solution in Object::Pad but I need to continue
working on that, and see how to make it more of a core shape. I suspect

Post by Paul "LeoNerd" Evans
2) Change the parser to emit OP_MULTIPARAM directly
Once the implementation in pp_multiparam is found to be nicely
sufficient and working, it's time to change the parser
implementation. Rather than having the in-core parser create and
emit the older style OP_ARG* ops only to immediately get
rewritten, it might as well accumulate all the information it
needs then emit a single OP_MULTIPARAM directly. This saves some
of the inefficiencies of performing that two-stage rewrite, and
is also necessary for step 4 below.

And, the more I think about it, is at least related to a better
solution for OP_METHSTART and `$self` handling in method subs. This one
will relate to a fair bit of adjustment in the way perly.y parses sub
signatures, and I think will warrant its own message thread, so I'll
write that up probably tomorrow.

--
Paul "LeoNerd" Evans

***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS