Paul "LeoNerd" Evans
2024-11-18 17:17:32 UTC
Work on my branch[1] continues. I believe I now have a roughly
four-step plan that ends in higher-performance signatures everywhere,
and support for named parameters. It's four steps, because the first
two steps are needed just to shuffle a bunch of things around into a
better state to then add these things later. Those first two stages
don't appear to give any immediate wins, and without the explanation of
"it's needed for later stuff", would at first glance appear to make
things worse. This being the reason I'm explaining this all - hopefully
it helps those first two to make sense.
My plan for these four steps is:
1) Create OP_MULTIPARAM as a rewrite-optimiser step
As discussed in previous emails, this is initially an optimisation
rewrite in the same style as OP_MULTIDEREF and OP_MULTICONCAT,
which collapses a bunch of existing OP_ARG* ops together into a
single op so it runs a bit faster. Crucially it also holds all the
logic in one place in one op, as will be needed for step 3 below.
2) Change the parser to emit OP_MULTIPARAM directly
Once the implementation in pp_multiparam is found to be nicely
sufficient and working, it's time to change the parser
implementation. Rather than having the in-core parser create and
emit the older style OP_ARG* ops only to immediately get rewritten,
it might as well accumulate all the information it needs then emit
a single OP_MULTIPARAM directly. This saves some of the
inefficiencies of performing that two-stage rewrite, and is also
necessary for step 4 below.
3) Add CVf_NOSNAIL and the no-snails entersub performance improvement
This is the step that finally achieves the long-awaited
performance improvement in signatured subs. The overall idea is
that pp_entersub does not need to bother copying the caller
arguments off the stack into the @_ array (called the "snail"
array, hence the title of this step). Instead, those arguments can
just be left on the stack, and pp_multiparam can consume them
directly from there. By removing a whole stage of argument
copying, as well as all the fiddling with another depth of the @_
array in the first place, lots of unnecessary work can be removed,
with the user-visible upshot being that calling a sub that uses
signatures is now faster than it used to be. Quite how much, I
don't yet have figures on, and will need to be measured once I get
to it.
The reason this step depended on step 1 is that consuming the
arguments directly off the stack is a bit more subtle than just
popping them off (primarily because they need to be handled in
left-to-right order, so they're not being popped, but handled
inplace in order). This logic is all much easier to handle if it
all lives entirely within one pp function, rather than being split
across many in the existing pp_argcheck + N * pp_argelem approach.
4) Add named parameters to parser + OP_MULTIPARAM
Finally, with all this done, it will be possible to get the parser
to recognise the additional syntax of named parameters, and to add
information into the OP_MULTIPARAM structure to implement them.
Because these don't exist as individual ops (and it wouldn't make
sense to have them as individual ops anyway), this required the
parser to be able to emit the OP_MULTIPARAM directly, hence
depending on step 2. This step doesn't *strictly* depend on step
3, but if it's done before implementing no-snails, it just means
the work of converting to no-snails now has a lot more code to work
with. Doing it in this order means less having to write new code
only to change it shortly after.
As there are four steps here, each of which are quite large work in
their own, I think I would like to submit a PR for each step
individually, rather than try to combine them all together in one
super-huge lump at the end. This makes them somewhat easier to
review.
The main reason I'm writing this email therefore, is to point out that
this is the plan. This hopefully helps when reviewing the first two PRs,
as it gives some description on the overall direction and reason for
the changes. Otherwise, step 1 in particular would appear rather
pointless on its own, without this wider context. I'll point back at
this email thread in the PR descriptions themselves.
So far in all this plan I have put almost zero thought into how it
might interact with other CPAN modules, or even core's own 'class'
feature. It's possible some details will need a bit of adjusting or at
worst entirely rethinking in order to keep these all working.
Offhand, the things I can think of that would interact are:
core's feature 'class'
XS::Parse::Sublike, and hence:
Signature::Attribute::Alias
Signature::Attribute::Checked
Object::Pad
Future::AsyncAwait
All of those were created by me, so I am in a good place to think about
the relative trade-offs and details involved. It's entirely possible
that there could even be CPAN modules out there /not/ written by me,
that likewise would have an impact. If anyone knows about any I would
love to hear about them as then I can keep them in mind for my plans
too, and ensure not to break things. I can't guarantee not to break
modules I don't know about though.. ;)
[1]: My work-in-progress branch:
https://github.com/leonerd/perl5/tree/faster-signatures
four-step plan that ends in higher-performance signatures everywhere,
and support for named parameters. It's four steps, because the first
two steps are needed just to shuffle a bunch of things around into a
better state to then add these things later. Those first two stages
don't appear to give any immediate wins, and without the explanation of
"it's needed for later stuff", would at first glance appear to make
things worse. This being the reason I'm explaining this all - hopefully
it helps those first two to make sense.
My plan for these four steps is:
1) Create OP_MULTIPARAM as a rewrite-optimiser step
As discussed in previous emails, this is initially an optimisation
rewrite in the same style as OP_MULTIDEREF and OP_MULTICONCAT,
which collapses a bunch of existing OP_ARG* ops together into a
single op so it runs a bit faster. Crucially it also holds all the
logic in one place in one op, as will be needed for step 3 below.
2) Change the parser to emit OP_MULTIPARAM directly
Once the implementation in pp_multiparam is found to be nicely
sufficient and working, it's time to change the parser
implementation. Rather than having the in-core parser create and
emit the older style OP_ARG* ops only to immediately get rewritten,
it might as well accumulate all the information it needs then emit
a single OP_MULTIPARAM directly. This saves some of the
inefficiencies of performing that two-stage rewrite, and is also
necessary for step 4 below.
3) Add CVf_NOSNAIL and the no-snails entersub performance improvement
This is the step that finally achieves the long-awaited
performance improvement in signatured subs. The overall idea is
that pp_entersub does not need to bother copying the caller
arguments off the stack into the @_ array (called the "snail"
array, hence the title of this step). Instead, those arguments can
just be left on the stack, and pp_multiparam can consume them
directly from there. By removing a whole stage of argument
copying, as well as all the fiddling with another depth of the @_
array in the first place, lots of unnecessary work can be removed,
with the user-visible upshot being that calling a sub that uses
signatures is now faster than it used to be. Quite how much, I
don't yet have figures on, and will need to be measured once I get
to it.
The reason this step depended on step 1 is that consuming the
arguments directly off the stack is a bit more subtle than just
popping them off (primarily because they need to be handled in
left-to-right order, so they're not being popped, but handled
inplace in order). This logic is all much easier to handle if it
all lives entirely within one pp function, rather than being split
across many in the existing pp_argcheck + N * pp_argelem approach.
4) Add named parameters to parser + OP_MULTIPARAM
Finally, with all this done, it will be possible to get the parser
to recognise the additional syntax of named parameters, and to add
information into the OP_MULTIPARAM structure to implement them.
Because these don't exist as individual ops (and it wouldn't make
sense to have them as individual ops anyway), this required the
parser to be able to emit the OP_MULTIPARAM directly, hence
depending on step 2. This step doesn't *strictly* depend on step
3, but if it's done before implementing no-snails, it just means
the work of converting to no-snails now has a lot more code to work
with. Doing it in this order means less having to write new code
only to change it shortly after.
As there are four steps here, each of which are quite large work in
their own, I think I would like to submit a PR for each step
individually, rather than try to combine them all together in one
super-huge lump at the end. This makes them somewhat easier to
review.
The main reason I'm writing this email therefore, is to point out that
this is the plan. This hopefully helps when reviewing the first two PRs,
as it gives some description on the overall direction and reason for
the changes. Otherwise, step 1 in particular would appear rather
pointless on its own, without this wider context. I'll point back at
this email thread in the PR descriptions themselves.
So far in all this plan I have put almost zero thought into how it
might interact with other CPAN modules, or even core's own 'class'
feature. It's possible some details will need a bit of adjusting or at
worst entirely rethinking in order to keep these all working.
Offhand, the things I can think of that would interact are:
core's feature 'class'
XS::Parse::Sublike, and hence:
Signature::Attribute::Alias
Signature::Attribute::Checked
Object::Pad
Future::AsyncAwait
All of those were created by me, so I am in a good place to think about
the relative trade-offs and details involved. It's entirely possible
that there could even be CPAN modules out there /not/ written by me,
that likewise would have an impact. If anyone knows about any I would
love to hear about them as then I can keep them in mind for my plans
too, and ensure not to break things. I can't guarantee not to break
modules I don't know about though.. ;)
[1]: My work-in-progress branch:
https://github.com/leonerd/perl5/tree/faster-signatures
--
Paul "LeoNerd" Evans
***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS
Paul "LeoNerd" Evans
***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS