Suggestions wanted: visibility of XSUB functions

Discussion:

(too old to reply)

Paul "LeoNerd" Evans

2024-10-09 00:09:36 UTC

I have a (draft) PR in progress to implement a performance boost on
code of the form

foreach my ($idx, $val) (builtin::indexed @array) {
...
}

by using the same CXt_LOOP_ARY optimisation that regular
single-variable foreach on a single array uses. Since this keeps track
of the array index anyway, we get to use it "for free" by assigning it
into the $idx var, and thus avoid the need for an entire temporary
list. Seems good.

There's one problem with this PR currently which is why it's still in
draft. In order to make it work, the code that builds the optree for
a `for` loop (which lives in `op.c`) needs to be able to check that the
CV target of an `OP_ENTERSUB` that is its iteration-list expression
definitely points to the builtin::indexed function. That function lives
in `builtin.c`, a different C file. We don't currently give XSUB
functions a prototype in a .h file, so they can't be seen.

Hence, the horrible hack in the file that is this line:

https://github.com/Perl/perl5/pull/22641/files#diff-e70a9c5c0a9ba08a5e8ac7323e5f796732577f22a3845c573f1e347aaccfb755R174-R175

I'm not currently aware of any existing precedent for doing this
another nicer way. In various other places of code where (typically)
some opchecker in op.c needs to be aware of specific optree shapes, it
can use the opcodes of the individual ops. But builtin *functions* are
just that - functions. Regular functions called with regular
OP_ENTERSUB. So it can't check the opcode. It has to rely on the CV
address instead.

I could just declare the prototype for this function in some .h file
somewhere, and call it done. But it'd be a weird oddball of a case we
don't do anywhere else.

Before I undraft the PR, does anyone have any better ideas?

--
Paul "LeoNerd" Evans

***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

Tony Cook

2024-10-09 02:22:21 UTC

Permalink

Post by Paul "LeoNerd" Evans
I could just declare the prototype for this function in some .h file
somewhere, and call it done. But it'd be a weird oddball of a case we
don't do anywhere else.

You could just add it to embed.fnc.

It may end up with a Perl_ name and no longer use the XS() macro, but
I don't think that's a big deal.

Tony

Paul "LeoNerd" Evans

2024-10-09 19:30:49 UTC

Permalink

On Wed, 9 Oct 2024 13:22:21 +1100

Post by Tony Cook
You could just add it to embed.fnc.
It may end up with a Perl_ name and no longer use the XS() macro, but
I don't think that's a big deal.

Ahyes, that seems to work well in practice.

PR updated.

--
Paul "LeoNerd" Evans

***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

Dave Mitchell

2024-10-09 09:29:33 UTC

Permalink

Post by Paul "LeoNerd" Evans
I have a (draft) PR in progress to implement a performance boost on
code of the form
...
}
by using the same CXt_LOOP_ARY optimisation that regular
single-variable foreach on a single array uses.

[snip]

Post by Paul "LeoNerd" Evans
needs to be able to check that the
CV target of an `OP_ENTERSUB`

Sub calls are normally based on looking up the CV in the GV at *runtime*,
so its its possible to override subs. For example:

@a = qw(a b c);
*builtin::indexed = sub (@) { qw(x y z) };
print builtin::indexed(@a), "\n";

which prints "xyz";

Can this proposed optimisation cope with the sub being swapped out? If
not, do we document anywhere that you can't always rely on it for
builtin:::indexed?

--
You're only as old as you look.

Paul "LeoNerd" Evans

2024-10-09 19:02:04 UTC

Permalink

On Wed, 9 Oct 2024 10:29:33 +0100

Post by Dave Mitchell
Sub calls are normally based on looking up the CV in the GV at
@a = qw(a b c);
which prints "xyz";
Can this proposed optimisation cope with the sub being swapped out? If
not, do we document anywhere that you can't always rely on it for
builtin:::indexed?

It's already the case that most of the simple `builtin::` functions are
replaced by direct opcodes at compiletime, so such a replacement in the
GV already doesn't work.

E.g.

$ perl -E '*builtin::reftype = sub { "BOO" }; say builtin::reftype []'
Built-in function 'builtin::reftype' is experimental at -e line 1.
Prototype mismatch: sub builtin::reftype ($) vs none at -e line 1.
ARRAY

This is because most of those simple function CVs have a callchecker
that replaces the OP_ENTERSUB with a direct opcode implementation of
the behaviour. This makes them perform much faster at runtime, avoiding
that entersub overhead.

We should probably call that out more specifically in documentation.

I think it's not entirely unreasonable that any of the functions in the
"builtin::" space don't allow being replaced in such a fashion. In fact
perhaps we could probably go further in setting the READONLY flag on all
these GVs, to prevent that kind of thing?

--
Paul "LeoNerd" Evans

***@leonerd.org.uk
http://www.leonerd.org.uk/ | https://metacpan.org/author/PEVANS

Martijn Lievaart via perl5-porters

2024-10-10 19:38:20 UTC

Permalink

Post by Paul "LeoNerd" Evans
On Wed, 9 Oct 2024 10:29:33 +0100

It's already the case that most of the simple `builtin::` functions are
replaced by direct opcodes at compiletime, so such a replacement in the
GV already doesn't work.

Then replace this one with an opcode too?

HTH,

M4