Post by Max via perl5-portersDear Perl 5 Porters, I am updating from Perl 5.26.1 to 5.38.2. These
versions have a different behavior in the interpolation of strings.
construct the string thereafter returning a wrong value for the first
@{[ $Var++ ]} value at that point of interpolation. ###############
TestInterpolateString.pl #################### $Var = 0; $String =
$String . "\n"; 1;
###################################################### >
0 1 1 2 > /opt/perl/5.38.2/bin/perl TestInterpolateString.plInterpolated
string: 0 2 1 2 > /opt/perl/5.34.0/bin/perl
TestInterpolateString.plInterpolated string: 0 2 1 2
######################################################
Would you agree that this is a bug in 5.38.2 or is the change intended?
I do prefer the 5.26.1 behavior, as it is more logical and allows
think that in future Perl versions the behavior will fall back to that
of Perl 5.26.1? Thank you in advance. Sincerely, Max.
The behavior you mentioned changed between 5.26 and 5.28 in the
following commit, which introduced the multiconcat op:
#####
$ gitshowf e839e6ed99c6b25aee589f56bb58de2f8fa00f41
commit e839e6ed99c6b25aee589f56bb58de2f8fa00f41
Author: David Mitchell <***@iabyn.nospamdeletethisbit.com>
AuthorDate: Tue Aug 8 18:42:14 2017 +0100
Commit: David Mitchell <***@iabyn.nospamdeletethisbit.com>
CommitDate: Tue Oct 31 15:31:26 2017 +0000
Add OP_MULTICONCAT op
Allow multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN
or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op,
which can
make things a *lot* faster: 4x or more.
In more detail: it will optimise into a single OP_MULTICONCAT, most
expressions of the form
LHS RHS
where LHS is one of
(empty)
my $lexical =
$lexical =
$lexical .=
expression =
expression .=
and RHS is one of
(A . B . C . ...) where A,B,C etc are expressions and/or
string constants
"aAbBc..." where a,A,b,B etc are expressions
and/or
string constants
sprintf "..%s..%s..", A,B,.. where the format is a constant string
containing only '%s' and '%%'
elements,
and A,B, etc are scalar
expressions (so
only a fixed, compile-time-known
number of
args: no arrays or list context
function
calls etc)
It doesn't optimise other forms, such as
($a . $b) . ($c. $d)
((($a .= $b) .= $c) .= $d);
(although sub-parts of those expressions might be converted to an
OP_MULTICONCAT). This is partly because it would be hard to
maintain the
correct ordering of tie or overload calls.
The compiler uses heuristics to determine when to convert: in general,
expressions involving a single OP_CONCAT aren't converted, unless some
other saving can be made, for example if an OP_CONST can be
eliminated, or
in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply
OPpTARGET_MY to, but OP_CONST can't.
The multiconcat op is of type UNOP_AUX, with the op_aux structure
directly
holding a pointer to a single constant char* string plus a list of
segment
lengths. So for
"a=$a b=$b\n";
the constant string is "a= b=\n", and the segment lengths are (2,3,1).
If the constant string has different non-utf8 and utf8 representations
(such as "\x80") then both variants are pre-computed and stored in
the aux
struct, along with two sets of segment lengths.
For all the above LHS types, any SASSIGN op is optimised away. For
a LHS
of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too.
For example where $a and $b are lexical vars, this statement:
my $c = "a=$a, b=$b\n";
formerly compiled to
const[PV "a="] s
padsv[$a:1,3] s
concat[t4] sK/2
const[PV ", b="] s
concat[t5] sKS/2
padsv[$b:1,3] s
concat[t6] sKS/2
const[PV "\n"] s
concat[t7] sKS/2
padsv[$c:2,3] sRM*/LVINTRO
sassign vKS/2
and now compiles to:
padsv[$a:1,3] s
padsv[$b:1,3] s
multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY
In terms of how much faster it is, this code:
my $a = "the quick brown fox jumps over the lazy dog";
my $b = "to be, or not to be; sorry, what was the question again?";
for my $i (1..10_000_000) {
my $c = "a=$a, b=$b\n";
}
runs 2.7 times faster, and if you throw utf8 mixtures in it gets even
better. This loop runs 4 times faster:
my $s;
my $a = "ab\x{100}cde";
my $b = "fghij";
my $c = "\x{101}klmn";
for my $i (1..10_000_000) {
$s = "\x{100}wxyz";
$s .= "foo=$a bar=$b baz=$c";
}
The main ways in which OP_MULTICONCAT gains its speed are:
* any OP_CONSTs are eliminated, and the constant bits (already in the
right encoding) are copied directly from the constant string
attached to
the op's aux structure.
* It optimises away any SASSIGN op, and possibly a PADSV op on the
LHS, in
all cases; OP_CONCAT only did this in very limited circumstances.
* Because it has a holistic view of the entire concatenation
expression,
it can do the whole thing in one efficient go, rather than
creating and
copying intermediate results. pp_multiconcat() goes to considerable
efforts to avoid inefficiencies. For example it will only
SvGROW() the
target once, and to the exact size needed, no matter what mix of utf8
and non-utf8 appear on the LHS and RHS. It never allocates any
temporary SVs except possibly in the case of tie or overloading.
* It does all its own appending and utf8 handling rather than calling
out to functions like sv_catsv().
* It's very good at handling the LHS appearing on the RHS; for
example in
$x = "abcd";
$x = "-$x-$x-";
It will do roughly the equivalent of the following (where targ is
$x);
SvPV_force(targ);
SvGROW(targ, 11);
p = SvPVX(targ);
Move(p, p+1, 4, char);
Copy("-", p, 1, char);
Copy("-", p+5, 1, char);
Copy(p+1, p+6, 4, char);
Copy("-", p+10, 1, char);
SvCUR(targ) = 11;
p[11] = '\0';
Formerly, pp_concat would have used multiple PADTMPs or temporary
SVs to
handle situations like that.
The code is quite big; both S_maybe_multiconcat() and pp_multiconcat()
(the main compile-time and runtime parts of the implementation) are
over
700 lines each. It turns out that when you combine multiple ops, the
number of edge cases grows exponentially ;-)
#####
We certainly haven't had this described as a bug until now, but I'll let
Dave Mitchell and others comment further.