Lists, pre-increment, and post-increment

Mon Jun 25 21:45:00 2012

Lists, pre-increment and post-increment

A while back, lwm on irc.freenode.net #perl wondered at the following behaviour:

$ my $y = 20;
20

$ my @x = ($y, ++$y, $y++);
22
22
21

Which is indeed a little on the odd side.

My first reaction was "well don't rely on ordering in list construction" ... except I'm sure that perl -does- call things in order in such a situation, so I started to wonder what was going on.

LeoNerd theorised that maybe the list construction was happening in reverse order, but if we synthesise that:

$ my $y = 20;
20

$ $y++
20

$ ++$y
22

$ $y
22

you can see that the result would be (22, 22, 20) rather than (22, 22, 21) so clearly that can't be the case.

Also, adding do {} blocks around each entry like so:

$ my $y = 20;
20

$ my @x = (do { my $tmp = $y }, do { my $tmp = ++$y }, do { my $tmp = $y++ });
20
21
21

demonstrates that the expressions are evaluated in order, and that with the addition of the temporary variables the results are what we'd expect.

So why do we get the strange behaviour unless we introduce the temporaries?

The first step to understanding is to have a look at the optree for the code. The perl5 virtual machine is a stack based VM where each subroutine (or file or other unit of execution) is a tree of VM ops, and the B::Concise module will dump us a representation of them:

matthewt@sherlock$ perl -MO=Concise \
    -e'my $y = 20; my @x = ($y, ++$y, $y++)'

g  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v:{ ->3
5     <2> sassign vKS/2 ->6
3        <$> const[IV 20] s ->4
4        <0> padsv[$y:1,3] sRM*/LVINTRO ->5
6     <;> nextstate(main 2 -e:1) v:{ ->7
f     <2> aassign[t4] vKS/COMMON ->g
-        <1> ex-list lKP ->d
7           <0> pushmark s ->8
8           <0> padsv[$y:1,3] l ->9
a           <1> preinc sK/1 ->b
9              <0> padsv[$y:1,3] sRM ->a
c           <1> postinc[t3] sK/1 ->d
b              <0> padsv[$y:1,3] sRM ->c
-        <1> ex-list lK ->f
d           <0> pushmark s ->e
e           <0> padav[@x:2,3] lRM*/LVINTRO ->f
-e syntax OK

Ok, so that's a fair bit of stuff. Let's ignore most of it for a second. There are two statements in our one liner, corresponding to two 'nextstate' ops -

  my $y = 20

2     <;> nextstate(main 1 -e:1) v:{ ->3

  my @x = ($y, ++$y, $y++)

6     <;> nextstate(main 2 -e:1) v:{ ->7

The last part of the output, '->7', indicates that the next op to be executed is the one marked 7, so let's have a look at that and its neighbours:

7           <0> pushmark s ->8
8           <0> padsv[$y:1,3] l ->9
a           <1> preinc sK/1 ->b
9              <0> padsv[$y:1,3] sRM ->a
c           <1> postinc[t3] sK/1 ->d
b              <0> padsv[$y:1,3] sRM ->c

There's three fetches of $y, and a preinc and postinc op - looks like this is the optree for the ($y, ++$y, $y++) part - roughly

list(
  padsv('$y'),
  preinc(padsv('$y')),
  postinc(padsv('$y')),
)

since the pushmark says "start of stuff" and the top of the stack indicates where it ends (approximately - imagine me waving my hands frantically at the people who know what they're talking about so they don't interrupt the narrative flow by being suddenly and unexpectedly accurate at me).

Now, we can see from the dump that this means the second padsv needs to be executed before the preinc, and the third before the postinc - we can confirm this by using Concise again but this time asking for execution order:

matthewt@sherlock$ perl -MO=Concise,-exec \
  -e'my $y = 20; my @x = ($y, ++$y, $y++)'

1  <0> enter 
2  <;> nextstate(main 1 -e:1) v:{
3  <$> const[IV 20] s
4  <0> padsv[$y:1,3] sRM*/LVINTRO
5  <2> sassign vKS/2
6  <;> nextstate(main 2 -e:1) v:{
7  <0> pushmark s
8  <0> padsv[$y:1,3] l
9  <0> padsv[$y:1,3] sRM
a  <1> preinc sK/1
b  <0> padsv[$y:1,3] sRM
c  <1> postinc[t3] sK/1
d  <0> pushmark s
e  <0> padav[@x:2,3] lRM*/LVINTRO
f  <2> aassign[t4] vKS/COMMON
g  <@> leave[1 ref] vKP/REFC
-e syntax OK

Pulling the section we care about out:

7  <0> pushmark s
8  <0> padsv[$y:1,3] l
9  <0> padsv[$y:1,3] sRM
a  <1> preinc sK/1
b  <0> padsv[$y:1,3] sRM
c  <1> postinc[t3] sK/1

Now, let's look at what happens to the stack

  Stack: ...

8  <0> padsv[$y:1,3] l

  Stack: ... 20

9  <0> padsv[$y:1,3] sRM

  Stack: ... 20 20

a  <1> preinc sK/1

... wait.

If that was literally just 20, how does preinc ++ the value in $y?

Best look at the implementation, I guess. To find this, we crack out our handy copy of the perl source code (you don't have one? shame on you!) and grep for pp_preinc - a perl5 VM op is also known as a 'push/pop function' due to it being a stack system so the C function for a given op is generally findable as pp_${opname}. We find it in pp_hot.c:

PP(pp_preinc)
{
    dVAR; dSP;
    if (SvTYPE(TOPs) >= SVt_PVAV || isGV_with_GP(TOPs))
        DIE(aTHX_ "%s", PL_no_modify);
    if (!SvREADONLY(TOPs) && SvIOK_notUV(TOPs) && !SvNOK(TOPs) && !SvPOK(TOPs)
        && SvIVX(TOPs) != IV_MAX)
    {
        SvIV_set(TOPs, SvIVX(TOPs) + 1);
        SvFLAGS(TOPs) &= ~(SVp_NOK|SVp_POK);
    }
    else /* Do all the PERL_PRESERVE_IVUV conditionals in sv_inc */
        sv_inc(TOPs);
    SvSETMAGIC(TOPs);
    return NORMAL;
}

Sweeping the details under a handy carpet, there are two code paths here:

        SvIV_set(TOPs, SvIVX(TOPs) + 1);

        sv_inc(TOPs);

both of which are doing in-place modification of TOPs, the top element of the stack.

So! What we have isn't 20, it's actually another (C-level) pointer to the same SV as $y, i.e.:

  Stack: ...

8  <0> padsv[$y:1,3] l

  Stack: ... $y              ($y == 20)

9  <0> padsv[$y:1,3] sRM

  Stack: ... $y $y           ($y == 20)

a  <1> preinc sK/1

  Stack: ... $y $y           ($y == 21)

which means ... that postinc can't be doing a toplevel modification directly. Looking at the postinc code we see:

PP(pp_postinc)
{
    dVAR; dSP; dTARGET;
    if (SvTYPE(TOPs) >= SVt_PVAV || isGV_with_GP(TOPs))
        DIE(aTHX_ "%s", PL_no_modify);
    sv_setsv(TARG, TOPs);
    if (!SvREADONLY(TOPs) && SvIOK_notUV(TOPs) && !SvNOK(TOPs) && !SvPOK(TOPs)
        && SvIVX(TOPs) != IV_MAX)
    {
        SvIV_set(TOPs, SvIVX(TOPs) + 1);
        SvFLAGS(TOPs) &= ~(SVp_NOK|SVp_POK);
    }
    else
        sv_inc(TOPs);
    SvSETMAGIC(TOPs);
    /* special case for undef: see thread at 2003-03/msg00536.html in archive */
    if (!SvOK(TARG))
        sv_setiv(TARG, 0);
    SETs(TARG);
    return NORMAL;
}

the crucial parts of which are (again with the two paths to do the increment):

    sv_setsv(TARG, TOPs);

        SvIV_set(TOPs, SvIVX(TOPs) + 1);

        sv_inc(TOPs);

    SETs(TARG);

so postinc is copying the value into TARG, incrementing the same as preinc does, but then setting the top entry of the stack to TARG - so the stack becomes:

  Stack: ... $y $y           ($y == 21)

b  <0> padsv[$y:1,3] sRM

  Stack: ... $y $y           ($y == 21)

c  <1> postinc[t3] sK/1

  Stack: ... $y $y $targ      ($y == 22, $targ == 21)

and there we have it - 22, 22, 21 - exactly what gets assigned to @x (and the = operation copies the values into new SVs so they're no longer aliased).

Note that the t3 in postinc[t3] is a name derived from how many TARG temporary variables have been allocated so far - which isn't massively relevant to us at this point but can be interesting when reading through Concise output.

Of course, the correct answer when faced with ($y, ++$y, $y++) or equivalent is still to pat the author gently on the head and then rewrite it, but the stack aliasing behaviour can be rather important. For example, consider:

sub roll_sum_1 {
  my $total = shift;
  return ($total, map { $total += $_; $total } @_);
}

sub roll_sum_2 {
  my $first = my $total = shift;
  return ($first, map { $total += $_; $total } @_);
}

Try calling both of these on, say, (5, 10, 5, 10). How does the answer differ, and why? (bonus points for working it out without the aid of the perl5 VM).

If you've enjoyed this little exploration, you'll be pleased to know that it's far from the only such article I have in mind ... but I make no guarantees as to when I'll get to writing another one, sorry.

In the mean time - happy perl hacking!

-- mst, out.