I tried adding post-increment to CPython

Implemented increment in CPython. This time, I will introduce the outline of increment and the knowledge obtained. In the next implementation edition, we will look at the implementation method in chronological order.

Link list

All three times plus extra edition. Overview and summary of adding post-increment to CPython Implementation of CPython with post-increment List of all changes when adding post-increment to CPython Extra edition of adding post-increment to CPython

slide

First of all, please have a brief summary of the results slide.

specification

It works like this.


>>> i=0
>>> i++
0
>>> i
1


>>> lst=[x for x in range(5)]
>>> lst
[0, 1, 2, 3, 4]
>>> lst[0]++
0
>>> lst
[1, 1, 2, 3, 4]

>>> class cls:
...     a=5
...
>>> cls_obj=cls()
>>> cls_obj.a
5
>>> cls_obj.a++
5
>>> cls_obj.a
6

Increment can be implemented not only for variables but also for lists and member variables. In addition, the evaluation is returned at the same time as the variable is rewritten.

Obtained findings

Here's what I found by looking at the CPython 3.5.0 source code when implementing increments.

About the behavior of CPython

The Python script is executed as follows.

--Lexical analysis - Include/tokenizer.h - Parser/tokenizer.c

Parsing
- Grammar/Grammar --Python grammar written in EBNF. From here, the automaton that creates CST is automatically generated at the time of Make.
- Parser/Python.asdl --ASDL is probably an Abstract-Type and Scheme-Definition Language. I'm not sure, but it probably says what kind of tree is made from statements and operators.
- Python/ast.c --Create an AST (abstract syntax tree) from the CST (concrete syntax tree) created from the program automatically generated by Grammar / Grammar above.
- Modules/parsermodule.c --Confirm that the expected tree is generated --Compile
- Lib/opcode.py --It was used to create a file with OPCODE defined such as STORE called opcode.h.
- Python/compile.c --Compile the tree into bytecode --Execute
- Python/ceval.c --The heart of the Python Virtual Machine that reads and executes bytecode --Python Virtual Machine is a stack machine that uses stacks instead of registers to perform operations. When I looked it up, I found that JVM, .NET Framework VES, and Ruby YARV are also stack machines.

How to change Python grammar

There is a page called 23. Changing CPython ’s Grammar in Python Developer's Guide, so you can refer to it. However, this is not enough words, so I've summarized some more specific changes.

When you want to change the reserved word to another word such as "I want to write foreach instead of for" --Sometimes you just need to change Grammar / Grammar appropriately --You only need to change the 'for' part to " ('for' |'foreach') , but if you rewrite the deep'elif', it will be in Python / ast.c. It seems that it is necessary to change around ast.c because it will fall with the assert of (unconfirmed)
If you want to use a symbol string that is not used in Python, such as "I want to be able to use! Instead of for" --In addition to modifying Grammar / Grammar as above, define tokens in Include / tokenizer.h, Parser / tokenizer.c
If you want to add a grammar that uses symbols that are already used in other meanings -For example, list comprehension[x + 1 | x <- range(10)]Change to be able to write (|Is already used in the sense of bit OR) -Since the carved tokens are passed to the automatically generated parser in sequence, the bit operation is performed at the timing before that.|Or list comprehension|I thought it would be good to judge whether it was, but I gave up because it would take time to make this from scratch. ――I think it should be done by the parser in the first place, but is the Python grammar, which is originally LL (1), no longer LL (1)? → It seems necessary to read the automatic generation part of the parser that I did not touch and know the ability of the parser --I didn't quite understand (for that reason, in this experiment, I implemented using $ instead of | and cheated)
If you want to define some syntactic sugar --In addition to tokenizer. *, Grammar, it seems good to make it into a synonymous tree with Python / ast.c
If you want to add a grammar that goes beyond the framework of expressions and sentences or does not exist in existing ones --tokenizer, Grammar, .. Sometimes compile.c should also spit out nice bytecode. If necessary, also define new opcodes and their interpretations

By the way, if you make a destructive change, you will not be able to compile the library at the time of make install.

In this increment implementation, there was at least one expression ++ 2 and an error was thrown (it is a mystery why it is written like that in the first place).


compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]

I tried adding post-increment to CPython. Overview and summary