Pre compiled files in python


In the past couple of months, our team has been working very hard to build a migration engine on python, for taking backup of drives using encryption mounts. Recently, I was asked a very interesting question about python:

“I often find .pyc files being generated after I run the python scripts. What are the .pyc files? Why are the pre compiled files being generated when python is an interpreter ?? Can the .pyc files be deleted?”

Python is an interpreted language and not a compiled one. Python uses a CPython bytecode interpreter. Python source code is compiled into bytecode, which is the internal representation of a Python program. The bytecode is also cached in .pyc and .pyo files, so that executing the same file is faster the second time (recompilation from source to bytecode can be avoided).

The Python interpreter actually has the structure of a classic compiler. When you invoke the “python” command, the raw source code is scanned for tokens, these tokens are parsed into a tree representing the logical structure of the program, which is finally transformed into bytecode. Finally, this bytecode is executed by the virtual machine. The details of the process are shown in the figure below:

Flow of data in a typical parser

Lexical analysis

A tokenizer breaks input text into a series of tokens. The process of transforming the input text into tokens is known as “lexical analysis” or “tokenizing”. Tokens are identified based on the specific rules of the lexer. Some methods used to identify tokens include regular expressions, specific sequences of characters known as a flag and specific separating characters called delimiters. Special characters, including punctuation characters, are commonly used by lexers to identify tokens because of their natural use in written and programming languages.

Parser

In computing, a parser is one of the components in an interpreter or compiler that takes input text and builds a data structure like parse tree giving a structural representation of the input, checking for correct syntax in the process. Python’s parser takes a token stream as input and based on the rules declared in the Python grammar produces an Abstract Syntax Tree (AST).

Code generation:

The next phase of compilation is code generation, which takes the AST constructed in the previous phase and produces a PyCodeObject as output. A PyCodeObject is an independent unit of executable code, containing all the data and code necessary for independent execution by the Python bytecode interpreter.

Code execution:

The execution of Python bytecode is handled by the bytecode interpreter. Python’s bytecode interpreter is a stack-based virtual machine. The process of bytecode execution manipulates a data stack, by pushing and popping instructions.

Advertisements

2 thoughts on “Pre compiled files in python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s