Python-Parsing arguments through Argparse


The argparse module makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv[1].

Argparse module automatically generates help and usage messages, making it easier to run. The module is highly user configurable. For example, user can define arguments that are mandatory, optional, or even specify default values for the arguments. Some of the features of argparse module can be illustrated from the following example:

import argparse;
import sys

argv = sys.argv[1:]

command_choices = ["create", "mount", "unmount", "delete"];

#Description of the module
parser = argparse.ArgumentParser(description="Argument parser Example")

"""
The following argument can take one of four choices,
create, mount, unmount and delete. It throws an error
if any other choice is given
"""

parser.add_argument("command",
                    choices=command_choices,
                    help="The commands are create, mount, unmount, delete")

"""
Its mandatory for the user to give the required_arg as
an argument in the command line, otherwise it throws an
error. required = True is added for this purpose
"""

parser.add_argument("--required_arg",
                    required=True,
                    help="The argument which is mandatory")

"""
The following argument is optional as 
required option is not given
"""
parser.add_argument("--optional_arg",
                    help="The argument which is optional")

"""
The argument is optional. If no value is given, the 
argument is given the default value. 
"""
parser.add_argument("--size_mb",
                    default="100",
                    help="The size in MB")

parser.add_argument("--encryption_alg",
                    default="aes",
                    help="The encryption algorithm supports aes")

args = parser.parse_args(argv)

"""
Print the values of the required arguments
"""

print "required_arg is : ", args.required_arg
print "optional_arg is : ", args.optional_arg
print "size_mb is      : ", args.size_mb
print "Command is      : ", args.command

Scripting-How is it different from programming?


What is script for a movie?? Script for a movie describes what actors do. It also describes the interaction between the actors, and connects different sequences of a movie. Scripting languages are also very similar. Scripting is used to automate the execution of tasks which could alternatively be executed one-by-one manually. Environments that can be automated through scripting include software applications, and the operating system shells (OS shells). Scripting languages play a major role in automation testing, because, from a script, you can invoke different programs and test the outputs.

scripting-languageIn my earlier organization, when I was working on projects related to Natural language processing, I could see the power of scripting languages. We were building a translation engine, for translating text from one natural language to another, for example from Urdu to Hindi or Sanskrit to Hindi. Scripting languages had the power to use the regular expressions and made text processing very much simpler. Those tasks which required design of state machines in a programming language, could be written by a single line in a scripting language!!!! Let us see an example of a word count program, written in both PERL and C.

 A word can be defined as set of characters, which are preceded by any number of blank spaces, new lines or tabs. If you had to write the same piece of code in C, a state machine has to be designed (see example in Dennis Ritchie). Scripting language solves the problem with a single line of code.

images1We found highly complex pieces of code, could be written very easily in scripting languages. It has the power to handle regular expressions and text processing.

Scripting languages are designed for “gluing” applications. Consider the scenario in which different  modules are written in different languages, like C, Java, C++ etc. and have to be integrated into a single project. Scripting languages play a key role in such type of scenarios.

Scripting languages also have the power to access OS facilities. When we were building the migration engine, we could connect to loop devices, do mounts, unmount etc., using python.

Scripting languages use typeless approaches to achieve a higher level of programming and more rapid application development than programming languages.  That means, declaration of variables is not required in Scripting. Only assignment would be sufficient.

Scripting languages make the life of developer very easy. Enjoy Scripting !!!!!

Python coding standards


Indentation is rigidly enforced in python.  Python script does not run if the indentation is not followed. Entire block of code should come under single indentation. (should have same number of spaces or tabs).

Pylint software:

Pylint is a tool that checks for errors in python code. It checks for coding conventions,  lines and indentation, etc. For example it checks for:

  1. Checks if there are unused imports.
  2. Unused variables assigned.
  3. Checks if the declared interfaces are fully implemented etc.

PEP8 coding standards:

High level of readability is one of the most important features of python. PEP8 standards provide guidelines to improve readability of code and make it consistent on wide variety of python scripts. Example:

  1. Indentation: 4 spaces for indentation.
  2. Spaces are a preferred indentation method.
  3. Limits all the lines to a maximum of 79 characters.
  4. Method definitions in a class are separated by a blank line.
  5. Imports : Imports should always be on separate lines. Imports should be grouped in the following order:
  • standard library imports
  • related third party imports
  • local application/library specific imports
  1. Avoid white space in the following situations:

Immediately inside parentheses, brackets or braces.

Yes: spam(apple[1], {mango: 2})
No: spam( apple[ 1 ], { mango: 2 } )

Immediately before a comma, semicolon, or colon:

Yes: if x == 4: print x, y; x, y = y, x
No: if x == 4 : print x , y ; x , y = y , x

Immediately before the open parenthesis that starts the argument list of a function call:

Yes: spam(1)
No: spam (1)

Immediately before the open parenthesis that starts an indexing or slicing:

Yes: dict['key'] = list[index]
No: dict ['key'] = list [index]

Immediately before the open parenthesis that starts an indexing or slicing:

Yes:
x = 1
y = 2
long_variable = 3
No:
x    = 1
y    = 2
long_variable    = 3

Other recommendations:

Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).

Comments:

Use inline comments sparingly.

x = x + 1 # Increment x

 A tool that automatically formats Python code to conform to the PEP 8 style guide.

To run autopep8:

$ autopep8 TARGET.py

Pre compiled files in python


In the past couple of months, our team has been working very hard to build a migration engine on python, for taking backup of drives using encryption mounts. Recently, I was asked a very interesting question about python:

“I often find .pyc files being generated after I run the python scripts. What are the .pyc files? Why are the pre compiled files being generated when python is an interpreter ?? Can the .pyc files be deleted?”

Python is an interpreted language and not a compiled one. Python uses a CPython bytecode interpreter. Python source code is compiled into bytecode, which is the internal representation of a Python program. The bytecode is also cached in .pyc and .pyo files, so that executing the same file is faster the second time (recompilation from source to bytecode can be avoided).

The Python interpreter actually has the structure of a classic compiler. When you invoke the “python” command, the raw source code is scanned for tokens, these tokens are parsed into a tree representing the logical structure of the program, which is finally transformed into bytecode. Finally, this bytecode is executed by the virtual machine. The details of the process are shown in the figure below:

Flow of data in a typical parser

Lexical analysis

A tokenizer breaks input text into a series of tokens. The process of transforming the input text into tokens is known as “lexical analysis” or “tokenizing”. Tokens are identified based on the specific rules of the lexer. Some methods used to identify tokens include regular expressions, specific sequences of characters known as a flag and specific separating characters called delimiters. Special characters, including punctuation characters, are commonly used by lexers to identify tokens because of their natural use in written and programming languages.

Parser

In computing, a parser is one of the components in an interpreter or compiler that takes input text and builds a data structure like parse tree giving a structural representation of the input, checking for correct syntax in the process. Python’s parser takes a token stream as input and based on the rules declared in the Python grammar produces an Abstract Syntax Tree (AST).

Code generation:

The next phase of compilation is code generation, which takes the AST constructed in the previous phase and produces a PyCodeObject as output. A PyCodeObject is an independent unit of executable code, containing all the data and code necessary for independent execution by the Python bytecode interpreter.

Code execution:

The execution of Python bytecode is handled by the bytecode interpreter. Python’s bytecode interpreter is a stack-based virtual machine. The process of bytecode execution manipulates a data stack, by pushing and popping instructions.

Sampling of Video signals


Sampling of an audio signal converts continuous time signal into a discrete time signal. Quantization converts continuous amplitude signal into a discrete amplitude signal. A discrete time and discrete amplitude signal is a digital signal. The present post describes the sampling of a video signal, different characteristics of the video which are considered for sampling, spatial sampling and temporal sampling. A real world video is composed of multiple objects each with their own characteristic shape, depth, texture and illumination. The color and brightness of a natural video changes with varying degrees of smoothness throughout the video (‘continuous tone’). Characteristics of a typical natural video scene that are relevant for video processing and compression include spatial characteristics (texture variation within scene, number and shape of objects, color, etc.) and temporal characteristics (object motion, changes in illumination, movement of the camera or viewpoint and so on). A natural video is spatially and temporally continuous. Representing a video in digital form involves sampling the real scene spatially and temporally.

Spatial sampling

Sampling the signal at a point in time produces a sampled image or frame that has defined values at a set of sampling points. The most common format for a sampled image is a rectangle with the sampling points positioned on a square or rectangular grid. Figure 1 shows a continuous-tone frame with two different sampling grids superimposed upon it. Sampling occurs at each of the intersection points on the grid and the sampled image may be reconstructed by representing each sample as a square picture element (pixel). The visual quality of the image is influenced by the number of sampling points. Choosing a ‘coarse’ sampling grid produces a low-resolution sampled image whilst increasing the number of sampling points slightly increases the resolution of the sampled image.

Temporal sampling

A moving video image is captured by taking a rectangular ‘snapshot’ of the signal at periodic time intervals. Playing back the series of frames produces the appearance of motion. A higher temporal sampling rate (frame rate) gives apparently smoother motion in the video scene but requires more samples to be captured and stored. Frame rates below 10 frames per second are sometimes used for very low bit-rate video communications (because the amount of data is relatively small) but motion is clearly jerky and unnatural at this rate. Between 10 and 20 frames per second is more typical for low bit-rate video communications; the image is smoother but jerky motion may be visible in fast-moving parts of the sequence. Sampling at 25 or 30 complete frames per second is standard for television pictures (with interlacing to improve the appearance of motion, see below); 50 or 60 frames per second produces smooth apparent motion (at the expense of a very high data rate).

Git clone: The remote end hung up unexpectedly!!!!


1294361190error_thumbCloning a repository on GitHub : “git clone url” will clone online repository to your hard drive so you may begin working on your modifications.  The local copy is called your local repository.Sometimes when you try to clone a repository using the command:

git clone git@github.com:thpii/rm-phoenix.git phoenix/

You get the following error:

Initialized empty Git repository in /home/innominds/phoenix/.git/
ssh: connect to host github.com port 22: Connection timed out
fatal: The remote end hung up unexpectedly

Try the following command:

ssh git@github.com

If you get the following error:

ssh: connect to host github.com port 22: Connection timed out
The port 22 on your machine needs to be enabled by your system administrator.

Steps to clone a repository on GitHub.


githubVersion control plays a major role in the industry. Version control is inevitable when a project is  collective contribution of large number of team members, where each of them check in the code into a centralized repository. The purpose of the blog is to help some of my team members to clone a repository from GitHub.

Cloning : “git clone url” will clone online repository to your hard drive so you may begin working on your modifications. The local copy is called your local repository.

1. Setting the name and default email

To associate user name with the commits,  user name can be set by the command:

git config --global user.name "Name"

To associate the email address with the commits,email id can be given by command:

git config --global user.email "email@example.com"

git_repo_marked

2. Check if the repo is public or private:
Go to the url where your repository is present in the GitHub. On the top left of the page, you can see an icon, which specifies if the repository is public or private.  (Please see figure 1 for details). If the repository is public, you don’t need the keys to clone your repo.

3. Clone the repo:

Steps to Clone the repo if it is public:
A git clone should work, either in git bash or the Linux shell. If the repo is public, you don’t need keys to clone the repo. You can clone the repo using the following command:

git clone git@github.com:FFmpeg/FFmpeg.git

Steps to clone the repo if it is private:
If the repo is private, you need to add the public key to the account.

1. Check if the keys already exist in .ssh folder, by the command:

ls ~/.ssh

2. Generate the SSH key:
If the keys are not present,  generate the ssh key with the command:

 ssh-keygen -t rsa -C "mymail@example.com"

You will find two files, id_rsa, which is private key and id_rsa.pub, which is the public key, in ~/.ssh folder.

Go to the Account settings on the top right corner in the GitHub page as shown in the figure. Click on SSH keys and Add SSH key. Copy the contents of id_rsa.pub there.

A git clone should now work!!!!