Monday, July 11, 2016

The many uses of randomness - Part 2

By Vasudev Ram


Denarius image attribution

Hi, readers,

In my previous (and first) post on randomness, titled:

The many uses of randomness ,

I had described some uses of random numbers related to floats, and ended by saying I would continue in the next post, with other uses, such as for strings (and things).

This is that next post (delayed some, sorry about that).

Assume that the following statements have been executed first in your Python program or in your Python shell:
from __future__ import print_function
import string
from random import random, randint, randrange, choice, shuffle
Let's look now at the use of random numbers to generate random character and string data.

First, let's generate a few different kinds of random characters:

1) Random characters from the range of 7-bit ASCII characters, i.e. the characters with ASCII codes 0 to 127. This expression generates a single ASCII character:
chr(randint(0, 127))
Each time the above expression is evaluated, it will generate a random character whose code is between 0 and 127.

As a result, it may sometimes generate non-printable characters, such as the characters with codes in the range 0 to 31, and 127. See the Wikipedia article about ASCII above, for information on printable versus non-printable characters.

To generate only printable ASCII characters, use:
choice(string.printable)

We may want to generate all ASCII characters, or even all printable characters, only for some specialized purposes. More commonly, we may want to generate printable random characters from a specific subset of the complete ASCII character set. Some examples of this would be: generating random uppercase letters, random lowercase letters, random numeric digits, or combinations of those. Here are a few code snippets for those cases:
# Generate random uppercase letter.
chr(randint(ord('A'), ord('Z')))
(which relies on the fact that the ASCII codes for the characters 'A' through 'Z' are contiguous).
Or, another way:
# Generate random uppercase letter.
choice(string.ascii_uppercase)
# -------------------------------------------
# Generate random lowercase letter.
chr(randint(ord('a'), ord('z')))
Or, another way:
# Generate random lowercase letter.
choice(string.ascii_lowercase)
Random numbers can be used to generate random strings, where the randomness of the strings can be in either or both of two dimensions, the content or the length:

Generate strings with random character content but fixed length, e.g.: "tdczs", "ohybi", "qhmyf", "elazk"
def rand_lcase_str(n):
    '''Return string of n random lowercase letters.'''
    assert n > 0
    rand_chars = [ choice(string.ascii_lowercase) for i in range(n) ]
    return ''.join(rand_chars)

# Calls and output:
[ rand_lcase_str(3) for i in range(1, 8) ]
['xio', 'qsc', 'omt', 'fnn', 'ezz', 'get', 'frs']
[ rand_lcase_str(7) for i in range(1, 4) ]
['hazrdwu', 'sfvvxno', 'djmhxri']

Generate strings with fixed character content but random lengths, e.g.: "g", "gggg", "gg", "ggggg", "ggg"; all strings contain only letter g's, but are of different lengths.
def rand_len_fixed_char_str(c, low_len=1, high_len=256):
    '''Return a string containing a number of characters c,
    varying randomly in length between low_len and high_len'''
    assert len(c) == 1
    assert 0 < low_len <= high_len
    rand_chars = c * randint(low_len, high_len)
    return rand_chars

# Calls and output:
[ rand_len_fixed_char_str('g', 3, 8) for i in range(10) ]
['gggg',
 'ggggggg',
 'ggg',
 'ggggggg',
 'ggggg',
 'ggggg',
 'gggggg',
 'gggggg',
 'gggggg',
 'ggggg']
Generate strings with both random character content and random lengths, e.g.: "phze", "ysqhdty", "mltstwdg", "bnr", "q", "ifgcvgrey". This should be easy after the above snippets, since we can use parts of the logic from some of them, so is left as an exercise for the reader.

Such kinds of randomly generated data are useful for many purposes, e.g. for testing apps that read or write CSV or TSV files, fixed-length or variable-length records, spreadsheets, databases; for testing report generation logic (particularly with respect to text formatting, wrapping, centering, justification, logic related to column and line widths, etc.).

All these use cases can benefit from running them on random data (maybe with some programmed constraints, as I showed above), to more thoroughly test the app than can be done manually by typing in, say, only a few dozen variations of test data. There are at least two benefits here:

- a program can be more systematically random (if that makes sense) than a human can, thereby giving test data that provides better coverage;

- the computer can generate large volumes of random data for testing the app, much faster than a human can. It can also feed it as input to the software you want to test, faster than a human can, e.g. by reading it from a file instead of a user typing it. So overall, (parts of) your testing work can get done a lot faster.

In the next part, I'll show how, using a mathematical concept, random numbers can be used to reduce the amount of test data needed to test some apps, while still maintaining a good level of quality of testing. I will also discuss / show some other uses of randomness, such as in web development, and simulating physical events.

The image at the top of the post is of a Roman denarius in silver of Maximinus (235-238). The word denarius seems to be the origin of the word for money in multiple modern languages, according to the linked article.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



No comments: