Tuesday, August 4, 2015

Converting numeric strings to integers with handrolled code


By Vasudev Ram

-


A while ago, I was explaining to someone, how different data types such as numbers and strings are represented in a computer. That made me think of writing a simple program, called str_to_int.py, similar to Python's built-in int() function, to show them roughly what steps are involved in the process of converting a numeric string, such as '123', to an integer.

There are many different solutions possible for this task, and some may be more efficient or simpler than others, of course. This was just my first quick attempt at it (in Python, because I had no need to do it earlier, though I've done it before in assembly language and in C; in C it would be like the atoi() function in the C standard library). Note that this program does not handle all the cases that Python's int() does. It's just meant to show the basics of the process.

Here is the source code for str_to_int.py:
def str_to_int(s):
    ctr = i = 0
    for c in reversed(s):
        i += (ord(c) - 48) * (10 ** ctr)
        ctr += 1
    return i

print
for s in ('0', '1', '2', '3', '12', '123', '234', '456', '567'):
    i = str_to_int(s)
    print "s = {}, i = {} |".format(s, i),

print
print

for i in range(50):
    s = str(i)
    j = str_to_int(s)
    print "s = {}, j = {} |".format(s, j), 

And here is its output:
$ py str_to_int.py

s = 0, i = 0 | s = 1, i = 1 | s = 2, i = 2 | s = 3, i = 3 | s = 12, i = 12 | s =
 123, i = 123 | s = 234, i = 234 | s = 456, i = 456 | s = 567, i = 567 |

s = 0, j = 0 | s = 1, j = 1 | s = 2, j = 2 | s = 3, j = 3 | s = 4, j = 4 | s = 5
, j = 5 | s = 6, j = 6 | s = 7, j = 7 | s = 8, j = 8 | s = 9, j = 9 | s = 10, j
= 10 | s = 11, j = 11 | s = 12, j = 12 | s = 13, j = 13 | s = 14, j = 14 | s = 1
5, j = 15 | s = 16, j = 16 | s = 17, j = 17 | s = 18, j = 18 | s = 19, j = 19 |
s = 20, j = 20 | s = 21, j = 21 | s = 22, j = 22 | s = 23, j = 23 | s = 24, j =
24 | s = 25, j = 25 | s = 26, j = 26 | s = 27, j = 27 | s = 28, j = 28 | s = 29,
 j = 29 | s = 30, j = 30 | s = 31, j = 31 | s = 32, j = 32 | s = 33, j = 33 | s
= 34, j = 34 | s = 35, j = 35 | s = 36, j = 36 | s = 37, j = 37 | s = 38, j = 38
 | s = 39, j = 39 | s = 40, j = 40 | s = 41, j = 41 | s = 42, j = 42 | s = 43, j
 = 43 | s = 44, j = 44 | s = 45, j = 45 | s = 46, j = 46 | s = 47, j = 47 | s =
48, j = 48 | s = 49, j = 49 |

To get the documentation for int(), you can do this:
>>> print int.__doc__
which gives this as the output:
int(x=0) -> int or long
int(x, base=10) -> int or long

Convert a number or string to an integer, or return 0 if no arguments
are given.  If x is floating point, the conversion truncates towards zero.
If x is outside the integer range, the function returns a long instead.

If x is not a number or if base is given, then x must be a string or
Unicode object representing an integer literal in the given base.  The
literal can be preceded by '+' or '-' and be surrounded by whitespace.
The base defaults to 10.  Valid bases are 0 and 2-36.  Base 0 means to
interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4
Learn more about this overall topic at the Wikipedia article on numeral systems.
Also check out my earlier post about Bhaskaracharya and the man who found zero.

Kthxbye :)

Vasudev Ram - Online Python training and programming

Dancing Bison Enterprises

Signup to hear about new Python products or services that I create.

Posts about Python  Posts about xtopdf

Contact Page

2 comments:

Wil Cooley said...

How about `for i,c in enumerate(reversed(s)):` instead?

Vasudev Ram said...


You may want to check your code.

1. While I know about enumerate(), in this case, IMO, it complicates the code, since I was trying to explain a concept to beginners.

2. Your use of i would overwrite mine unless you use a different variable name, and I'm not sure it is useful, for this program, anyway.

3. Here is the output, with your suggested change - it contains many errors:

$ py str_to_int_3.py

s = 0, i = 0 | s = 1, i = 1 | s = 2, i = 2 | s = 3, i = 3 | s = 12, i = 11 | s =
123, i = 102 | s = 234, i = 202 | s = 456, i = 402 | s = 567, i = 502 |

s = 0, j = 0 | s = 1, j = 1 | s = 2, j = 2 | s = 3, j = 3 | s = 4, j = 4 | s = 5
, j = 5 | s = 6, j = 6 | s = 7, j = 7 | s = 8, j = 8 | s = 9, j = 9 | s = 10, j
= 11 | s = 11, j = 11 | s = 12, j = 11 | s = 13, j = 11 | s = 14, j = 11 | s = 1
5, j = 11 | s = 16, j = 11 | s = 17, j = 11 | s = 18, j = 11 | s = 19, j = 11 |
s = 20, j = 21 | s = 21, j = 21 | s = 22, j = 21 | s = 23, j = 21 | s = 24, j =
21 | s = 25, j = 21 | s = 26, j = 21 | s = 27, j = 21 | s = 28, j = 21 | s = 29,
j = 21 | s = 30, j = 31 | s = 31, j = 31 | s = 32, j = 31 | s = 33, j = 31 | s
= 34, j = 31 | s = 35, j = 31 | s = 36, j = 31 | s = 37, j = 31 | s = 38, j = 31
| s = 39, j = 31 | s = 40, j = 41 | s = 41, j = 41 | s = 42, j = 41 | s = 43, j
= 41 | s = 44, j = 41 | s = 45, j = 41 | s = 46, j = 41 | s = 47, j = 41 | s =
48, j = 41 | s = 49, j = 41 |

So, not a good idea.

- Vasudev