For those times when '20050415115959' just takes up too much space. Useful for making your numbers shorter (like timestamps in URLs).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | NOTATION10 = '0123456789'
NOTATION70 = "!'()*-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~"
class BaseConvert:
def __init__(self):
self.notation = NOTATION10
def _convert(self, n=1, b=10):
'''
Private function for doing conversions; returns a list
'''
if True not in [ isinstance(n, x) for x in [long, int, float] ]:
raise TypeError, 'parameters must be numbers'
converted = []
quotient, remainder = divmod(n, b)
converted.append(remainder)
if quotient != 0:
converted.extend(self._convert(quotient, b))
return converted
def convert(self, n, b):
'''
General conversion function
'''
nums = self._convert(n, b)
nums.reverse()
return self.getNotation(nums)
def getNotation(self, list_of_remainders):
'''
Get the notational representation of the converted number
'''
return ''.join([ self.notation[x] for x in list_of_remainders ])
class Base70(BaseConvert):
'''
>>> base = Base70()
>>> base.convert(10)
'4'
>>> base.convert(510)
'1E'
>>> base.convert(1000)
'8E'
>>> base.convert(10000000)
'N4o4'
'''
def __init__(self):
self.notation = NOTATION70
def convert(self, n):
"Convert base 10 to base 70"
return BaseConvert.convert(self, n, 70)
def _test():
import doctest, base
return doctest.testmod(base)
if __name__ == '__main__':
_test()
|
I have worked on several projects where number strings needed to be "compressed" without losing data. Two of them were separate Zope projects where the auto-generated unique IDs (datetime stamps with random numbers tacked on) used up too much URL space in the browser address bar.
As a result, we modified the unique ID generators to incorporate classes based on the above. The Base70 class was actually written for a Nevow project; some of the characters in NOTATION70 are not Zope-friendly; we thus came up with a base 65 notation that was Zope-friendly. In addition, we also had one that didn't go in URLS and was base 90. Very short number strings ;-)
Update: Fixed typo as pointed out by Anand Pillai in the comments.
Download
Copy to clipboard
You really want vowels in there? The problem with including vowels is that then your output can include natural-language words, and certain words can be offensive to certain people.
Murphy's Law: when "fuk-u" shows up in one of your URLs, the wrong person is going to notice it.
why not a two way process? The thing I notice is that you're only converting in one direction. Something like the code would be more useful if you ever want to use the base 70 number as a number. (for bases less than 37, int(n, base) is the simpler way to back-convert).
Spelling error. I think you mean "raise TypeError, 'parameters must be numbers' " not "raise TypeError, 'parameters bust be numbers'"
-Anand
struct and base64. You can accomplish something similar with the standard struct module and base64. Not quite as compact, and the numbers are padded, but it's fairly straight-forward. I have it at: http://svn.colorstudy.com/home/ianb/recipes/base64unpack.py ; but here's the raw code:
packing and base64. I really like Ian's suggestion above. Very simple and elegant solution. We emailed briefly about this after I had done some testing. First, I made a minor change to his code above to allow for doubles (and thus much longer number sequences):
But after I made that change, I ran into some other issues. Given, these will most likely be edge cases, but interesting to note and be aware of nonetheless:
Precision is good here:
But increasing the power by one at this point results in a loss of precision:
Here are a few examples of how this changes with numbers of increasing size (the numbers are test timestamps + "random" numbers):
I asked Ian about this, and he briefly touched on C and struct internals which I won't get into ;-) Something to keep in mind, though.
Only part-way two-way... I really like your convert logic. Much cleaner. I'll update the recipe with it at some point. However, your two-way doesn't do a full two-way:
Using the order of your INT_TO_DIGIT list, 100 base70 would be the string '1U'; you conversion doesn't let you go from '1U' and reobtain 100. When I get some time, I'll look into that, just for kicks (I've never needed that functionality).
Some notes about INT_TO_DIGIT: 1) strings are already indexed iterables, so you don't need a list comprehension for it; and 2) it might be a good idea to list the strings in the notation in python sorting order, that way if someone sorted out a bunch of base70 strings, they would actually list in numerical order. Here they are in python sort order:
Recursion is limiting. I'm hitting recursion limits. I suggest replacing:
with:
Any reason not to use the more general function linked to below? http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/111286
Tested various numbers with both and they give the same result. Anyone find a problem with the other one?