Generate short UUID
We all love UUIDs.
They are great as:
- unique identifiers (wikipedia says "the number of random version-4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion")
- can be generated in a distributed fashion, like on the client
BUT, they are pretty LONG, for example if you want to include your ID in a URL, it's just TOO long.
http://devtoolsdaily.com/examples/graphviz_examples/6572a1e2-6b2a-4588-a97a-0d685bb01d5f
- UUIDv4 is most common one and it's random.
- UUID is 128bit random number.
- UUID is represented usually as
123e4567-e89b-12d3-a456-426614174000
(32 hexadedimal numbers + 4 hyphens).
so UUID we usually generate looks like 36 characters, and there is plenty of room to compact it if we want to use it in the url.
We'll use python for playing with UUID.
>>> from uuid import uuid4
>>> id = uuid4()
>>> id
UUID('6572a1e2-6b2a-4588-a97a-0d685bb01d5f')
string representation is 36 characters.
>>> len(str(id))
36
if the same number is represented as a number is even longer.
>>> id.int
134847232822826388955179878208578264415
>>> len(str(id.int))
39
Why is it it longer?
because traditional representation is using HEX (base 16), the number representation above is using base 10, so it take more digits to represent same number.
So to decrease the length of this number, we need to increase number's base, lets try to represent this number using different bases.
python doesn't have native base converter, so here is random code from StackOverflow.
def numberToBase(n, b):
if n == 0:
return [0]
digits = []
while n:
digits.append(int(n % b))
n //= b
return digits[::-1]
lets see how many characters we need with different base:
>>> len(numberToBase(id.int, 16))
32
>>> len(numberToBase(id.int, 32))
26
>>> len(numberToBase(id.int, 52))
23
>>> len(numberToBase(id.int, 62))
22
>>> len(numberToBase(id.int, 64))
22
>>> len(numberToBase(id.int, 66))
21
>>> len(numberToBase(id.int, 128))
19
>>> len(numberToBase(id.int, 256))
16
but if we want to use generated string in the URL, we can only use a subset of characters.
Here is a list of allowed characters from stackoverflow or https://perishablepress.com/stop-using-unsafe-characters-in-urls/
these are a-z
, A-Z
, 0-9
and _
, -
, ~
, .
(66 total).
~
and .
are less common in IDs, so you can exclude them if you prefer.
with 66 characters, let's define our alphabet and translate our number to this alphabet:
>>> urlsafe_66_alphabet = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_-.~'
>>> ''.join(urlsafe_66_alphabet[x] for x in numberToBase(id.int, 66))
'ssLIVEdkjwNEjRxFatJ7j'
now the url can be a little smaller:
http://devtoolsdaily.com/examples/graphviz_examples/ssLIVEdkjwNEjRxFatJ7j
This article was originally published in DevToolsDaily blog