Discussion:
Control of hash randomization
Aaron Meurer
2012-05-27 04:58:13 UTC
Permalink
Hi.

First, I want to apologize if this is the wrong list for this
question. If it is, kindly direct me to the correct one and I'll be on
my way.

I am attempting to "port" SymPy to Python 3.3. We already have full
support for Python 3.2, but the new hash randomization in Python 3.3
has brought up a bunch of new failures in our test suite. This is not
a surprise, as a lot of operations in SymPy are dependent on hash
values. But expecting the problem does not entirely help use to solve
it.

My question is this: is there a way to get the random seed used for
hash randomization, and to print it out, and then to later input that
seed to the interpreter? Without this ability, it becomes extremely
difficult to reproduce test failures. The best bet is to restart the
interpreter multiple times until it occurs. We already have
experienced in SymPy several odd bugs that were caused by very rare
seeds in the random module. This seed is easy to set, though, and we
print it with each test run, so the handful of issues that have
cropped up we've been able to fix. I'd like to have that same ability
with hash randomization.

I figured that this is a new feature, and people on this list are
supposed to be experts on new Python features, so my hope is that
either someone here will know the answer, or can point me in the right
direction to one.

If it is the case, as I fear/expect, that this is not possible, should
I open an issue in the Python bug tracker? Should I raise the issue
on python-dev? Has it been discussed before? I realize that there
may be a security risk (imho a low one, though) in such a feature, so
it might not be immediately accepted.

Aaron Meurer
Aaron Meurer
2012-05-27 05:26:53 UTC
Permalink
I found half of my answer. You can set it with the PYTHONHASHSEED
environment variable:

$PYTHONHASHSEED=42 python3.3
hashPython 3.3.0a3 (v3.3.0a3:0b53b70a40a0, May 1 2012, 11:39:35)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Post by Aaron Meurer
hash('a')
-5486454195961207828

$PYTHONHASHSEED=42 python3.3
Python 3.3.0a3 (v3.3.0a3:0b53b70a40a0, May 1 2012, 11:39:35)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Post by Aaron Meurer
hash('a')
-5486454195961207828

I'm still couldn't find how to actually get that seed, though.

Aaron Meurer
Post by Aaron Meurer
Hi.
First, I want to apologize if this is the wrong list for this
question. If it is, kindly direct me to the correct one and I'll be on
my way.
I am attempting to "port" SymPy to Python 3.3.  We already have full
support for Python 3.2, but the new hash randomization in Python 3.3
has brought up a bunch of new failures in our test suite.  This is not
a surprise, as a lot of operations in SymPy are dependent on hash
values.  But expecting the problem does not entirely help use to solve
it.
My question is this: is there a way to get the random seed used for
hash randomization, and to print it out, and then to later input that
seed to the interpreter?  Without this ability, it becomes extremely
difficult to reproduce test failures.  The best bet is to restart the
interpreter multiple times until it occurs.  We already have
experienced in SymPy several odd bugs that were caused by very rare
seeds in the random module.  This seed is easy to set, though, and we
print it with each test run, so the handful of issues that have
cropped up we've been able to fix.  I'd like to have that same ability
with hash randomization.
I figured that this is a new feature, and people on this list are
supposed to be experts on new Python features, so my hope is that
either someone here will know the answer, or can point me in the right
direction to one.
If it is the case, as I fear/expect, that this is not possible, should
I open an issue in the Python bug tracker?  Should I raise the issue
on python-dev?  Has it been discussed before?  I realize that there
may be a security risk (imho a low one, though) in such a feature, so
it might not be immediately accepted.
Aaron Meurer
m***@v.loewis.de
2012-05-27 06:22:27 UTC
Permalink
Post by Aaron Meurer
I'm still couldn't find how to actually get that seed, though.
In C, you can look at _Py_HashSecret. In Python, you need to write
an extension module, or use ctypes on the Python interpreter itself.

However, this is not the seed: when an RNG is used, there is no seed,
instead, the OS directly provides the hash secret. So your extension
would also have to support setting the secret, which is tricky because
the secret is already used by the time the extension gets loaded.

So you would have to change the interpreter to support such a feature.

If the crashes/test failures are frequent enough, I rather recommend
testing with PYTHONHASHSEED set to random integers.

Also: if a test fails due to hash randomization, it should normally
be possible to find the root cause by just reviewing the code (long
enough). It may not be possible to reproduce the failure, but it
should be obvious if a certain piece of code would fail under hash
randomization.

Regards,
Martin
Aaron Meurer
2012-05-27 08:07:55 UTC
Permalink
Post by m***@v.loewis.de
Post by Aaron Meurer
I'm still couldn't find how to actually get that seed, though.
In C, you can look at _Py_HashSecret. In Python, you need to write
an extension module, or use ctypes on the Python interpreter itself.
However, this is not the seed: when an RNG is used, there is no seed,
instead, the OS directly provides the hash secret. So your extension
would also have to support setting the secret, which is tricky because
the secret is already used by the time the extension gets loaded.
So you would have to change the interpreter to support such a feature.
If the crashes/test failures are frequent enough, I rather recommend
testing with PYTHONHASHSEED set to random integers.
I see. This would require spawning a new Python process, so it's not
ideal, but I guess it's the only solution. I'll give ctypes a try
too. I don't particularly feel like dabbling in C extensions just to
make our tests a little more helpful (maybe someone else more
courageous will give it a go).
Post by m***@v.loewis.de
Also: if a test fails due to hash randomization, it should normally
be possible to find the root cause by just reviewing the code (long
enough). It may not be possible to reproduce the failure, but it
should be obvious if a certain piece of code would fail under hash
randomization.
Regards,
Martin
Ha! Well, that's easy enough to say, but if all you have to work with
is an assertion that failed, and a very large code base, it might not
be so straight forward. Furthermore, such situations are very often
not obvious (or else the author probably would not have written them
in the first place).

I do grant that this is possible in principle, but pragmatically
speaking, if it's possible to consistently reproduce a bug, it's 100
times easier to fix it.

It doesn't help that quite a few Python programmers don't understand
just what is and is not guaranteed by hash dependent objects. For
example, I've seen this mistake made several times:

a = set(whatever) # or dict
b = list(a)
c = list(a)
assert b == c

The assertion does NOT have to hold, and I've seen situations where it doesn't.

That issue is pretty subtle. The more common case is iterating
through a set or dict (or a tuple that was sorted by hash, which is
the most common case for SymPy), and there is some subtle fact about
the loop that makes the result differ depending on the result of
iteration. Quite often, the result is still "correct" (in SymPy, this
generally means the answer is still mathematically correct), just not
the same as what the test expected.

Aaron Meurer
m***@v.loewis.de
2012-05-27 08:24:19 UTC
Permalink
Post by Aaron Meurer
That issue is pretty subtle. The more common case is iterating
through a set or dict (or a tuple that was sorted by hash, which is
the most common case for SymPy), and there is some subtle fact about
the loop that makes the result differ depending on the result of
iteration. Quite often, the result is still "correct" (in SymPy, this
generally means the answer is still mathematically correct), just not
the same as what the test expected.
But there are standard procedures to deal with that very phenomenon:
use a proper equality function.

People have written tests for years that somehow relied on the order
of keys in a dictionary (an issue in particular for doctest). If you
find a failed assertion, and it involves an equality test, verify that
the comparison uses "normalized" representations of the value. If not,
add the normalization to all related test cases.

Regards,
Martin
Chris Jerdonek
2012-05-27 14:11:22 UTC
Permalink
Post by m***@v.loewis.de
Also: if a test fails due to hash randomization, it should normally
be possible to find the root cause by just reviewing the code (long
enough). It may not be possible to reproduce the failure, but it
should be obvious if a certain piece of code would fail under hash
randomization.
Ha!  Well, that's easy enough to say, but if all you have to work with
is an assertion that failed, and a very large code base, it might not
be so straight forward.  Furthermore, such situations are very often
not obvious (or else the author probably would not have written them
in the first place).
Sorry if this is obvious, but another suggestion is to include more
information in the assertion error message. It's not clear from the
discussion so far if this is being done. That way you can learn more
about the state of the code during the assertion failure.

I also spot checked a few spots in SymPy and see this being done frequently--

def test_ratint():
assert ratint(S(0), x) == 0

Options include using the extended form of the assert statement, or
the msg parameter of unittest's various assertion methods.

--Chris
Aaron Meurer
2012-05-27 22:49:31 UTC
Permalink
Post by m***@v.loewis.de
use a proper equality function.
People have written tests for years that somehow relied on the order
of keys in a dictionary (an issue in particular for doctest). If you
find a failed assertion, and it involves an equality test, verify that
the comparison uses "normalized" representations of the value. If not,
add the normalization to all related test cases.
Regards,
Martin
I appreciate what you're trying to say here, but you need to
understand that in SymPy, the result of a test is a mathematical
expression (in the most literal sense). A "normalization" function in
this case would be a simplification function. Expression
simplification is very difficult. Expression normalization is even
more difficult. Technically speaking, for a large enough class of
functions, it's impossible. I can point you to papers that
demonstrate why expression simplification/normalization is so
difficult if you are interested.

In SymPy, we have many tests like

assert solve(3*x+5+2**(-5*x+3), x) in [
[-((25*log(2) - 3*LambertW(-10240*2**(Rational(1,
3))*log(2)/3))/(15*log(2)))],
[Rational(-5, 3) + LambertW(log(2**(-10240*2**(Rational(1,
3))/3)))/(5*log(2))],
[-Rational(5,3) +
LambertW(-10240*2**Rational(1,3)*log(2)/3)/(5*log(2))],
[(-25*log(2) + 3*LambertW(-10240*2**(Rational(1,
3))*log(2)/3))/(15*log(2))],
[-((25*log(2) - 3*LambertW(-10240*2**(Rational(1,
3))*log(2)/3)))/(15*log(2))],
[-(25*log(2) - 3*LambertW(log(2**(-10240*2**Rational(1,
3)/3))))/(15*log(2))],
[(25*log(2) - 3*LambertW(log(2**(-10240*2**Rational(1,
3)/3))))/(-15*log(2))]
]

This was note the test writer trying to come up with every possible
form of the expression. At some point in time, solve() returned each
one of those answers, but some subtle change in the code somewhere
resulted in it returning a different one. Actually, in this
particular case, we probably could normalize by expanding, but that's
a simple case. Here's a more advanced one (heurisch performs symbolic
integration):

assert heurisch(sin(x)*cos(x), x) in [sin(x)**2 / 2, -cos(x)**2 / 2]

This is a classic example from calculus of a function the produces a
different result depending on how you integrate it. These two
functions are not mathematically equal. That's because heurisch() is
only guaranteed to produce an antiderivative, which can differ by a
constant.

Even if simplify() were reliable, writing "assert simplify(a - b) ==
0" instead of "assert a == b" would make our tests much slower, and
would add the possibility that any test failure anywhere is the result
of a bug in simplify().

What we really need to do is change our internal ordering to not use
hashes, and to avoid iterating through a dictionary when the result
could be different (but still mathematically the same). At least hash
randomization will point us to the places where this is happening.

On Sun, May 27, 2012 at 8:11 AM, Chris Jerdonek
Post by m***@v.loewis.de
Post by m***@v.loewis.de
Also: if a test fails due to hash randomization, it should normally
be possible to find the root cause by just reviewing the code (long
enough). It may not be possible to reproduce the failure, but it
should be obvious if a certain piece of code would fail under hash
randomization.
Ha!  Well, that's easy enough to say, but if all you have to work with
is an assertion that failed, and a very large code base, it might not
be so straight forward.  Furthermore, such situations are very often
not obvious (or else the author probably would not have written them
in the first place).
Sorry if this is obvious, but another suggestion is to include more
information in the assertion error message.  It's not clear from the
discussion so far if this is being done.  That way you can learn more
about the state of the code during the assertion failure.
I also spot checked a few spots in SymPy and see this being done frequently--
       assert ratint(S(0), x) == 0
Options include using the extended form of the assert statement, or
the msg parameter of unittest's various assertion methods.
--Chris
Yes, this is basically how we do our tests, "assert
result_of_some_computation == the_correct_answer". The solution in
this case is for us to be using py.test instead of our home-grown test
runner, which provides this output for us.

Aaron Meurer

Antoine Pitrou
2012-05-27 21:03:07 UTC
Permalink
Post by Aaron Meurer
It doesn't help that quite a few Python programmers don't understand
just what is and is not guaranteed by hash dependent objects. For
a = set(whatever) # or dict
b = list(a)
c = list(a)
assert b == c
The assertion does NOT have to hold, and I've seen situations where it doesn't.
Actually, it should hold, since the iteration order of a given set or dict
doesn't change (similarly, dict.keys(), dict.values() and dict.items() should
all iterate in the same order). However, constructing the set in different ways
may lead to different iteration orders.

Regards

Antoine.
Aaron Meurer
2012-05-27 21:40:27 UTC
Permalink
Post by Antoine Pitrou
Post by Aaron Meurer
It doesn't help that quite a few Python programmers don't understand
just what is and is not guaranteed by hash dependent objects.  For
a = set(whatever) # or dict
b = list(a)
c = list(a)
assert b == c
The assertion does NOT have to hold, and I've seen situations where it doesn't.
Actually, it should hold, since the iteration order of a given set or dict
doesn't change (similarly, dict.keys(), dict.values() and dict.items() should
all iterate in the same order). However, constructing the set in different ways
may lead to different iteration orders.
Regards
Antoine.
Yes, that was a simplification of the problem. I think when I saw it
some very subtle thing was happening to the set. I can look it up if
anyone's interested.

Aaron Meurer
m***@v.loewis.de
2012-05-27 22:19:54 UTC
Permalink
Post by Aaron Meurer
Post by Antoine Pitrou
Post by Aaron Meurer
It doesn't help that quite a few Python programmers don't understand
just what is and is not guaranteed by hash dependent objects.  For
a = set(whatever) # or dict
b = list(a)
c = list(a)
assert b == c
The assertion does NOT have to hold, and I've seen situations where it doesn't.
Actually, it should hold, since the iteration order of a given set or dict
doesn't change (similarly, dict.keys(), dict.values() and
dict.items() should
all iterate in the same order). However, constructing the set in different ways
may lead to different iteration orders.
Regards
Antoine.
Yes, that was a simplification of the problem. I think when I saw it
some very subtle thing was happening to the set. I can look it up if
anyone's interested.
I was puzzled by your original claim as well. If you "admit" that the
set might have been touched somehow, I can readily believe that it gives
elements in a different order.

Regards,
Martin
Loading...