Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test suite fails #31

Closed
daniel-mohr opened this issue Jul 19, 2021 · 15 comments
Closed

test suite fails #31

daniel-mohr opened this issue Jul 19, 2021 · 15 comments
Assignees
Labels
CI platform question Further information is requested testing Issues related to units tests and their coverage

Comments

@daniel-mohr
Copy link
Contributor

daniel-mohr commented Jul 19, 2021

run-test-suite-jobs (windows-2016) and run-test-suite-jobs (windows-2019) show an error:

ERROR: test_Units_conform (test_Units.UnitsTest)
Tests the `conform` class method on `Units`.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cfunits\cfunits\cfunits\test\test_Units.py", line 151, in test_Units_conform
    x = Units.conform([360], Units("second"), Units("minute"))
  File "d:\a\cfunits\cfunits\cfunits\units.py", line 2243, in conform
    y = x.view(dtype=float)
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

In #29 this test was integrated and it was speculated that the reason is the new numpy version.

I believe now, that the use of x.dtype.char in the inplace conversion is platform dependent. The 21 different built-in types are the 21 fundamental data types of ctypes, whitch are also platform dependent. Array types and conversions between types does not show how many Bytes/Bits are used for a type. Further the following code leads to different results on windows (64-bit?) and linux (Ubuntu 18.04 (64-bit)):

On Windows:

>>> import numpy
>>> x = numpy.array([360], dtype=numpy.int32)
>>> x.dtype.char
'l'
>>> x = numpy.array([360], dtype=numpy.int64)
>>> x.dtype.char
'q'

On Linux:

>>> import numpy
>>> x = numpy.array([360], dtype=numpy.int32)
>>> x.dtype.char
'i'
>>> x = numpy.array([360], dtype=numpy.int64)
>>> x.dtype.char
'l'

Used character codes (x.dtype.char) can be found: integer types

@daniel-mohr
Copy link
Contributor Author

Regarding the code It is only checked for signed integer. Are unsigned integers not possible?

@daniel-mohr
Copy link
Contributor Author

If the input type is not specified, we could not expect a certain type.

@daniel-mohr
Copy link
Contributor Author

Do we really want to return in any cases/platforms float32?

Changing this could break usage of cfunits.

@daniel-mohr
Copy link
Contributor Author

#32 solves this issue partly. For me the open items are:

  • Regarding the code: It is only checked for signed integer. Are unsigned integers not possible?
  • Do we really want to return in any cases/platforms float32? (Changing this could break usage of cfunits.)

@sadielbartholomew sadielbartholomew added CI question Further information is requested testing Issues related to units tests and their coverage platform labels Jul 20, 2021
@sadielbartholomew
Copy link
Member

Thanks for raising this, @daniel-mohr.

I believe now, that the use of x.dtype.char in the inplace conversion is platform dependent
...

Great investigation work and thanks for providing the clear and concise evidence to back your conclusion in your comment. I am satisfied from your comments that you have found the underlying problem.

Regarding the code It is only checked for signed integer. Are unsigned integers not possible?

@davidhassell wrote most, if not all, of cfunits, and may be able to clarify further about this, but he is on leave until early next week, so I am afraid I don't know the answer until David can potentially elaborate.

I would guess that unsigned integers were, unintentionally, not considered as a possibility, and in that case it is good that you have spotted that prospect.

Do we really want to return in any cases/platforms float32? Changing this could break usage of cfunits.

Since the vast majority of our users (unless there has been a seismic shift we are not aware of) use a Linux or Mac OS, our main concern is that cfunits works for those platforms. So, whatever the solution, it is most important that it does not break cfunits for LInux or Mac. In fact, if there isn't any platform-agnostic solution, we would have to prioritise Linux and Mac even if it means we can't run cfunits on Windows, but I am confident there is at least one way to allow cfunits to work on all platforms given the small number of barriers you have unearthed.

I will start reviewing your associated PR shortly, but please note that is the main criteria that I will need to check against.

@daniel-mohr
Copy link
Contributor Author

Regarding float32 I have to clarify myself: Independent of the bit range (32-bit or 64-bit or ...) the referenced code part runs in case x is an integer (x.dtype.char == "i"). The used bit range may depend on hardware platform/architecture (x86 or x86-64 or amd64 or ...) or on implementations (Python, Cpython, PyPy, MicroPython, ...). In general the bit range should not depend on the OS.

As an example: Someone puts a numpy.int64 number in and gets back a numpy.float32 number (no inplace). If he/she/it does the same but uses inplace=True the return type is numpy.float64. And if he/she/it only choose numpy.int as input type the number of input bits depend on the hardware platform/architecture/implementation (32-bit or 64-bit system).

I'm fine with float32 in all cases but I believe that this is not expected by all users. The other way round: If someone has an implementation already using cfunits, it may be expecting float32 and changing this could break this specific implementation. Therefore my hint 'could break usage'.

And yes of course focusing on Linux and/or Mac OS is perfect!

Let's wait for David.

@daniel-mohr
Copy link
Contributor Author

Using the code

import numpy
print('[numpy.dtype](https://numpy.org/devdocs/reference/arrays.dtypes.html):')
print()
print('|            dtype             | dtype.char | dtype.kind | dtype.num |')
print('|------------------------------|------------|------------|-----------|')
for i in range(24):
    a = numpy.array([], dtype=numpy.sctypeDict[i])
    print('| {:<28} |     {:>2}     |    {:>2}      |    {:>2}     |'.format(
        str(numpy.sctypeDict[i]), a.dtype.char, a.dtype.kind, a.dtype.num))

I generated quickly a list, which I could not find somewhere. The list shows all dtypes numpy provides and list them together with
the char, kind and num:

numpy.dtype:

dtype dtype.char dtype.kind dtype.num
<class 'numpy.bool_'> ? b 0
<class 'numpy.int8'> b i 1
<class 'numpy.uint8'> B u 2
<class 'numpy.int16'> h i 3
<class 'numpy.uint16'> H u 4
<class 'numpy.int32'> i i 5
<class 'numpy.uint32'> I u 6
<class 'numpy.int64'> l i 7
<class 'numpy.uint64'> L u 8
<class 'numpy.longlong'> q i 9
<class 'numpy.ulonglong'> Q u 10
<class 'numpy.float32'> f f 11
<class 'numpy.float64'> d f 12
<class 'numpy.float128'> g f 13
<class 'numpy.complex64'> F c 14
<class 'numpy.complex128'> D c 15
<class 'numpy.complex256'> G c 16
<class 'numpy.object_'> O O 17
<class 'numpy.bytes_'> S S 18
<class 'numpy.str_'> U U 19
<class 'numpy.void'> V V 20
<class 'numpy.datetime64'> M M 21
<class 'numpy.timedelta64'> m m 22
<class 'numpy.float16'> e f 23

This shows us, that if x.dtype.kind == "i": is fullfilled for:

dtype dtype.char dtype.kind dtype.num
<class 'numpy.int8'> b i 1
<class 'numpy.int16'> h i 3
<class 'numpy.int32'> i i 5
<class 'numpy.int64'> l i 7
<class 'numpy.longlong'> q i 9

Also the new code in PR #32 only handles the cases of numpy.int32, numpy.int64 and numpy.int16. Therefore numpy.int8 and numpy.longlong are still missing and should be mapped/converted to numpy.float32 and numpy.float128. Or is numpy.float64 sufficient for numpy.longlong?

@daniel-mohr
Copy link
Contributor Author

And on a Windows machine using conda (similar to the runner here on github) we get a similar result with small differences (only list the differences:

dtype (linux) dtype (windows) dtype.char dtype.kind dtype.num
<class 'numpy.bool_'> ? b 0
<class 'numpy.int8'> b i 1
<class 'numpy.uint8'> B u 2
<class 'numpy.int16'> h i 3
<class 'numpy.uint16'> H u 4
<class 'numpy.int32'> <class 'numpy.intc'> i i 5
<class 'numpy.uint32'> <class 'numpy.uintc'> I u 6
<class 'numpy.int64'> <class 'numpy.int32'> l i 7
<class 'numpy.uint64'> <class 'numpy.uint32'> L u 8
<class 'numpy.longlong'> <class 'numpy.int64'> q i 9
<class 'numpy.ulonglong'> <class 'numpy.uint64'> Q u 10
<class 'numpy.float32'> f f 11
<class 'numpy.float64'> d f 12
<class 'numpy.float128'> <class 'numpy.longdouble'> g f 13
<class 'numpy.complex64'> F c 14
<class 'numpy.complex128'> D c 15
<class 'numpy.complex256'> <class 'numpy.clongdouble'> G c 16
<class 'numpy.object_'> O O 17
<class 'numpy.bytes_'> S S 18
<class 'numpy.str_'> U U 19
<class 'numpy.void'> V V 20
<class 'numpy.datetime64'> M M 21
<class 'numpy.timedelta64'> m m 22
<class 'numpy.float16'> e f 23

@daniel-mohr
Copy link
Contributor Author

Sorry for so many information. I find them step by step. Looking at list_numpy_dtypes you can see more differences in the github actions. I will here only provide the relevant part for integers and floats:

dtype (ubuntu 18.04 + 20.04) dtype (Mac OS X 10.15.7) dtype (Windows Server 2016 + 2019) dtype.char dtype.kind dtype.num dtype.str
<class 'numpy.int8'> b i 1
<class 'numpy.int16'> h i 3 <i2
<class 'numpy.int32'> <class 'numpy.int32'> <class 'numpy.intc'> i i 5 <i4
<class 'numpy.int64'> <class 'numpy.int64'> <class 'numpy.int32'> l i 7 <i8
<class 'numpy.int64'> <class 'numpy.longlong'> <class 'numpy.int64'> q i 9 <i8
<class 'numpy.float32'> f f 11 <f4
<class 'numpy.float64'> d f 12 <f8
<class 'numpy.float128'> <class 'numpy.float128'> <class 'numpy.longdouble'> g f 13 <f16
<class 'numpy.float16'> e f 23 <f2

Creating an numpy array with numpy.intc on such a Windows produces an numpy.int32. Therefore I do not really know, why there is a numpy.int32. Looking at the last column, I expect a similar result for Mac OS and numpy.longlong. But anyway it seems we could trust dtype.str.

I will try to provide a better and more general solution in PR #32: Just replacing 'i' in a dtype.str belonging to an integer by 'f' should be OK.

daniel-mohr added a commit to daniel-mohr/cfunits that referenced this issue Jul 22, 2021
@daniel-mohr
Copy link
Contributor Author

list_numpy_dtypes does now combine the results automatic.

The combination can be found at: https://daniel-mohr.github.io/list_numpy_dtypes/

@daniel-mohr
Copy link
Contributor Author

I now integrated 32 bit platforms in list_numpy_dtypes . This shows/proofs now, that it is not only a problem with windows. Also on Ubuntu 18.04 i386 and Debian i386 the mismatch occurred.

@daniel-mohr
Copy link
Contributor Author

daniel-mohr commented Aug 10, 2021

The pull request #32 solves this issue partly. For me the open items are still:

  • Regarding the code: It is only checked for signed integer. Are unsigned integers not possible? Similar to Make platform independent #32 I could add the code for unsigned integers. What do you think?
  • Do we really want to return in any cases/platforms float32? (Changing this could break usage of cfunits.) I believe that every 4 byte integer could be stored in a 8 byte float (double) without losing precision. Therefore it could be more accurate to use numpy.float64 in all cases. Further for an integer with more than 4 bytes a float32 is not sufficient. What do you think?
  • @sadielbartholomew I promised once to combine the github actions. But run-test-suite.yml uses a 2 dimensional matrix for different os and different python versions. run-test-suite_ubuntu_default_packages.yml and run-test-suite_windows.yml use only a 1 dimensional matrix (vector) for different os. I do not think that combing this would give a benefit. It could be interesting to put all these actions in one file. What do you think?
  • In list_numpy_dtypes I used my github action i386_debian_latest to run on a 32 bit system (at least using 32 bit libraries, the physical CPU is probable stil 64 bit). Do you think cfunits could profit by this action?

@sadielbartholomew
Copy link
Member

Hi @daniel-mohr, thanks for the excellent summary of open aspects of this issue. There's quite a lot of important decisions to make based on your comments, it seems.

Can I suggest, if you are happy to, in order to keep the discussions separate and to reflect that your PR #32 made the test suite pass in all cases, to move all items except the third to new, separate issues, then close this one? By that I mean to just lift and shift your comments above separately to the new issues, nothing that would take more than a minute.

It's helpful that you've outlined your proposed approach for each item - I'll arrange a chat with @davidhassell to briefly work out what we think should be done in each case and get back to you about it. If you did want to move to separate issues, we can comment separately on each.

As for the comment against the third bullet:

I do not think that combing this would give a benefit.

So, as I understand it, you don't think it would be worthwhile to check the test suite with Linux OS default packages and Windows against different Python versions? I would have thought it could be somewhat useful just to be rigorous in case of some strange facet of a specific Python version on a given OS that might cause a niche issue now and again. Also, the cfunits test suite is really quick relative to the Actions limits minutes we have to use (just on the free GitHub plan), so there is no real harm in running several more quick jobs, as it would require to combine the workflows so that we run both on the 2D matrix of both OS and Python version.

That said, overall if you think it is too much effort for too little benefit to combine them, that's fine and we can just leave them separate. It's no hugely important. (Alternatively we could perhaps, of the three, combine two and leave one separate?) I'll let you decide!

@daniel-mohr
Copy link
Contributor Author

OK, I created 3 new issues and will think about combining things -- maybe I will provide a pull request or I will integrate it in a pull request regarding one of the new issues #34, #35 or #36.

@daniel-mohr
Copy link
Contributor Author

@sadielbartholomew And thanks for all the nice words! ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI platform question Further information is requested testing Issues related to units tests and their coverage
Projects
None yet
Development

No branches or pull requests

2 participants