Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added rle_fast C extension to improve speed #2

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

AshishS-1123
Copy link

Changes Made

  • Created rle_fast extension
  • Added tests for testing the extension
  • Udated setup.py to build and install the extension

Why?

While the algorithm for encoding and decoding operations in rle/init.py is very efficient, it does not perform the said operations very fast. The reason for this is that Python is a slow language.
As such, Python provides a C-API for users to write extensions to Python. This way, we have the speed of C with the flexibility of Python.

Here, I have used the C-API to write the same algorithm as used in rle/init.py in C, provided code for building this extension and wrote a few tests. For the few input values that I tried, the speed seems to have improved at least 4x.

How?

The code for this extension is present in the folder rle_fast.
It contains 3 files, namely-

  1. rle_fast_extension.c
  2. rle_utils.h
  3. rle_docs.h

Wrapper for extension

The wrapper code for the extension is present in the file rle_fast/rle_fast_extension.c.
This code contains two methods encode_c and decode_c that will be called for encode and decode operations respectively. They are responsible for taking the arguments, parsing them, performing type checking, raising appropriate exceptions, etc.
In short, they act as an interface between Python and the C code.

Other than these functions, there are some function-level and module-level definitions too, where we define the names of the functions that will be called from the python script, number of arguments to be passed, docstrings for functions and module, and the name of the module.

Encode and Decode Operations

The file rle_fast/rle_utils.h contains the actual algorithm for performing the encode and decode operations.
It contains two functions, encode_sequence and decode_sequence.

The algorithm used in these two functions is the same as that in rle/init.py.

Documentation

The docstrings are present in the file rle_fast/rle_docs.h.
These are merely variables containing strings describing the module and the methods in it.
In rle_fast_extension.c, these docstrings have been used in the module and function definitions. After building and installing the extension, these docstrings can be accessed using the built-in help or doc method, just like with a normal python package.

Installation

The code for installing the rle_fast extension is present in the setup.py file.

To build the extension,
python setup.py build

To install the extension,
python setup.py install

Usage

To import the package,

from rle.rle_fast import encode
from rle.rle_fast import decode

Changes as compared to PR #1

In my previous pull request, I had mentioned that the extension fails for non integer values. I have fixed that bug.

Before, in the encode_sequence function, I had converted the elements from the input sequence to integers, before comparing them. This is why the code failed for non integer parameters.
In this version of my code, I have made use of an API function PyObject_RichCompareBool that compares two Python objects.
Now, the code works for almost all data-types, including integers, floats, complex numbers, characters, etc.

TO-DO

  1. Write better tests for the extension
  2. Find cases where the extension might fail, and fix it.

@AshishS-1123
Copy link
Author

@tnwei
I know you are super busy right now, but could you please review this PR. I am starting to wonder if there is some problem in the code.

As for the problem of distributing the package on multiple platforms, the best solution I could find was to use Github Actions. Now, I am not that familiar with how to go about doing that, so I decided to check out how other packages that use C extensions build wheels. Checkout this link from scikit-learn. It looks promising.

@AshishS-1123
Copy link
Author

@tnwei Its been months since this PR was opened. Yet you haven't given any response.
Please comment below whether you are looking into this, or if you want to reject my contribution. Otherwise, I will have no choice but to call this a dead project and create and publish my own fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant