add algebraic logging #2965

patins1 · 2024-01-26T06:15:46Z

Description

Supported by a dedicated training listener, algebraic operations executed during training can be recorded and stored as Python program.

If this change is a backward incompatible change, why must this change be made?

In order not to record a concrete batch size that is used during training, -1 is now used at some places in the existing Java code as value for the batch dimension. This is backwards compatible as underlying engines would infer the right value from the size of the array.

Interesting edge cases to note here

In case different epochs or even different batches within an epoch use different prediction / loss functions, multiple prediction / loss functions are generated (a Python comment will indicate how often they are "used"). The MNIST and ResNet examples only generated one prediction / loss function which are unit-tested and also tested by me in a tensorflow program to yield the same results as the original DJL model. It will be interesting to test other models in the future.

The algebraic logging works only with mxnet as of now the PyTorch engine doesn't build up a data structure describing the executed operation and its arguments.

--------- Co-authored-by: Administrator <Administrator@tech8> Co-authored-by: KexinFeng <[email protected]>

* Implement PtNDArraryEx.multiboxDetection * MultiboxDetection - code cleanup * MultiboxDetection - code cleanup * MultiboxDetection - code cleanup * MultiboxDetection - code cleanup * format code * Fix, add tests, and pass CI --------- Co-authored-by: Zach Kimberg <[email protected]>

…brary#2796) This reverts commit 3a90d0a.

This fixes the markdown headers to be h1 so they render correctly in docs.

…y#2818)

…valibrary#2806) * [api] Added Early stopping configuration (deepjavalibrary#38) * [api] Added Builder for Early stopping configuration (deepjavalibrary#38) * Explicitly set NDManager for dataset in EarlyStoppingListenerTest to make the test run on JDK11 in gradle.

This creates an abstraction for combining devices into a single device. The main use case for now is in DJL Serving TP_parallel. It will allow us to create a WorkerGroup and a PyPredictor for a set of devices and then track the usage of devices properly. It could also be used later for multi-gpu training or other multi-device cases.

…alibrary#2826)

* Updates doc versions to 0.24.0 Also moves android gradle.properties to the new 0.25.0. * Remove android change

* Updates XGBoost to 2.0.1 * Use devtools 8 * Updates based on new Xgboost JNI API. --------- Co-authored-by: Frank Liu <[email protected]>

Fixes deepjavalibrary#2840

* Added element-wise gauss error function (ERF) * Added element-wise arctan2 * Format java * Fixed docs * added * to other_ptr in Atan2

* Added 2D FFT * Format java * Add default fft2 * Convert array to vectors * Add inverse fft2 * Add better assersion in ifft2 test * Add really better assersion in ifft2 test * Move cast bellow ifft2 for unsupported exception * Format java * changed dims to axes * changed dims to axes

* only build triton binaries * install requests library * remove script

…javalibrary#2850)

Updates the navigation as a followup to deepjavalibrary/djl-serving#1316.

* Suppress serial warning for JDK21 In JDK21, it now throws the serial warning for including potentially unserializable instance variables. This includes the standard Java data structures like List, Set, and Map. This changes the JDK21 support from deepjavalibrary#2903 to suppress the warning rather than no longer serializing the variables. * Keep CategoryMask as transient

…alibrary#2951) Fixes: deepjavalibrary#2949

…2955)

…#2960)

…#2963)

codecov-commenter · 2024-02-02T15:17:39Z

Codecov Report

Attention: Patch coverage is 75.13514% with 92 lines in your changes are missing coverage. Please review.

Project coverage is 72.33%. Comparing base (bb5073f) to head (5d65575).
Report is 1002 commits behind head on master.

Files	Patch %	Lines
...i/src/main/java/ai/djl/training/listener/Node.java	59.29%	75 Missing and 6 partials ⚠️
...va/ai/djl/training/listener/AlgebraicListener.java	92.71%	3 Missing and 8 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2965      +/-   ##
============================================
+ Coverage     72.08%   72.33%   +0.24%     
- Complexity     5126     7381    +2255     
============================================
  Files           473      724     +251     
  Lines         21970    32886   +10916     
  Branches       2351     3438    +1087     
============================================
+ Hits          15838    23789    +7951     
- Misses         4925     7460    +2535     
- Partials       1207     1637     +430

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zachgk · 2024-02-27T18:51:55Z

Hi @patins1. I think this PR is going in a useful direction, but we may need to make some changes. Let me start by putting it into the context I am approaching it from.

A number of the other imperative deep learning frameworks (PyTorch, MXNet, etc.) eventually reached a stage where they want to convert models from being imperative (embedded into python code) into symbolic (a standalone data structure). From the symbolic format, you can do lots of useful things such as easier importing/exporting or full compiler style optimizations.

There are then two major ways this is done: tracing or scripting. In tracing, you run a forward pass on your model and observe which operations are run. From the trace you can then reconstruct it as your symbolic model. The other approach is to do static analysis and look at the python/Java code itself to convert it into the equivalent data structure format. For example, see torchscript.

In that sense, this algebraic logging PR seems to be a tracing method that exports into a python keras model. I have a few large concerns. First is that it only works on MXNet. The main MXNet project is abandoned so we want to focus development on the maintained engines. Or ideally, it should be engine agnostic rather than targeted to a particular engine. The other is that we want to design an implementation that could expand to other output formats (python with pytorch, torchscript, maybe a DJL custom format, etc).

So using the global record is probably not going to work. Not all engines support a generic invoke. My thought is that we could have a TracingNDArray that wraps around an NDArray and will execute the wrapped NDArray operations while also recording the operations executed.

Then, we probably want to do a two-step recording. The first step would record the operation name and args into some standard DJL format. In the second step, that format would be converted into the desired target (python keras). So calling the core pieces would look something like:

TracedNDArray result = myOperation(new TracedNDArray(input1), new TracedNDArray(input2));
Symbolic symbolicMyOperation = result.getTrace();
PyKerasExporter.export(symbolicMyOperation, path);

From a solution like this, it would work with all engines because it just uses the NDArray class itself. We can add some helpers onto Trainer to simplify tracing such as Symbolic symbolicMyOperation = trainer.trace(). We could add other exporter classes that are based on Symbolic. And finally, we could even try to build a scripting based strategy (perhaps leveraging known Blocks) later on. As long as that scripting targets the same Symbolic class, it could share the same pool of exporters.

Does this make sense? Also, feel free to share any concerns or alternative suggestions to my proposal

patins1 · 2024-03-02T04:31:26Z

HI @zachgk , thanks for your thoughts.

You sketched a class Symbolic, which I assume would capture the DJL custom format you mentioned.
MXNet has a similar concept of symbolic model which you can activate using the symbolic-model option of ai.djl.examples.training.util.Arguments:

Use symbolic model, use imperative model if false

So I would assume that the symbolic model that can be loaded for MXNet, also control flow statements are contained, and this would make the difference to your Symbolic class which is the equivalence to my Node class I guess.
So given that we would only support imperative models / graphs, I dont see it is worth the effort to go into the direction you propose, it is just too little value as imperative models are not representative of the whole model (capturing only single traces by nature). Anyways, it is interesting to just log the graphs like I did for transparency/QA reasons.

On the otherhand, DJL already supports a symbolic format namely on block level, and it would be an interesting extension to DJL to write converters from it to the block-level equivalent of tensorflow which are keras layers or to the equivalent of PyTorch which is provided by the torch.nn package. I might look into the former transformation at some time in the future..

I had no idea MXNet is abandoned, personally I use PyTorch engine when working with DJL but only for this logging thing I had to use MXNet. And that's the beauty of DJL, that I can switch easily to MXNet without changing my code, awesome!

From my discoveries implementing this feature, I realized that PyTorch and MXNet are quite alike while tensorflow showed major differences:

for the convolutional operation, the channels have a different dimensional order which makes an additional transpose operation necessary. I moved this transpose operation from the forward computational graf to the weight initialization section in my last commit (doesn't really make a difference, but now the corresponding model weight parameter have a different shape)
Tensorflow needs additional tf.nn.bias_add calls for various operations while Pytorch / mxnet have this already incorporated e.g. the PyTorch bias parameter of torch.nn.Conv2d
My next big challenge is to log RNN operations. I thought I could use tf.compat.v1.nn.dynamic_rnn but this is deprecated and tensorflow wants you to use tf.keras.layers.RNN which is a layer concept that I by design didnt want to use. To be continued..

patins1 · 2024-03-02T04:39:16Z

As to give an example how the block-level model built by TrainMnist.java would be converted to tensorflow:

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.Dense(10)
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

zachgk · 2024-03-04T20:59:44Z

Yeah. I borrowed the name of Symbolic from MXNet (which I used to work on). But they all have it: see the blog post What are Symbolic and Imperative APIs in TensorFlow 2.0? .

So control flow is a tricky part of the story. Symbolic formats can be viewed almost like programming languages and can contain control flow. But this is where the tracing/scripting methodologies differ the most. With tracing, it can't detect the control flow. Instead, it ends up interpreting the paths taken by the control flow as if they are hard-coded. This can work fine if the paths are fixed such as a for loop through all of the layers in the model. For paths that vary such as based on the input arguments, tracing simply won't work for those model designs. So, even if the Symbolic formats have control flow capability, the tracing methodology can't make use of it.

This is where some of the goal for scripting comes in. Using scripting, it can recognize control flow and treat it appropriately (assuming the Symbolic format can express the necessary control flow logic). However, scripting must also deal with other logic in the source programming language (python/Java) such as classes, function calls, recursion, other data types, etc. It also needs some avenue to be called from where it has access to the source code. This is less a problem in dynamic python, but in Java it would require either being before the Java compiler or to use the compiled java byte code. Overall, it is a more difficult but mot powerful path.

Now, DJL blocks are not actually a symbolic format. Imperative formats still use features in their source programming languages like class hierarchies. As an example, the imperative Pytorch includes the Module class.

There are two major differences that separate the DJL blocks from a symbolic format. The first is it's treatment of primitive vs compound features. In DJL, you can think of blocks as either being primitive blocks that call the actual engine operators or compound blocks that only call other blocks. If it was properly symbolic, a converter would require only defining the conversion for primitive blocks. As an analogy, a language like Java has primitives (defined in the Java language spec) and compounds (code written in language). Tools like the Java compiler require custom handling for all primitives but work on any arbitrary Java code. However, no DJL block converter would ever be finished. It would require implementations for every block any user might create.

The second difference comes from LambdaBlocks. These are blocks that can contain arbitrary Java code. So, there is no way to write a converter that works for LambdaBlocks without going back to the methodologies of tracing or scripting to convert the arbitrary Java code into Symbolic. This is a fairly big issue as we try to use LambdaBlocks whenever no parameters are necessary including most activations and pooling in addition to other arbitrary code like reshapes, transpose, flatten, etc.

patins1 · 2024-03-16T00:58:45Z

Similar to your abstraction, it makes sense to divide all blocks in DJL as primitive blocks on the one side and compound / lambda blocks on the other side. If we apply this division to other engines as well, we could postulate that primitive blocks can be converted across engines easily while the other blocks - can't, or we don't care for now. It is then the question to find the set of primitive blocks that shall be supported, and then hopefully 90% of networks can be transformed easily between these engines which share this common set of primitive blocks. I ll have a look if this is feasible for MNIST and Resnet which are the study objects of this pull request.

SidneyLann and others added 30 commits September 19, 2023 17:36

To support Yolov8 (deepjavalibrary#2776)

950340f

--------- Co-authored-by: Administrator <Administrator@tech8> Co-authored-by: KexinFeng <[email protected]>

[onnxruntime] Upgrades OnnxRuntime to 1.16.0 (deepjavalibrary#2784)

da15713

build ft for sm90 (deepjavalibrary#2785)

15fd0d0

Updates LightGBM to 1.7.6 (deepjavalibrary#2793)

3a90d0a

Revert "Updates LightGBM to 1.7.6 (deepjavalibrary#2793)" (deepjavali…

8fd79db

…brary#2796) This reverts commit 3a90d0a.

[tokenizer] Allows import non-english model (deepjavalibrary#2797)

27c6a57

allow to just build for 1 flow (deepjavalibrary#2798)

d432a65

[api] Fixed NDList decode numpy file bug (deepjavalibrary#2804)

458933c

[api] Allows cancel Input (deepjavalibrary#2805)

2f4ebee

[ci] Fixes out of diskspace issue (deepjavalibrary#2808)

90059cd

[docs] Fixes markdown headers (deepjavalibrary#2812)

298ea1f

This fixes the markdown headers to be h1 so they render correctly in docs.

Bump up DJL version to 0.25.0 (deepjavalibrary#2809)

f0b4334

add gpu flag build for triton client (deepjavalibrary#2815)

fe86680

[xgb] Add .xgb file extension support (deepjavalibrary#2810)

1c5aef8

[tokenizers] Upgrade huggingface tokenizers to 1.14.1 (deepjavalibrar…

23e07cf

…y#2818)

[huggingface] Adds CrossEncoderTranslator (deepjavalibrary#2817)

85d9e85

Update README with release update (deepjavalibrary#2823)

7d68857

[api] Replace double-check singlton with lazy initialization (deepjav…

3927867

…alibrary#2826)

[api] Refactor PublisherBytesSupplier.java (deepjavalibrary#2831)

9b0c8c9

Updates doc versions to 0.24.0 (deepjavalibrary#2829)

6981d76

* Updates doc versions to 0.24.0 Also moves android gradle.properties to the new 0.25.0. * Remove android change

Updates XGBoost to 2.0.1 (deepjavalibrary#2833)

715e620

* Updates XGBoost to 2.0.1 * Use devtools 8 * Updates based on new Xgboost JNI API. --------- Co-authored-by: Frank Liu <[email protected]>

[tokenizer] Fixes tokenizer bug (deepjavalibrary#2843)

9f55189

Fixes deepjavalibrary#2840

Add erf and atan2 (deepjavalibrary#2842)

f84d3bb

* Added element-wise gauss error function (ERF) * Added element-wise arctan2 * Format java * Fixed docs * added * to other_ptr in Atan2

only build triton binaries (deepjavalibrary#2847)

8f6ff7c

* only build triton binaries * install requests library * remove script

[tokenizer] Update import script for huggingface_hub api change (deep…

e8ceef3

…javalibrary#2850)

[docs] Update serving configuration nav (deepjavalibrary#2853)

e315554

Updates the navigation as a followup to deepjavalibrary/djl-serving#1316.

zachgk and others added 11 commits January 16, 2024 10:49

Moves to Actions hosted M1 runner (deepjavalibrary#2948)

aac5eb3

[ci] Disable run scheduled github actions in fork (deepjavalibrary#2943)

1545c09

[api] Moves commons-compress dependency to standalone class. (deepjav…

fd453d9

…alibrary#2951) Fixes: deepjavalibrary#2949

[api] Allows to load .pt or .onnx file from jar url (deepjavalibrary#…

3bcece6

…2955)

[docs] Updates README (deepjavalibrary#2954)

9c8cc60

[tokenizer] Return if exceed max token length (deepjavalibrary#2957)

5ece342

[tokenizer] Adds getters for HuggingfaceTokenizer (deepjavalibrary#2958)

3defba6

[doc] Make LMI a separate tab and include I/O schema (deepjavalibrary…

c4e3023

…#2960)

[docs] Fixes cuda version for pytorch native library (deepjavalibrary…

9a52d3d

…#2963)

add algebraic logging

9476112

patins1 requested review from zachgk, frankfliu and a team as code owners January 26, 2024 06:15

patins1 added 7 commits January 26, 2024 17:10

log number of generated functions

bd1ca7b

fix naming

a103528

fix bad practice

f81f4ad

fix empty-if-block problem

e6dafa5

fix build errors

421d06e

fix outofmem problem by using 2 epochs

aca0881

reduce memory footprint

1b25c45

patins1 added 2 commits February 18, 2024 22:59

mapped more operations

668be2d

put convolution transpose at weight init

5d65575

frankfliu force-pushed the master branch from ec89a66 to c68f8a7 Compare April 26, 2024 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add algebraic logging #2965

add algebraic logging #2965

patins1 commented Jan 26, 2024 •

edited

Loading

codecov-commenter commented Feb 2, 2024 •

edited

Loading

zachgk commented Feb 27, 2024

patins1 commented Mar 2, 2024

patins1 commented Mar 2, 2024

zachgk commented Mar 4, 2024

patins1 commented Mar 16, 2024

add algebraic logging #2965

Are you sure you want to change the base?

add algebraic logging #2965

Conversation

patins1 commented Jan 26, 2024 • edited Loading

Description

codecov-commenter commented Feb 2, 2024 • edited Loading

Codecov Report

zachgk commented Feb 27, 2024

patins1 commented Mar 2, 2024

patins1 commented Mar 2, 2024

zachgk commented Mar 4, 2024

patins1 commented Mar 16, 2024

patins1 commented Jan 26, 2024 •

edited

Loading

codecov-commenter commented Feb 2, 2024 •

edited

Loading