Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add buffers as input #74

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

eWert-Online
Copy link
Collaborator

Fixes #53

@eWert-Online eWert-Online marked this pull request as draft December 21, 2022 20:28
t.is(diffPercentage, 2.85952484323);
});

test.skip("Accepts buffer as input for base and compare image at the same time", async (t) => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is currently failing, as we can't distinguish from nodes stdin stream, where the first image ends and the second one starts.

@@ -21,7 +21,7 @@
"scripts": {
"run": "esy x ODiffBin",
"test": "esy x RunTests.exe",
"test-js": "esy ava",
"test-js": "esy ava --verbose",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the verbose flag to better see, which tests are failing and which are passing.
Also, without this flag, ava does not report that some tests timed out.

Comment on lines +2 to +5
/** The image type of the base image. This has to be set to the corresponding image format when using a buffer as input */
baseImageType?: 'filepath' | 'jpg' | 'png' | 'bmp' | 'tiff' = 'filepath';
/** The image type of the compare image. This has to be set to the corresponding image format when using a buffer as input */
compareImageType?: 'filepath' | 'jpg' | 'png' | 'bmp' | 'tiff' = 'filepath';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like, that you are currently able to provide a buffer as input and at the same time add "filepath" as the image type.
Maybe this can be prevented with some typescript magic, but I am really not experienced in ts.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way too quickly read codec info from buffer? I understand why this is done looks like it may hurt performance to add a lookup over all the codecs.

I think this can be done by simply duplicating overrides like this

declare function compare(
  baseImage: Buffer,
  compareImage: buffer,
  baseCodec: "..",
  compareCodec: ".."
)

declare function compare(
  baseImage: string,
  compareImage: string,
)

and then in function check the type of baseImage && compareImage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can of course try to read in the first 4 bytes and identify the magic number. Here is a list of the valid magic numbers for different image formats: https://gist.github.com/leommoore/f9e57ba2aa4bf197ebc5.

TIFF and JPG do seem to have multiple valid ones, But I think it would be simple (and probably reasonably fast) to check it.

@@ -10,15 +26,55 @@ let diffPath =
let base =
Arg.(
value
& pos(0, file, "")
& info([], ~docv="BASE", ~doc="Path to base image")
& pos(0, underscore_or(non_dir_file), "_")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are now able to add either a file or an _ (underscore) to accept a raw image buffer.
I explicitly added the non_dir_file converter, so directories are not a valid input.

Comment on lines +38 to +50
let baseType =
Arg.(
value
& opt(enum([("auto", `auto), ...supported_formats]), `auto)
& info(
["base-type"],
~docv="FORMAT",
~doc=
Printf.sprintf(
"The type of the base image (required to be not auto when a buffer is used as input).\nSupported values are: auto,%s",
supported_formats |> List.map(fst) |> String.concat(","),
),
)
Copy link
Collaborator Author

@eWert-Online eWert-Online Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are now able to explicitly set the type of the provided input. This is required for buffer inputs.
We could also try to read the type from the images header, but I think this would be too much work (and it would hurt the performance).

Comment on lines +29 to +38
/* We use 65536 because that is the size of OCaml's IO buffers. */
let chunk_size = 65536;
let buffer = Buffer.create(chunk_size);
let rec loop = () => {
Buffer.add_channel(buffer, stdin, chunk_size);
loop();
};
try(loop()) {
| End_of_file => Buffer.contents(buffer)
};
Copy link
Collaborator Author

@eWert-Online eWert-Online Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are reading from stdin until there is no more data provided.
If we wanted to support both base and compare image as buffer inputs, we would need to implement a check for some kind of separator in here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is basically why I didn't implement this from scratch. Looks incredibly tough to correctly read stdin and avoid memory copying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the performance is not bad. The only issue is knowing when the first image ends and the second one starts.
I will need to see what I can come up with to solve this problem.

@eWert-Online
Copy link
Collaborator Author

@dmtrKovalenko
What do you think of the design so far? I may have some time to work on this this week, but I want to make sure I am on the right path before continuing 🙂

Copy link
Owner

@dmtrKovalenko dmtrKovalenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your design is really good, I am just wondering about reading stdin into ocaml buffer which involves a very big memory chunk being copied and leaved for ocaml GC which may impact performance

module IO2 = (val getIOModule(img2Path));
let img1Type =
switch (img1Type) {
| `auto when img1 == "_" =>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be named stdin instead of auto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto is the one which automatically gets the type from the filename (the currently published behaviour). The explicit types (png, jpg, ...) are the types given with the stdin input.
I think we can get rid of this though, if we are reading in the format from the magic number.

Comment on lines +2 to +5
/** The image type of the base image. This has to be set to the corresponding image format when using a buffer as input */
baseImageType?: 'filepath' | 'jpg' | 'png' | 'bmp' | 'tiff' = 'filepath';
/** The image type of the compare image. This has to be set to the corresponding image format when using a buffer as input */
compareImageType?: 'filepath' | 'jpg' | 'png' | 'bmp' | 'tiff' = 'filepath';
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way too quickly read codec info from buffer? I understand why this is done looks like it may hurt performance to add a lookup over all the codecs.

I think this can be done by simply duplicating overrides like this

declare function compare(
  baseImage: Buffer,
  compareImage: buffer,
  baseCodec: "..",
  compareCodec: ".."
)

declare function compare(
  baseImage: string,
  compareImage: string,
)

and then in function check the type of baseImage && compareImage

Comment on lines +29 to +38
/* We use 65536 because that is the size of OCaml's IO buffers. */
let chunk_size = 65536;
let buffer = Buffer.create(chunk_size);
let rec loop = () => {
Buffer.add_channel(buffer, stdin, chunk_size);
loop();
};
try(loop()) {
| End_of_file => Buffer.contents(buffer)
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is basically why I didn't implement this from scratch. Looks incredibly tough to correctly read stdin and avoid memory copying.

@dmtrKovalenko
Copy link
Owner

Makes sense, but I would really check if there would be possible to read images directly from stdin instead of creating additional ocaml buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

accept buffers as input
2 participants