Skip to content

Holoscan SDK v2.0.0

Compare
Choose a tag to compare
@agirault agirault released this 19 Apr 15:00
· 7 commits to main since this release
1e011a4

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core
  • make_condition, make_fragment, make_network_context, make_operator, make_resource, and
    make_scheduler now accept a non-const string or character array for the name parameter.
  • A new event-based mult-thread scheduler (EventBasedScheduler) is available. It is an alternative to the existing, polling-based MultiThreadScheduler and can be used as a drop-in replacement. The only difference in parameters is that it does not take check_recession_period_ms parameter, as there is no such polling interval for this scheduler. It should give similar performance to the MultiThreadScheduler with a very short polling interval, but without the high CPU usage seen for the multi-thread scheduler in that case (due to constant polling for work by one thread).
  • When an exception is raised from the Operator methods start, stop or compute, that exception will first trigger the underlying GXF scheduler to terminate the application graph and then the exception will be raised by Holoscan SDK. This resolves an issue with inconsistent behavior from Python and C++ apps on how exceptions were handled and fixes a crash in C++ apps when an operator raised an exception from the start or stop methods.
  • Now, when an exception occurs during the execution of a Holoscan application, it is propagated to
    the application's run method, allowing users to catch and manage exceptions within their
    application.
    Previously, the Holoscan runtime would catch and log exceptions, with the application continuing
    to run (in Python) or exit (in C++) without a clear indication of the exception's origin.
    Users can catch and manage exceptions by enclosing the run method in a try block.
  • In the case of the holoscan::Fragment::run_async and holoscan.Application.run_async methods
    for C++ and Python, they return std::future and concurrent.futures.Future respectively.
    The revised documentation advises using future.get() in C++ and future.result() in Python to
    wait until the application has completed execution and to address any exceptions that occurred.
Operators
  • V4L2 Video Capture: added support to set manual exposure and gain values for cameras that support it.
  • Inference: one can now run multiple instances of the Inference operator in a single application without ressource conflicts.
Utils
  • Can now build from source for iGPU (IGX iGPU, Jetpack) from a non-iGPU system (IGX dGPU, x86_64)
  • The NGC container now supports packaging and running Holoscan Application Packages using the Holoscan CLI.
  • CLI runner - better handling of the use of available GPUs by reading the package manifest file and check the system for available GPUs. New --gpus argument to override the default values.

Breaking Changes

  • The VideoStreamRecorderOp and VideoStreamReplayerOp now work without requiring the libgxf_stream_playback.so extension. Now that the extension is unused, it has been removed from the SDK and should no longer be listed under the extensions section of application YAML files using these operators.

  • As of version 2.0, we have removed certain Python bindings to align with the unified logger interface:

    • Removed APIs:
      • holoscan.logger.enable_backtrace()
      • holoscan.logger.disable_backtrace()
      • holoscan.logger.dump_backtrace()
      • holoscan.logger.should_backtrace()
      • holoscan.logger.flush()
      • holoscan.logger.flush_level()
      • holoscan.logger.flush_on()
    • However, the following APIs remain accessible for Python. These are intended for logging in Holoscan's core or for C++ operators (e.g., using the HOLOSCAN_LOG_INFO macro), and are not designed for Python's logging framework. Python API users are advised to utilize the standard logging module for their logging needs:
      • holoscan.logger.LogLevel
      • holoscan.logger.log_level()
      • holoscan.logger.set_log_level()
      • holoscan.logger.set_log_pattern()
  • Several GXF headers have moved from gxf/std to gxf/core:

    • parameter_parser.hpp
    • parameter_parser_std.hpp
    • parameter_registrar.hpp
    • parameter_storage.hpp
    • parameter_wrapper.hpp
    • resource_manager.hpp
    • resource_registrar.hpp
    • type_registry.hpp
  • Some C++ code for tensor interoperability has been upstreamed from Holoscan SDK into GXF. The public holoscan::Tensor class will remain, but there have been a small number of backward incompatible changes in related C++ classes and methods in this release. Most of these were used internally and are unlikely to affect existing applications.

    • supporting classes holoscan::gxf::GXFTensor and holoscan::gxf::GXFMemoryBuffer have been removed. The DLPack functionality that was formerly in holoscan::gxf::GXFTensor is now available upstream in GXF's nvidia::gxf::Tensor.
    • The struct holoscan::gxf::DLManagedTensorCtx has been renamed to holoscan::gxf::DLManagedTensorContext and is now just an alias for nvidia::gxf::DLManagedTensorContext. It also has two additional fields (dl_shape and dl_strides to hold shape/stride information used by DLPack).
    • holoscan::gxf::DLManagedMemoryBuffer is now an alias to nvidia::gxf::DLManagedMemoryBuffer
  • The GXF UCX extension, used in distributed applications, now sends data asynchronously by default, which can lead to issues such as insufficient memory on the transmitter side when a memory pool is used. Specifically, the concern is only for operators that have a memory pool and connect to an operator in a separate fragment of the distributed application. As a workaround, users can increase the num_blocks parameter to a higher value in the BlockMemoryPool or use the UnboundedAllocator to avoid the problem. This issue will be addressed in a future release by providing a more robust solution to handle the asynchronous data transmission feature of the UCX extension, eliminating the need for manual intervention (see Known Issue 4601414).

    • For fragments using a BlockMemoryPool, the num_blocks parameter can be increased to a higher value to avoid the issue. For example, the following code snippet shows the existing BlockMemoryPool resource being created with a higher number of blocks:

      recorder_format_converter = make_operator<ops::FormatConverterOp>(
        "recorder_format_converter",
        from_config("recorder_format_converter"),
        Arg("pool") =
          //make_resource<BlockMemoryPool>("pool", 1, source_block_size, source_num_blocks));
          make_resource<BlockMemoryPool>("pool", 1, source_block_size, source_num_blocks * 2));
      source_pool_kwargs = dict(
          storage_type=MemoryStorageType.DEVICE,
          block_size=source_block_size,
          #num_blocks=source_num_blocks,
          num_blocks=source_num_blocks * 2,
      )
      recorder_format_converter = FormatConverterOp(
              self,
              name="recorder_format_converter",
              pool=BlockMemoryPool(self, name="pool", **source_pool_kwargs),
              **self.kwargs("recorder_format_converter"),
          )
      )
    • Since the underlying UCXTransmitter attempts to send the emitted data regardless of the status of the downstream Operator input port's message queue, simply doubling the num_blocks may not suffice in cases where the receiver operator's processing time is slower than that of the sender operator.

    • If you encounter the issue, consider using the UnboundedAllocator instead of the BlockMemoryPool to avoid the problem. The UnboundedAllocator does not have a fixed number of blocks and can allocate memory as needed, though it can cause some overhead due to the lack of a fixed memory pool size and may lead to memory exhaustion if the memory is not released in a timely manner.
      The following code snippet shows how to use the UnboundedAllocator:

      ...
        Arg("pool") = make_resource<UnboundedAllocator>("pool");
      from holoscan.resources import UnboundedAllocator
      ...
              pool=UnboundedAllocator(self, name="pool"),
      ...

Bug fixes

Issue Description
4381269 Fixed a bug that caused memory exhaustion when compiling the SDK in the VSCode Dev Container (using 'Tasks: Run Build Task') due to the missing CMAKE_BUILD_PARALLEL_LEVEL environment variable. Users can specify the number of jobs with the --parallel option (e.g., ./run vscode --parallel 16).
4569102 Fixed an issue where the log level was not updated from the environment variable when multiple Application classes were created during the session. Now, the log level setting in Application class allows for a reset from the environment variable if overridden.
4578099 Fixed a segfault in FormatConverterOp if used with a BlockMemoryPool with insufficient capacity to create the output tensor.
4571581 Fixed an issue where the documentation for the built-in operators was either missing or incorrectly rendered.
4591763 Application crashes if an exception is thrown from Operator::start or Operator::stop
4595680 Fixed an issue that caused the Inference operator to fail when multiple instances were composed in a single application graph.

Known Issues

This section supplies details about issues discovered during development and QA but not resolved in this release.

Issue Description
4062979 When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered.
4267272 AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing nv-p2p.h. Expected to be addressed in IGX SW 1.0 GA.
4384768 No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing nv-p2p kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively.
4190019 Holoviz segfaults on multi-gpu setup when specifying device using the --gpus flag with docker run. Current workaround is to use CUDA_VISIBLE_DEVICES in the container instead.
4210082 v4l_camera example seg faults at exit.
4339399 High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the check_recession_period_ms parameter set to 0 by default) may still experience high CPU usage. Setting the HOLOSCAN_CHECK_RECESSION_PERIOD_MS environment variable to a value greater than 0 (e.g. 1.5) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler.
4318442 UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the UCX_TLS environment variable.
4325468 The V4L2VideoCapture operator only supports YUYV and AB24 source pixel formats, and only outputs the RGBA GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888.
4325585 Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the stop_on_deadlock_timeout parameter is improperly set to a value equal to or less than check_recession_period_ms, particularly if check_recession_period_ms is greater than zero.
4301203 HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA.
4384348 UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages.
4481171 Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads.
4458192 In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the HOLOSCAN_HEALTH_CHECK_PORT environment variable (default: 8777), for example, by using export HOLOSCAN_HEALTH_CHECK_PORT=8780.
4601414 The UCX extension's asynchronous data transmission feature causes a regression in the distributed application, such as insufficient memory on the transmitter side. As a workaround, users can increase the num_blocks parameter in the BlockMemoryPool or use the UnboundedAllocator instead of the BlockMemoryPool to avoid the issue.