Holoscan SDK v2.0.0
Release Artifacts
- π Docker container: tag
v2.0.0-dgpu
andv2.0.0-igpu
- π Python wheel:
pip install holoscan==2.0.0
- π¦οΈ Debian packages:
2.0.0.2-1
- π Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
Core
make_condition
,make_fragment
,make_network_context
,make_operator
,make_resource
, and
make_scheduler
now accept a non-const
string or character array for thename
parameter.- A new event-based mult-thread scheduler (
EventBasedScheduler
) is available. It is an alternative to the existing, polling-basedMultiThreadScheduler
and can be used as a drop-in replacement. The only difference in parameters is that it does not takecheck_recession_period_ms
parameter, as there is no such polling interval for this scheduler. It should give similar performance to theMultiThreadScheduler
with a very short polling interval, but without the high CPU usage seen for the multi-thread scheduler in that case (due to constant polling for work by one thread). - When an exception is raised from the
Operator
methodsstart
,stop
orcompute
, that exception will first trigger the underlying GXF scheduler to terminate the application graph and then the exception will be raised by Holoscan SDK. This resolves an issue with inconsistent behavior from Python and C++ apps on how exceptions were handled and fixes a crash in C++ apps when an operator raised an exception from thestart
orstop
methods. - Now, when an exception occurs during the execution of a Holoscan application, it is propagated to
the application'srun
method, allowing users to catch and manage exceptions within their
application.
Previously, the Holoscan runtime would catch and log exceptions, with the application continuing
to run (in Python) or exit (in C++) without a clear indication of the exception's origin.
Users can catch and manage exceptions by enclosing therun
method in atry
block. - In the case of the
holoscan::Fragment::run_async
andholoscan.Application.run_async
methods
for C++ and Python, they returnstd::future
andconcurrent.futures.Future
respectively.
The revised documentation advises usingfuture.get()
in C++ andfuture.result()
in Python to
wait until the application has completed execution and to address any exceptions that occurred.
Operators
- V4L2 Video Capture: added support to set manual
exposure
andgain
values for cameras that support it. - Inference: one can now run multiple instances of the Inference operator in a single application without ressource conflicts.
Utils
- Can now build from source for iGPU (IGX iGPU, Jetpack) from a non-iGPU system (IGX dGPU, x86_64)
- The NGC container now supports packaging and running Holoscan Application Packages using the Holoscan CLI.
- CLI runner - better handling of the use of available GPUs by reading the package manifest file and check the system for available GPUs. New
--gpus
argument to override the default values.
Breaking Changes
-
The
VideoStreamRecorderOp
andVideoStreamReplayerOp
now work without requiring thelibgxf_stream_playback.so
extension. Now that the extension is unused, it has been removed from the SDK and should no longer be listed under theextensions
section of application YAML files using these operators. -
As of version 2.0, we have removed certain Python bindings to align with the unified logger interface:
- Removed APIs:
holoscan.logger.enable_backtrace()
holoscan.logger.disable_backtrace()
holoscan.logger.dump_backtrace()
holoscan.logger.should_backtrace()
holoscan.logger.flush()
holoscan.logger.flush_level()
holoscan.logger.flush_on()
- However, the following APIs remain accessible for Python. These are intended for logging in Holoscan's core or for C++ operators (e.g., using the
HOLOSCAN_LOG_INFO
macro), and are not designed for Python's logging framework. Python API users are advised to utilize the standardlogging
module for their logging needs:holoscan.logger.LogLevel
holoscan.logger.log_level()
holoscan.logger.set_log_level()
holoscan.logger.set_log_pattern()
- Removed APIs:
-
Several GXF headers have moved from
gxf/std
togxf/core
:parameter_parser.hpp
parameter_parser_std.hpp
parameter_registrar.hpp
parameter_storage.hpp
parameter_wrapper.hpp
resource_manager.hpp
resource_registrar.hpp
type_registry.hpp
-
Some C++ code for tensor interoperability has been upstreamed from Holoscan SDK into GXF. The public
holoscan::Tensor
class will remain, but there have been a small number of backward incompatible changes in related C++ classes and methods in this release. Most of these were used internally and are unlikely to affect existing applications.- supporting classes
holoscan::gxf::GXFTensor
andholoscan::gxf::GXFMemoryBuffer
have been removed. The DLPack functionality that was formerly inholoscan::gxf::GXFTensor
is now available upstream in GXF'snvidia::gxf::Tensor
. - The struct
holoscan::gxf::DLManagedTensorCtx
has been renamed toholoscan::gxf::DLManagedTensorContext
and is now just an alias fornvidia::gxf::DLManagedTensorContext
. It also has two additional fields (dl_shape
anddl_strides
to hold shape/stride information used by DLPack). holoscan::gxf::DLManagedMemoryBuffer
is now an alias tonvidia::gxf::DLManagedMemoryBuffer
- supporting classes
-
The GXF UCX extension, used in distributed applications, now sends data asynchronously by default, which can lead to issues such as insufficient memory on the transmitter side when a memory pool is used. Specifically, the concern is only for operators that have a memory pool and connect to an operator in a separate fragment of the distributed application. As a workaround, users can increase the
num_blocks
parameter to a higher value in theBlockMemoryPool
or use theUnboundedAllocator
to avoid the problem. This issue will be addressed in a future release by providing a more robust solution to handle the asynchronous data transmission feature of the UCX extension, eliminating the need for manual intervention (see Known Issue 4601414).-
For fragments using a
BlockMemoryPool
, thenum_blocks
parameter can be increased to a higher value to avoid the issue. For example, the following code snippet shows the existingBlockMemoryPool
resource being created with a higher number of blocks:recorder_format_converter = make_operator<ops::FormatConverterOp>( "recorder_format_converter", from_config("recorder_format_converter"), Arg("pool") = //make_resource<BlockMemoryPool>("pool", 1, source_block_size, source_num_blocks)); make_resource<BlockMemoryPool>("pool", 1, source_block_size, source_num_blocks * 2));
source_pool_kwargs = dict( storage_type=MemoryStorageType.DEVICE, block_size=source_block_size, #num_blocks=source_num_blocks, num_blocks=source_num_blocks * 2, ) recorder_format_converter = FormatConverterOp( self, name="recorder_format_converter", pool=BlockMemoryPool(self, name="pool", **source_pool_kwargs), **self.kwargs("recorder_format_converter"), ) )
-
Since the underlying UCXTransmitter attempts to send the emitted data regardless of the status of the downstream Operator input port's message queue, simply doubling the
num_blocks
may not suffice in cases where the receiver operator's processing time is slower than that of the sender operator. -
If you encounter the issue, consider using the
UnboundedAllocator
instead of theBlockMemoryPool
to avoid the problem. TheUnboundedAllocator
does not have a fixed number of blocks and can allocate memory as needed, though it can cause some overhead due to the lack of a fixed memory pool size and may lead to memory exhaustion if the memory is not released in a timely manner.
The following code snippet shows how to use theUnboundedAllocator
:... Arg("pool") = make_resource<UnboundedAllocator>("pool");
from holoscan.resources import UnboundedAllocator ... pool=UnboundedAllocator(self, name="pool"), ...
-
Bug fixes
Issue | Description |
---|---|
4381269 | Fixed a bug that caused memory exhaustion when compiling the SDK in the VSCode Dev Container (using 'Tasks: Run Build Task') due to the missing CMAKE_BUILD_PARALLEL_LEVEL environment variable. Users can specify the number of jobs with the --parallel option (e.g., ./run vscode --parallel 16 ). |
4569102 | Fixed an issue where the log level was not updated from the environment variable when multiple Application classes were created during the session. Now, the log level setting in Application class allows for a reset from the environment variable if overridden. |
4578099 | Fixed a segfault in FormatConverterOp if used with a BlockMemoryPool with insufficient capacity to create the output tensor. |
4571581 | Fixed an issue where the documentation for the built-in operators was either missing or incorrectly rendered. |
4591763 | Application crashes if an exception is thrown from Operator::start or Operator::stop |
4595680 | Fixed an issue that caused the Inference operator to fail when multiple instances were composed in a single application graph. |
Known Issues
This section supplies details about issues discovered during development and QA but not resolved in this release.
Issue | Description |
---|---|
4062979 | When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered. |
4267272 | AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing nv-p2p.h . Expected to be addressed in IGX SW 1.0 GA. |
4384768 | No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing nv-p2p kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively. |
4190019 | Holoviz segfaults on multi-gpu setup when specifying device using the --gpus flag with docker run . Current workaround is to use CUDA_VISIBLE_DEVICES in the container instead. |
4210082 | v4l_camera example seg faults at exit. |
4339399 | High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the check_recession_period_ms parameter set to 0 by default) may still experience high CPU usage. Setting the HOLOSCAN_CHECK_RECESSION_PERIOD_MS environment variable to a value greater than 0 (e.g. 1.5 ) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler. |
4318442 | UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the UCX_TLS environment variable. |
4325468 | The V4L2VideoCapture operator only supports YUYV and AB24 source pixel formats, and only outputs the RGBA GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888. |
4325585 | Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the stop_on_deadlock_timeout parameter is improperly set to a value equal to or less than check_recession_period_ms , particularly if check_recession_period_ms is greater than zero. |
4301203 | HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA. |
4384348 | UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages. |
4481171 | Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads. |
4458192 | In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the HOLOSCAN_HEALTH_CHECK_PORT environment variable (default: 8777 ), for example, by using export HOLOSCAN_HEALTH_CHECK_PORT=8780 . |
4601414 | The UCX extension's asynchronous data transmission feature causes a regression in the distributed application, such as insufficient memory on the transmitter side. As a workaround, users can increase the num_blocks parameter in the BlockMemoryPool or use the UnboundedAllocator instead of the BlockMemoryPool to avoid the issue. |