Releases: google-ai-edge/mediapipe
MediaPipe v0.10.18
Build changes
- Following open-sourcing webgpu with open-sourcing one of its dependencies
third_party/emscripten
- Add pillow, pyyaml, and requests to model_maker BUILD
Framework and core calculator improvements
- Loading resources through calculator and subgraph contexts and configuring through kResourcesService.
- Use std::make_unique
- Moves OnDiskCacheHelper class into a separate file / compilation target
- Pools: report buffer specs on failure, fix status propagation, fix includes
- Open-Source MediaPipe's WebGPU helpers.
- BatchMatul uses transpose parameter.
- Introduce Resource to represent a generic resource (file content, embedded/in-memory resource) for reading.
- Bump up the version number to 0.10.16
- Migrate from AdapterProperties to AdapterInfo
- Migrate from Resource::ReadContents to Resources::Get (using ForEachLine where required)
- Update Resources docs to mention ForEachLine (so devs don't fallback to ReadContents in such a case)
- Adjust WebGPU device registration
- Fix includes/copies/checks for BuildLabelMapFromFiles
- Migrate to BuildLabelMapFromFiles.
- Update Python version requirements in setup.py
- Introduce Resources with mapping, so graphs can use placeholders instead of actual resource paths.
- Remove Resources::ReadContents & add Resource::TryReleaseAsString.
- Fix ports for multi side outputs.
- Update solution android apps with explicit exported attribute.
- Ensure kResourcesService is set before CalculatorGraph is initialized (otherwise subgraphs/nodes may get the wrong default resources).
- Switch inference tests to ResourceProviderCalculator & update builder to refer MODEL_RESOURCE.
- Migrate modules to use ResourceProviderCalculator.
- Support single tensor input in TensorsToImageCalculator
- Migrate TfLiteModelLoader to use MP Resources.
- Remove deprecated TfLiteModelLoader::LoadFromPath.
- Fix for isIOS() platform util on worker and non-worker contexts
- Support single tensor input in TensorsToSegmentationCalculator
- Makes CalculatorContext::GetGraphServiceManager() private
- BatchMatMul can handle cases where ndims != 4 and quantization
- RmsNorm has an optional scale parameter.
- Allowed variable audio packet size by setting num_samples to null.
- Fix technically correct but confusing example in top level comments.
- Removing
ReturnType
helper, since it's part of the standard now. - Update XNNPack to 9/24
- Enable LoRA conversion support for Gemma2-2B
- Improve warning when InferenceCalculator backends are not linked
- Bump MediaPipe version to 0.10.17.
- Update OpenCV to a version that compiles with C++ 17
- Force xnnpack when CPU inference is enforced
- Install PyBind before TensorFlow to get the MediaPipe version
- Change MP version to 0.10.18
- Add validation to LLM bundler, alternative takePicture method to support custom thread executor, CopySign op, const Spec() method to OutputStreamManager, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, menu for the default demo app and option to Close processor/graph and Exit gracefully, ngrammer, per layer embeddings and Relu1p5 fields to llm_params and update from Proto, a special InMemory Resources (current use case is in tests, but may be needed for some simple things as well), ResourceProviderCalculator (replacement for LocalFileContentsCalculator), Resource support into TfliteModelCalculator and a flag to set the default number of XNNPACK threads.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Initialize new members in LlmModelSettings
- Create an implicit session for all requests to generateResponse()
- Change session management so that all JNI calls come from the same thread.
- Add Session API support to LLM Java API
iOS
- Updated name of iOS audio classifier delegate
- Fixed incorrect stream mode in iOS audio classifier options
- Added method to ios audio task runner
- Updated iOS audio classifier BUILD file
- Fixed buffer length calculation in iOS MPPAudioData
- Updated iOS audio data tests to fix issue in buffer length calculation
- Revert "Added method for getting interleaved float32 pcm buffer from audio file"
- Updated comments in iOS LlmInference
- Dropped Refactored suffix for modified files in iOS genai
- Updated documentation of LlmTaskRunner
- Removed allocation of LlmInference Options
- Updated the response generation queue to be serial in iOS LlmInference
- Updated documentation of iOS LlmInference, documentation of LlmInference+Session
- Fixed marking of response generation completed control flow in LlmInference+Session.
- LlmInference.Options: remove unnecessary
numOfSupportedLoraRanks
parameter. - Add activation data type to LlmInference.Options.
- Added more methods to iOS
AVAudioPCMBuffer+TestUtils
, few basic iOS audio classifier tests, options tests to iOS audio classifier, utils for AVAudioFile, test for score threshold to MPPAudioClassifierTests, constants in MPPAudioClassifierTests, close method to iOS audio classifier, iOS MPPAudioData test utils, stream mode tests for iOS audio classifier, iOS audio classifier to cocoapods build, audio record creation tests to MPPAudioClassifierTests, close method to MPPAudioEmbedder, iOS audio embedder tests, more utility methods to MPPAudioEmbedderTests, streams mode tests for iOS audio embedder, iOS audio embedder to cocoapods build, comments to MPPAudioClassifierTests, iOS audio embedder header and implementation, iOS audio classifier implementation file, method for getting interleaved float32 pcm buffer from audio file, refactored iOS LlmTaskRunner, iOS LlmSessionRunner, more errors to GenAiInferenceError, refactored LlmInference, iOS session runner to build files, extra safeguards for response context in LlmSessionRunner, LlmInference+Session.swift and documentation regarding session and inference life times to iOS LLM Inference. - Fixed issue with iOS audio embedder result parsing, iOS audio embedder options processing , index error in AVAudioFile+TestUtils, audio classifier result processing in stream mode, error handling in MPPAudioData, microphone recording issues in iOS MPPAudioRecord, documentation of iOS Audio Record, iOS audio record and audio data tests by avoiding audio engine running state checks and iOS audio embedder result helpers and bug due to simultaneous response generation calls across sessions.
- Updated method signatures in iOS audio classifier tests
- Fixed flow limiting in iOS audio classifier
- Removed duplicate test from MPPAudioClassifierTests
- Updated comments in AVAudioFile+TestUtils
- Changed the name of iOS audio classifier async test helper
- Update comment for
LlmInference.Session.clone()
method. - Marked inits unavailable in MPPFloatBuffer
- Updated documentation of iOS audio record
- Adds a LlmInference.Metrics for providing some key performance metrics ( initialization time, response generation time) of the LLM inference.
- Removed unwanted imports from iOS audio data tests
- Cleaned ios audio test utils BUILD file
- Remove the activation data type from the Swift API. We don't expect users to set it directly.
- Use seconds instead of milliseconds for latency metrics.
Javascript
- Add comments to generateResponses method.
- Migrate to ForEachLine to have a single source of truth for getting file contents lines.
- Workaround for multi-output web LLM issue where last response can get corrupted when numResponses is odd.
- Quick fix for wrong number of multi-outputs sometimes when streaming
Python
- Add a flag in the converter config for generating fake weights. When it is set to true, all weights will be filled with zeros.
- Update text embedder test to match the output after XNNPack upgrade.
- Update remaining data in text embedder test to match the output after XNNPack upgrade.
- Update the expected value of the text embedder test.
- Add python pip deps to WORKSPACE
- Fix pip_deps targets.
Model Maker changes
- Undo dynamic sequence length for export_model api because it doesn't work with MediaPipe.
- Replace
mock
withunittest.mock
inmodel_maker
tests. - Move tensorflow lite python calls to ai-edge-litert.
MediaPipe Dependencies
- Update WASM files
MediaPipe v0.10.15
Build changes
- Fix unwanted dependency on GPU libraries.
- Adds TwoTapFirFilterCalculator.
- Add public visibility to
graph_service
headers. - Disable ASAN, TSAN and MSAN tests which take more than 10 minutes.
Framework and core calculator improvements
- Update
PointToForeign
with an optional cleanup object. - Enable
BeginLoopCalculator
for move-only types (e.g.Tensor
) withoutPacket::Consume
usage and copyable types without copying unless it's a fundamental type. - Ensure proper release of resources in case of multiple AHWB reads.
- Enables the configuration of GpuBufferPool options via GpuResources::Create();
- Bugfix to correctly handle landmark projection in the non-square case.
- add utility to wait for a sync (represented by FD)
- Change a RET_CHECK to RET_CHECK_EQ
- KinematicPathSolver: Avoid overshooting target
- Introduce GetDefaultGpuExecutor(GpuResources) to allow executing all calculators on MP GPU thread.
- No destruction for static ahwb_usage_track_.
- Unbind framebufffer in Affine Transformation Runner GL
- Move/isolate ahwb_usage_track_ into tensor_ahwb
- Guard ahwb_tensor_track_ with mutex.
- Add SidePacketConnectionTest
- Update C++ Graph Builder to support executors and support input/output stream handlers.
- Node::Input/OutputStreamHandler -> Node::SetInput/OutputStreamHandler
- Add
Packet::Share()
method in replacement ofSharedPtrWithPacket()
function. - Default to high-performance power preference hint for WebGL contexts. For some computers with dual GPUs (like MBP2019), this will more frequently give us the higher performance GPU, which is generally preferable for most of our use cases (realtime rendering and ML), since speed is more critical than power consumption. If necessary, the user can override this setting by requesting their canvas' WebGL context manually before initializing the graph.
- Introduce input_scale parameter to SpectogramCalculator.
- Improve documentation of graph options
- Add an option to PackMediaSequenceCalculator to add empty clip labels instead of ignoring them. This is useful when we want to distinguish processing errors from no-detections.
- Updates language detection headers
- Fix dangling error reporter pointer in memory mapped models
- Fix for possible infinite stall using setOptions immediately before a loadLoraModel call.
- Add relu1p5 op, abs op, Log op, mdspan and Lhs Broadcast Sub with test
- Fix missing member move in Tensor class
- Add support for single Tensor output streams for ImageToTensorCalculator.
- Fix some compilation errors in WebGPU code. These changes are all minor.
- Add single tensor output support to tensor_converter_calculator.
- Replace QCHECK with ABSL_QCHECK and CHECK with ABSL_CHECK.
- Fix a bug in TensorAHWB that triggers a crash with multiple delayed AHWB readers followed by a CPU reader.
- Fixes an unnecessary allocation of GraphServiceManager in case it is adopted from the calculator context.
- Fix triggering of DFATAL message.
- Remove xnn_enable_avx512fp16=false from .bazelrc
- Replace uses of TfLiteOperatorCreate with TfLiteOperatorCreateWithData
- Compile with '--keep_going' in setup.py
- Update ndk version so that our open source users get the best possible performance out of mediapipe.
- Correct address of android ndk
- Replace absl::make_unique with std::make_unique in tensor.cc and tensor_ahwb.cc.
- LLM decode benchmarks fill the cache with a predefined number of tokens before starting decoding.
- Add logic to drop the offending non-monotonically increasing timestamp in the MicrophoneHelper.
- Make packet payload const.
- Pass flag to indicate that consuming op may support prepacked GEMM.
- Get timestamp from OpenCV VideoCapture after first frame is read.
- Update XNNPack and cpuinfo
- Update TensorFlow to 2024-07-18.
- Remove deprecated TfLiteOperatorCreateWithData function
- Add option to use shifted window in SpectrogramCalculator.
- Move AhwbUsage struct and helper methods into a separate library.
- Make fields in
PacketGetter.Pair
public. - The GraphProfiler my be destoried before the task executed in the executor.
- Introduce flag in MicrophoneHelper to drop non-increasing timestamps.
- llm_test - add batch size of 8 for BM_Llm_QCINT8/512/128
- Add method to create MP Tensor from TfLite tensor specs
- Refactors AHardwareBufferView class to be instantiated with a TensorAhwbUsage pointer.
- Refactor LlmBuilder to have one graph
- Add
expected_seq_len
param to ComputeLogits() - Fix mediapipe::file::Exists() for >2GB files on Windows.
- Bump XNNPACK and KleidiAI versions.
- Update MP demo app to acquire wake lock
- Replace mediapipe::StatusOr with absl::StatusOr
- Sync on ssbo_writte_ before mapping an AHWB to a CpuReadView.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Bump targetSdkVersion to 34 throughout MediaPipe.
iOS
- Updated documentation in iOS audio classifier
- Added iOS holistic landmarker to vision framework build
- Changed method name in MPPAudioClassifierResult
- Added audio classifier options helpers
- Added audio classifier result helpers
- Added method to create audio record MPPAudioTaskRunner
- Removed unused imports in MPPAudioTaskRunner
- Added iOS audio embedder result, classifier result, classifier options, embedder options, embedder options helpers, classifier header and embedder result helpers
- Add missing argument for num_draft_tokens.
Javascript
- Set quantization bits for LoRA weight conversion to match those specified
- Warn on adding packets to a closed input stream instead of silently dropping packets.
- Enable experimental support for Chromium WGSL subgroups in LLM API, when available.
- Support multi-response generation.
Python
- Add prompt template to llm bundler.
Bug fixes
- class_weights flag cuases a crash for multiclass case
Model Maker changes
- Rename old BinaryAUC metric to BinarySparseAUC(used by text_classifier) and create a new BinaryAUC metric which does not expect sparse inputs.
- Allow configuration of num_parallel_calls and cycle_length in hparams
- Improve python code format.
- Use tf.io.gfile.GFile for writing metadata file in image classifier.
- Change SparsePrecision metric to BinarySparsePrecision metric, and same for SparseRecall->BinarySparseRecall in the core library. We only care about these metrics in the binary case, so this change makes the metric classnames more accurate for it's intended usage.
- Support multilabel model training in text classifier
- Create and add metrics for multi-class case
- Support a customized best model monitor for multiclass cases
MediaPipe Dependencies
- Update WASM files
MediaPipe v0.10.14
Framework and core calculator improvements
- Expose Lora ranks.
- Update C API documentation to make it clear that the callback is invoked multiple times
- Do not free response in PredictAsync callback
- Enable usage of DRISHTI_PROFILING from non mediapipe namespaces.
- Add model type to ImageGeneratorOptions.
- Allow casting Stream->Stream
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to other platforms.
iOS
- Added iOS audio data tests
- Removed unused methods in AVAudioPCMBufferTestUtils
- Added read at offset tests to MPPAudioRecordTests
- Renamed property in MPPAudioData
- Added iOS Audio Packet Creator
- Added iOS audio running mode
- Added iOS Packet Creator
- Added iOS audio task runner
- Updated documentation of MPPAudioPacketCreator
Javascript
- Allow models to be uploaded via ReadableStreamDefaultReader
- Allow all tasks to use a ReadableStreamDefaultReader
- Expose Web LoRA API.
- Raise WebGPU errors to JavaScript.
- Update GenAI Experimental README
- Update GenAI README
Python
- Fixed result_callback() argument
MediaPipe Dependencies
- Flatbuffers upgrade to 24.3.7
- Update TF and FlatBuffer dependency to latest.
MediaPipe v0.10.13
Build changes
- Make Holistic C++ graph public until we have a C++ API
- Added a test image to tasks/testdata/vision
- Update dependency in inference_calculator_metal to make it OSS compatible
- Add build rule for gpt2_unicode_mapping_calculator to ODML repo
- Update TF patch to match new version
- Make model_asset_bundle_resources public
Framework and core calculator improvements
- Added Interactive Segmenter C Tasks API and updated Image Segmenter + Pose Landmarker API/tests
- Moved some utility functions used by the segmenter APIs to a shared test namespace
- Added Face Stylizer C API
- Add config options to disable default service in mediapipe vision tasks.
- Fix race condition in GetCurrentThreadId
- Update base Docker image to Ubuntu 22.04
- Adding support for boolean tensor inputs to InferenceInterpreterDelegateRunner
- Updates
text_embedder_cosing_similarity
signature to use Embedding pointers - Fix mediapipe/framework/packet.h build failure on C++20.
- Finish allowing "direct Tensor" inputs and outputs in all InferenceCalculator variants.
- Add SizeInTokens API to C layer
- Remove dependency on "torch" for MediaPipe Python package
- Add option for allowing cropping beyond image borders in ContentZoomingCalculator
- Update XNNPACK
- Add support for loading models from memory mapped files
- InferenceCalculator: Add option to use mmap for model loading
- Workaround the flaky status of XNN_FLAG_KEEP_DIMS
- Add an HasError method and a test for ErrorReporter
- Update cached_kernel_path option doc
- Adds MemoryManager to several Tensor-generating calculators
- Add support for mmapping models to more inference calculators
- Removes InferenceRunner interface from InferenceCalculatorNodeImpl
- ContentZoomingCalculator: Fix initial state for "last measured" rect
- Reduce memory usage of LLM Web API
- Propagate packet timestamps to Android surfaces
- Make previous_log_index_ atomic and to fix some of race condition issues in Mediapipe Profiler
- Adds MemoryManager to TensorConverter Calculator
- Fix template_parser's crash when destructing stowed_messages_ for proto3.
- ContentZoomingCalculator: Don't clamp when
allow_cropping_outside_frame
is set - Refactor Metal path out of TensorsToSegmentationCalculator main file.
- Update Protobuf dependency to 4.x
- Expand AssetManager docs to provide JNI initialization method and proper usage patter through GetResourceContents.
- Enables reordering of input and output tensor streams in InferenceCalculators
- Add CopyCpuInputIntoTfLiteTensor
- Add CopyTfLiteTensorIntoCpuOutput
- Add int64_t to MP tensor.
- Make it clear sinks should outlive graph initialized with the corresponding config
- Update initializeNativeAssetManager docs - singleton + MediaPipe usage
- Deprecated ImageSource in favor of standard TexImageSource.
- Add support for additional tensor_data_type for tensor conversion calculators.
- Fix TensorsToSegmentationConverterMetal RunInGlContext().
- Change the naming of converters of ImageToTensorCalculator and TensorsToSegmentationCalculator.
- Update WebGL2 on OffscreenCanvas support check to include Safari 17+
- Use TextFormat for serialization
- Add
IsConnected()
to graph builder SideOut - Use "ahwb" prefix for "release_callback" to disambiguate ahwb vs. non ahwb callbacks.
- Allow multiple AHWB release callbacks.
- Add itemized loop calculators
- Add support for a Vector string packet to the constant_side_packet_calculator.
- Fix an issue in BeginItemLoopCalculator
- Allow arbitrary timestamp changes in BeginItemLoopCalculator
- Add unsigned int type to Mediapipe-Web binding.
- Add the ability to load a drishti graph template from a byte array.
- Add error handling to CreateSesssion in C API
- Report received dims size in the error.
- Adds conditional TFLITE_CONDITIONAL_NAMESPACE namespace to .cc implementations
- Adds support for tensor scalar output to VectorIntToTensorCalculator.
- Parse num classes per detection from TFLite_Detection_PostProcess op.
- Output error status int in case AHWB allocation fails.
- Support more types for inference_calculator_util tensor copying functions.
- Upgrade TensorFlow
- Fix ASAN error by removing tensor data filling for kNone in test.
- Added warning when MultiPoolOptions.keep_count is reached
- Updated the safetensor converter to support Gemma 7B mode
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Add sizeInTokens API to the Java LLM Inference Engine
- Set a default empty lora path for LlmInference.
iOS
- Updated iOS vision task runner to support tasks without norm rect stream
- Added iOS holistic landmarker result helpers and implementation
- Add async stream API to LlmInference for better Swift compatibility.
- Remove duplicate symbols from MediaPipeTasksGenAIC
- Apply iOS build fixes
- Revert avoid_deps in from MediaPipeTasksGenAI_framework and MediaPipeTasksGenAIC_framework.
- Fixed missing method in iOS vision task runner
- Fixed condition check in MPPVisionTaskRunner
- Fixed incorrect types in MPPHolisticLandmarkerResult
- Added init with proto utility to MPPHolisticLandmarkerResult
- Added MPPHolisticLandmarker helper for initialization from protobuf text file
- Added Holistic Landmarker Objective C Tests
- Added size in tokens API to iOS LlmInference
- Fixed type of holistic landmarker pose segmentation mask
- Added optional initialization of face blendshapes from protobuf file
- Added video mode and option tests to MPPHolisticLandmarker tests
- Updated documentation of MPPHolisticLandmarkerResult+Helpers.h
- Added iOS face stylizer implementation, options helpers, Result Helpers
- Updated iOS MPPImage Utils to support creation of output images from C++ RGB images
- Updated constants in MPPImage+Utils
- Added iOS Face Stylizer tests
- Updated documentation of iOS MPPFaceStylizer
- Added missing connections to iOS Pose Landmarker
- Added iOS MPPFloatBuffer, MPPFloatRingBuffer, ring buffer tests, MPPAudioData
- Update swift name of MPPAudioDataFormat
- Added live stream mode tests to iOS holistic landmarker
- Updated method signature in MPPHolisticLandmarkerResult+Helpers
- Added test for nil modelPath to face stylizer
- Fixed memory deallocation issues when creating images using MPPImage+Utils
- Exposed iOS Face Stylizer headers in xcframework build
- Added iOS MPPAudioRecord
- Updated method signature in MPPFloatRingBuffer
- Added audio error codes to MPPCommon.h
- Move MPPAudioDataFormat to a new file
- Updated method signature in MPPAudioRecord
- Added test utils for AVAudioPCMBuffer
- Added basic failure tests for MPPAudioRecord
- Add support for static LoRA on iOS.
- Updated AVAudioPCMBuffer convenience initializer to a class method
- Fix LlmTaskRunner.swift
- Added buffer loading tests to MPPAudioRecord
- Added method to load from audio record in MPPAudioData
- Fix LoRA integration in LlmInference.swift
- Add error handling to GenAI's Swift API
Javascript
- Add export to GenAI Fileset API
- Return the full string from the model
- Fix code snippet in NPM Readme
- Add Holistic Landmarker to NPM Readme
- Update npm README
- Add tokenizer normalization node to LLM web graph.
- Add Matrix to vision.d.ts
Python
- Fix API documentation link for ImageProcessing Options
- Disable text_embedder and text_classifier tests for Python
- Update safetensors converter for LoRA weights conversion for GEMMA 2B.
- Update model converter to support Phi-2 LoRA
- Mark Optional for landmark_drawing_spec argument
- Add LoRA options to converter.
Model Maker changes
- Keep tensorflow and tf-models-official to be <2.16. tensorflow-addons breaks with tensorflow 2.16.
- Read from default checkpoint path when training MobileBERT.
- Add checkpoint_frequency in model maker.
- Add repeat field in hyperparameters in model maker classifier.
- add mobilenet_v2 keras model spec.
- Only use auc, precision, and recall for binary classification problems.
- Drop remainder from datasets in text_classifier. This helps deal with issues on TPU training that results in NaN loss.
- Disable object detector oss test due to flakiness
MediaPipe Dependencies
- Update WASM files for 0.10.13 release
- Update WASM files to fix issues in the LLM Inference API
MediaPipe v0.10.11
Build changes
- Updated genai C package visibility
Framework and core calculator improvements
- Prevent UnpackMediaSequenceCalculator from segfaulting on a type of malformed input.
- Updated import statements of llm_inference_engine.h to support C
- Refactor OpenGL 3.1 path out of TensorsToSegmentationCalculator main file.
- Add 'addRawDataSpanToInputSidePacket' and
addRawDataSpanToInputStream
binding functions. - Update DotAttention interface to take SelfAttentionWeights
- Remove customized DotAttention
- Update Tensorflow dependency to latest release
- Update InferenceCalculator documentation on DELEGATE side input.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to other platforms.
Android
- Add LlmInference stats logging
- Changed max_sequence_length naming to max_tokens
iOS
- Added iOS LlmInferenceError, LlmInference, LlmTaskRunner
- Removed init with default params in
LlmInferenceOptions
- Updated iOS task runner to delete C LLM Session on deallocation
- Updated variable names in iOS LlmTaskRunner
- Updated access of C LLMSession to fileprivate
- Updated access of some constants in iOS LlmInference
- Removed unwanted iOS BUILD targets
- Added iOS Gen AI build scripts
- Added iOS Gen AI files
- Updated parameter names in LlmSessionConfig create
- Added asynchronous predict and generate function to iOS LLM Task Runner
- Removed decoded response from iOS LlmTaskRunner
- Updated iOS error enum cases
- Updated response generation state logic in iOS LlmInference
- Fixed error handling in iOS LlmInference
- Updated error message in iOS GenAiInferenceError
- Fixed unitialized response array in iOS LlmTaskRunner
- Added podspec templates of the iOS Gen AI framework
Javascript
- Add
visibility
field in landmark.d.ts - Update landmark_result.ts with
visibility
support
Python
- Register model_ckpt_util in Python framework
- Expose HolisticLandmarker module as other modules
- Create empty module if ENABLE_ODML_CONVERTER is not set
- Optimized memory usage for conversion script
Model Maker changes
- Remove jax and torch from model maker requirements.txt
- Use tf model optimization < 0.8.0 due to tf.keras and tf_keras compat issues
MediaPipe Dependencies
- Update WASM files for 0.10.11 release
MediaPipe v0.10.10
Build changes
- Fix TensorsToSegmentationCalculator gpu dependencies.
- Open Source build rules for quantization_util
- Added the binary the converter factory to run the model weight conversion.
- Integrates the kMemoryManagerService into ImageToTensorCalculator and InferenceCalculatorDarwinn.
- Updated iOS OpenCV source build to exclude highgui and videoio
- Open source some BUILD rules for Converter package
Framework and core calculator improvements
- Added Face Landmarker C Tasks API and tests
- Added Pose Landmarker C Tasks API
- Use memcpy now for copying data and indicate how the data is stored
- Remove superfluous glFlush().
- Added Face Detector C Tasks API
- Add mediapipe::file::IsDirectory helper
- Deprecate ImageFrame::ByteDepth
- Added files for the Image Segmenter C Tasks API
- Add general support for PathToResourceAsFile to TfLiteModelLoader
- Add CalculatorGraph::SetErrorCallback to receive errors in case of async graph use cases.
- Add JAX as requirements for MediaPipe python package
- Introduces HardwareBufferPool based on the ReusablePool and MultiPool
- Added the base classes for the LLM weight converter.
- Add stdbool import to C API
- Introduces MemoryManagerService with HardwareBufferPool and integrates it into the Tensor class.
- Added the model writer that writes to the weight binary files.
- Fix GlContext (attachments) cleanup in case of a failing GlContext initialization.
- Add option for using variable XNNPACK operators to MediaPipe XNNPACK flags
- Make InferenceCalculatorDarwinn support float and int32 as input data type.
- Adds VectorToTensorCalculator
- Enable HardwareBufferPool only if MEDIAPIPE_TENSOR_USE_AHWB is enabled
- Enables MultiPool and ReusablePool to pass on absl::Status returns originating from object factory methods.
- Adding TENSOR to InferenceCalculatorCpu to remove vector encumbrances
- Update Clang to version 16
- Add ability to preserve output format to GlScalerCalculator
- Add explicit depedency on XNNPACK & cpuinfo
- Update TensorFlow and Android NDK dependency
- Adds MEDIAPIPE_ANDROID_LINK_NATIVE_WINDOW condition to hardware_buffer_android
- Support interpolate flags in image_to_tensor_converter_opencv.
- Add MODEL_VIEW side input to tflite_model_calculator
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to other platforms.
Android
- Added HolisticLandmarker
- Migrate TextGenerator Java API to C Wrapper
- Add LlmTaskRunner to TextGenerator sources
- Don't cache the JNI environment for async calls
- Simplified api interface
- Updates to LLM JNI Layer
- Handle model loading on Android
- Support custom cache dir.
- Pass cacheDir to LLM engine from Java API
- Add "done" field to the Java LLM API
- Removed backend option from JNI layer
iOS
- Updated supported pixel formats in iOS image classifier Documentation
- Removed support for CVPixelBuffer of type 32RGBA
- Added support for creating CVPixelBuffer from C++ Images to iOS MPPImage Utils
- Updated implementation of MPPImage Utils to reduce lines of code
- Added iOS interactive segmenter options, helpers, implmentation and basic tests
- Added iOS Image Embedder API
- Enabled stream mode on iOS pose landmarker
- Fixed issue with iOS Language Detector Prediction Count
- Added iOS language detector to Cocoapods build
- Added cosine similarity method to iOS MPPImagEmbedder
- Updated method signature of MPPImageEmbedderResult initializer
- Added packet validation in MPPImageEmbedderResult+Helpers
- Added tests for creating MPPImage with source type UIImage from C++ Image
- Renamed methods in MPPImageUtilsTests
- Added a new class for iOS Interctive Segmenter Results
- Added iOS interactive segmenter to cocoapods build
- Added method for initializing MPPImages of all source types from MPPImage+TestUtils
- Added support for creating MPImages of sample buffer source type from C++ Images in MPPImage+Utils
- Added provision to initialize MPImages of source type sample buffer to MPImage+TestUtils
- Added tests for initialization of MPImages with source type sample buffer from MPImage+Utils
- Updated documentation of iOS hand landmarker, image embedder, image segmenter, interactive segmenter, object detector
- Added provision to create graph config from task options that use any proto for iOS tasks
- Added new methods to MPPTaskOptionsProtocol.h
- Updated iOS task runner to initialize tasks using MPPTaskInfo
- Updated iOS text task runner to initialize tasks from task Info.
- Updated iOS vision task runner to use new methods from MPPTaskRunner
- Updated MPPTaskOptionsProtocol
- Updated MPPTaskRunner initializer
- Fixed iOS framework conflicts with TensorFlowLiteC and OpenCV CocoaPods
- Fixed issue in installing iOS tasks text and vision libraries in a single project
Javascript
- Extend verifyGraph to be compatible with proto3.
- Add Holistic Landmarker Web API
- Add export declarations to PoseLandmakerResult
- TypeScript: adding VideoFrame typings support to video input
- Guard WaitOnGpu with extra OpenGL checks.
- Explicitly cast at callsite of WebGL context creation to avoid compilation errors with newer Emscripten versions.
Python
- Added Holistic Landmarker Python API
- Support both proto2 and proto3 in task subgraph options configuration, and revised the Holistic Landmarker API's implementation
- Update holistic_landmarker.py
- Documented HolisticLandmarker
- Fixing delegate passing argument in BaseOptions
- Add model_ckpt_util to Python Build script
- Use pybind_library for GenAI Converter build
MediaPipe Dependencies
- Expose MediaPipe's ABSL and Sentencepiece as shared dependencies
- Remove Sentencepiece's LOG function
- Removed unwanted headers from opencv_ios_xcfraemworl_files.bzl
- Update WASM files for 0.10.10 release
- Upgrade TypeScript to 5.3.3
MediaPipe v0.10.9
Build changes
- Add libtext and libvision build rules
- Add lib targets for all C vision tasks
Framework and core calculator improvements
- Added files for the Image Embedder C API and tests
- Pass Model Asset Buffer as byte array + length
- Drop default arguments in C API
- Updated the Image Embedder C API and added tests for cosine similarity
- Drop default arguments in Image Embedder C API
- Remove const from input types of C API
- Resolved issues and added a common header to hold all the necessary structures for the vision tasks
- Refactor OpenCV path out of TensorsToSegmentationCalculator main file.
- Add some convenience getters to EglManager.
- Added files for the Object Detector C Tasks API
- Explicitly delete some copy operations to improve compile errors.
- Extract CPU conversion methods into a separate library & add test
- Updated components and their tests in the C Tasks API
- Ensure that releaseGl() is called if prepapreGl throws
- Adding a GpuTestWithParamBase test class to support value parameterized tests
- Added Gesture Recognizer C API and tests
- Holistic Landmarker C++ Graph
- Revised Gesture Recognizer API implementation and associated tests
- Added FreeMemory test for GestureRecognizerResult
- Refactor GestureRecognizerResult conversion for default initialization
- Move LanguageDetectorResult converter to LanguageDetector task
- Add TensorsToSegmentationCalculator test utilities.
- Added Hand Landmarker C Tasks API and tests
- Export java package for hand_roi_refinement_graph_options.
- Fix naming in different files
- Create an explicit GlRuntimeException class
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Create shared utilities to construct category lists
- Create shared utilities to construct landmark lists
- Add the result class for the HolisticLandmarker Java API
- HolisticLandmarker Java API
- Add dependency on hand_roi_refinement_graph_options_proto
- Use Java Proto Lite Target for Hand ROI Refinement proto
- Move hand_roi_refinement_graph_options_java_proto_lite to vision lib
iOS
- Added iOS MPPPoseLandmarker.mm
- Added null check for segmentation masks in pose landmarker helper initializer
- Added pose landmarker protobuf utils
- Fixed graph name in iOS language detector
- Added iOS language detector tests
- Added iOS Objective C Pose Landmarker Tests
- Added iOS interactive segmenter options
- Added iOS region of interest
- Added iOS region of interest helpers
- Updated iOS vision/core to add methods for processing region of interest
- Added iOS interactive segmenter header
Javascript
- Creates GpuBuffers around pre-allocated AHardware_Buffer objects.
- Add drawConfidenceMask() to our public API
- Use gl.LINEAR interpolation for confidence masks
- Add missing export declarations to DrawingUtils
Python
- Example updated for mp.Image in documentation
- Added image classifier benchmark
- Updated copyright
- Documented the return value and added percentile to argparser
- Allowed a default value for the model argument
- Added more benchmark scripts for the Tasks Python API
- Code cleanup and revised benchmarking API
- Removed unused param
Model Maker changes
- Add option to omit the checkpoint callback in text classifier.
- Add BinaryAUC metric and Best Checkpoint callback to Text Classifier
- Remove batch dimension from the output of tflite_with_tokenizer in text classifier.
MediaPipe v0.10.8
Build changes
- Allow Python to be build on Mac with GPU support
Bazel changes
- Adds an empty skeleton project for iOS docgen.
- Remove pinned versions from deps
- Added files for the Language Detector C API and tests
- Add OnCameraBoundListener and support for landscape orientation to CameraXPreviewHelper
- Removed language_detection_result and moved the necessary containers to language_detector.h
- Detection postprocessing support quantized tensor.
- Adding vector versions of input calls to TS GraphRunner API
- Introduce AlignHandToPoseInWorldCalculator
- Add check to avoid doing illegal memory access from an invalid iterator from std::prev()
- GPU_ORIGIN configurable through base options proto.
- Introduce FixGraphBackEdges utils function.
- Migrate ParseTagAndName to use absl::string_view
- Plumb an optional default Executor and set of input side packets
- Add implementation and tests for Image Classifier C API
- Allow GPU Origin Proto to be build by Maven
- Add a field to GPUBuffer C struct so FFIGen can handle it
- Add scaling support to surface view renderer.
- Remove objc_library from Python build path for Mac GPU build
- Fix internal incensistency in parsing code
- Add CPU tests for TensorsToSegmentationCalculator
- Speed up Python build by only building binary graph
- Don't drop status message in ConvertFromImageFrame
- Use designated initializers for TensorsToSegmentationCalculator tests.
- Adding two new immutable texture GpuBufferFormat types
- TensorsToDetectionsCalculator supports multi clasees for a bbox.
- Move filtering logic of score to ConvertToDetection.
- Add video and live stream processing and tests for Image Classifier C API
- Upgrade to use Gradle 8.4
- Add AT_FIRST_TICK processing to SidePacketToStreamCalculator.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Add GPU Origin proto to Java Tasks Library
iOS
- Added iOS Pose Landmarker result, options and helpers
- Added iOS language detector options, results, and helpers
- Added property to get labels from iOS Image Segmenter
- Added a test for getting labels from iOS image segmenter
- Updated iOS Image Segmenter documentation to use Swift names
- Added pose landmarker result helpers
- Added iOS pose landmarks connections
- Added iOS pose landmarker header
- Updated documentation
- Added language detector result helpers
- Added iOS language detector implementation
- Fixed extra condition check in iOS Image Segmenter Result Helper
- Added iOS Image Segmenter to CocoaPods build
- Fixed deletion of iOS output MPImage buffer in MPImage Utils
- Added GPU support
Javascript
- Add drawCategoryMask() to our public API
- Creates GpuBuffers around pre-allocated AHardware_Buffer objects.
- Allow OffscreenCanvas to be used by DrawingUtils
Python
- Added files for Face Stylizer Unit Tests
- Allow Mac to use GPU Delegate
- Use mp.ImageFormat instead of just ImageFormat
- Support 3-channel RGB images for Mac Python
- Added GPU support on Mac and Linux
Dependency changes
- Update WASM files for 0.10.8 relese
MediaPipe v0.10.7
Framework and core calculator improvements
- Fix win32 build break in mediapipe.
- Remove 'awaiting' labels when user issue/PR updated.
- Fix glScalerCalculator not clearing background in FIT mode
- Add cc_binary target for C Libraries
- Only recreate immutable texture when necessary for Android TensorsToSegmentationCalculator.
- Update PackMediaSequenceCalculator to support index feature inputs on the CLIP_MEDIA_ input tag.
- Added concatenate stream, get_vector_item stream, landmarks_to_tensor stream, tensor_to_joints stream utility function.
- Introduce TensorToJointsCalculator and LandmarksTransformationCalculator
- smoothing stream utility function.
- Don't convert nullptr to std::string in C layer
- Fix memory access issue in C layer
- segmentation smoothing stream utility function.
- Populate the classification result output param instead of a copy
- Add tests for C API containers
- Add unit tests for C layer for the input types of Text Classifier
- Add End to End test for Text Classifier C API
- Add error handling to C API
- Added files for the TextEmbedder C API and tests
- See memory of freed result to nullptr
- Smooth pose landmarks
- GlSurfaceViewRenderer: Capture graph output texture
- Prefix status macro implementation with MP_.
- Introduce CombineJointsCalculator and SetJointsVisibilityCalculator
- Add stream API presence utils.
- Fixed some issues with documentation
- Add stream API merge utils.
- Update glog to latest commit
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Do not convert milliseconds to microseconds twice
- Fix bug missing SHOW_RESULT in image generator
- Fix depth condition bug when only depth condition is configured.
iOS
- Added iOS face stylizer result, options and header
- Added iOS MPPFileInfo for tests
- Added new initializers for iOS MPPImage in test utils
- Added iOS MPPMask test utils
- Added iOS image segmenter basic Objective C tests
- Updated multiply function in iOS Image Segmenter tests to use C++ vectors
- Fixed premature deallocation of C++ masks in iOS Image Segmenter
- Updated interface of iOS image segmenter
- Added selfie segmentation and running mode tests to image segmenter
- Uncommented live stream test in iOS image segmenter tests
- Updated iOS Face Detector Objective C API names
- Updated iOS Face Landmarker,hand landmarker,Object Detector Objective C API names
- Added iOS Image Segmenter tests for methods with completion handlers
- Added methods to create iOS
MPImage
with source typeUIImage
from a C++ image. - Changed de-allocation method in data provider release callback
- Fixed error messages
- Updated error messages in MPPImage Utils
Javascript
- Add helper to create Connection array
- Add export declaration for FaceDetector
- Add export declaration to FaceDetector.detect()
- Do not use full filename when FileLocator decides which asset to load
Bug fixes
Fixed Pose Landmarker jittering issue
Model Maker changes
Add export_model_with_tokenizer to Text Classifier API.
MediaPipe Dependencies
Update WASM files for 0.10.7 release
MediaPipe v0.10.5
Framework and core calculator improvements
- Fix crash in SavePngTestOutput
- Log stack traces for combined CalculatorGraph statuses
- Add a GpuOrigin parameter to TensorConverterCalculator
- Replace some size EXPECTs by ASSERTs
- Add a support for label annotations (image/label/string and image/label/confidence). Also fixed some clang tidy issues.
- Set confidence score of the bounding box label.
- Add setGpuBufferVerticalFlip to GraphRunner TS API
- Remove unsafe cast.
- apply affine transform before drawing, in order to keep constant line width regardless of face cropping.
- Migrate packet messages auto registration to rely on MEDIAPIPE_STATIC_REGISTRATOR_TEMPLATE
- add end loop calculator for image size
- Provide a way to disable static registration using MEDIAPIPE_DISABLE_STATIC_REGISTRATION
- Header for callback_packet_calculator to allow dynamic registration for superusers
- Support more GPU formats in tensor converter calculator.
- Expose stream handlers in headers to allow dynamic registration for superusers
- Expose tool calculators in headers to enable dynamic registration by superusers.
- Dry-Run mode for static registration to make it easier to find all required static registrations
- Fix MediaPipe build in Chromium.
- Swap left and right hand labels.
- Don't access "document" in WebWorker
- Update PackMediaSequenceCalculator to support adding clip/media/id to the MediaSequence.
- update pose rendering
- Update the header information for EnsureMinimumDefaultExecutorStackSize.
- Move stream API loopback to third_party.
- Add pose landmarks constants
- Add an API in model_task_graph to create or use cached model resources.
- Move stream API image_size to third_party.
- Add C++ converters for C Text Classifier API
- Move stream API rect_transformation to third_party.
- Change the image label input from Classification to Detection.
- Update port includes with IWYU to fix clang warnings in code where corresponding ports are used.
- New image test utilities and memory management fixes.
- Add a custom op resolver for fused batch norm.
- Improving throttling logs by providing a node info corresponding to a throttling stream.
- Use ABSL_LOG in MediaPipe.
- Remove reference pointer to prevent using a constant reference in the looped iteration variable
- Remove unnecessary includes in threadpool_std_thread_impl.cc.
- Make cache writes optional in InferenceCalculatorAdvancedGL
- Update PackMediaSequenceCalculator to support setting clip/media/string, clip/media/confidence and clip/label/index.
- Some spelling and grammar fixes in the comments.
- Add notes/warnings for calculators which use dedicated GL contexts.
- Remove video and stream model in face stylizer.
- Move stream API landmarks_projection to third_party.
- Remove video and streaming mode for face stylizer.
- landmarks_to_detection stream utility function.
- Ensure that C header don't import C++ types
- Splitting GraphRunner into public API declared interfaces and private TS impls
- Add option for nearest neighbor interpolation.
- Fixes two issues with file handling on windows:
- Remove uncoditional texture params reset to make float textures handled correctly.
- fixes the non-unicode path of file_helpers on windows
- Modifying tensor_to_vector_float_calculator to take in D_BFLOAT16 values
- Don't define field in ExternalFileHandler that's not used on Windows.
- Clean up TensorConverterCalculator flipping behavior
- Fix win32 build break in mediapipe.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Adds option to use tensor_ahwb in Android vendor processes
- Add output size as parameters in Java ImageSegmenter
- Change SegmentationOptions.builder() to be public
- ImageGenerator Java API
- Provide API/options to show intermediate results and generating progress for Java Image Generator.
- Set enableFlowLimiting to false since only Image model is supported for face stylizer.
- Move loading tasks-vision-jni to individual vision task class
iOS
- Added refactored iOS vision task runner sources
- Removed convenience initializer from refactored MPPVisionTaskRunner
- Updated iOS docs to use swift names in place of objective c names
- Added gesture recognizer and hand landmarker to iOS vision framework
- Fixed directory creation issues in build_ios_framework.sh
- Changed delegate method to optional
- Added iOS image segmenter implementation file
- Updated image segmenter bazel target to add MPPImageSegmenter.mm
- Renamed option in MPPImageSegmenterOptions
- Updated iOS face detector to use refactored vision task runner
- Updated iOS image classifier to use refactored vision task runner
- Changed order of methods in MPPImageSegmenter.mm
- Fixed method call in MPPImageSegmenter.mm
- Updated face landmarker, gesture recognizer,hand landmarker,object detector to use refactored vision task runner
- Replaced the old iOS vision task runner with the refactored task runner
- Updated iOS gesture recognizer documentation to use Swift names
- Updated iOS hand landmarker documentation to use swift names
- Moved iOS MPPHandLandmark enum to MPPHandLandmarker.h
- Fixes iOS hand landmarker connections
Javascript
- vlog default executor and its config usage
- Updates the runners to support wasm-style binary assets files, and allows their URLs to be explicitly specified as part of the WasmFileset.
- Add 'types' to package.json
- Add externs to js_library targets
- Add API exports for MPMask and MPImage
- Add Handedness to JS, C++ and Android API
- Fix missing exports for FilesetResolver and static constants
- Add exports to ImageSegmenterResult and InteractiveSegmenterResult
Python
- Set the default running model to Image for face stylizer.
Bug fixes
- Internal fixes
Model Maker changes
-
Add tensorflow-addons to model_maker requirements.txt
-
Change to add the w_avg latent code to style encoding before layer swapping. This is a bug in the previous code. Also set training=True for encoder since this affect the encoding performance.
-
add metadata writer into face stylizer.
-
Refactor text_classifier preprocessor to move away from using classifier_data_lib
-
Import image_util for using it in mediapipe face stylizer open sourcing.
-
Fix image_util shortcut import line
-
Change supported_ops to a Tuple instead of List to match the API definition.
-
Add a new from_image API to create face stylizer dataset from a single image. Also deprecate the from_folder API since we only support one-shot use case now.
-
Add an API to run inference with face stylizer TF model.
-
Check if the image contains valid face that can be aligned for stylization. If not, throw an exception for invalid input image. This is applied to both input stylized face and raw face.
-
Add allow_custom_ops to model_util.convert_to_tflite and enable custom ops for face stylizer.
-
MediaPipe Dependencies
-
Update WASM files for 10.5 release