Skip to content

MediaPipe v0.10.18

Latest
Compare
Choose a tag to compare
@dbcp1 dbcp1 released this 07 Nov 23:34

Build changes

  • Following open-sourcing webgpu with open-sourcing one of its dependencies third_party/emscripten
  • Add pillow, pyyaml, and requests to model_maker BUILD

Framework and core calculator improvements

  • Loading resources through calculator and subgraph contexts and configuring through kResourcesService.
  • Use std::make_unique
  • Moves OnDiskCacheHelper class into a separate file / compilation target
  • Pools: report buffer specs on failure, fix status propagation, fix includes
  • Open-Source MediaPipe's WebGPU helpers.
  • BatchMatul uses transpose parameter.
  • Introduce Resource to represent a generic resource (file content, embedded/in-memory resource) for reading.
  • Bump up the version number to 0.10.16
  • Migrate from AdapterProperties to AdapterInfo
  • Migrate from Resource::ReadContents to Resources::Get (using ForEachLine where required)
  • Update Resources docs to mention ForEachLine (so devs don't fallback to ReadContents in such a case)
  • Adjust WebGPU device registration
  • Fix includes/copies/checks for BuildLabelMapFromFiles
  • Migrate to BuildLabelMapFromFiles.
  • Update Python version requirements in setup.py
  • Introduce Resources with mapping, so graphs can use placeholders instead of actual resource paths.
  • Remove Resources::ReadContents & add Resource::TryReleaseAsString.
  • Fix ports for multi side outputs.
  • Update solution android apps with explicit exported attribute.
  • Ensure kResourcesService is set before CalculatorGraph is initialized (otherwise subgraphs/nodes may get the wrong default resources).
  • Switch inference tests to ResourceProviderCalculator & update builder to refer MODEL_RESOURCE.
  • Migrate modules to use ResourceProviderCalculator.
  • Support single tensor input in TensorsToImageCalculator
  • Migrate TfLiteModelLoader to use MP Resources.
  • Remove deprecated TfLiteModelLoader::LoadFromPath.
  • Fix for isIOS() platform util on worker and non-worker contexts
  • Support single tensor input in TensorsToSegmentationCalculator
  • Makes CalculatorContext::GetGraphServiceManager() private
  • BatchMatMul can handle cases where ndims != 4 and quantization
  • RmsNorm has an optional scale parameter.
  • Allowed variable audio packet size by setting num_samples to null.
  • Fix technically correct but confusing example in top level comments.
  • Removing ReturnType helper, since it's part of the standard now.
  • Update XNNPack to 9/24
  • Enable LoRA conversion support for Gemma2-2B
  • Improve warning when InferenceCalculator backends are not linked
  • Bump MediaPipe version to 0.10.17.
  • Update OpenCV to a version that compiles with C++ 17
  • Force xnnpack when CPU inference is enforced
  • Install PyBind before TensorFlow to get the MediaPipe version
  • Change MP version to 0.10.18
  • Add validation to LLM bundler, alternative takePicture method to support custom thread executor, CopySign op, const Spec() method to OutputStreamManager, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, menu for the default demo app and option to Close processor/graph and Exit gracefully, ngrammer, per layer embeddings and Relu1p5 fields to llm_params and update from Proto, a special InMemory Resources (current use case is in tests, but may be needed for some simple things as well), ResourceProviderCalculator (replacement for LocalFileContentsCalculator), Resource support into TfliteModelCalculator and a flag to set the default number of XNNPACK threads.

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

  • Initialize new members in LlmModelSettings
  • Create an implicit session for all requests to generateResponse()
  • Change session management so that all JNI calls come from the same thread.
  • Add Session API support to LLM Java API

iOS

  • Updated name of iOS audio classifier delegate
  • Fixed incorrect stream mode in iOS audio classifier options
  • Added method to ios audio task runner
  • Updated iOS audio classifier BUILD file
  • Fixed buffer length calculation in iOS MPPAudioData
  • Updated iOS audio data tests to fix issue in buffer length calculation
  • Revert "Added method for getting interleaved float32 pcm buffer from audio file"
  • Updated comments in iOS LlmInference
  • Dropped Refactored suffix for modified files in iOS genai
  • Updated documentation of LlmTaskRunner
  • Removed allocation of LlmInference Options
  • Updated the response generation queue to be serial in iOS LlmInference
  • Updated documentation of iOS LlmInference, documentation of LlmInference+Session
  • Fixed marking of response generation completed control flow in LlmInference+Session.
  • LlmInference.Options: remove unnecessary numOfSupportedLoraRanks parameter.
  • Add activation data type to LlmInference.Options.
  • Added more methods to iOS AVAudioPCMBuffer+TestUtils, few basic iOS audio classifier tests, options tests to iOS audio classifier, utils for AVAudioFile, test for score threshold to MPPAudioClassifierTests, constants in MPPAudioClassifierTests, close method to iOS audio classifier, iOS MPPAudioData test utils, stream mode tests for iOS audio classifier, iOS audio classifier to cocoapods build, audio record creation tests to MPPAudioClassifierTests, close method to MPPAudioEmbedder, iOS audio embedder tests, more utility methods to MPPAudioEmbedderTests, streams mode tests for iOS audio embedder, iOS audio embedder to cocoapods build, comments to MPPAudioClassifierTests, iOS audio embedder header and implementation, iOS audio classifier implementation file, method for getting interleaved float32 pcm buffer from audio file, refactored iOS LlmTaskRunner, iOS LlmSessionRunner, more errors to GenAiInferenceError, refactored LlmInference, iOS session runner to build files, extra safeguards for response context in LlmSessionRunner, LlmInference+Session.swift and documentation regarding session and inference life times to iOS LLM Inference.
  • Fixed issue with iOS audio embedder result parsing, iOS audio embedder options processing , index error in AVAudioFile+TestUtils, audio classifier result processing in stream mode, error handling in MPPAudioData, microphone recording issues in iOS MPPAudioRecord, documentation of iOS Audio Record, iOS audio record and audio data tests by avoiding audio engine running state checks and iOS audio embedder result helpers and bug due to simultaneous response generation calls across sessions.
  • Updated method signatures in iOS audio classifier tests
  • Fixed flow limiting in iOS audio classifier
  • Removed duplicate test from MPPAudioClassifierTests
  • Updated comments in AVAudioFile+TestUtils
  • Changed the name of iOS audio classifier async test helper
  • Update comment for LlmInference.Session.clone() method.
  • Marked inits unavailable in MPPFloatBuffer
  • Updated documentation of iOS audio record
  • Adds a LlmInference.Metrics for providing some key performance metrics ( initialization time, response generation time) of the LLM inference.
  • Removed unwanted imports from iOS audio data tests
  • Cleaned ios audio test utils BUILD file
  • Remove the activation data type from the Swift API. We don't expect users to set it directly.
  • Use seconds instead of milliseconds for latency metrics.

Javascript

  • Add comments to generateResponses method.
  • Migrate to ForEachLine to have a single source of truth for getting file contents lines.
  • Workaround for multi-output web LLM issue where last response can get corrupted when numResponses is odd.
  • Quick fix for wrong number of multi-outputs sometimes when streaming

Python

  • Add a flag in the converter config for generating fake weights. When it is set to true, all weights will be filled with zeros.
  • Update text embedder test to match the output after XNNPack upgrade.
  • Update remaining data in text embedder test to match the output after XNNPack upgrade.
  • Update the expected value of the text embedder test.
  • Add python pip deps to WORKSPACE
  • Fix pip_deps targets.

Model Maker changes

  • Undo dynamic sequence length for export_model api because it doesn't work with MediaPipe.
  • Replace mock with unittest.mock in model_maker tests.
  • Move tensorflow lite python calls to ai-edge-litert.

MediaPipe Dependencies

  • Update WASM files