OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
-
Updated
Jul 23, 2024 - Python
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
Aircraft design optimization made fast through modern automatic differentiation. Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Matlab implementation to simulate the non-linear dynamics of a fixed-wing unmanned areal glider. Includes tools to calculate aerodynamic coefficients using a vortex lattice method implementation, and to extract longitudinal and lateral linear systems around the trimmed gliding state.
Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
Awesome LLM-related papers and repos on very comprehensive topics.
Seamlessly integrate state-of-the-art transformer models into robotics stacks
Famous Vision Language Models and Their Architectures
Vortex lattice method for inviscid lifting-surface aerodynamics
Phi-3 for Mac: Locally-run Vision and Language Models for Apple Silicon
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.
To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."