Skip to content

Commit

Permalink
implement a full screen memcpy optimization.
Browse files Browse the repository at this point in the history
publish wincam 0.0.3
  • Loading branch information
lovettchris committed Jul 16, 2024
1 parent bdf9571 commit e02d76f
Show file tree
Hide file tree
Showing 5 changed files with 68 additions and 39 deletions.
7 changes: 6 additions & 1 deletion build.cmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
call src/build.cmd
pushd %~dp0\src
call build.cmd
if ERRORLEVEL 1 goto :eof
popd
if ERRORLEVEL 1 goto :eof
python -m build --outdir dist
if ERRORLEVEL 1 goto :eof
python -m twine upload --repository pypi dist/*
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
# project metadata
[project]
name = "wincam"
version = "0.0.2"
version = "0.0.3"
authors = [
{name = "lovettchris"}
]
Expand Down
47 changes: 29 additions & 18 deletions src/ScreenCapture/SimpleCapture.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -371,32 +371,43 @@ void SimpleCapture::ReadPixels(ID3D11Texture2D* acquiredDesktopImage) {
}

if (m_buffer) {
// Copy the cropped image out of the full monitor frame.
char* ptr = m_buffer;
char* src = reinterpret_cast<char*>(resource.pData);
int x = m_bounds.left;
int y = m_bounds.top;
int w = (m_bounds.right - m_bounds.left);
int h = m_bounds.bottom - m_bounds.top;
auto expectedSize = w * h * CHANNELS;
if (expectedSize != m_size) {
printf("buffer too small\n");
return;
}

int actualHeight = (int)desc.Height;
int xBytes = x * CHANNELS;
int targetRowBytes = w * CHANNELS;
int srcRowBytes = desc.Width * CHANNELS;
if (xBytes + targetRowBytes > srcRowBytes) {
targetRowBytes = srcRowBytes - xBytes;
}
src += (lBmpRowPitch * y); // skip to top row.
for (int row = y; row < actualHeight && row < y + h; row++){
::memcpy(ptr, src + xBytes, targetRowBytes);
src += lBmpRowPitch;
ptr += targetRowBytes;
// Handle full screen with a single memcpy, technically it can
// handle any height so long as it is full width and starting at top left.
if (x == 0 && y == 0 && w == desc.Width)
{
::memcpy(ptr, src, min(m_size, captureSize));
}
else
{
// Copy the cropped image out of the full monitor frame.
auto expectedSize = w * h * CHANNELS;
if (expectedSize != m_size) {
printf("buffer too small\n");
return;
}

int actualHeight = (int)desc.Height;
int xBytes = x * CHANNELS;
int targetRowBytes = w * CHANNELS;
int srcRowBytes = desc.Width * CHANNELS;
if (xBytes + targetRowBytes > srcRowBytes) {
targetRowBytes = srcRowBytes - xBytes;
}
src += (lBmpRowPitch * y); // skip to top row.
for (int row = y; row < actualHeight && row < y + h; row++) {
::memcpy(ptr, src + xBytes, targetRowBytes);
src += lBmpRowPitch;
ptr += targetRowBytes;
}
}

SetEvent(m_event);
}

Expand Down
3 changes: 0 additions & 3 deletions src/build.cmd
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
@echo off

cd ~dp0

msbuild /target:restore /p:Configuration=Release "/p:Platform=x64" ScreenCapture.sln
if ERRORLEVEL 1 goto :eof

msbuild /target:rebuild /p:Configuration=Release "/p:Platform=x64" ScreenCapture.sln

48 changes: 32 additions & 16 deletions wincam.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Metadata-Version: 2.1
Name: wincam
Version: 0.0.2
Version: 0.0.3
Summary: A Python high-performance screenshot library for Windows 10+ using Direct3D11CaptureFramePool.
Author: lovettchris
Project-URL: Homepage, https://github.com/lovettchris/wincam
Expand Down Expand Up @@ -46,30 +46,37 @@ with DXCamera(x, y, w, h, fps=30) as camera:
frame, timestamp = camera.get_bgr_frame()
```

See [Demo Video](https://youtu.be/og7-3b0bsuo)

## Introduction

When you need to capture video frames fast to get a nice smooth 30 or 60 fps video
this library will do it, so long as you are on Windows 10.0.19041.0 or newer.
When you need to capture video frames in a low latency (< 5 milliseconds into a numpy array)
to get a nice smooth 30 or 60 fps video this library can do it.

This is using a new Windows 10 API called
[Direct3D11CaptureFramePool](https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture.direct3d11captureframepool?view=winrt-26100)
which requires DirectX 11 and a GPU.

To get the fastest time possible, this library is implemented in C++ and the C++ library copies each frame directly into
a buffer provided by the python code. This C++ library is loaded into your python process. Only one instance of DXCamera
can be used per python process.

This is using a new Windows 10 API called [Direct3D11CaptureFramePool](https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture.direct3d11captureframepool?view=winrt-26100) which requires DirectX 11 and a GPU.
## Prerequisites

To get the fastest time possible, this library is implemented in C++ and the
C++ library copies each frame directly into a buffer provided by the python code.
This C++ library is loaded into your python process.
Only one instance of DXCamera can be used per python process.
Requires `Python 3.10` and `Windows 10.0.19041.0` or newer on an `x64` platform.

## Installation

```
pip install wincam
```

Note: OpenCV is required for color space conversion.
See [https://pypi.org/project/wincam/](https://pypi.org/project/wincam/).

## Multiple Monitors

This supports multiple monitors. Windows can define negative X, and Y locations when a monitor is to the left or above
the primary monitor. The `DXCamera` will find and capture the appropriate monitor from the `x, y` locations you provide
This supports multiple monitors. Windows can define negative X, and Y locations when a monitor is to the left or above
the primary monitor. The `DXCamera` will find and capture the appropriate monitor from the `x, y` locations you provide
and it will crop the image to the bounds you provide.

The `DXCamera` does not support regions that span more than one monitor and it will report an error if you try.
Expand All @@ -79,7 +86,7 @@ The `DXCamera` does not support regions that span more than one monitor and it w
The following example scripts are provided in this repo.

- [examples/mirror.py](examples/mirror.py) - shows the captured frames in real time so you can see how it is performing
on your machine. Have some fun with infinite frames with frames! Press ESCAPE to close the window.
on your machine. Have some fun with infinite frames with frames! Press ESCAPE to close the window.

- [examples/video.py](examples/video.py) - records an .mp4 video to disk.

Expand All @@ -93,19 +100,28 @@ In each example can specify what to record using:
## Performance

Each call to `camera.get_bgr_frame()` can be as fast as 1 millisecond because the C++ code is asynchronously writing to
the buffer provided. This way your python code is not blocking waiting for that frame. For this reason it is crucial
the buffer provided. This way your python code is not blocking waiting for that frame. For this reason it is crucial
that you use `DXCamera` in a `with` block as shown above since this ensures the life time of the python buffer used by
the C++ code is managed correctly. If you cannot use a `with` block for some reason then you must call the `stop()`
the C++ code is managed correctly. If you cannot use a `with` block for some reason then you must call the `stop()`
method.

In order to hit a smooth target frame rate while recording video the `DXCamera` takes a target fps as input, which
defaults to 30 frames per second. The calls to `camera.get_bgr_frame()` will self regulate with an accurate sleep
to hit that target as closely as possible so that the frames you collect form a nice smooth video as shown in the
[video.py example](examples/video.py).

Note that this sleep is more accurate that python `time.sleep()` which on Windows is very inaccurate with a
tolerance of +/- 15 milliseconds!
Note, there is no point providing an `fps` target greater than the windows monitor refresh rate. You can find this
refresh rate on your Display Settings `Advanced settings` tab. If you go higher than this rate you will only get
duplicate frames since the underlying `Direct3D11CaptureFramePool` is only getting new frames at the refresh rate. This
is normally 60fps, unless you have a fancy new GPU and monitor. On a remote desktop this refresh rate can be lower,
like 30 fps.

Note that this sleep is more accurate that python `time.sleep()` which on Windows is very inaccurate with a
tolerance of +/- 15 milliseconds. But this more accurate sleep is using a spin wait which uses one core of your CPU.

## Credits

This project was inspired by [dxcam](https://github.com/ra1nty/DXcam) and the
[C++ Win32CaptureSample](https://github.com/robmikh/win32capturesample) and was made possible with lots of help from
Windows team members Robert Mikhayelyan and Shawn Hargreaves.

0 comments on commit e02d76f

Please sign in to comment.