Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restorer: allocate a stack guard page #842

Open
wants to merge 288 commits into
base: criu-dev
Choose a base branch
from

Conversation

avagin
Copy link
Member

@avagin avagin commented Nov 10, 2019

An accidental overflow could corrupt the area located below a stack,
so let's map an extra page there without read-write permissions.

rst0git and others added 30 commits May 16, 2019 03:26
The function `setup_TCP_server_socket` (defined in img-remote.c) and
`setup_tcp_server` (defined in util.c) have very similar functionality.

Replace setup_TCP_server_socket() with setup_tcp_server() to reduce
code duplication and to enable IPv6 support for the image-cache action
of CRIU.

We set SO_REUSEADDR flag to allow reuse of local addresses.

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Remove setup_TCP_client_socket() and use setup_tcp_server() instead as
both functions have very similar functionality.

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Check only once if (sockfd < 0)

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
When CRIU calls the ip tool on restore, it passes the fd of remote
socket by replacing the STDIN before execvp. The stdin is used by the
ip tool to receive input. However, the ip tool calls ftell(stdin)
which fails with "Illegal seek" since UNIX sockets do not support file
positioning operations. To resolve this issue, read the received
content from the UNIX socket and store it into temporary file, then
replace STDIN with the fd of this tmp file.

 # python test/zdtm.py run -t zdtm/static/env00 --remote -f ns
 === Run 1/1 ================ zdtm/static/env00

 ========================= Run zdtm/static/env00 in ns ==========================
 Start test
 ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
 Adding image cache
 Adding image proxy
 Run criu dump
 Run criu restore
 =[log]=> dump/zdtm/static/env00/31/1/restore.log
 ------------------------ grep Error ------------------------
 RTNETLINK answers: File exists
 (00.229895)      1: do_open_remote_image RDONLY path=route-9.img snapshot_id=dump/zdtm/static/env00/31/1
 (00.230316)      1: 	Running ip route restore
 Failed to restore: ftell: Illegal seek
 (00.232757)      1: Error (criu/util.c:712): exited, status=255
 (00.232777)      1: Error (criu/net.c:1479): IP tool failed on route restore
 (00.232803)      1: Error (criu/net.c:2153): Can't create net_ns
 (00.255091) Error (criu/cr-restore.c:1177): 105 killed by signal 9: Killed
 (00.255307) Error (criu/mount.c:2960): mnt: Can't remove the directory /tmp/.criu.mntns.dTd7ak: No such file or directory
 (00.255339) Error (criu/cr-restore.c:2119): Restoring FAILED.
 ------------------------ ERROR OVER ------------------------
 ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
 ##################################### FAIL #####################################

Fixes checkpoint-restore#311

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
In Py2 `range` returns a list and `xrange` creates a sequence object
that evaluates lazily. In Py3 `range` is equivalent to `xrange` in Py2.

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Tikhomirov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Tikhomirov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Most of the typos were found by codespell.

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
This patch does not introduce any functional changes.

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
The function accept_proxy_to_cache() is a wrapper around accept().
Use accept() directly instead.

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Reduce code duplication by introducing a strflags() macro which maps
a flag to corresponding string.

Signed-off-by: Radostin Stoyanov <[email protected]>
This patch makes various private variables and procedures static.

These changes conform to the following code style conventions:

 When declaring pointer data or a function that returns a pointer type,
 the preferred use of ``*`` is adjacent to the data name or function
 name and not adjacent to the type name.

 Statements longer than 80 columns will be broken into sensible chunks,
 unless exceeding 80 columns significantly increases readability and
 does not hide information. Descendants are always substantially
 shorter than the parent and are placed substantially to the right.
 The same applies to function headers with a long argument list.

from https://www.kernel.org/doc/Documentation/process/coding-style.rst

The function declarations {send,recv}_image() from img-remote.h are
removed because they do not have a corresponding implementation.

The following functions are made static because they are not used
outside img-remote.c:
* {send,recv}_image_async()
* {read,write}_remote_header()
* socket_set_non_blocking()

Signed-off-by: Radostin Stoyanov <[email protected]>
There is no need to print an error message, __xalloc() does that.

Signed-off-by: Radostin Stoyanov <[email protected]>
When an interrupt signal (even SIGWINCH when strace is used) is
received while epoll_wait() sleeps, it will return a value of
-1 and set errno to EINTR, which is not an error and should be
ignored.

Signed-off-by: Radostin Stoyanov <[email protected]>
There is no need to pass the values for address and port as arguments
when creating a TCP server. The external `opts` object, which provides
opts.addr and opts.port, is accessible in all components that require
these values.

With this change, a value specified with the `--address` option will
used by image-cache in the same way as with page-server.

Example:
criu image-cache --address 127.0.0.1 --port 1234
criu page-server --address 127.0.0.1 --port 1234

Signed-off-by: Radostin Stoyanov <[email protected]>
The variable name 'remote_sk' is shorter than 'proxy_to_cache_fd' and
it is more similar to 'page_server_sk' (used in criu/page-xfer.c).

Signed-off-by: Radostin Stoyanov <[email protected]>
The name 'local_sk' is shorter than 'local_req_fd', and it is more
similar to the name 'page_server_sk' used in criu/page-xfer.c

Signed-off-by: Radostin Stoyanov <[email protected]>
When saddr.ss_family is AF_INET6 we should cast &saddr to
(struct sockaddr_in6 *).

Signed-off-by: Radostin Stoyanov <[email protected]>
Combine the functionality of socket_set_non_blocking() and
socket_set_blocking() into a new function, and move it in
criu/util.c to enable reusability throughout the code base.

Signed-off-by: Radostin Stoyanov <[email protected]>
Combine the macro constants DUMP_FINISH and RESTORE_FINISH, into
a single one, called FINISH.

In addition, replace the key-word strings used by the above-mentioned
constants, and NULL_SNAPSHOT_ID, with a \0 character that will be used
to indicate a "finish" message.

Signed-off-by: Radostin Stoyanov <[email protected]>
Sort and remove unused/unnecessary include statements in criu/img-*.c

Signed-off-by: Radostin Stoyanov <[email protected]>
The print_data() function was part of the deprecated (and removed)
'show' action, and it was moved in util.c with the following commit:

	a501b48
	The 'show' action has been deprecated since 1.6, let's finally drop it.

	The print_data() routine is kept for yet another (to be deprecated too)
	feature called 'criu exec'.

The criu exec feature was removed with:

	909590a
	Remove criu exec code

	It's now obsoleted by compel library.
	Maybe-TODO: Add compel tool exec action?

Therefore, now we can drop print_data() as well.

Signed-off-by: Radostin Stoyanov <[email protected]>
class ctypes.c_char_p
    Represents the C char * datatype when it points to a zero-
    terminated string. For a general character pointer that may
    also point to binary data, POINTER(c_char) must be used.
    The constructor accepts an integer address, or a bytes object.

https://docs.python.org/3/library/ctypes.html#ctypes.c_char_p

Signed-off-by: Radostin Stoyanov <[email protected]>
There are a few places where spaces have been used instead of tabs for
indentation. This patch converts the spaces to tabs for consistency
with the rest of the code base.

Signed-off-by: Radostin Stoyanov <[email protected]>
nviennot and others added 15 commits October 7, 2019 20:00
The lock status string may be empty. This can happen when the owner of
the lock is invisible from our PID namespace. This unfortunate behavior
is fixed in kernels v4.19 and up (see commit 1cf8e5de40)

Signed-off-by: Nicolas Viennot <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
When performing pre-dump we continuously increase the page-pipe size to
fit the max amount memory pages in the pipe's buffer. However, we never
actually set the pipe's buffer size to max. By doing so, we can reduce
the number of pipe-s necessary for pre-dump and improve the performance
as shown in the example below.

For example, let's consider the following process:

	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>

	void main(void)
	{
		int i = 0;
		void *cache = calloc(1, 1024 * 1024 * 1024);
		while(1) {
			printf("%d\n", i++);
			sleep(1);
		}
	}

stats-dump before this change:

	frozen_time: 123538
	memdump_time: 95344
	memwrite_time: 11980078
	pages_scanned: 262721
	pages_written: 262169
	page_pipes: 513
	page_pipe_bufs: 519

stats-dump after this change:

	frozen_time: 83287
	memdump_time: 54587
	memwrite_time: 12547466
	pages_scanned: 262721
	pages_written: 262169
	page_pipes: 257
	page_pipe_bufs: 263

Signed-off-by: Radostin Stoyanov <[email protected]>
The support for per-pid images with locks has been dropped with
commit d040219 ("locks: Drop support for per-pid images with locks")
and CR_FD_FILE_LOCKS_PID is not used.

Signed-off-by: Radostin Stoyanov <[email protected]>
RPC messages are have fairly small size and using space on the stack
might be a better option. This change follows the pattern used with
do_pb_read_one() and pb_write_one().

Signed-off-by: Radostin Stoyanov <[email protected]>
libcriu tests are currently broken. This patch fixes couple of
issues to allow the building and running libcriu tests.

1. lib/c/criu.h got updated to include version.h which is present
at "criu/include", but the command to compile libcriu tests is not
specifying "criu/include" in the path to be searched for header
files. This resulted in compilation error.
This can be fixed by adding "-I ../../../../../criu/criu/include"
however it causes more problems as "criu/include/fcntl.h" would now
hide system defined fcntl.h
Solution is to use "-iquote ../../../../../criu/criu/include"
which applies only to the quote form of include directive.

2. Secondly, libcriu.so major version got updated to 2 but
libcriu/run.sh still assumes verion 1. Instead of just updating the
version in libcriu/run.sh to 2, this patch updates the libcriu/Makefile
to use "CRIU_SO_VERSION_MAJOR" so that future changes to major version
of libcriu won't cause same problem again.

Signed-off-by: Ashutosh Mehra <[email protected]>
Updated scripts/travis/travis-tests to run libcriu test.

Signed-off-by: Ashutosh Mehra <[email protected]>
PATH is pointing to incorrect location for `criu` executable
causing libcriu tests to fail when running in travis.
Also added statements to display log file contents on failure
to help in debugging.

Signed-off-by: Ashutosh Mehra <[email protected]>
I don't see many issues with early-log, so we probably don't
need the warning when it was used. Note that after
commit 74731d9 ("zdtm: make grep_errors also grep warnings")
also warnings are grepped by zdtm.py (and I believe that was
an improvement) which prints some bothering lines:

> =[log]=> dump/zdtm/static/inotify00/38/1/dump.log
> ------------------------ grep Error ------------------------
> (00.000000) Will allow link remaps on FS
> (00.000034) Warn  (criu/log.c:203): The early log isn't empty
> ------------------------ ERROR OVER ------------------------

Instead of decreasing loglevel of the message, improve it by
reporting a real issue.

Cc: Adrian Reber <[email protected]>
Cc: Pavel Tikhomirov <[email protected]>
Cc: Radostin Stoyanov <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signal masks propagate through execve, so we need to clear them before
invoking the action scripts as it may want to handle SIGCHLD, or SIGSEGV.

Signed-off-by: Nicolas Viennot <[email protected]>
With the newly introduced aarch64 at Travis it is possible for the CRIU
test-cases to switch to aarch64.

Travis uses unprivileged LXD containers on aarch64 which blocks many of
the kernel interfaces CRIU needs. So for now this only tests building
CRIU natively on aarch64 instead of using the Docker+QEMU combination.

All tests based on Docker are not working on aarch64 is there currently
seems to be a problem with Docker on aarch64. Maybe because of the
nesting of Docker in LXD.

Signed-off-by: Adrian Reber <[email protected]>
Update test to support both iptables and nft to create conntrack rules.

Signed-off-by: Vitaly Ostrosablin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
An accidental overflow could corrupt the area located below a stack,
so let's map an extra page there without read-write permissions.

Signed-off-by: Andrei Vagin <[email protected]>
thread_args[i].mz = mz + i;
sigframe = (struct rt_sigframe *)&mz[i].rt_sigframe;
/* Allocate a guard page. */
if (mprotect(mem_zones + i * restore_mem_zone_size, page_size(), PROT_NONE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not page_size(), please, but RESTORE_STACK_REDZONE

@0x7f454c46
Copy link
Member

I guess, there are still two issues I see:

  1. bootstrap_len is a continuous block so we might fail to find this large amount for restorer - may be worth to split it on multiple vmas (though, it may be not a problem on 64-bit systems).
  2. Now the redzone is one page, but in the original failure, it was an array that has a couple of kilobytes size, which means 4Kb maybe not enough for structures used on restorer's stack.

Anyway, the patch seems good to me and is a nice start that may catch new issues.
With a minor nit, you can add my Reviewed-by: Dmitry Safonov <[email protected]> or which way is now preferred :)
(I don't have a button that would mark PR as reviewed)

@0x7f454c46
Copy link
Member

@avagin ping? :)

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet