Skip to content

Commit

Permalink
Use 4k page instead of 2M for managed tensor (pytorch#2058)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: pytorch#2058

This diff changes the page size from 2M to 4k for prefaulting/mapping the pages.

Reviewed By: q10, jasonjk-park, zyan0, jianyuh

Differential Revision: D49924136

fbshipit-source-id: fdee08b9a4da54dce902c98ee3aae62ac0d3ad6c
  • Loading branch information
banitag1 authored and facebook-github-bot committed Oct 4, 2023
1 parent 7b7ad61 commit ea18a68
Showing 1 changed file with 3 additions and 5 deletions.
8 changes: 3 additions & 5 deletions fbgemm_gpu/src/cumem_utils.cu
Original file line number Diff line number Diff line change
Expand Up @@ -224,11 +224,9 @@ Tensor new_host_mapped_tensor(
// can minimize the cost while holding this global lock.
void* const ptr = malloc(size_bytes);

// advise the kernel to allocate large 2M pages
madvise(ptr, size_bytes, MADV_HUGEPAGE);

// pre-fault/map the pages by setting the first byte of the page
size_t pageSize = (1 << 21);
// Pre-fault/map the pages by setting the first byte of the page
// TODO: parallelize the mapping of pages with a threadpool executor
const size_t pageSize = (size_t)sysconf(_SC_PAGESIZE);
uintptr_t alignedPtr = (((uintptr_t)ptr + pageSize - 1) & ~(pageSize - 1));
for (uintptr_t p = alignedPtr; p < ((uintptr_t)ptr + size_bytes);
p += pageSize) {
Expand Down

0 comments on commit ea18a68

Please sign in to comment.