Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The project failed when executing pread #1

Open
DanteKun123 opened this issue Aug 12, 2024 · 1 comment
Open

The project failed when executing pread #1

DanteKun123 opened this issue Aug 12, 2024 · 1 comment
Assignees

Comments

@DanteKun123
Copy link

Hello, we use smartssd to run the project in the environment of vitis2021.2 and xrt2021.2, but the project fails when performing pread. The code execution process is as follows:

root@cmmhc-PowerEdge-R750:/home/xuekun/code/GNN/damon24-gnn-in-situ-sampling-main/smartssd/sampling/build# ./test_streaming_sampler
Edge file handler: 4
Chunk offsets size: 29
Target nodes size: 0
Edge chunk size: 134217728
Edge file size: 6907023872
Xrt Device Id: 0x55c01cd2f340
Begin sample next epoch... bo_index: 0
cur_frontier size: 0
Prepair frontiers Duration: 0.002423 ms
Begin sample one layer, n_neighbors: 20, number of chunks: 1
Processing chunk 0, chunk frontier size: 0
ERR: pread failed: error: Bad address

We use the papers100M dataset. The configuration of sampler in test_streaming_sample.cpp is as follows:

StreamingSampler sampler(
   {0}, "parallel_streaming_sampler.xclbin", "parallel_streaming_sampler",
   "/mnt/nvme2n1/test_data/papers100M/preprocessed2/streaming_edges.bin",
   "/mnt/nvme2n1/test_data/papers100M/preprocessed2/chunks.txt",
   "/mnt/nvme2n1/test_data/papers100M/train_nodes.bin", {20, 15, 10},
   (size_t)128 * 1024 * 1024);

(Because we didn't find sample_target_nodes.xclbin in the project, we used parallel_streaming_sampler.xclbin)

And then we perform training for one epochs:

{
    EasyTimer timer("Sampling the whole epoch. ");
    sampler.newEpochStart();
}

Can you help us identify where the problem is? Thank you very much!

@Souukou
Copy link
Collaborator

Souukou commented Aug 31, 2024

Hi, thanks for running our program. You can use parallel_streaming_sampler.xclbin directly, but I think the issue might be with the input files. Our program requires you to preprocess the papers100M dataset before running it.

Here’s what you need to do:

  1. Download the papers100M dataset and unzip it.
  2. Use the preprocessing script from this link to run the preprocessing. https://github.com/CASP-Systems-BU/damon24-gnn-in-situ-sampling/blob/main/smartssd/sampling/scripts/preprocess/papers100m.ipynb
  3. Modify the parameters in the following code snippet to match the location of your preprocessed files:
StreamingSampler sampler(
   {0}, "parallel_streaming_sampler.xclbin", "parallel_streaming_sampler",
   "/mnt/nvme2n1/test_data/papers100M/preprocessed2/streaming_edges.bin",
   "/mnt/nvme2n1/test_data/papers100M/preprocessed2/chunks.txt",
   "/mnt/nvme2n1/test_data/papers100M/train_nodes.bin", {20, 15, 10},
   (size_t)128 * 1024 * 1024);

Thanks for bringing this issue to our attention. If you have any further questions, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants