Skip to content

pandeyshubham25/pagerank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Page Rank


Directories and files descriptions


src - contains core logic

  1. prank_serial.cpp - serial implementation
  2. prank_parallel.cpp - parallel implementation using mpi
  3. prank_parallel_spmv.cpp - sparse matrix vector multiplication using coordinate
  4. prank_parallel_csr.cpp - sparse matrix vector multiplication using compressed sparse row format
  5. adj_to_csr.cpp - utility code to convert graph data to CSR (Compressed Sparse Row) format
  6. adj_to_coo.cpp - utility code to convert graph data to COO (Coordinate) format
  7. verify.cpp - utility code to evaluate correctness of computed page ranks against the baseline values
  8. measure.cpp - utility code for lazy execution of all implementations

data - contains sample files to run Page Rank on
NOTE: We have only provided 2 datasets here. Any other data source can be put in this directory for evaluation. To make sure that the processing is consistent, all the files in this directory contain only two space separated integers in every line indicating an edge between the first integer and the second.


test - contains test data This directory has custom generated data for testing purposes. The data format is same as that in data directory.


plotting - contains python scripts to generate plots and generated plots
NOTE: Please make sure you are in plotting directory before executing the below commands

  1. iteration_times.py - to run this script, use the following format
    1. python iteration_times.py 1
    2. The commandline arg is integer value from 1 to 3 indicating which data to plot
    3. 1 is for plotting data based on number of nodes, 2 is for plotting data based on number of edges and 3 is for plotting data based on average degree of nodes
  2. data_times.py - to run this script, use the following format
    1. python data_times.py
    2. It does not take any commandline argument, and simply plots the time take by different data formats to load based on the number of edges in the graph.

data_formatter.py It can be run on the datasets downloadable from the links provided for the dataset in the ppt. This converts the files to the standard format we are using in data directory. To use:

  1. Download data
  2. Place individual files (for data only) into the data_raw directory
  3. python3 data_formatter.py

Steps to execute


  1. Make sure you have created a build folder in the base directory (this is where some of the outputs get written).
  2. Make sure you have the dataset in desired format inside data directory (we have provided two for testing). If you want to test on some other data, you can use data_formatter.py to format it and place it in data directory (instructions for this given in the previous section)
  3. Make sure that you are inside src directory before executing any of the commands below
  4. Run prank_serial.cpp
    1. clang++ -std=c++17 -fopenmp prank_serial.cpp -o prank_serial
    2. ./prank_serial 100 0.85 ../data/cit-Patents
    3. arguments - (number of iterations, alpha value for page rank algorithm, input file name)
    4. This creates a file named cit-Patents_serial_ranks in the build directory with node id and their corresponding weight (in range 0 to 1) in each line.
  5. Run prank_parallel.cpp
    1. clang++ -std=c++17 -fopenmp prank_parallel.cpp -o prank_parallel
    2. ./prank_parallel 100 0.85 ../data/cit-Patents 9
    3. arguments - (number of iterations, alpha value for page rank algorithm, input file name, number of threads)
    4. This creates a file named cit-Patents_parallel_ranks in the build directory with node id and their corresponding weight (in range 0 to 1) in each line.
  6. Run prank_parallel_spmv.cpp
    1. Make sure that you have the file in coo format inside data directory. For this, execute adj_to_coo.cpp as instructued below
      1. clang++ -std=c++17 adj_to_coo.cpp -o adjcoo
      2. ./adjcoo ../data/cit-Patents
    2. This creates a file named cit-Patents_coo in data directory
    3. clang++ -std=c++17 -fopenmp prank_parallel_spmv.cpp -o prank_parallel_spmv
    4. ./prank_parallel_spmv 100 0.85 ../data/cit-Patents_coo 9
    5. arguments - (number of iterations, alpha value for page rank algorithm, input file name, number of threads)
    6. This creates a file named cit-Patents_coo_parallel_ranks in the data directory with node id and their corresponding weight (in range 0 to 1) in each line.
  7. Run prank_parallel_csr.cpp
    1. Make sure that you have the file in csr format inside data directory. For this, execute adj_to_coo.cpp as instructued below
      1. clang++ -std=c++17 adj_to_csr.cpp -o adjcsr
      2. ./adjcsr ../data/cit-Patents
    2. This creates a file named cit-Patents_csr in data directory
    3. clang++ -std=c++17 -fopenmp prank_parallel_csr.cpp -o prank_parallel_csr
    4. ./prank_parallel_csr 100 0.85 ../data/cit-Patents_csr 9
    5. arguments - (number of iterations, alpha value for page rank algorithm, input file name, number of threads)
    6. This creates a file named cit-Patents_csr_parallel_ranks in the data directory with node id and their corresponding weight (in range 0 to 1) in each line.

Running all


We have also created an executable that runs all methods.

Note: When calling the executable, ensure only the plain edge data is in the data folder. Matrix conversion results will be left in the build folder.

Note 2: If you wish to change any constants, you may do so in src/measure.cpp

To build: make

To execute all (after building): ./eval

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published