From 0663c47e9824bb7f799af6b154e7f6452989db58 Mon Sep 17 00:00:00 2001 From: Kent Knox Date: Mon, 20 Jul 2015 15:28:58 -0500 Subject: [PATCH 1/3] Adding Apache license --- LICENSE | 176 ++++++++++++++++++ NOTICE | 9 + samples/cmake/FindOpenCL.cmake | 18 +- .../functions/cufunc_common.hpp | 21 +-- src/cmake/FindOpenCL.cmake | 19 +- src/library/kernels/csrmm_adaptive.cl | 34 +--- src/library/kernels/csrmm_general.cl | 34 +--- src/library/kernels/csrmv_adaptive.cl | 32 ---- src/library/kernels/sort_by_key_uint.cl | 91 +++++---- 9 files changed, 236 insertions(+), 198 deletions(-) create mode 100644 LICENSE create mode 100644 NOTICE diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..7a77415 --- /dev/null +++ b/LICENSE @@ -0,0 +1,176 @@ +Apache License +Version 2.0, January 2004 +http://www.apache.org/licenses/ + +TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + +1. Definitions. + +"License" shall mean the terms and conditions for use, reproduction, +and distribution as defined by Sections 1 through 9 of this document. + +"Licensor" shall mean the copyright owner or entity authorized by +the copyright owner that is granting the License. + +"Legal Entity" shall mean the union of the acting entity and all +other entities that control, are controlled by, or are under common +control with that entity. For the purposes of this definition, +"control" means (i) the power, direct or indirect, to cause the +direction or management of such entity, whether by contract or +otherwise, or (ii) ownership of fifty percent (50%) or more of the +outstanding shares, or (iii) beneficial ownership of such entity. + +"You" (or "Your") shall mean an individual or Legal Entity +exercising permissions granted by this License. + +"Source" form shall mean the preferred form for making modifications, +including but not limited to software source code, documentation +source, and configuration files. + +"Object" form shall mean any form resulting from mechanical +transformation or translation of a Source form, including but +not limited to compiled object code, generated documentation, +and conversions to other media types. + +"Work" shall mean the work of authorship, whether in Source or +Object form, made available under the License, as indicated by a +copyright notice that is included in or attached to the work +(an example is provided in the Appendix below). + +"Derivative Works" shall mean any work, whether in Source or Object +form, that is based on (or derived from) the Work and for which the +editorial revisions, annotations, elaborations, or other modifications +represent, as a whole, an original work of authorship. For the purposes +of this License, Derivative Works shall not include works that remain +separable from, or merely link (or bind by name) to the interfaces of, +the Work and Derivative Works thereof. + +"Contribution" shall mean any work of authorship, including +the original version of the Work and any modifications or additions +to that Work or Derivative Works thereof, that is intentionally +submitted to Licensor for inclusion in the Work by the copyright owner +or by an individual or Legal Entity authorized to submit on behalf of +the copyright owner. For the purposes of this definition, "submitted" +means any form of electronic, verbal, or written communication sent +to the Licensor or its representatives, including but not limited to +communication on electronic mailing lists, source code control systems, +and issue tracking systems that are managed by, or on behalf of, the +Licensor for the purpose of discussing and improving the Work, but +excluding communication that is conspicuously marked or otherwise +designated in writing by the copyright owner as "Not a Contribution." + +"Contributor" shall mean Licensor and any individual or Legal Entity +on behalf of whom a Contribution has been received by Licensor and +subsequently incorporated within the Work. + +2. Grant of Copyright License. Subject to the terms and conditions of +this License, each Contributor hereby grants to You a perpetual, +worldwide, non-exclusive, no-charge, royalty-free, irrevocable +copyright license to reproduce, prepare Derivative Works of, +publicly display, publicly perform, sublicense, and distribute the +Work and such Derivative Works in Source or Object form. + +3. Grant of Patent License. Subject to the terms and conditions of +this License, each Contributor hereby grants to You a perpetual, +worldwide, non-exclusive, no-charge, royalty-free, irrevocable +(except as stated in this section) patent license to make, have made, +use, offer to sell, sell, import, and otherwise transfer the Work, +where such license applies only to those patent claims licensable +by such Contributor that are necessarily infringed by their +Contribution(s) alone or by combination of their Contribution(s) +with the Work to which such Contribution(s) was submitted. If You +institute patent litigation against any entity (including a +cross-claim or counterclaim in a lawsuit) alleging that the Work +or a Contribution incorporated within the Work constitutes direct +or contributory patent infringement, then any patent licenses +granted to You under this License for that Work shall terminate +as of the date such litigation is filed. + +4. Redistribution. You may reproduce and distribute copies of the +Work or Derivative Works thereof in any medium, with or without +modifications, and in Source or Object form, provided that You +meet the following conditions: + +(a) You must give any other recipients of the Work or +Derivative Works a copy of this License; and + +(b) You must cause any modified files to carry prominent notices +stating that You changed the files; and + +(c) You must retain, in the Source form of any Derivative Works +that You distribute, all copyright, patent, trademark, and +attribution notices from the Source form of the Work, +excluding those notices that do not pertain to any part of +the Derivative Works; and + +(d) If the Work includes a "NOTICE" text file as part of its +distribution, then any Derivative Works that You distribute must +include a readable copy of the attribution notices contained +within such NOTICE file, excluding those notices that do not +pertain to any part of the Derivative Works, in at least one +of the following places: within a NOTICE text file distributed +as part of the Derivative Works; within the Source form or +documentation, if provided along with the Derivative Works; or, +within a display generated by the Derivative Works, if and +wherever such third-party notices normally appear. The contents +of the NOTICE file are for informational purposes only and +do not modify the License. You may add Your own attribution +notices within Derivative Works that You distribute, alongside +or as an addendum to the NOTICE text from the Work, provided +that such additional attribution notices cannot be construed +as modifying the License. + +You may add Your own copyright statement to Your modifications and +may provide additional or different license terms and conditions +for use, reproduction, or distribution of Your modifications, or +for any such Derivative Works as a whole, provided Your use, +reproduction, and distribution of the Work otherwise complies with +the conditions stated in this License. + +5. Submission of Contributions. Unless You explicitly state otherwise, +any Contribution intentionally submitted for inclusion in the Work +by You to the Licensor shall be under the terms and conditions of +this License, without any additional terms or conditions. +Notwithstanding the above, nothing herein shall supersede or modify +the terms of any separate license agreement you may have executed +with Licensor regarding such Contributions. + +6. Trademarks. This License does not grant permission to use the trade +names, trademarks, service marks, or product names of the Licensor, +except as required for reasonable and customary use in describing the +origin of the Work and reproducing the content of the NOTICE file. + +7. Disclaimer of Warranty. Unless required by applicable law or +agreed to in writing, Licensor provides the Work (and each +Contributor provides its Contributions) on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or +implied, including, without limitation, any warranties or conditions +of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A +PARTICULAR PURPOSE. You are solely responsible for determining the +appropriateness of using or redistributing the Work and assume any +risks associated with Your exercise of permissions under this License. + +8. Limitation of Liability. In no event and under no legal theory, +whether in tort (including negligence), contract, or otherwise, +unless required by applicable law (such as deliberate and grossly +negligent acts) or agreed to in writing, shall any Contributor be +liable to You for damages, including any direct, indirect, special, +incidental, or consequential damages of any character arising as a +result of this License or out of the use or inability to use the +Work (including but not limited to damages for loss of goodwill, +work stoppage, computer failure or malfunction, or any and all +other commercial damages or losses), even if such Contributor +has been advised of the possibility of such damages. + +9. Accepting Warranty or Additional Liability. While redistributing +the Work or Derivative Works thereof, You may choose to offer, +and charge a fee for, acceptance of support, warranty, indemnity, +or other liability obligations and/or rights consistent with this +License. However, in accepting such obligations, You may act only +on Your own behalf and on Your sole responsibility, not on behalf +of any other Contributor, and only if You agree to indemnify, +defend, and hold each Contributor harmless for any liability +incurred by, or claims asserted against, such Contributor by reason +of your accepting any such warranty or additional liability. + +END OF TERMS AND CONDITIONS diff --git a/NOTICE b/NOTICE new file mode 100644 index 0000000..1ba33bf --- /dev/null +++ b/NOTICE @@ -0,0 +1,9 @@ +clSPARSE + Copyright 2015 Advanced Micro Devices, Inc. + Copyright 2015 Vratis, Ltd. + + This product includes software developed at + Advanced Micro Devices, Inc. (http://www.amd.com). + + This product includes software developed at + Vratis, Ltd. (http://www.vratis.com). diff --git a/samples/cmake/FindOpenCL.cmake b/samples/cmake/FindOpenCL.cmake index 5e7fd77..96aee3e 100644 --- a/samples/cmake/FindOpenCL.cmake +++ b/samples/cmake/FindOpenCL.cmake @@ -1,19 +1,3 @@ -# ######################################################################## -# Copyright 2015 Advanced Micro Devices, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ######################################################################## - # Locate an OpenCL implementation. # Currently supports AMD APP SDK (http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx/) @@ -137,7 +121,7 @@ endif( ) # If we asked for OpenCL 1.2, and we found a version installed greater than that, pass the 'use deprecated' flag if( (OpenCL_FIND_VERSION VERSION_LESS "2.0") AND (OpenCL_VERSION VERSION_GREATER OpenCL_FIND_VERSION) ) add_definitions( -DCL_USE_DEPRECATED_OPENCL_2_0_APIS ) - + # If we asked for OpenCL 1.1, and we found a version installed greater than that, pass the 'use deprecated' flag if( (OpenCL_FIND_VERSION VERSION_LESS "1.2") AND (OpenCL_VERSION VERSION_GREATER OpenCL_FIND_VERSION) ) add_definitions( -DCL_USE_DEPRECATED_OPENCL_1_1_APIS ) diff --git a/src/benchmarks/cusparse-bench/functions/cufunc_common.hpp b/src/benchmarks/cusparse-bench/functions/cufunc_common.hpp index 8f01829..93b46d2 100644 --- a/src/benchmarks/cusparse-bench/functions/cufunc_common.hpp +++ b/src/benchmarks/cusparse-bench/functions/cufunc_common.hpp @@ -1,19 +1,3 @@ -/* ************************************************************************ - * Copyright 2013 Advanced Micro Devices, Inc. - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - * ************************************************************************/ - #ifndef CUBLAS_BENCHMARK_COMMON_HXX__ #define CUBLAS_BENCHMARK_COMMON_HXX__ @@ -240,7 +224,7 @@ class cusparseFunc : timer(_timer) { timer_id = timer.getUniqueID( "cufunc", 0 ); - + /* Setup cublas. */ cuSparseStatus = cusparseCreate( &handle ); @@ -279,7 +263,6 @@ class cusparseFunc cudaError_t err; cusparseHandle_t handle; cusparseStatus_t cuSparseStatus; -}; +}; #endif // ifndef CUBLAS_BENCHMARK_COMMON_HXX__ - diff --git a/src/cmake/FindOpenCL.cmake b/src/cmake/FindOpenCL.cmake index 5e7fd77..cae4387 100644 --- a/src/cmake/FindOpenCL.cmake +++ b/src/cmake/FindOpenCL.cmake @@ -1,20 +1,3 @@ -# ######################################################################## -# Copyright 2015 Advanced Micro Devices, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ######################################################################## - - # Locate an OpenCL implementation. # Currently supports AMD APP SDK (http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx/) # @@ -137,7 +120,7 @@ endif( ) # If we asked for OpenCL 1.2, and we found a version installed greater than that, pass the 'use deprecated' flag if( (OpenCL_FIND_VERSION VERSION_LESS "2.0") AND (OpenCL_VERSION VERSION_GREATER OpenCL_FIND_VERSION) ) add_definitions( -DCL_USE_DEPRECATED_OPENCL_2_0_APIS ) - + # If we asked for OpenCL 1.1, and we found a version installed greater than that, pass the 'use deprecated' flag if( (OpenCL_FIND_VERSION VERSION_LESS "1.2") AND (OpenCL_VERSION VERSION_GREATER OpenCL_FIND_VERSION) ) add_definitions( -DCL_USE_DEPRECATED_OPENCL_1_1_APIS ) diff --git a/src/library/kernels/csrmm_adaptive.cl b/src/library/kernels/csrmm_adaptive.cl index b77a3ad..735b9d6 100644 --- a/src/library/kernels/csrmm_adaptive.cl +++ b/src/library/kernels/csrmm_adaptive.cl @@ -1,35 +1,3 @@ -R"( -/* -Copyright 2015 Advanced Micro Devices, Inc. -All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - -* Redistributions of source code must retain the above copyright notice, this - list of conditions and the following disclaimer. - -* Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - -* Neither the name of [project] nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -*/ -)" - R"( #define WGSIZE 256 @@ -403,7 +371,7 @@ csrmm_ulong( global const FPTYPE * restrict sparseVals, __local FPTYPE partialSums[ BLOCKSIZE ]; // The current implementation of csrmm is implemented as a batched csrmv for now - // The loop iterates on the number of columns in the output matrix, and we increment + // The loop iterates on the number of columns in the output matrix, and we increment // the global pointers to the dense B and C matrices a column for each iteration. for( ulong curr_col = 0; curr_col < num_cols_C; ++curr_col ) { diff --git a/src/library/kernels/csrmm_general.cl b/src/library/kernels/csrmm_general.cl index 488485b..ed6f3f3 100644 --- a/src/library/kernels/csrmm_general.cl +++ b/src/library/kernels/csrmm_general.cl @@ -1,35 +1,3 @@ -R"( -/* -Copyright 2015 Advanced Micro Devices, Inc. -All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - -* Redistributions of source code must retain the above copyright notice, this - list of conditions and the following disclaimer. - -* Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - -* Neither the name of [project] nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -*/ -)" - R"( #if defined DOUBLE #ifdef cl_khr_fp64 @@ -173,7 +141,7 @@ void csrmv_batched( const INDEX_TYPE num_rows, local VALUE_TYPE sdata[ WG_SIZE + SUBWAVE_SIZE / 2 ]; // The current implementation of csrmm is implemented as a batched csrmv - // The loop iterates on the number of columns in the output matrix, and we increment + // The loop iterates on the number of columns in the output matrix, and we increment // the global pointers to the dense B and C matrices a column for each iteration. for( SIZE_TYPE curr_col = 0; curr_col < num_cols_C; ++curr_col ) { diff --git a/src/library/kernels/csrmv_adaptive.cl b/src/library/kernels/csrmv_adaptive.cl index 350ba32..892f5a6 100644 --- a/src/library/kernels/csrmv_adaptive.cl +++ b/src/library/kernels/csrmv_adaptive.cl @@ -1,35 +1,3 @@ -R"( -/* -Copyright 2015 Advanced Micro Devices, Inc. -All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - -* Redistributions of source code must retain the above copyright notice, this - list of conditions and the following disclaimer. - -* Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - -* Neither the name of [project] nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -*/ -)" - R"( #define WGSIZE 256 diff --git a/src/library/kernels/sort_by_key_uint.cl b/src/library/kernels/sort_by_key_uint.cl index c771244..7f18244 100644 --- a/src/library/kernels/sort_by_key_uint.cl +++ b/src/library/kernels/sort_by_key_uint.cl @@ -1,21 +1,21 @@ R"( -/*************************************************************************** -* © 2012,2014 Advanced Micro Devices, Inc. All rights reserved. -* -* Licensed under the Apache License, Version 2.0 (the "License"); -* you may not use this file except in compliance with the License. -* You may obtain a copy of the License at -* -* http://www.apache.org/licenses/LICENSE-2.0 -* -* Unless required by applicable law or agreed to in writing, software -* distributed under the License is distributed on an "AS IS" BASIS, -* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -* See the License for the specific language governing permissions and -* limitations under the License. -***************************************************************************/ -#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable +/*************************************************************************** +* © 2012,2014-2015 Advanced Micro Devices, Inc. All rights reserved. +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +***************************************************************************/ +#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable #pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable #define WG_SIZE 256 @@ -56,17 +56,17 @@ uint scanLocalMemAndTotal(uint val, __local uint* lmem, uint *totalSum, int excl int l_id = get_local_id(0); int l_size = get_local_size(0); lmem[l_id] = 0; - + l_id += l_size; lmem[l_id] = val; barrier(CLK_LOCAL_MEM_FENCE); - + uint t; for (int i = 1; i < l_size; i *= 2) { - t = lmem[l_id - i]; + t = lmem[l_id - i]; barrier(CLK_LOCAL_MEM_FENCE); - lmem[l_id] += t; + lmem[l_id] += t; barrier(CLK_LOCAL_MEM_FENCE); } *totalSum = lmem[l_size*2 - 1]; @@ -121,7 +121,7 @@ void sort4BitsKeyValueAscending(u32 sortData[4], int sortVal[4], VALUE_TYPE so #endif GROUP_LDS_BARRIER; - + ldsSortData[dstAddr.x] = sortData[0]; ldsSortData[dstAddr.y] = sortData[1]; ldsSortData[dstAddr.z] = sortData[2]; @@ -160,15 +160,15 @@ void sort4BitsKeyValueAscending(u32 sortData[4], int sortVal[4], VALUE_TYPE so } __kernel -void permuteByKeyAscTemplate( __global const u32* restrict gKeys, +void permuteByKeyAscTemplate( __global const u32* restrict gKeys, __global const int* restrict gValues, - __global const VALUE_TYPE * restrict gValues2, - __global const u32* rHistogram, - __global u32* restrict gDstKeys, - __global int* restrict gDstValues, + __global const VALUE_TYPE * restrict gValues2, + __global const u32* rHistogram, + __global u32* restrict gDstKeys, + __global int* restrict gDstValues, __global VALUE_TYPE * restrict gDstValues2, int m_n, - int m_nWGs, + int m_nWGs, int m_startBit, int m_nBlocksPerWG) { @@ -195,14 +195,14 @@ void permuteByKeyAscTemplate( __global const u32* restrict gKeys, if( lIdx < (NUM_BUCKET) ) { #if defined(DESCENDING) - localHistogramToCarry[lIdx] = rHistogram[(NUM_BUCKET - lIdx -1)*nWGs + wgIdx]; + localHistogramToCarry[lIdx] = rHistogram[(NUM_BUCKET - lIdx -1)*nWGs + wgIdx]; #else localHistogramToCarry[lIdx] = rHistogram[lIdx*nWGs + wgIdx]; #endif } GROUP_LDS_BARRIER; - + const int blockSize = ELEMENTS_PER_WORK_ITEM*WG_SIZE; @@ -248,7 +248,7 @@ void permuteByKeyAscTemplate( __global const u32* restrict gKeys, for(int i=0; i>startBit) & 0xf; - { + { u32 setIdx = lIdx/16; if( lIdx < NUM_BUCKET ) { @@ -271,9 +271,9 @@ void permuteByKeyAscTemplate( __global const u32* restrict gKeys, AtomInc( SET_HISTOGRAM( setIdx, keys[i] ) ); #endif #endif - + GROUP_LDS_BARRIER; - + uint hIdx = NUM_BUCKET+lIdx; if( lIdx < NUM_BUCKET ) { @@ -345,7 +345,7 @@ void permuteByKeyAscTemplate( __global const u32* restrict gKeys, gDstKeys[ groupOffset + myIdx ] = sortData[ie]; gDstValues[ groupOffset + myIdx ] = sortVal[ie]; gDstValues2[ groupOffset + myIdx ] = sortVal2[ie]; - + } } } @@ -392,7 +392,7 @@ void sort4BitsKeyValueDescending(u32 sortData[4], int sortVal[4], const int sta #endif GROUP_LDS_BARRIER; - + ldsSortData[dstAddr.x] = sortData[0]; ldsSortData[dstAddr.y] = sortData[1]; ldsSortData[dstAddr.z] = sortData[2]; @@ -421,11 +421,11 @@ void sort4BitsKeyValueDescending(u32 sortData[4], int sortVal[4], const int sta } //do we need descending __kernel -void permuteByKeyDescTemplate( __global const u32* restrict gKeys, - __global const int* restrict gValues, - __global const u32* rHistogram, - __global u32* restrict gDstKeys, - __global int* restrict gDstValues, +void permuteByKeyDescTemplate( __global const u32* restrict gKeys, + __global const int* restrict gValues, + __global const u32* rHistogram, + __global u32* restrict gDstKeys, + __global int* restrict gDstValues, int4 cb) { __local u32 ldsSortData[WG_SIZE*ELEMENTS_PER_WORK_ITEM+16]; @@ -450,14 +450,14 @@ void permuteByKeyDescTemplate( __global const u32* restrict gKeys, if( lIdx < (NUM_BUCKET) ) { #if defined(DESCENDING) - localHistogramToCarry[lIdx] = rHistogram[(NUM_BUCKET - lIdx -1)*nWGs + wgIdx]; + localHistogramToCarry[lIdx] = rHistogram[(NUM_BUCKET - lIdx -1)*nWGs + wgIdx]; #else localHistogramToCarry[lIdx] = rHistogram[lIdx*nWGs + wgIdx]; #endif } GROUP_LDS_BARRIER; - + const int blockSize = ELEMENTS_PER_WORK_ITEM*WG_SIZE; @@ -499,7 +499,7 @@ void permuteByKeyDescTemplate( __global const u32* restrict gKeys, for(int i=0; i>startBit) & 0xf; - { + { u32 setIdx = lIdx/16; if( lIdx < NUM_BUCKET ) { @@ -522,9 +522,9 @@ void permuteByKeyDescTemplate( __global const u32* restrict gKeys, AtomInc( SET_HISTOGRAM( setIdx, keys[i] ) ); #endif #endif - + GROUP_LDS_BARRIER; - + uint hIdx = NUM_BUCKET+lIdx; if( lIdx < NUM_BUCKET ) { @@ -595,7 +595,7 @@ void permuteByKeyDescTemplate( __global const u32* restrict gKeys, { gDstKeys[ groupOffset + myIdx ] = sortData[ie]; gDstValues[ groupOffset + myIdx ] = sortVal[ie]; - + } } } @@ -615,4 +615,3 @@ void permuteByKeyDescTemplate( __global const u32* restrict gKeys, } } )" - From f50a20ff87f0b4b5f14541b77b441dfbdae14328 Mon Sep 17 00:00:00 2001 From: Kent Knox Date: Tue, 21 Jul 2015 15:03:50 -0500 Subject: [PATCH 2/3] Refactored text in README file; explained semantic versioning --- CONTRIBUTING.md | 6 ++-- README.md | 84 ++++++++++++++++++++++++------------------------- 2 files changed, 44 insertions(+), 46 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6bf5c02..6973688 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -13,10 +13,8 @@ Firstly, in order to contribute code to this project, a contributor must have a At this point, the repository maintainers will be notified by GitHub that a 'pull request' exists pending against their repository. A code review should be completed within a few days, depending on the scope of submitted code, and the code will either be accepted, rejected or commented on for feedback. ## Code submission guidelines -We want to ensure that the project code base maintains a level of quality over time, such that future contributors find it as easy to jump into the code as hopefully it is today. As such, pull requests should -* follow the [code style guidelines]( ) of the project as posted to the project wiki. Unfortunately, there was no unifying code guidelines defined between the BLAS & FFT projects, but code submissions should not mix styles within an individual file. We have since defined and posted a code style guideline for the projects and we expect the code to slowly transition to the new -guidelines over time - * separate check-ins that modify a files style from the ones that add/change/delete code. +Points to remember when generating pull-requests against clSPARSE +* clSPARSE is a project licensed under the [Apache License, Version 2.0]( http://www.apache.org/licenses/LICENSE-2.0 ). If you are not already familiar, please review the license before issuing a pull request. We intend this project to be open to external contributors, and encourage developers to contribute code back that they believe will provide value to the overall community. We interpret an explicit 'pull request' to this repository as an implicit acknowledgement from the contributor that they wish to share code under the terms of the Apache license v2.0. * target the **develop** branch * ensure that the [code properly builds]( https://github.com/kknox/clSPARSE/wiki/Build ) * Proper cloud based build services should verify the code builds in a PR diff --git a/README.md b/README.md index 4ada636..dff9cbf 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,12 @@ -## Project badges -**coming soon** - -Pre-built binaries **not yet** available on our releases page **coming soon** +Pre-built binaries are available on our [releases page](releases) # clSPARSE -an OpenCL© library implementing Sparse linear algebra. This project started +an OpenCL© library implementing sparse linear algebra. This project started as a collaboration between [AMD Inc.](http://www.amd.com/) and -[Vratis Ltd.](http://www.vratis.com/). In opening the source to the public, we -invite all interested parties to [contribute](CONTRIBUTING.md) to the source. +[Vratis Ltd.](http://www.vratis.com/). ## Introduction to clSPARSE -The library source compiles cross-platform on the back of an advanced cmake build -system allowing users to build the library, benchmarks, tests and takes care of -dependencies for them. True in spirit with -the other clMath libraries, clSPARSE exports a “C” interface to allow -projects to build wrappers around clSPARSE in any language they need. With -respect to the API, a great deal of thought and effort went into designing the -API’s to make them less ‘cluttered’ compared to the older clMath libraries. -OpenCL state is not explicitly passed through the API, which enables the library -to be forward compatible when users are ready to switch from OpenCL 1.2 to OpenCL -2.0. Lastly, the API’s are designed such that users are in control of where -input and output buffers live, and they maintain absolute control of when data -transfers to/from device memory need to happen, so that there are no performance -surprises. - -At release, clSPARSE provides these fundamental sparse operations for OpenCL: +At this time, clSPARSE provides these fundamental sparse operations for OpenCL: - Sparse Matrix - dense Vector multiply (SpM-dV) - Sparse Matrix - dense Matrix multiply (SpM-dM) - Iterative conjugate gradient solver (CG) @@ -33,11 +15,16 @@ At release, clSPARSE provides these fundamental sparse operations for OpenCL: - COO to CSR conversions (& converse) - Functions to read matrix market files in COO or CSR format -### clSPARSE build information -A [Build primer](https://github.com/kknox/clSPARSE/wiki/Build) is available +True in spirit with the other clMath libraries, clSPARSE exports a “C” interface to allow +projects to build wrappers around clSPARSE in any language they need. A great deal +of thought and effort went into designing the API’s to make them less ‘cluttered’ +compared to the older clMath libraries. OpenCL state is not explicitly passed +through the API, which enables the library to be forward compatible when users are +ready to switch from OpenCL 1.2 to OpenCL 2.0 _(OpenCL 2.0 support not completely finished)_ -### clSPARSE library user documentation -**API documentation** not yet available +The API’s are designed such that users are in control of where input and output +buffers live, and they maintain control of when data transfers to/from device +memory happen, so that there are no performance surprises. ### Google Groups Two mailing lists have been created for the clMath projects: @@ -48,12 +35,15 @@ Two mailing lists have been created for the clMath projects: - clmath-developers@googlegroups.com - group whose focus is for developers interested in contributing to the library code itself -### Contributing code -Please refer to and read the [Contributing](CONTRIBUTING.md) document for guidelines on -how to contribute code to this open source project. Code in the -/master branch is considered to be stable and new library releases are made -when commits are merged into /master. Active development and pull-requests should -be made to the /develop branch. +### API semantic versioning +Good software is typically the result of the loop of feedback and iteration; +software interfaces no less so. clSPARSE follows the +[semantic versioning](http://semver.org/) guidelines, and while the major version +number remains '0', the public API should not be considered stable. We release +clSPARSE as beta software (0.y.z) early to the community to elicit feedback and +comment. This comes with the expectation that with feedback, we may incorporate +breaking changes to the API that might require early users to recompile, or rewrite +portions of their code as we iterate on the design. ## Samples clSPARSE contains a directory of simple [OpenCL samples](./samples) that demonstrate the use @@ -61,17 +51,33 @@ of the API in both C and C++. The [superbuild](http://www.kitware.com/media/htm script for clSPARSE also builds the samples as an external project, to demonstrate how an application would find and link to clSPARSE with cmake. -## Build dependencies +### clSPARSE library documentation +**API documentation** is not yet available, but the samples above give an excellent +starting point to basic library operations. + +### Contributing code +Please refer to and read the [Contributing](CONTRIBUTING.md) document for guidelines on +how to contribute code to this open source project. Code in the +/master branch is considered to be stable and new library releases are made +when commits are merged into /master. Active development and pull-requests should +be made to the **/develop** branch. + +## Build clSPARSE is primarily written with C++ using C++11 core features. It does export a 'C' interface for compatibility with other languages. +### How to build clSPARSE for your platform +A [Build primer](https://github.com/kknox/clSPARSE/wiki/Build) is available on +the wiki, which describes how to use cmake to generate platforms specific build +files + ### Compiling for Windows - Windows® 7/8 - Visual Studio 2013 and above - CMake 2.8.12 (download from [Kitware](http://www.cmake.org/download/)) - Solution (.sln) or - Nmake makefiles -- An OpenCL SDK, such as [APP SDK 3.0](http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/) +- An OpenCL SDK, such as [APP SDK 3.0](http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/) ### Compiling for Linux - GCC 4.8 and above @@ -79,7 +85,7 @@ a 'C' interface for compatibility with other languages. - Unix makefiles or - KDevelop or - QT Creator - - An OpenCL SDK, such as [APP SDK 3.0](http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/) +- An OpenCL SDK, such as [APP SDK 3.0](http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/) ### Compiling for Mac OSX - CMake 2.8.12 (install via [brew](http://brew.sh/)) @@ -87,12 +93,6 @@ a 'C' interface for compatibility with other languages. - XCode - An OpenCL SDK (installed via `xcode-select --install`) -### Test infrastructure dependencies +### Bench & Test infrastructure dependencies - Googletest v1.7 - Boost v1.58 - -### Benchmark infrastructure dependencies -- Boost v1.58 - - [API documentation]: http://clmathlibraries.github.io/clSPARSE/ - [binary_release]: https://github.com/clMathLibraries/clSPARSE/releases From 92a179c354058b5e2a3235e72541a2f13530fb8c Mon Sep 17 00:00:00 2001 From: Kent Knox Date: Tue, 21 Jul 2015 16:58:29 -0500 Subject: [PATCH 3/3] Adding benchmark data wrt 14.301 Refactored the csv directory layout --- .../clSPARSE-0.6.0.0/W9100/Csr2Coo.csv | 19 ----------- .../clSPARSE-0.6.0.0/W9100/Csr2Dense.csv | 16 --------- .../clSPARSE-0.6.0.0/W9100/SpM-dV.csv | 16 --------- .../cuSPARSE-0.6.0.0/Tesla_K40/Csr2Coo.csv | 19 ----------- .../cuSPARSE-0.6.0.0/Tesla_K40/Csr2Dense.csv | 17 ---------- .../cuSPARSE-0.6.0.0/Tesla_K40/SpM-dV.csv | 16 --------- .../clSPARSE-0.6.0.0/W9100/Csr2Coo.csv | 17 ++++++++++ .../clSPARSE-0.6.0.0/W9100/Csr2Dense.csv | 15 +++++++++ .../clSPARSE-0.6.0.0/W9100/README.md | 26 +++++++++++++++ .../clSPARSE-0.6.0.0/W9100/SpM-dV.csv | 17 ++++++++++ .../clSPARSE-0.6.0.0/W9100/clsparse-bench.sh | 33 +++++++++++++++++++ .../cuSPARSE-7.0/Tesla_K40/Csr2Coo.csv | 17 ++++++++++ .../cuSPARSE-7.0/Tesla_K40/Csr2Dense.csv | 15 +++++++++ .../cuSPARSE-7.0/Tesla_K40/README.md | 26 +++++++++++++++ .../cuSPARSE-7.0/Tesla_K40/SpM-dV.csv | 17 ++++++++++ .../cuSPARSE-7.0/Tesla_K40/cusparse-bench.sh | 32 ++++++++++++++++++ 16 files changed, 215 insertions(+), 103 deletions(-) delete mode 100644 doc/performance/clSPARSE-0.6.0.0/W9100/Csr2Coo.csv delete mode 100644 doc/performance/clSPARSE-0.6.0.0/W9100/Csr2Dense.csv delete mode 100644 doc/performance/clSPARSE-0.6.0.0/W9100/SpM-dV.csv delete mode 100644 doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/Csr2Coo.csv delete mode 100644 doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/Csr2Dense.csv delete mode 100644 doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/SpM-dV.csv create mode 100644 docs/performance/clSPARSE-0.6.0.0/W9100/Csr2Coo.csv create mode 100644 docs/performance/clSPARSE-0.6.0.0/W9100/Csr2Dense.csv create mode 100644 docs/performance/clSPARSE-0.6.0.0/W9100/README.md create mode 100644 docs/performance/clSPARSE-0.6.0.0/W9100/SpM-dV.csv create mode 100644 docs/performance/clSPARSE-0.6.0.0/W9100/clsparse-bench.sh create mode 100644 docs/performance/cuSPARSE-7.0/Tesla_K40/Csr2Coo.csv create mode 100644 docs/performance/cuSPARSE-7.0/Tesla_K40/Csr2Dense.csv create mode 100644 docs/performance/cuSPARSE-7.0/Tesla_K40/README.md create mode 100644 docs/performance/cuSPARSE-7.0/Tesla_K40/SpM-dV.csv create mode 100644 docs/performance/cuSPARSE-7.0/Tesla_K40/cusparse-bench.sh diff --git a/doc/performance/clSPARSE-0.6.0.0/W9100/Csr2Coo.csv b/doc/performance/clSPARSE-0.6.0.0/W9100/Csr2Coo.csv deleted file mode 100644 index 988095f..0000000 --- a/doc/performance/clSPARSE-0.6.0.0/W9100/Csr2Coo.csv +++ /dev/null @@ -1,19 +0,0 @@ -Csr2Coo,Bell-Garland matrices,, -,,, -,,, -,,GiElements/s, -cant,,5.34468, -consph,,5.79985, -cop20k_A,,4.58041, -mac_econ_fwd500,,3.3711, -mc2depi,,4.21598, -pdb1HYS,,5.66851, -pwtk,,6.80393, -rail4284,,6.15234, -rma10,,4.31987, -scircuit,,2.47316, -shipsec1,,5.0256, -webbase_1M,,3.6331, -,,, -,,, -AMD System:,Ubuntu 14.04.2 LTS,,(timmy-linux.amd.com) diff --git a/doc/performance/clSPARSE-0.6.0.0/W9100/Csr2Dense.csv b/doc/performance/clSPARSE-0.6.0.0/W9100/Csr2Dense.csv deleted file mode 100644 index a37270f..0000000 --- a/doc/performance/clSPARSE-0.6.0.0/W9100/Csr2Dense.csv +++ /dev/null @@ -1,16 +0,0 @@ -Csr2Dense,Small-MTX,matrices, -,,, -,,, -,,GiElements/s, -Dubcova1,,0.0278669, -hydr1c_A_11,,0.0211048, -hydr1c_A_72,,0.021233, -hydr1c_A_76,,0.0210692, -Maragal_6,,0.0689761, -Na5,,0.223689, -psse1,,0.0103225, -Reuters911,,0.0461601, -Si10H16,,0.0849898, -tomography,,0.195625, -,,, -AMD System:,Ubuntu 14.04.2 LTS,,(timmy-linux.amd.com) diff --git a/doc/performance/clSPARSE-0.6.0.0/W9100/SpM-dV.csv b/doc/performance/clSPARSE-0.6.0.0/W9100/SpM-dV.csv deleted file mode 100644 index 2ff97e2..0000000 --- a/doc/performance/clSPARSE-0.6.0.0/W9100/SpM-dV.csv +++ /dev/null @@ -1,16 +0,0 @@ -SpM-dV,Bell-Garland matrices,,,,, -,,320 GB/s,,,, -,Linux,1642.5 (VM),,,,1642.5 == 14.50.2 -,,w9100,,,, -cant,,96.0199,,,, -consph,,118.862,,,, -cop20k_A,,86.6051,,,, -mac_econ_fwd500,,63.2257,,,, -mc2depi,,114.147,,,, -pdb1HYS,,104.649,,,, -pwtk,,139.115,,,, -rail4284,,98.4723,,,, -rma10,,76.0268,,,, -scircuit,,46.9494,,,, -shipsec1,,100.119,,,, -webbase_1M,,117.437,,,, diff --git a/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/Csr2Coo.csv b/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/Csr2Coo.csv deleted file mode 100644 index d32712f..0000000 --- a/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/Csr2Coo.csv +++ /dev/null @@ -1,19 +0,0 @@ -Csr2Coo,Bell-Garland matrices,, -,,, -,,, -,,,GiElements/s -cant,,,12.5817 -consph,,,13.2232 -cop20k_A,,,6.12102 -mac_econ_fwd500,,,1.8865 -mc2depi,,,1.25825 -pdb1HYS,,,16.5828 -pwtk,,,12.4101 -rail4284,,,14.6147 -rma10,,,10.8501 -scircuit,,,1.71464 -shipsec1,,,13.0017 -webbase_1M,,,0.989057 -,,, -,,, -Nvidia System:,openSUSE 13.2,,(Durres.amd.com) diff --git a/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/Csr2Dense.csv b/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/Csr2Dense.csv deleted file mode 100644 index 6cb7fd6..0000000 --- a/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/Csr2Dense.csv +++ /dev/null @@ -1,17 +0,0 @@ -Csr2Dense,Small-MTX,matrices, -,,, -,,,GiElements/s -,,, -Dubcova1,,,0.0107514 -hydr1c_A_11,,,0.0121738 -hydr1c_A_72,,,0.0121759 -hydr1c_A_76,,,0.0121681 -Maragal_6,,,0.026069 -Na5,,,0.155912 -psse1,,,0.00519411 -Reuters911,,,0.0217721 -Si10H16,,,0.0308366 -tomography,,,0.566176 -,,, -,,, -Nvidia System:,openSUSE 13.2,,(Durres.amd.com) diff --git a/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/SpM-dV.csv b/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/SpM-dV.csv deleted file mode 100644 index edd5b8a..0000000 --- a/doc/performance/cuSPARSE-0.6.0.0/Tesla_K40/SpM-dV.csv +++ /dev/null @@ -1,16 +0,0 @@ -SpM-dV,Bell-Garland matrices,,,,, -,,,280 GB/s,,, -,Linux,,346.47,,,1642.5 == 14.50.2 -,,,k40c,,, -cant,,,96.4569,,, -consph,,,102.113,,, -cop20k_A,,,72.1035,,, -mac_econ_fwd500,,,47.1653,,, -mc2depi,,,74.2639,,, -pdb1HYS,,,102.969,,, -pwtk,,,74.1772,,, -rail4284,,,34.8417,,, -rma10,,,64.8728,,, -scircuit,,,51.2187,,, -shipsec1,,,161.233,,, -webbase_1M,,,32.5601,,, diff --git a/docs/performance/clSPARSE-0.6.0.0/W9100/Csr2Coo.csv b/docs/performance/clSPARSE-0.6.0.0/W9100/Csr2Coo.csv new file mode 100644 index 0000000..201c7cc --- /dev/null +++ b/docs/performance/clSPARSE-0.6.0.0/W9100/Csr2Coo.csv @@ -0,0 +1,17 @@ +csr2coo,, +,, +OpenCL runtime:,1573.4 (VM), +OpenCL Device:,w9100, +,,Gi-Elements/s +cant,,5.34 +consph,,5.80 +cop20k_A,,4.58 +mac_econ_fwd500,,3.37 +mc2depi,,4.22 +pdb1HYS,,5.67 +pwtk,,6.80 +rail4284,,6.15 +rma10,,4.32 +scircuit,,2.47 +shipsec1,,5.03 +webbase_1M,,3.63 diff --git a/docs/performance/clSPARSE-0.6.0.0/W9100/Csr2Dense.csv b/docs/performance/clSPARSE-0.6.0.0/W9100/Csr2Dense.csv new file mode 100644 index 0000000..f666497 --- /dev/null +++ b/docs/performance/clSPARSE-0.6.0.0/W9100/Csr2Dense.csv @@ -0,0 +1,15 @@ +csr2dense,, +,, +OpenCL runtime:,1573.4 (VM), +OpenCL Device:,w9100, +,,Gi-Elements/s +Dubcova1,,0.028 +hydr1c_A_11,,0.021 +hydr1c_A_72,,0.021 +hydr1c_A_76,,0.021 +Maragal_6,,0.069 +Na5,,0.224 +psse1,,0.010 +Reuters911,,0.046 +Si10H16,,0.085 +tomography,,0.196 diff --git a/docs/performance/clSPARSE-0.6.0.0/W9100/README.md b/docs/performance/clSPARSE-0.6.0.0/W9100/README.md new file mode 100644 index 0000000..409585e --- /dev/null +++ b/docs/performance/clSPARSE-0.6.0.0/W9100/README.md @@ -0,0 +1,26 @@ +# Benchmarking +## Hardware +w9100 + +## Environment +Ubuntu 14.04 + +clSPARSE v0.6.0.0 + +[Catalyst FirePro](http://support.amd.com/en-us/download/workstation?os=Linux%20x86_64#catalyst-pro) 14.301 + +## Tool +[clsparse-bench](clSPARSE\src\benchmarks\clsparse-bench) + +## Methodology +For each data point, we took 20 samples. Each sample consists of 20 calls +with a wait afterward. We benchmark with respect to the API, utilizing host timers +(not pure kernel time with ). +Outlying samples beyond 1 standard deviation were removed. + +Conversion routines benchmarked as number of Gi-Elements/s converted + +SpM-dV routine calculated as Gi-Bytes/s +```c +( sizeof( cl_int )*( csrMtx.num_nonzeros + csrMtx.num_rows ) + sizeof( T ) * ( csrMtx.num_nonzeros + csrMtx.num_cols + csrMtx.num_rows ) ) / time_in_ns( ); +``` diff --git a/docs/performance/clSPARSE-0.6.0.0/W9100/SpM-dV.csv b/docs/performance/clSPARSE-0.6.0.0/W9100/SpM-dV.csv new file mode 100644 index 0000000..88fae20 --- /dev/null +++ b/docs/performance/clSPARSE-0.6.0.0/W9100/SpM-dV.csv @@ -0,0 +1,17 @@ +SpM-dV,, +,, +OpenCL runtime:,1573.4 (VM), +OpenCL Device:,w9100, +,,Gi-Bytes/s +cant,,99.7 +consph,,118.8 +cop20k_A,,90.1 +mac_econ_fwd500,,65.9 +mc2depi,,113.4 +pdb1HYS,,102.6 +pwtk,,144.9 +rail4284,,100.7 +rma10,,78.4 +scircuit,,56.5 +shipsec1,,100.5 +webbase_1M,,120.3 diff --git a/docs/performance/clSPARSE-0.6.0.0/W9100/clsparse-bench.sh b/docs/performance/clSPARSE-0.6.0.0/W9100/clsparse-bench.sh new file mode 100644 index 0000000..08c80d2 --- /dev/null +++ b/docs/performance/clSPARSE-0.6.0.0/W9100/clsparse-bench.sh @@ -0,0 +1,33 @@ +#!/bin/bash + +#Executable +clsparse_bench=~/code/github/clMathLibraries/bin/clSPARSE/release/clSPARSE-build/staging/clsparse-bench + +#Data directories +mtx_cant=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/cant +mtx_consph=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/consph +mtx_cop20k_A=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/cop20k_A +mtx_mac_econ_fwd500=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/mac_econ_fwd500 +mtx_mc2depi=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/mc2depi +mtx_pdb1HYS=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/pdb1HYS +mtx_pwtk=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/pwtk +mtx_rail4284=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/rail4284 +mtx_rma10=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/rma10 +mtx_scircuit=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/scircuit +mtx_shipsec1=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/shipsec1 +mtx_webbase_1M=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/webbase-1M + +clsparse_timing_dir=timings/spm-dv/14.301 +mkdir -p ${clsparse_timing_dir} +${clsparse_bench} -f spmdv -d ${mtx_cant} > ${clsparse_timing_dir}/cant.txt +${clsparse_bench} -f spmdv -d ${mtx_consph} > ${clsparse_timing_dir}/consph.txt +${clsparse_bench} -f spmdv -d ${mtx_cop20k_A} > ${clsparse_timing_dir}/cop20k_A.txt +${clsparse_bench} -f spmdv -d ${mtx_mac_econ_fwd500} > ${clsparse_timing_dir}/mac_econ_fwd500.txt +${clsparse_bench} -f spmdv -d ${mtx_mc2depi} > ${clsparse_timing_dir}/mc2depi.txt +${clsparse_bench} -f spmdv -d ${mtx_pdb1HYS} > ${clsparse_timing_dir}/pdb1HYS.txt +${clsparse_bench} -f spmdv -d ${mtx_pwtk} > ${clsparse_timing_dir}/pwtk.txt +${clsparse_bench} -f spmdv -d ${mtx_rail4284} > ${clsparse_timing_dir}/rail4284.txt +${clsparse_bench} -f spmdv -d ${mtx_rma10} > ${clsparse_timing_dir}/rma10.txt +${clsparse_bench} -f spmdv -d ${mtx_scircuit} > ${clsparse_timing_dir}/scircuit.txt +${clsparse_bench} -f spmdv -d ${mtx_shipsec1} > ${clsparse_timing_dir}/shipsec1.txt +${clsparse_bench} -f spmdv -d ${mtx_webbase_1M} > ${clsparse_timing_dir}/webbase_1M.txt diff --git a/docs/performance/cuSPARSE-7.0/Tesla_K40/Csr2Coo.csv b/docs/performance/cuSPARSE-7.0/Tesla_K40/Csr2Coo.csv new file mode 100644 index 0000000..af2b02f --- /dev/null +++ b/docs/performance/cuSPARSE-7.0/Tesla_K40/Csr2Coo.csv @@ -0,0 +1,17 @@ +Csr2Coo,, +,, +CUDA driver:,346.47, +OpenCL Device:,k40c, +,,Gi-Elements/s +cant,,12.58 +consph,,13.22 +cop20k_A,,6.12 +mac_econ_fwd500,,1.89 +mc2depi,,1.26 +pdb1HYS,,16.58 +pwtk,,12.41 +rail4284,,14.61 +rma10,,10.85 +scircuit,,1.71 +shipsec1,,13.00 +webbase_1M,,0.99 diff --git a/docs/performance/cuSPARSE-7.0/Tesla_K40/Csr2Dense.csv b/docs/performance/cuSPARSE-7.0/Tesla_K40/Csr2Dense.csv new file mode 100644 index 0000000..7853e12 --- /dev/null +++ b/docs/performance/cuSPARSE-7.0/Tesla_K40/Csr2Dense.csv @@ -0,0 +1,15 @@ +Csr2Dense,, +,, +CUDA driver:,346.47, +OpenCL Device:,k40c, +,,Gi-Elements/s +Dubcova1,,0.011 +hydr1c_A_11,,0.012 +hydr1c_A_72,,0.012 +hydr1c_A_76,,0.012 +Maragal_6,,0.026 +Na5,,0.156 +psse1,,0.005 +Reuters911,,0.022 +Si10H16,,0.031 +tomography,,0.566 diff --git a/docs/performance/cuSPARSE-7.0/Tesla_K40/README.md b/docs/performance/cuSPARSE-7.0/Tesla_K40/README.md new file mode 100644 index 0000000..27fe3b8 --- /dev/null +++ b/docs/performance/cuSPARSE-7.0/Tesla_K40/README.md @@ -0,0 +1,26 @@ +# Benchmarking +## Hardware +Tesla K40c + +## Environment +OpenSUSE 13.2 + +cuSPARSE v7.0 + +Tesla driver 346.47 + +## Tool +[cusparse-bench](clSPARSE\src\benchmarks\cusparse-bench) + +## Methodology +For each data point, we took 20 samples. Each sample consists of 20 calls +with a wait afterward. We benchmark with respect to the API, utilizing host timers +(not pure kernel time with ). +Outlying samples beyond 1 standard deviation were removed. + +Conversion routines benchmarked as number of Gi-Elements/s converted + +SpM-dV routine calculated as Gi-Bytes/s +```c +( sizeof( cl_int )*( csrMtx.num_nonzeros + csrMtx.num_rows ) + sizeof( T ) * ( csrMtx.num_nonzeros + csrMtx.num_cols + csrMtx.num_rows ) ) / time_in_ns( ); +``` diff --git a/docs/performance/cuSPARSE-7.0/Tesla_K40/SpM-dV.csv b/docs/performance/cuSPARSE-7.0/Tesla_K40/SpM-dV.csv new file mode 100644 index 0000000..f911394 --- /dev/null +++ b/docs/performance/cuSPARSE-7.0/Tesla_K40/SpM-dV.csv @@ -0,0 +1,17 @@ +SpM-dV,, +,, +CUDA driver:,346.47, +OpenCL Device:,k40c, +,,Gi-Bytes/s +cant,,96.5 +consph,,102.1 +cop20k_A,,72.1 +mac_econ_fwd500,,47.2 +mc2depi,,74.3 +pdb1HYS,,103.0 +pwtk,,74.2 +rail4284,,34.8 +rma10,,64.9 +scircuit,,51.2 +shipsec1,,161.2 +webbase_1M,,32.6 diff --git a/docs/performance/cuSPARSE-7.0/Tesla_K40/cusparse-bench.sh b/docs/performance/cuSPARSE-7.0/Tesla_K40/cusparse-bench.sh new file mode 100644 index 0000000..4e2958f --- /dev/null +++ b/docs/performance/cuSPARSE-7.0/Tesla_K40/cusparse-bench.sh @@ -0,0 +1,32 @@ +#!/bin/bash + +#Executable +cusparse_bench=~/code/github/clMathLibraries/bin/clSPARSE/release/clSPARSE-build/staging/cusparse-bench + +#Data directories +mtx_cant=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/cant +mtx_consph=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/consph +mtx_cop20k_A=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/cop20k_A +mtx_mac_econ_fwd500=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/mac_econ_fwd500 +mtx_mc2depi=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/mc2depi +mtx_pdb1HYS=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/pdb1HYS +mtx_pwtk=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/pwtk +mtx_rail4284=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/rail4284 +mtx_rma10=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/rma10 +mtx_scircuit=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/scircuit +mtx_shipsec1=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/shipsec1 +mtx_webbase_1M=~/code/github/clMathLibraries/bin/deps/release/Externals/MTX/Bell_Garland/webbase-1M + +mkdir -p timings-cusparse-spmdv +${cusparse_bench} -f spmdv -d ${mtx_cant} > timings-cusparse-spmdv/cant.txt +${cusparse_bench} -f spmdv -d ${mtx_consph} > timings-cusparse-spmdv/consph.txt +${cusparse_bench} -f spmdv -d ${mtx_cop20k_A} > timings-cusparse-spmdv/cop20k_A.txt +${cusparse_bench} -f spmdv -d ${mtx_mac_econ_fwd500} > timings-cusparse-spmdv/mac_econ_fwd500.txt +${cusparse_bench} -f spmdv -d ${mtx_mc2depi} > timings-cusparse-spmdv/mc2depi.txt +${cusparse_bench} -f spmdv -d ${mtx_pdb1HYS} > timings-cusparse-spmdv/pdb1HYS.txt +${cusparse_bench} -f spmdv -d ${mtx_pwtk} > timings-cusparse-spmdv/pwtk.txt +${cusparse_bench} -f spmdv -d ${mtx_rail4284} > timings-cusparse-spmdv/rail4284.txt +${cusparse_bench} -f spmdv -d ${mtx_rma10} > timings-cusparse-spmdv/rma10.txt +${cusparse_bench} -f spmdv -d ${mtx_scircuit} > timings-cusparse-spmdv/scircuit.txt +${cusparse_bench} -f spmdv -d ${mtx_shipsec1} > timings-cusparse-spmdv/shipsec1.txt +${cusparse_bench} -f spmdv -d ${mtx_webbase_1M} > timings-cusparse-spmdv/webbase_1M.txt