Skip to content

Commit

Permalink
Phi2 dirty release
Browse files Browse the repository at this point in the history
  • Loading branch information
DumaxFr committed Jun 10, 2018
1 parent 5e02fcf commit bb94ece
Show file tree
Hide file tree
Showing 14 changed files with 81 additions and 76 deletions.
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,22 @@ Based on Christian Buchner's & Christian H.'s CUDA project, no more active o

Check the [README.txt](README.txt) for the additions

If you appreciate the work done for this release, please consider a small donation to one of those addresses :

LUX : LWcYLSx37F37gHjbmvhwQPQ8PhyDYibmpr
RVN : RQXpsvSaVrGYo4tyGityWDNBQMFcnqANyj
BTC : 1AtQXFbnzYTsjzy2bzSH6nPGxqZ32NG42T

You may also retribute the previous "forker" :

BTC donation address: 1AJdfCpLWPNoAMDfHF1wD5y8VgKSSTHxPo (tpruvot)

A part of the recent algos were originally written by [djm34](https://github.com/djm34) and [alexis78](https://github.com/alexis78)

This variant was tested and built on Linux (ubuntu server 14.04, 16.04, Fedora 22 to 25)
This variant was tested and built on Windows 10 and Visual Studio 2015 (community edition)
It is also built for Windows 7 to 10 with VStudio 2013, to stay compatible with Windows 7 and Vista.

Note that the x86 releases are generally faster than x64 ones on Windows, but that tend to change with the recent drivers.

The recommended CUDA Toolkit version was the [6.5.19](http://developer.download.nvidia.com/compute/cuda/6_5/rel/installers/cuda_6.5.19_windows_general_64.exe), but some light algos could be faster with the version 7.5 and 8.0 (like lbry, decred and skein).
Note that x86 releases won't be anymore provided.

About source code dependencies
------------------------------
Expand All @@ -24,7 +30,7 @@ This project requires some libraries to be built :
- Curl (prebuilt for win)
- pthreads (prebuilt for win)

The tree now contains recent prebuilt openssl and curl .lib for both x86 and x64 platforms (windows).
The tree now contains recent prebuilt openssl and curl .lib for x64 platforms (windows).

To rebuild them, you need to clone this repository and its submodules :
git clone https://github.com/peters/curl-for-windows.git compat/curl-for-windows
Expand Down
41 changes: 26 additions & 15 deletions README.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@

ccminer 2.2.5 (Feb 2018) "x16r algo"
ccminer dumax-0.9.0 (June 10th 2018) "Phi2 algo"
---------------------------------------------------------------

***************************************************************
If you find this tool useful and like to support its continuous
development, then consider a donation.

DumaxFr@github:
LUX : LWcYLSx37F37gHjbmvhwQPQ8PhyDYibmpr
RVN : RQXpsvSaVrGYo4tyGityWDNBQMFcnqANyj
BTC : 1AtQXFbnzYTsjzy2bzSH6nPGxqZ32NG42T

tpruvot@github:
BTC : 1AJdfCpLWPNoAMDfHF1wD5y8VgKSSTHxPo
DCR : DsUCcACGcyP8McNMRXQwbtpDxaVUYLDQDeU
Expand Down Expand Up @@ -102,8 +107,8 @@ its command line interface and options.
neoscrypt use to mine FeatherCoin, Trezarcoin, Orbitcoin, etc
nist5 use to mine TalkCoin
penta use to mine Joincoin / Pentablake
phi use to mine LUXCoin
phi2d use to mine LUXCoin after fork
phi use to mine Folm
phi2 use to mine LUXCoin
polytimos use to mine Polytimos
quark use to mine Quarkcoin
qubit use to mine Qubit
Expand Down Expand Up @@ -219,13 +224,11 @@ Wildkeccak specific:

>>> Examples <<<

Example for Ravencoin mining on thetechnicalspool.com with all nvidia gpus in your system
ccminer -a x16r -o stratum+tcp://thetechnicalspool.com:3636 -u <<username.worker>> -p <<workerpassword>>

Example for Heavycoin Mining on heavycoinpool.com with a single gpu in your system
ccminer -t 1 -a heavy -o stratum+tcp://stratum01.heavycoinpool.com:5333 -u <<username.worker>> -p <<workerpassword>> -v 8


Example for Heavycoin Mining on hvc.1gh.com with a dual gpu in your system
ccminer -t 2 -a heavy -o stratum+tcp://hvcpool.1gh.com:5333/ -u <<WALLET>> -p x -v 8
Example for Ravencoin mining on thetechnicalspool.com with a single gpu from many in your system (gpu number from 0 to x)
ccminer -a x16r -o stratum+tcp://thetechnicalspool.com:3636 -u <<username.worker>> -p <<workerpassword>> -d 1


Example for Fuguecoin solo-mining with 4 gpu's in your system and a Fuguecoin-wallet running on localhost
Expand Down Expand Up @@ -274,15 +277,25 @@ I plan to add a json format later, if requests are formatted in json too..
>>> Additional Notes <<<

This code should be running on nVidia GPUs ranging from compute capability
3.0 up to compute capability 5.2. Support for Compute 2.0 has been dropped
5.2 up to compute capability 6.1 (7.0). Support for Compute 3.0 has been dropped
so we can more efficiently implement new algorithms using the latest hardware
features.


>>> RELEASE HISTORY <<<
Feb. 2017 v2.2.5

June 10th 2018 dumax-0.9.0 (initial fork release)
Moved to visual studio 2015 and CUDA 9.1
Improved x16r
Added x16s (same kernels than x16r)
Added Phi2 algo (unoptimized)
Improved Phi, x17

>>> RELEASE HISTORY by tpruvot <<<
Feb. 2018 v2.2.5
New x16r algo

Jan. 04th 2017 v2.2.4
Jan. 04th 2018 v2.2.4
Improve lyra2v2
Higher keccak default intensity
Drop SM 2.x support by default, for CUDA 9 and more recent
Expand Down Expand Up @@ -602,6 +615,4 @@ Source code is included to satisfy GNU GPL V3 requirements.

With kind regards,

Christian Buchner ( [email protected] )
Christian H. ( Chris84 )
Tanguy Pruvot ( tpruvot@github )
Stephane Duma ( DumaxFr@github )
2 changes: 1 addition & 1 deletion algos.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ static const char *algo_names[] = {
"nist5",
"penta",
"phi",
"phi2d",
"phi2",
"polytimos",
"quark",
"qubit",
Expand Down
2 changes: 1 addition & 1 deletion bench.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ void algo_free_all(int thr_id)
free_nist5(thr_id);
free_pentablake(thr_id);
free_phi(thr_id);
free_phi2d(thr_id);
free_phi2(thr_id);
free_polytimos(thr_id);
free_quark(thr_id);
free_qubit(thr_id);
Expand Down
4 changes: 2 additions & 2 deletions ccminer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ Options:\n\
nist5 NIST5 (TalkCoin)\n\
penta Pentablake hash (5x Blake 512)\n\
phi Lux, Folm ...\n\
phi2d Dirty Lux Phi2\n\
phi2 Dirty Lux Phi2\n\
polytimos Politimos\n\
quark Quark\n\
qubit Qubit\n\
Expand Down Expand Up @@ -2430,7 +2430,7 @@ static void *miner_thread(void *userdata)
rc = scanhash_phi(thr_id, &work, max_nonce, &hashes_done);
break;
case ALGO_PHI2D:
rc = scanhash_phi2d(thr_id, &work, max_nonce, &hashes_done);
rc = scanhash_phi2(thr_id, &work, max_nonce, &hashes_done);
break;
case ALGO_POLYTIMOS:
rc = scanhash_polytimos(thr_id, &work, max_nonce, &hashes_done);
Expand Down
4 changes: 2 additions & 2 deletions ccminer.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -395,7 +395,7 @@
<ClCompile Include="neoscrypt\neoscrypt-cpu.c" />
<ClInclude Include="neoscrypt\cuda_vectors.h" />
<ClInclude Include="phi\cuda_phi.h" />
<ClInclude Include="phi\cuda_phi2d.h" />
<ClInclude Include="phi\cuda_phi2.h" />
<ClInclude Include="phi\cuda_phi2_skein512.h" />
<ClInclude Include="x11\cuda_x11_simd512_sm2.cuh" />
<ClInclude Include="x16x\cuda_x16x.h" />
Expand Down Expand Up @@ -553,7 +553,7 @@
<CudaCompile Include="phi\cuda_phi2_br_streebog_echo512.cu" />
<CudaCompile Include="phi\cuda_phi2_lyra2.cu" />
<CudaCompile Include="phi\cuda_phi2_skein512.cu" />
<CudaCompile Include="phi\phi2d.cu" />
<CudaCompile Include="phi\phi2.cu" />
<CudaCompile Include="scrypt\blake.cu" />
<CudaCompile Include="scrypt\keccak.cu" />
<CudaCompile Include="scrypt\sha256.cu" />
Expand Down
4 changes: 2 additions & 2 deletions ccminer.vcxproj.filters
Original file line number Diff line number Diff line change
Expand Up @@ -620,8 +620,8 @@
<ClInclude Include="Algo512\cuda_b_hamsi512.h" />
<ClInclude Include="Algo512\cuda_b_cubehash512.h" />
<ClInclude Include="Algo512\cuda_b_jh512.h" />
<ClInclude Include="phi\cuda_phi2d.h" />
<ClInclude Include="phi\cuda_phi2_skein512.h" />
<ClInclude Include="phi\cuda_phi2.h" />
</ItemGroup>
<ItemGroup>
<CudaCompile Include="cuda.cpp">
Expand Down Expand Up @@ -1010,10 +1010,10 @@
<CudaCompile Include="Algo512\cuda_b_hamsi512.cu" />
<CudaCompile Include="Algo512\cuda_b_cubehash512.cu" />
<CudaCompile Include="Algo512\cuda_b_jh512.cu" />
<CudaCompile Include="phi\phi2d.cu" />
<CudaCompile Include="phi\cuda_phi2_skein512.cu" />
<CudaCompile Include="phi\cuda_phi2_lyra2.cu" />
<CudaCompile Include="phi\cuda_phi2_br_streebog_echo512.cu" />
<CudaCompile Include="phi\phi2.cu" />
</ItemGroup>
<ItemGroup>
<Image Include="res\ccminer.ico">
Expand Down
4 changes: 2 additions & 2 deletions miner.h
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ extern "C" {
extern int scanhash_nist5(int thr_id, struct work *work, uint32_t max_nonce, unsigned long *hashes_done);
extern int scanhash_pentablake(int thr_id, struct work *work, uint32_t max_nonce, unsigned long *hashes_done);
extern int scanhash_phi(int thr_id, struct work* work, uint32_t max_nonce, unsigned long *hashes_done);
extern int scanhash_phi2d(int thr_id, struct work* work, uint32_t max_nonce, unsigned long *hashes_done);
extern int scanhash_phi2(int thr_id, struct work* work, uint32_t max_nonce, unsigned long *hashes_done);
extern int scanhash_polytimos(int thr_id, struct work* work, uint32_t max_nonce, unsigned long *hashes_done);
extern int scanhash_quark(int thr_id, struct work *work, uint32_t max_nonce, unsigned long *hashes_done);
extern int scanhash_qubit(int thr_id, struct work* work, uint32_t max_nonce, unsigned long *hashes_done);
Expand Down Expand Up @@ -358,7 +358,7 @@ extern "C" {
extern void free_nist5(int thr_id);
extern void free_pentablake(int thr_id);
extern void free_phi(int thr_id);
extern void free_phi2d(int thr_id);
extern void free_phi2(int thr_id);
extern void free_polytimos(int thr_id);
extern void free_quark(int thr_id);
extern void free_qubit(int thr_id);
Expand Down
File renamed without changes.
45 changes: 19 additions & 26 deletions phi/cuda_phi2_lyra2.cu
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@
* DumaxFr@github 2018 - Dual Lyra2 for Phi2
*/

#include <stdio.h>

#include "cuda_helper.h"
#include "cuda_vector_uint2x4.h"


#define PHI2LYRA2_TPB64 32
// Limited by shared mem max capacity (TPBx1532 <= 48kb)
// 48kb prefered to fit 2 times in 96kb max shared on sm52 & sm61)
#define PHI2LYRA2_TPB64_MAIN 32
#define PHI2LYRA2_TPB64_LDST 128

#ifdef __INTELLISENSE__
/* just for vstudio code colors */
Expand Down Expand Up @@ -147,7 +150,7 @@ static void round_lyra(uint2x4* s)
}

__device__ __forceinline__
static void reduceDuplex(uint2 state[4], uint32_t thread, const uint32_t threads)
static void reduceDuplex(uint2 state[4], const uint32_t thread, const uint32_t threads)
{
uint2 state1[3];

Expand Down Expand Up @@ -175,7 +178,7 @@ static void reduceDuplex(uint2 state[4], uint32_t thread, const uint32_t threads
}

__device__ __forceinline__
static void reduceDuplexRowSetup(const int rowIn, const int rowInOut, const int rowOut, uint2 state[4], uint32_t thread, const uint32_t threads)
static void reduceDuplexRowSetup(const int rowIn, const int rowInOut, const int rowOut, uint2 state[4], const uint32_t thread, const uint32_t threads)
{
uint2 state1[3], state2[3];

Expand Down Expand Up @@ -226,7 +229,7 @@ static void reduceDuplexRowt(const int rowIn, const int rowInOut, const int rowO
LD4S(state1, rowIn, i, thread, threads);
LD4S(state2, rowInOut, i, thread, threads);

#pragma unroll
#pragma unroll
for (int j = 0; j < 3; j++)
state[j] ^= state1[j] + state2[j];

Expand Down Expand Up @@ -255,7 +258,7 @@ static void reduceDuplexRowt(const int rowIn, const int rowInOut, const int rowO

LD4S(state1, rowOut, i, thread, threads);

#pragma unroll
#pragma unroll
for (int j = 0; j < 3; j++)
state1[j] ^= state[j];

Expand Down Expand Up @@ -318,19 +321,9 @@ static void reduceDuplexRowt_8(const int rowInOut, uint2* state, const uint32_t
state[j] ^= last[j];
}

//__constant__ uint2x4 blake2b_IV[2] = {
// 0xf3bcc908lu, 0x6a09e667lu,
// 0x84caa73blu, 0xbb67ae85lu,
// 0xfe94f82blu, 0x3c6ef372lu,
// 0x5f1d36f1lu, 0xa54ff53alu,
// 0xade682d1lu, 0x510e527flu,
// 0x2b3e6c1flu, 0x9b05688clu,
// 0xfb41bd6blu, 0x1f83d9ablu,
// 0x137e2179lu, 0x5be0cd19lu
//};

__global__
__launch_bounds__(64, 1)
__launch_bounds__(PHI2LYRA2_TPB64_LDST, 8)
void cuda_phi2_lyra2_gpu_hash_32p1_1(const uint32_t threads, const uint2* const __restrict__ g_hash) {

const uint32_t thread = (blockDim.x * blockIdx.x + threadIdx.x);
Expand Down Expand Up @@ -369,7 +362,7 @@ void cuda_phi2_lyra2_gpu_hash_32p1_1(const uint32_t threads, const uint2* const
}

__global__
__launch_bounds__(64, 1)
__launch_bounds__(PHI2LYRA2_TPB64_LDST, 8)
void cuda_phi2_lyra2_gpu_hash_32p2_1(const uint32_t threads, const uint2* const __restrict__ g_hash) {

const uint32_t thread = (blockDim.x * blockIdx.x + threadIdx.x);
Expand Down Expand Up @@ -408,7 +401,7 @@ void cuda_phi2_lyra2_gpu_hash_32p2_1(const uint32_t threads, const uint2* const
}

__global__
__launch_bounds__(PHI2LYRA2_TPB64, 1)
__launch_bounds__(PHI2LYRA2_TPB64_MAIN, 1)
void cuda_phi2_lyra2_gpu_hash_32_2(const uint32_t threads) {

const uint32_t thread = blockDim.y * blockIdx.x + threadIdx.y;
Expand Down Expand Up @@ -453,7 +446,7 @@ void cuda_phi2_lyra2_gpu_hash_32_2(const uint32_t threads) {
}

__global__
__launch_bounds__(64, 1)
__launch_bounds__(PHI2LYRA2_TPB64_LDST, 8)
void cuda_phi2_lyra2_gpu_hash_32p1_3(const uint32_t threads, uint2 *g_hash) {

const uint32_t thread = blockDim.x * blockIdx.x + threadIdx.x;
Expand All @@ -479,7 +472,7 @@ void cuda_phi2_lyra2_gpu_hash_32p1_3(const uint32_t threads, uint2 *g_hash) {
}

__global__
__launch_bounds__(64, 1)
__launch_bounds__(PHI2LYRA2_TPB64_LDST, 8)
void cuda_phi2_lyra2_gpu_hash_32p2_3(const uint32_t threads, uint2 *g_hash) {

const uint32_t thread = blockDim.x * blockIdx.x + threadIdx.x;
Expand Down Expand Up @@ -513,20 +506,20 @@ void cuda_phi2_lyra2_cpu_init(uint64_t *d_matrix) {

__host__
void cuda_phi2_lyra2_cpu_hash_32x2(const uint32_t threads, uint32_t *d_hash) {
uint32_t tpb = PHI2LYRA2_TPB64;
uint32_t tpb = PHI2LYRA2_TPB64_MAIN;

dim3 grid1((threads * 4 + tpb - 1) / tpb);
dim3 block1(4, tpb >> 2);

dim3 grid2((threads + 64 - 1) / 64);
dim3 block2(64);
dim3 grid2((threads + PHI2LYRA2_TPB64_LDST - 1) / PHI2LYRA2_TPB64_LDST);
dim3 block2(PHI2LYRA2_TPB64_LDST);

cuda_phi2_lyra2_gpu_hash_32p1_1 <<< grid2, block2 >>> (threads, (uint2*)d_hash);
cuda_phi2_lyra2_gpu_hash_32_2 <<< grid1, block1, 24 * (8 - 0) * sizeof(uint2) * tpb >>> (threads);
cuda_phi2_lyra2_gpu_hash_32_2 <<< grid1, block1, 192 * sizeof(uint2) * tpb >>> (threads);
cuda_phi2_lyra2_gpu_hash_32p1_3 <<< grid2, block2 >>> (threads, (uint2*)d_hash);

cuda_phi2_lyra2_gpu_hash_32p2_1 <<< grid2, block2 >>> (threads, (uint2*)d_hash);
cuda_phi2_lyra2_gpu_hash_32_2 <<< grid1, block1, 24 * (8 - 0) * sizeof(uint2) * tpb >>> (threads);
cuda_phi2_lyra2_gpu_hash_32_2 <<< grid1, block1, 192 * sizeof(uint2) * tpb >>> (threads);
cuda_phi2_lyra2_gpu_hash_32p2_3 <<< grid2, block2 >>> (threads, (uint2*)d_hash);

}
Loading

0 comments on commit bb94ece

Please sign in to comment.