Skip to content

Commit

Permalink
Master to be -> master (#16)
Browse files Browse the repository at this point in the history
* clean up types

* clean up types

* rm AbstracDitcs

* rm Abstract from Matrix

* rm AbstracDitcs and add zeros

* clean up tensors more

* clean up sparse

* clean up  virtual

* clean up linear alg ext

* simplify lin alg ext

* comments

* transpose

* corewct types

* simplify more

* transpose

* clean up #1

* clean up #2

* clean up #3

* clean up #4

* add canonise.jl

* formatt code

* clean up #5

* remove trans

* add new func

* clean up

* start cleaning gauges

* clean up

* add comments

* output env variational

* MpoTensor

* structs

* MpoTensor

* fix issues with types

* fix nothing type

* length for MpoTensor to be verified

* clean up

* fix subtyping in tensors

* clean up types

* add N explicitte in some places

* rm lenght

* MpoTensor

* transpose

* clea up

* ideas

* mv ideas to attic

* make MpoTensor external const

* renaming

* add tsvd example

* zipper

* improve bench

* clean up exports

* add comments

* add basic tests for QMps and Canonise

* clean up stuff

* measure_memory

* add measure_memory

* add format_bytes

* zipper

* sparse zipper

* add eltype for mps / mps

* fix some issues

* zipper psvd

* zipper psvd

* zipper sparse svd

* tsvd

* change svd_corner_matrix

* correct Adjoint for CornerTenso

* to cuda

* device

* device

* edvice

* diag

* fix Diagona

* fix device

* fix array 2

* fix diag 3

* works

* add comments

* toward cuda

* fix kwargs

* cuda contractons

* cuda central

* cleanup cuda

* patch array centraltensor

* patching dense central tensor

* vector memory

* start cleaning

* add GPU flags, clean up

* clean up

* zipper QMps on GPU

* start working on qr

* qr on GPU works

* add basic tests for QMps

* more basic tests

* aux

* change site

* clean up

* clean up CPU <-> GPU transfer

* gauges on gpu

* change types in gauges

* reordering

* clean up

* fix bug

* clean up

* fix

* rm unnecassry permusts

* dense site

* clean up

* clean up

* zipper central

* virtual

* test

* central

* cuproj

* aux_saprse

* add @view in site

* clean up

* add bench for multi gpu

* add examples

* add commenst

* add @inbounds in some places

* towards cpu contractions

* to cpu

* benchmarks for memoizations of cusparse

* added comment explaining convention

* SparseCSC

* move stuff to attic

* clean up a bit

* clean types a bit

* added measure_memory for memoization caches

* added handling of sparse matrices

* hotfix in sparse matix CSR

* clean up

* clean up 2

* fix types

* sitetensor

* SiteTensor Sparse

* VirtualTensor

* poolofprojectors

* memory

* copied from "leg-ordering" branch

* start working on SCR problem

* project_ket_on_bra virtual

* add SparseArrays to test cuSparse

* virtual update_env_left

* add  CUDA.:* (commented)

* virtual update_env_right

* cirtual update_reduced_env_right

* add bench for CSR

* add bench

* site

* info

* reduced_env_right sitetensor

* measure_spectrum

* input mps is in the correct canonical form

* test measure_spectrum

* virtual

* test measure spectrum

* move_to_CPU

* schmidt

* view virtual

* rand qmpo

* toward allocation

* allocate site

* site working

* speed up virtual (update_env_right)

* virtual alloc

* virtual no cusparse

* virtual no cusparse

* proj

* site

* test

* working site

* cleanup

* new zipper

* new zipper

* stable zipper

* building blocks of new zipper-variational

* fix type

* fix typo

* start new zipper

* new env

* new zipper

* new_zipper clear env

* kill attic

* fix new zipper

* args in zipper

* clean up

* repeat psvd

* sparse

* corner matrix for virtual

* virtualtest

* restart virtual

* env_levt_v1

* new virtual

* my_batched_mul

* virtual with 2 step projectors

* WIP

* cases without central tensor

* fix update_reduced_env_right virtual

* central batched_mul

* measure_memory(EnvironmentMixed)

* change fg to cl_h

* PoolOfProjectors in clustered_hamiltonian

* clean up tests

* split long lines

* add docs

* docs

* clean up toml

* clean up

* add the docs

* add flag for RMF in zipper

* add docs

* add depth of sweep in zipper

* Rename aux.jl to utils.jl, becouse aux.* is restricted filename in windows

* Update SpinGlassTensors.jl, changed aux to utils

* moved projectors.jl from SpinGlassNetworks

* remove networks from Project

* moved test

* add projectors to tests

* update julia

* added projectors.jl to runtest

* moved rank_reveal from SpinGlassNetworks

* bugfix

* update to CUDA 4.4.1, TensorOperations 4 and TensorCast 0.4. SEE README

* reformat

* fix ci

* onlcy set self-hosted

* fix nothing comparsion (#18)

* Fixes creation of sparse matrices (#19)

* change CUDA sparse to CSR, rename function createing sparse matrices

* up

* update ci runner

* fix runs-on ci

* fix ci

* set proper rev

* add flags

* fix project in TransmuteDims

* fix typo

---------

Co-authored-by: bartekGardas <[email protected]>
Co-authored-by: marekrams <[email protected]>
Co-authored-by: annamariadziubyna <[email protected]>
Co-authored-by: annamariadziubyna <[email protected]>
Co-authored-by: Łukasz Pawela <[email protected]>
Co-authored-by: Łukasz Pawela <[email protected]>
  • Loading branch information
7 people authored Apr 3, 2024
1 parent 7372b46 commit 2d761c2
Show file tree
Hide file tree
Showing 64 changed files with 3,840 additions and 1,227 deletions.
16 changes: 6 additions & 10 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,21 @@ on:
- pull_request
jobs:
test:
name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }}
runs-on: ${{ matrix.os }}
name: Julia ${{ matrix.version }}
runs-on: [self-hosted,titan,gpu]
strategy:
fail-fast: false
matrix:
version:
- '1.7'
- '1.8'
os:
- ubuntu-latest
- macOS-latest
arch:
- x64
- '1.9'
- '1.10'
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
with:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
- name: Fix TransmuteDims
run: julia --project=@. --color=yes -e 'using Pkg; Pkg.add(name="TransmuteDims", rev="strided2")'
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-runtest@latest
env:
Expand Down
31 changes: 20 additions & 11 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,25 +1,34 @@
name = "SpinGlassTensors"
uuid = "7584fc6a-5a23-4eeb-8277-827aab0146ea"
authors = [
"Łukasz Pawela <[email protected]>",
"Konrad Jałowiecki <[email protected]>",
"Bartłomiej Gardas <[email protected]>"
]
version = "0.3.0"
authors = ["Anna Maria Dziubyna <[email protected]>", "Tomasz Śmierzchalski <[email protected]>", "Bartłomiej Gardas <[email protected]>", "Konrad Jałowiecki <[email protected]>", "Łukasz Pawela <[email protected]>", "Marek M. Rams <[email protected]>"]
version = "1.0.0"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
DocStringExtensions = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Memoize = "c03570c3-d221-55d1-a50c-7939bbd78826"
LowRankApprox = "898213cb-b102-5a47-900c-97e73b919f73"
MKL = "33e6dc65-8f57-5167-99aa-e5a354878fb2"
Memoization = "6fafb56a-5788-4b4e-91ca-c0cea6611c73"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
TSVD = "9449cd9e-2762-5aa3-a617-5413e99d722e"
TensorCast = "02d47bb6-7ce6-556a-be16-bb1710789e2b"
TensorOperations = "6aa20fa7-93e2-5fca-9bc0-fbd0db3c71a2"
TransmuteDims = "24ddb15e-299a-5cc3-8414-dbddc482d9ca"
cuTENSOR = "011b41b2-24ef-40a8-b3eb-fa098493e9e1"

[compat]
DocStringExtensions = "0.8"
Memoize = "0.4"
CUDA = "4.4.1"
DocStringExtensions = "0.9.3"
LowRankApprox = "0.5.5"
MKL = "0.4.2"
Memoization = "0.2.1"
SparseArrays = "1.9"
TensorCast = "0.4"
TensorOperations = "3.0.1"
julia = "1.7, 1.8"
TensorOperations = "4"
cuTENSOR = "1.1.0"
julia = "1.9, 1.10"

[extras]
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
[![Coverage Status](https://coveralls.io/repos/github/iitis/SpinGlassTensors.jl/badge.svg?branch=master)](https://coveralls.io/github/iitis/SpinGlassTensors.jl?branch=master)
# SpinGlassTensors.jl
This works with CUDA v4.4.1. You need to manually `]add TransmuteDims#strided2`
29 changes: 29 additions & 0 deletions bench_mm.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
using TensorCast, TensorOperations
function time_mm()
M = rand(100, 100, 100)
L = rand(100, 100)
R = rand(100, 100)
@time begin
@matmul M1[x, σ, α] := sum(β) L[x, β] * M[β, σ, α]
@matmul MM[x, σ, y] := sum(α) M1[x, σ, α] * R[α, y]
end
end

function time_tensor()
M = rand(100, 100, 100)
L = rand(100, 100)
R = rand(100, 100)

@time begin
@tensor M̃[x, σ, y] := L[x, β] * M[β, σ, α] * R[α, y] order = (α, β)
# @cast B[(x, σ), y] |= M̃[x, σ, y]
end
end

println("matmul")
time_mm()
time_mm()

println("\n tensor")
time_tensor()
time_tensor()
12 changes: 12 additions & 0 deletions benchmark/args.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
using LinearAlgebra

function my_svd(A; kwargs...)
svd(A; kwargs...)
end


T = Float64
n = 2
A = rand(T, 2, 2)

my_svd(A, full = true)
29 changes: 29 additions & 0 deletions benchmark/cuda_matrix_mul.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
using CUDA
using LinearAlgebra

CUDA.allowscalar(false)

nnz = 100
Val = CUDA.rand(Float64, nnz)
Ptr = CuArray(1:nnz+1)
Ind = CuArray(rand(1:100, nnz))

A = CUDA.CUSPARSE.CuSparseMatrixCSR(Ptr, Ind, Val, (100, 100))
B = CUDA.rand(Float64, 100, 100)
C = CUDA.CUSPARSE.CuSparseMatrixCSC(Ptr, Ind, Val, (100, 100))

A * B # no scalar indexing
CUDA.@allowscalar B * A # scalar indexing

C * B # no scalar indexing
CUDA.@allowscalar B * C # scalar indexing

A' * B # no scalar indexing
CUDA.@allowscalar B * A' # scalar indexing

transpose(A) * B # no scalar indexing
CUDA.@allowscalar B * transpose(A) # scalar indexing
# problem is when we multiply dense x sparse

D = rand(Float64, (100, 100))
CUDA.@allowscalar D * A # scalar indexing
15 changes: 15 additions & 0 deletions benchmark/gpu_slicing.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
using CUDA

T = Float64
n = 10000
k = 500

a = CUDA.rand(T, n, n)
p = reverse(collect(1:k))
p_d = CuArray(p)

@time A = a[:, p]
@time @inbounds A = a[:, p]
@time A = a[:, p_d]
@time @inbounds A = a[:, p_d]
nothing
46 changes: 46 additions & 0 deletions benchmark/memoization_cusparse.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
using Memoization
using LinearAlgebra
using CUDA
using BenchmarkTools

# Functions from constactions_cuda/sparse.jl which are not exported

@memoize Dict function aux_cusparse(::Type{R}, n::Int64) where {R<:Real}
println("entering aux_cusparse function")
CuArray(1:n+1), CUDA.ones(R, n)
end

@memoize Dict function CUDA.CUSPARSE.CuSparseMatrixCSC(
::Type{R},
p::Vector{Int},
) where {R<:Real}
println("entering cusparse")
n = length(p)
cn, co = aux_cusparse(R, n)
CUDA.CUSPARSE.CuSparseMatrixCSC(cn, CuArray(p), co, (maximum(p), n))
end


function CuSparseMatrixCSC_no_memo(::Type{R}, p::Vector{Int}) where {R<:Real}
println("entering no memo")
n = length(p)
cn, co = aux_cusparse(R, n)
CUDA.CUSPARSE.CuSparseMatrixCSC(cn, CuArray(p), co, (maximum(p), n))
end

# test of their memoization

p = sort(rand(1:5000, 10000000))
p2 = sort(rand(1:5000, 10000000))
@time A = CuSparseMatrixCSC_no_memo(Float64, p)
@time B = CuSparseMatrixCSC_no_memo(Float64, p)

@time C = CUDA.CUSPARSE.CuSparseMatrixCSC(Float64, p) # compilation time?

@time D = CUDA.CUSPARSE.CuSparseMatrixCSC(Float64, p)
@time E = CUDA.CUSPARSE.CuSparseMatrixCSC(Float64, p2)
@time F = CUDA.CUSPARSE.CuSparseMatrixCSC(Float64, p2)
CUDA.memory_status()
Memoization.empty_all_caches!()
CUDA.memory_status()
# clearing memoization caches doeas not free GPU memory
38 changes: 38 additions & 0 deletions benchmark/memoization_test.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
using SpinGlassTensors
using Memoization
using CUDA


@memoize Dict function example_cuda_array(::Type{R}, size::Int64) where {R<:Real}
CUDA.rand(R, (size, size))
end


@memoize Dict function example_array(::Type{R}, size::Int64) where {R<:Real}
rand(R, size, size)
end


@memoize Dict function aux_cusparse(::Type{R}, n::Int64) where {R<:Real}
CuArray(1:n+1), CUDA.ones(R, n)
end


@memoize Dict function CUDA.CUSPARSE.CuSparseMatrixCSC(
::Type{R},
p::Vector{Int},
) where {R<:Real}
n = length(p)
cn, co = aux_cusparse(R, n)
CUDA.CUSPARSE.CuSparseMatrixCSC(cn, CuArray(p), co, (maximum(p), n))
end


A = example_cuda_array(Float64, 10000)
B = example_cuda_array(Float64, 1100)
C = example_array(Float64, 1000)
p = rand(1:5000, 100000000)
D = CUDA.CUSPARSE.CuSparseMatrixCSC(Float64, p)
CUDA.memory_status()
println("/n")
measure_memory(Memoization.caches)
25 changes: 25 additions & 0 deletions benchmark/mulit_gpu.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
using CUDA

function move_to_CUDA(a::Array{T,N}) where {T,N}
buf_a = Mem.alloc(Mem.Unified, sizeof(a))
d_a = unsafe_wrap(CuArray{T,N}, convert(CuPtr{T}, buf_a), size(a))
finalizer(d_a) do _
Mem.free(buf_a)
end
copyto!(d_a, a)
d_a
end

T = Float64
n = 100
gpus = Int(length(devices()))

a = rand(T, n, n, gpus)
a_d = move_to_CUDA(a)

for (gpu, dev) enumerate(devices())
device!(dev)
@views a_d[:, :, gpu] .= 2 .* a_d[:, :, gpu]
end

a_d
10 changes: 10 additions & 0 deletions benchmark/mulit_gpu2.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
using CUDA

T = Float64
n = 100
gpus = Int(length(devices()))

a = rand(T, n, n, gpus)
a_d = cu(a, unified = true)

a_d
86 changes: 86 additions & 0 deletions benchmark/psvd.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
using LinearAlgebra, MKL
using TensorOperations
using TensorCast
using TSVD
using LowRankApprox
using RandomizedLinAlg
using FameSVD

N = 100
cut = 8

mat = rand(100, 100);
U, S, V = svd(mat);
S = exp.(collect(0:N-1) * log(4 / 5));

mat = U * Diagonal(S) * V';
U, S, V = svd(mat);

U, S, V = U[:, 1:cut], S[1:cut], V[:, 1:cut]
mat1 = U * Diagonal(S) * V'
println(S[1:cut])
println(norm(mat - mat1))

Up, Sp, Vp = psvd(mat, rank = 2 * cut)

mat2 = Up[:, 1:cut] * Diagonal(Sp[1:cut]) * Vp[:, 1:cut]'

println(Sp[1:cut])
println(Sp[1:cut] - S[1:cut])
println(norm(mat - mat2))

# Vp = V

C = mat * Vp
println(size(C))
Ut, _ = qr(C)
Ut = Ut[:, 1:cut]
println(size(Ut))
C = Ut' * mat
Vp, _ = qr(C')
Vp = Vp[:, 1:cut]



C = mat * Vp
Uf, Sf, Vf = svd(C);
Uf, Sf, Vf = Uf[:, 1:cut], Sf[1:cut], Vf[:, 1:cut]
mat3 = Uf * Diagonal(Sf) * Vf' * Vp'
println(Sf - S[1:cut])
println(norm(mat - mat3))

nothing


iter = 5
Up, Sp, Vp = [], [], []
for i = 1:iter
Utemp, Stemp, Vtemp = psvd(mat, rank = 2 * cut)
push!(Up, Utemp)
push!(Sp, Stemp)
push!(Vp, Vtemp)
end

Ups = hcat(Up...)
Vps = hcat(Vp...)
Sps = vcat(Sp...) / iter
println(size(Ups), " ", size(Vps), " ", size(Sps))
println(size(Up[1]), " ", size(Vp[1]), " ", size(Sp[1]))

Uq, Ur = qr(Ups)
Vq, Vr = qr(Vps)

Ut, St, Vt = svd(Ur * Diagonal(Sps) * Vr')

U2 = Uq * Ut[:, 1:cut]
V2 = Vq * Vt[:, 1:cut]
S2 = St[1:cut]
println(St)
println(S2)

mat4 = U2 * Diagonal(S2) * V2'


println(norm(mat1 - mat2))
println(norm(mat1 - mat3))
println(norm(mat1 - mat4))
Loading

0 comments on commit 2d761c2

Please sign in to comment.