-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try CuPy version of SuperLU. #34
Comments
@Kyroba I have opened a Pull Request (#36) in which I have implemented GPU acceleration via the CuPy library. If you are interested, it would be great if you could try out this new version of pip install git+https://github.com/loganbvh/py-tdgl.git@gpu
pip install cupy-cuda11x # for example, for CUDA v11.2 ~ 11.8 For more details, see https://py-tdgl--36.org.readthedocs.build/en/36/installation.html#gpu-acceleration. |
@loganbvh I was meaning to look into this but got distracted by experiments! I will do this tomorrow, thank you for following up on it! |
Loop_parameter is derived from the following:
Essentially using an array of positions for each loop center. This composite parameter works when not using the CuPy sparse solver. |
Thanks for trying this, @Kyroba! As a workaround for the loop_parameter = 0
# Sum the potentials for the centers
for center in positions:
loop_parameter += CurrentLoop(
current=current,
radius=radius,
center=tuple(center),
length_units=length_units
) To be honest, I am not sure why that It's disappointing that the performance is not much better. I will continue testing to see if I can improve things. If you can't get snakeviz to work, you can try the One thing you could try is to set |
Also as a general rule, if you want to sum together many field sources, it will be more efficient to write a Python function that does the sum and then wrap that function in a single from tdgl.em import current_loop_vector_potential
def many_loops_vector_potential(
x, y, z, *,
loop_centers,
loop_currents,
loop_radii,
current_units="uA",
field_units="mT",
length_units="um",
):
if z.ndim == 0:
z = z * np.ones_like(x)
positions = np.array([x.squeeze(), y.squeeze(), z.squeeze()]).T
if isinstance(loop_currents, (int, float)):
loop_currents = loop_currents * np.ones(len(loop_centers))
if isinstance(loop_radii, (int, float)):
loop_radii = loop_radii * np.ones(len(loop_centers))
assert len(loop_currents) == len(loop_centers)
assert len(loop_radii) == len(loop_centers)
A_total = np.zeros((len(x), 3), dtype=float)
for current, center, radius in zip(loop_currents, loop_centers, loop_radii):
A_loop = current_loop_vector_potential(
positions,
loop_center=center,
loop_radius=radius,
current=current,
current_units=current_units,
length_units=length_units,
)
A_total += A_loop.to(f"{field_units} * {length_units}").magnitude
return A_total
# Define constants
current = 10000
radius = 0.5
length_units = "um"
n = 54 # Adjust this value as needed
positions = generate_dots(n)
loop_parameter = tdgl.Parameter(
many_loops_vector_potential,
loop_centers=positions,
loop_currents=current,
loop_radii=radius,
length_units=length_units,
)
A_applied = uni_parameter + loop_parameter |
Thanks @Kyroba. |
(Writing some notes here for myself.) Testing on Google Colab at 0d45f3f using https://github.com/pyutils/line_profiler. Test cases
Test setup"Quickstart" model with max_edge_length = xi / 5 = 0.5 / 5 (27502 mesh sites)length_units = "um"
# Material parameters
xi = 0.5
london_lambda = 2
d = 0.1
layer = tdgl.Layer(coherence_length=xi, london_lambda=london_lambda, thickness=d, gamma=1)
# Device geometry
total_width = 5
total_length = 3.5 * total_width
link_width = total_width / 3
# Outer geometry of the film
right_notch = (
tdgl.Polygon(points=box(total_width))
.rotate(45)
.translate(dx=(np.sqrt(2) * total_width + link_width) / 2)
)
left_notch = right_notch.scale(xfact=-1)
film = (
tdgl.Polygon("film", points=box(total_width, total_length))
.difference(right_notch, left_notch)
.resample(801)
.buffer(0)
)
# Holes in the film
round_hole = (
tdgl.Polygon("round_hole", points=circle(link_width / 2))
.translate(dy=total_length / 5)
.resample(201)
)
square_hole = (
tdgl.Polygon("square_hole", points=box(link_width))
.rotate(45)
.translate(dy=-total_length / 5)
.resample(201)
)
# Current terminals
source = (
tdgl.Polygon("source", points=box(1.1 * total_width, total_length / 100))
.translate(dy=total_length / 2)
)
drain = source.scale(yfact=-1).set_name("drain")
# Voltage measurement points
probe_points = [(0, total_length / 2.5), (0, -total_length / 2.5)]
device = tdgl.Device(
"weak_link",
layer=layer,
film=film,
holes=[round_hole, square_hole],
terminals=[source, drain],
probe_points=probe_points,
length_units=length_units,
)
device.make_mesh(max_edge_length=xi / 5)
device.mesh_stats_dict() == {'num_sites': 27502,
'num_elements': 53799,
'min_edge_length': 0.023546873824451527,
'max_edge_length': 0.09911161817920922,
'mean_edge_length': 0.05888391564333439,
'min_area': 0.0003082034156361785,
'max_area': 0.005147875450603828,
'mean_area': 0.0027992447651071306,
'coherence_length': 0.5,
'length_units': 'um'} options = tdgl.SolverOptions( %lprun -f tdgl.TDGLSolver.update -f tdgl.TDGLSolver.solve_for_psi_squared -f tdgl.TDGLSolver.solve_for_observables -s -u 1 tdgl.solve(**kwargs) Results1.
|
@Kyroba after some more testing (see above if you are interested), I have found that the GPU can provide a significant speedup (over 30% for the model I tested) but only if the main linear solve portion of the calculation is still done on the CPU. In other words, I expect that when using the GPU ( I still need to update the documentation, etc. over the coming days before merging all of these changes and making a new release. However if you would like, you are welcome to try it on the
|
Closing this issue because I have merged #36 and released https://pypi.org/project/tdgl/0.6.0/ |
https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.factorized.html
The LU factorization is done on the CPU, but the linear
solve
is done on the GPU.The text was updated successfully, but these errors were encountered: