Performance issues on large-ish model
Hi,
I'm working on an open-source structural analysis app called beamOS. I've written my own analysis engine, but that's only because I didn't know about this library, which is fantastic, so I am thinking about switching my app to use Pynite and making contributions for any features that I may need.
The issue is that Pynite is pretty slow on the model that I’ve been testing with. I’ve included the code here so you can run it if you’d like. It usually takes about 5 seconds on my machine compared to a couple hundred ms with OpenSees or the engine that I currently have, with similar results.
My question is, do you have any recommendations for increasing performance? I’m not super familiar with python so I don’t know the best practices. I know there are JIT compiled versions of python and even python that compiles to C. Have you had success trying any of these with pynite? Or maybe there are some settings in the library that could improve performance?
Thanks
Just checked out BeamOS and that's a cool project! I also have experienced a similar slow down on really large models. I would recommend profiling your code with pyinstrument. This will allow you to pin point where the time is being spent during execution. For example, here are some outputs from pyinstrument I was seeing on my end when I was also profiling my code.
For example you can do something like:
import pyinstrument
profiler = pyinstrument.Profiler(interval=0.0005, async_mode="disabled")
profiler.start()
# you code here
profiler.stop()
profiler.open_in_browser()
Also, check out my response on #189 for how to speed up one of the internal functions. With a model as big as the one you reference, you will almost certainly see a difference.
Hi @connorivy,
In addition to what @JBloss1517 mentioned, PyNite discretizes physical members into frame elements by iterating through every node in the model that lies along each physical member and splitting the member at those points. The implementation has a time complexity of O(m × n), where m is the number of physical members and n is the number of nodes. Try profiling this to confirm.
One way to improve this is by using a k-dimensional tree for the nodes. This would make the search for nodes lying on a member much faster.
I also thought of the KD Tree as @angelmusonda has said and tried it out on my own local machine, but wasn't seeing as much of a performance boost as I was expecting. My hunch was due to the overhead of adding the nodes to the tree itself. So I used a bounding box approach inside of the _discretize() method instead. In a test model with roughly 1000 nodes and 500 members, my model takes 18.4s to run with 84% of that time being the _discretize() method. But the swap of algorithim below, it takes 4.7 secs and _discretize() only taking 2.7% of the total run time.
# PhysMember.py
def descritize(self):
"""
Subdivides the physical member into sub-members at each node along the physical member
"""
# Clear out any old sub_members
self.sub_members = {}
# Start a new list of nodes along the member
int_nodes = []
# Create a vector from the i-node to the j-node and precompute values
Xi, Yi, Zi = self.i_node.X, self.i_node.Y, self.i_node.Z
Xj, Yj, Zj = self.j_node.X, self.j_node.Y, self.j_node.Z
vector_ij = array([Xj-Xi, Yj-Yi, Zj-Zi])
# Calculate squared length to avoid calling norm() - much faster
length_ij_squared = vector_ij[0]**2 + vector_ij[1]**2 + vector_ij[2]**2
length_ij = (length_ij_squared)**0.5
# Precompute unit vector for faster projection calculations
# Avoid division by zero
if length_ij > 1e-10:
unit_ij = vector_ij / length_ij
else:
# Handle case of zero-length member
unit_ij = array([1, 0, 0])
# Add the i-node and j-node to the list
int_nodes.append([self.i_node, 0])
int_nodes.append([self.j_node, length_ij])
# Tolerance squared (for distance comparisons)
tol_squared = 1e-10 * 1e-10
# OPTIMIZATION: Quick filtering based on bounding box
# Create a bounding box with a small padding around the member
padding = 1e-5 # Small padding to account for floating-point errors
x_min = min(Xi, Xj) - padding
x_max = max(Xi, Xj) + padding
y_min = min(Yi, Yj) - padding
y_max = max(Yi, Yj) + padding
z_min = min(Zi, Zj) - padding
z_max = max(Zi, Zj) + padding
# First, quickly filter nodes that might be in the bounding box
potential_nodes = []
for node in self.model.nodes.values():
# Skip the i and j nodes
if node is self.i_node or node is self.j_node:
continue
# Quick bounding box check (much faster than vector calculations)
X, Y, Z = node.X, node.Y, node.Z
if (x_min <= X <= x_max and
y_min <= Y <= y_max and
z_min <= Z <= z_max):
potential_nodes.append(node)
# OPTIMIZATION: Only process nodes that passed the bounding box check
for node in potential_nodes:
# Create a vector from the i-node to the current node
X, Y, Z = node.X, node.Y, node.Z
vector_in = array([X-Xi, Y-Yi, Z-Zi])
# Quick check: squared distance comparison to avoid norm()
length_in_squared = vector_in[0]**2 + vector_in[1]**2 + vector_in[2]**2
if length_in_squared > length_ij_squared + tol_squared:
continue
# Calculate projection along member axis using dot product
# OPTIMIZATION: Unrolled dot product calculation
proj = vector_in[0]*unit_ij[0] + vector_in[1]*unit_ij[1] + vector_in[2]*unit_ij[2]
# Check if projection is within member length
if proj < -tol_squared or proj > length_ij + tol_squared:
continue
# Calculate perpendicular distance squared using Pythagorean theorem
# dist² = |vector_in|² - proj²
perp_distance_squared = length_in_squared - proj**2
# Node is on member if perpendicular distance is effectively zero
if perp_distance_squared < tol_squared:
# Add the node to the list of intermediate nodes
int_nodes.append([node, proj])
# OPTIMIZATION: Use faster sorting for node ordering
# Create a list of sorted intermediate nodes by distance from the i-node
if len(int_nodes) > 2: # Only sort if there are intermediate nodes
# Skip first two elements (i_node and j_node) which were already added in order
middle_nodes = sorted(int_nodes[2:], key=lambda x: x[1])
int_nodes = [int_nodes[0], *middle_nodes, int_nodes[1]]
# Break up the member into sub-members at each intermediate node
for i in range(len(int_nodes) - 1):
# Generate the sub-member's name (physical member name + a, b, c, etc.)
name = self.name + chr(i+97)
# Find the i and j nodes for the sub-member, and their positions along the physical
# member's local x-axis
i_node = int_nodes[i][0]
j_node = int_nodes[i+1][0]
xi = int_nodes[i][1]
xj = int_nodes[i+1][1]
# Create a new sub-member
new_sub_member = Member3D(self.model, name, i_node, j_node, self.material.name, self.section.name, self.rotation, self.tension_only, self.comp_only)
# Flag the sub-member as active
for combo_name in self.model.load_combos.keys():
new_sub_member.active[combo_name] = True
# Apply end releases if applicable
if i == 0:
new_sub_member.Releases[0:6] = self.Releases[0:6]
if i == len(int_nodes) - 2:
new_sub_member.Releases[6:12] = self.Releases[6:12]
# Add distributed to the sub-member
for dist_load in self.DistLoads:
# Find the start and end points of the distributed load in the physical member's
# local coordinate system
x1_load = dist_load[3]
x2_load = dist_load[4]
# Determine if the distributed load should be applied to this segment
if x1_load <= xj and x2_load > xi:
direction = dist_load[0]
w1 = dist_load[1]
w2 = dist_load[2]
case = dist_load[5]
# Equation describing the load as a function of x
w = lambda x: (w2 - w1)/(x2_load - x1_load)*(x - x1_load) + w1
# Chop up the distributed load for the sub-member
if x1_load > xi:
x1 = x1_load - xi
else:
x1 = 0
w1 = w(xi)
if x2_load < xj:
x2 = x2_load - xi
else:
x2 = xj - xi
w2 = w(xj)
# Add the load to the sub-member
new_sub_member.DistLoads.append([direction, w1, w2, x1, x2, case])
# Add point loads to the sub-member
for pt_load in self.PtLoads:
direction = pt_load[0]
P = pt_load[1]
x = pt_load[2]
case = pt_load[3]
# Determine if the point load should be applied to this segment
if x >= xi and x < xj or (isclose(x, xj) and isclose(xj, self.L())):
x = x - xi
# Add the load to the sub-member
new_sub_member.PtLoads.append([direction, P, x, case])
# Add the new sub-member to the sub-member dictionary for this physical member
self.sub_members[name] = new_sub_member
I plan on issueing a PR at some point to try to get some of my findings and performance improvements into Pynite, but need to sit down and actually do it. But I think with some collective effort, Pynite can be highly effecient at solving large real world problems.
@JBloss1517
Prequalifying the nodes using a bounding box seems like a good idea—it does save some time. However, the time complexity remains the same, since we still need to iterate through the nodes for each member. Python loops are notoriously slow, even when they aren’t doing much.
The k dimension tree approach is also very similar. It would be implemented by constructing the tree only once, at the beginning of model preparation. This way, the prequalification would be much faster.
These ideas are great, and illustrates why I made Pynite open-source. I can figure out the physics side pretty good, but I am just a rudimentary programmer myself. I would love to have help with these kinds of issues.
Several years ago I did profile the code quite a bit. I found assembling the stiffness matrix was a slow process. Many of the stiffness values for degrees of freedom being placed in the stiffness matrix are zero values. I would imagine there's a way to speed that up that I haven't figured out yet. As it stands, Pynite is fast enough for my needs, so I haven't delved much further into it.
hey @JBloss1517 thanks for the responses. Could you accept my request to connect on LinkedIn? I'd like to ask you more about your usage of Pynite