PyNite Performance issues on large-ish model

Hi,

I'm working on an open-source structural analysis app called beamOS. I've written my own analysis engine, but that's only because I didn't know about this library, which is fantastic, so I am thinking about switching my app to use Pynite and making contributions for any features that I may need.

The issue is that Pynite is pretty slow on the model that I’ve been testing with. I’ve included the code here so you can run it if you’d like. It usually takes about 5 seconds on my machine compared to a couple hundred ms with OpenSees or the engine that I currently have, with similar results.

TwistyBowlFraming.zip

My question is, do you have any recommendations for increasing performance? I’m not super familiar with python so I don’t know the best practices. I know there are JIT compiled versions of python and even python that compiles to C. Have you had success trying any of these with pynite? Or maybe there are some settings in the library that could improve performance?

Thanks

Apr 03 '25 12:04 connorivy

Just checked out BeamOS and that's a cool project! I also have experienced a similar slow down on really large models. I would recommend profiling your code with pyinstrument. This will allow you to pin point where the time is being spent during execution. For example, here are some outputs from pyinstrument I was seeing on my end when I was also profiling my code.

For example you can do something like:

import pyinstrument

profiler = pyinstrument.Profiler(interval=0.0005, async_mode="disabled")
profiler.start()
# you code here
profiler.stop()
profiler.open_in_browser()

Apr 07 '25 22:04 JBloss1517

Also, check out my response on #189 for how to speed up one of the internal functions. With a model as big as the one you reference, you will almost certainly see a difference.

Apr 07 '25 22:04 JBloss1517

Hi @connorivy,

In addition to what @JBloss1517 mentioned, PyNite discretizes physical members into frame elements by iterating through every node in the model that lies along each physical member and splitting the member at those points. The implementation has a time complexity of O(m × n), where m is the number of physical members and n is the number of nodes. Try profiling this to confirm.

One way to improve this is by using a k-dimensional tree for the nodes. This would make the search for nodes lying on a member much faster.

Apr 08 '25 05:04 angelmusonda

I also thought of the KD Tree as @angelmusonda has said and tried it out on my own local machine, but wasn't seeing as much of a performance boost as I was expecting. My hunch was due to the overhead of adding the nodes to the tree itself. So I used a bounding box approach inside of the _discretize() method instead. In a test model with roughly 1000 nodes and 500 members, my model takes 18.4s to run with 84% of that time being the _discretize() method. But the swap of algorithim below, it takes 4.7 secs and _discretize() only taking 2.7% of the total run time.

    # PhysMember.py
    def descritize(self):
        """
        Subdivides the physical member into sub-members at each node along the physical member
        """

        # Clear out any old sub_members
        self.sub_members = {}

        # Start a new list of nodes along the member
        int_nodes = []

        # Create a vector from the i-node to the j-node and precompute values
        Xi, Yi, Zi = self.i_node.X, self.i_node.Y, self.i_node.Z
        Xj, Yj, Zj = self.j_node.X, self.j_node.Y, self.j_node.Z
        vector_ij = array([Xj-Xi, Yj-Yi, Zj-Zi])
        
        # Calculate squared length to avoid calling norm() - much faster
        length_ij_squared = vector_ij[0]**2 + vector_ij[1]**2 + vector_ij[2]**2
        length_ij = (length_ij_squared)**0.5
        
        # Precompute unit vector for faster projection calculations
        # Avoid division by zero
        if length_ij > 1e-10:
            unit_ij = vector_ij / length_ij
        else:
            # Handle case of zero-length member
            unit_ij = array([1, 0, 0])  

        # Add the i-node and j-node to the list
        int_nodes.append([self.i_node, 0])
        int_nodes.append([self.j_node, length_ij])

        # Tolerance squared (for distance comparisons)
        tol_squared = 1e-10 * 1e-10
        
        # OPTIMIZATION: Quick filtering based on bounding box
        # Create a bounding box with a small padding around the member
        padding = 1e-5  # Small padding to account for floating-point errors
        x_min = min(Xi, Xj) - padding
        x_max = max(Xi, Xj) + padding
        y_min = min(Yi, Yj) - padding
        y_max = max(Yi, Yj) + padding
        z_min = min(Zi, Zj) - padding
        z_max = max(Zi, Zj) + padding
        
        # First, quickly filter nodes that might be in the bounding box
        potential_nodes = []
        for node in self.model.nodes.values():
            # Skip the i and j nodes
            if node is self.i_node or node is self.j_node:
                continue
                
            # Quick bounding box check (much faster than vector calculations)
            X, Y, Z = node.X, node.Y, node.Z
            if (x_min <= X <= x_max and 
                y_min <= Y <= y_max and 
                z_min <= Z <= z_max):
                potential_nodes.append(node)
        
        # OPTIMIZATION: Only process nodes that passed the bounding box check
        for node in potential_nodes:
            # Create a vector from the i-node to the current node
            X, Y, Z = node.X, node.Y, node.Z
            vector_in = array([X-Xi, Y-Yi, Z-Zi])
            
            # Quick check: squared distance comparison to avoid norm()
            length_in_squared = vector_in[0]**2 + vector_in[1]**2 + vector_in[2]**2
            if length_in_squared > length_ij_squared + tol_squared:
                continue
                
            # Calculate projection along member axis using dot product
            # OPTIMIZATION: Unrolled dot product calculation
            proj = vector_in[0]*unit_ij[0] + vector_in[1]*unit_ij[1] + vector_in[2]*unit_ij[2]
            
            # Check if projection is within member length
            if proj < -tol_squared or proj > length_ij + tol_squared:
                continue
                
            # Calculate perpendicular distance squared using Pythagorean theorem
            # dist² = |vector_in|² - proj²
            perp_distance_squared = length_in_squared - proj**2
            
            # Node is on member if perpendicular distance is effectively zero
            if perp_distance_squared < tol_squared:
                # Add the node to the list of intermediate nodes
                int_nodes.append([node, proj])
        
        # OPTIMIZATION: Use faster sorting for node ordering
        # Create a list of sorted intermediate nodes by distance from the i-node
        if len(int_nodes) > 2:  # Only sort if there are intermediate nodes
            # Skip first two elements (i_node and j_node) which were already added in order
            middle_nodes = sorted(int_nodes[2:], key=lambda x: x[1])
            int_nodes = [int_nodes[0], *middle_nodes, int_nodes[1]]
        
        # Break up the member into sub-members at each intermediate node
        for i in range(len(int_nodes) - 1):
            # Generate the sub-member's name (physical member name + a, b, c, etc.)
            name = self.name + chr(i+97)

            # Find the i and j nodes for the sub-member, and their positions along the physical
            # member's local x-axis
            i_node = int_nodes[i][0]
            j_node = int_nodes[i+1][0]
            xi = int_nodes[i][1]
            xj = int_nodes[i+1][1]

            # Create a new sub-member
            new_sub_member = Member3D(self.model, name, i_node, j_node, self.material.name, self.section.name, self.rotation, self.tension_only, self.comp_only)
            
            # Flag the sub-member as active
            for combo_name in self.model.load_combos.keys():
                new_sub_member.active[combo_name] = True

            # Apply end releases if applicable
            if i == 0:
                new_sub_member.Releases[0:6] = self.Releases[0:6]
            if i == len(int_nodes) - 2:
                new_sub_member.Releases[6:12] = self.Releases[6:12]

            # Add distributed to the sub-member
            for dist_load in self.DistLoads:
                
                # Find the start and end points of the distributed load in the physical member's
                # local coordinate system
                x1_load = dist_load[3]
                x2_load = dist_load[4]

                # Determine if the distributed load should be applied to this segment
                if x1_load <= xj and x2_load > xi: 
                    
                    direction = dist_load[0]
                    w1 = dist_load[1]
                    w2 = dist_load[2]
                    case = dist_load[5]

                    # Equation describing the load as a function of x
                    w = lambda x: (w2 - w1)/(x2_load - x1_load)*(x - x1_load) + w1

                    # Chop up the distributed load for the sub-member
                    if x1_load > xi:
                        x1 = x1_load - xi
                    else:
                        x1 = 0
                        w1 = w(xi)
                    
                    if x2_load < xj:
                        x2 = x2_load - xi
                    else:
                        x2 = xj - xi
                        w2 = w(xj)

                    # Add the load to the sub-member
                    new_sub_member.DistLoads.append([direction, w1, w2, x1, x2, case])

            # Add point loads to the sub-member
            for pt_load in self.PtLoads:
                
                direction = pt_load[0]
                P = pt_load[1]
                x = pt_load[2]
                case = pt_load[3]

                # Determine if the point load should be applied to this segment
                if x >= xi and x < xj or (isclose(x, xj) and isclose(xj, self.L())):

                    x = x - xi
                    
                    # Add the load to the sub-member
                    new_sub_member.PtLoads.append([direction, P, x, case])

            # Add the new sub-member to the sub-member dictionary for this physical member
            self.sub_members[name] = new_sub_member

Apr 08 '25 13:04 JBloss1517

I plan on issueing a PR at some point to try to get some of my findings and performance improvements into Pynite, but need to sit down and actually do it. But I think with some collective effort, Pynite can be highly effecient at solving large real world problems.

Apr 08 '25 13:04 JBloss1517

@JBloss1517
Prequalifying the nodes using a bounding box seems like a good idea—it does save some time. However, the time complexity remains the same, since we still need to iterate through the nodes for each member. Python loops are notoriously slow, even when they aren’t doing much.

The k dimension tree approach is also very similar. It would be implemented by constructing the tree only once, at the beginning of model preparation. This way, the prequalification would be much faster.

Apr 09 '25 05:04 angelmusonda

These ideas are great, and illustrates why I made Pynite open-source. I can figure out the physics side pretty good, but I am just a rudimentary programmer myself. I would love to have help with these kinds of issues.

Several years ago I did profile the code quite a bit. I found assembling the stiffness matrix was a slow process. Many of the stiffness values for degrees of freedom being placed in the stiffness matrix are zero values. I would imagine there's a way to speed that up that I haven't figured out yet. As it stands, Pynite is fast enough for my needs, so I haven't delved much further into it.

Apr 10 '25 02:04 JWock82

hey @JBloss1517 thanks for the responses. Could you accept my request to connect on LinkedIn? I'd like to ask you more about your usage of Pynite

Jul 07 '25 11:07 connorivy