Unable to update passive swarm variables
UW is producing an index error when running on multiple processors:
Traceback (most recent call last):
File "/Users/benknight/Documents/Research/Modelling/UW-models/UW3-dev/passive_swarm_issue.py", line 64, in <module>
test_ps.data[:,0] = 1.
File "/Users/benknight/Documents/Research/GitHub/underworld3/src/underworld3/swarm.py", line 1513, in __exit__
var._update()
File "/Users/benknight/Documents/Research/GitHub/underworld3/src/underworld3/swarm.py", line 278, in _update
self._rbf_to_meshVar(self._meshVar)
File "/Users/benknight/Documents/Research/GitHub/underworld3/src/underworld3/swarm.py", line 302, in _rbf_to_meshVar
Values = self.rbf_interpolate(new_coords, verbose=verbose, nnn=nnn)
File "/Users/benknight/Documents/Research/GitHub/underworld3/src/underworld3/swarm.py", line 386, in rbf_interpolate
values = kdt.rbf_interpolator_local(new_coords, D, nnn, verbose)
File "src/underworld3/kdtree.pyx", line 231, in underworld3.kdtree.KDTree.rbf_interpolator_local
IndexError: index 4607182418800017421 is out of bounds for axis 0 with size 2
[2]PETSC ERROR: ------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 10 BUS: Bus Error, possibly illegal memory access
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[2]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[2]PETSC ERROR: to get more information on the crash.
[2]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
I've attached a basic script to replicate the issue.
Cheers passive_swarm_issue.txt
I checked in a sort-of fix (setting nodal-point value to zero when no particles are found). I'm assuming no-particles-found is the error condition, but I haven't checked if it is actually a small number that is required to build the kd-tree consistently.
We did not have a check for this case: Sparse swarm which does not have particles in every process but still trying to build a proxy mesh variable. In general, this is not really a well-defined situation. We should probably have some value that can be set as the default value where no particles are nearby. Alternatively, refuse to build a proxy.
I'll leave the issue open for discussion of a proper fix.
I was doing something similar to this in #142. I am closing #142. We can track this one instead.
@gthyagi slightly different issue as the fixes Louis added won't stop the error you posted from occurring
@bknight1, I agree. I referred another issue here because these issues fall under a broader category known as "the scenario where a processor contains zero particles." The existing built-in particle functions consistently generate errors when executed in parallel under this circumstance.
For users transitioning from uw2, tasks involving particles may appear straightforward. However, in uw3, all such operations become nontrivial.
I'm not sure what we can do about this - underworld2 would return indices of particles added to the swarm so you could tell that there had been a migration but you could not always be sure you would still have particles on any given process.
We should think about how this is to be done to minimise the confusion in parallel.
I agree that this becomes complicated quite quickly and it always seems to be difficult but I thought this was the case in uw2 as well.
L
@bknight1, I agree. I referred another issue here because these issues fall under a broader category known as "the scenario where a processor contains zero particles." The existing built-in particle functions consistently generate errors when executed in parallel under this circumstance.
For users transitioning from uw2, tasks involving particles may appear straightforward. However, in uw3, all such operations become nontrivial.
I'm not sure I agree, what you were trying to do would still produce an error in UW2 too
@lmoresi We can number the particles if that is something Underworld needs. I have had at least one other request. It is the same as adding the "cellid" field, but it would be "particleid".
@knepley I think particle numbering would be useful for us
It is already doable. Add an int field and set it to an id when you create the particles.
We do need to have the ID be unique though and so we at least need to know the high water mark for the number of particles through time and always start from there. It causes problems if we re-use particle id values.
Sent from my iPhone
On 18 Mar 2024, at 22:41, Matthew Knepley @.***> wrote:
It is already doable. Add an int field and set it to an id when you create the particles.
— Reply to this email directly, view it on GitHubhttps://github.com/underworldcode/underworld3/issues/166#issuecomment-2003686726, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADABPIZYHEO6QX7ZLGAZM23YY3HEVAVCNFSM6AAAAABEGWHAM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTGY4DMNZSGY. You are receiving this because you were mentioned.Message ID: @.***>
On Mon, Mar 18, 2024 at 5:56 PM Louis Moresi @.***> wrote:
We do need to have the ID be unique though and so we at least need to know the high water mark for the number of particles through time and always start from there. It causes problems if we re-use particle id values.
Okay, in this case I would make an API for a global counter, something like
CounterGetRange(counter, numIds, &idStart);
which is collective, and reserves [idStart, idStart + numIds) on that process. This turns adding particles into a collective operation, but it seems like it logically already was.
Matt
Sent from my iPhone
On 18 Mar 2024, at 22:41, Matthew Knepley @.***> wrote:
It is already doable. Add an int field and set it to an id when you create the particles.
— Reply to this email directly, view it on GitHub< https://github.com/underworldcode/underworld3/issues/166#issuecomment-2003686726>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ADABPIZYHEO6QX7ZLGAZM23YY3HEVAVCNFSM6AAAAABEGWHAM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTGY4DMNZSGY>.
You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/underworldcode/underworld3/issues/166#issuecomment-2005114913, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEORCLZETKM7W2DW5W2263YY5PIPAVCNFSM6AAAAABEGWHAM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBVGEYTIOJRGM . You are receiving this because you were mentioned.Message ID: @.***>
-- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ http://www.cse.buffalo.edu/~knepley/
Agreed - we should do this ourselves while we wait for this to propagate into PETSc itself. @julesghub - feel free to assign this to somebody other than me.