underworld2 icon indicating copy to clipboard operation
underworld2 copied to clipboard

Some issues with coupling with badlands

Open HonghaoXiong opened this issue 10 months ago • 23 comments

Hello Mr.Julia I have met a problem when I used coupling with badlands in underworld2.16-arm64 version. The software always emerges many steps when processes surface with badlands. It look like times can't be same between underworld and badlands, That never occur when i use underworld 2.15.1b version. This issue induce simulation very slow, and it also lead me can't restart underworld no matter i choice any output steps. It shows the time is different between underworld and badlands. I have submit my code and output log. I truly appreciate your help.

bug_underworld.zip

HonghaoXiong avatar Mar 24 '25 06:03 HonghaoXiong

Thanks for the ticket Xiong,

I did make a small change at the end of the badlands execution loop to force a stata output. https://github.com/julesghub/badlands/blob/5cfb741eeb06ba9f80d5cd743f1c86d2c4ec5c06/badlands/badlands/model.py#L844

You can test if that is the issue by uninstalling badlands and reinstalling it via, pip install badlands==2.2.4 in the docker.

Let me know how you go trying that.

julesghub avatar Apr 03 '25 03:04 julesghub

Are you using docker? If so you won't be able to run pip install badlands==2.2.4 from within the docker, but I do know another workaround to try the old version of badlands.

julesghub avatar Apr 03 '25 05:04 julesghub

Yes, Sir. I using docker . And I installed badlands==2.2.4 successfully. Then, I have tried my code again, and it works. But, It emerged another warning:IOStream.flush timed out. But simulation is still running, It looks OK. Does this warning have problem? Sir

Image Image

HonghaoXiong avatar Apr 03 '25 06:04 HonghaoXiong

Oh you managed to install it. That's great, I forgot I left gcc tools in the docker you are using.

As for the error I think that's related to running the code in a jupyter session. Instead run the docker via command line mode. If you are not sure how to do that let me know I can help.

julesghub avatar Apr 03 '25 07:04 julesghub

Thank you so much! Sir. Yeah, I will run the code in command instead of jupyter lab.

HonghaoXiong avatar Apr 03 '25 08:04 HonghaoXiong

Let me know how it behaves. If using badlands 2.2.4 solves the issues we can look into further.


From: Bruce Xiong @.> Sent: Thursday, 3 April 2025 7:12 PM To: underworldcode/underworld2 @.> Cc: Julian Giordani @.>; Comment @.> Subject: Re: [underworldcode/underworld2] Some issues with coupling with badlands (Issue #716)

Thank you so much! Sir. Yeah, I will run the code in command instead of jupyter lab.

— Reply to this email directly, view it on GitHubhttps://github.com/underworldcode/underworld2/issues/716#issuecomment-2774839310, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADJPNKFMOBUEBOUCHQHLOED2XTUQTAVCNFSM6AAAAABZUCG7DSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZUHAZTSMZRGA. You are receiving this because you commented.Message ID: @.***>

[HonghaoXiong]HonghaoXiong left a comment (underworldcode/underworld2#716)https://github.com/underworldcode/underworld2/issues/716#issuecomment-2774839310

Thank you so much! Sir. Yeah, I will run the code in command instead of jupyter lab.

— Reply to this email directly, view it on GitHubhttps://github.com/underworldcode/underworld2/issues/716#issuecomment-2774839310, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADJPNKFMOBUEBOUCHQHLOED2XTUQTAVCNFSM6AAAAABZUCG7DSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZUHAZTSMZRGA. You are receiving this because you commented.Message ID: @.***>

julesghub avatar Apr 03 '25 10:04 julesghub

Sir, I truly appreciate your continued care and guidance. After I installed badlands2.2.4, I built a new container based on underworld2.16.0-arm64-ompi include new badlands. Then, I runned my code on Macbook pro M2 and It works. But, my code work a preiod time and quit on old underworld version(the problem is processing coupling with badlands). And , I find the problem I mentioned that log emerges many ToNow... steps like attached picture, have been solved. I think maybe is my incorrect code write of passive tracer. I wrote passive tracer code is pts1 = np.array([[GEO.nd(140.* u.kilometer),GEO.nd(-20. * u.kilometer)]]). Now, I write passive tracer like underworld example pts1 = np.ndarray((1, 2)) pts1[:,0] = GEO.nd(140.* u.kilometer) pts1[:,1] = GEO.nd(-20. * u.kilometer) Here is my simulation log about simulation on new underworld conrainer. Please check it.

Warm regards Honghao

simulation.log.zip

Image

HonghaoXiong avatar Apr 04 '25 01:04 HonghaoXiong

@HonghaoXiong I'm not sure I understand what the issue is. I see you have run many from your simulation.log. Perhaps after many time steps, the geometry of the model is creating problems for the coupling.My recommendation is to look at the model output to ensure the geometry and velocity solutions look satisfactory over time.

julesghub avatar Apr 04 '25 02:04 julesghub

Sir, my apologize. I mean I have met three problems. First problem, No matter which version of Underworld I use, include 2.15.1b or 2.16, my code crashes after running for a while on my local computer because I called Badlands, even after I reduced my model resolution. Second problem, When I add passive tracers, my simulation runs extremely slowly. The logs show that during the surface process, it performs an excessive number of iterations, as seen in the screenshot I shared earlier. Third problem, It is strata laytime output, Maybe Haibin had told you. When I used 2.16, I must set badland.xml strata laytime. But , these problems are solved after I installed badlands2.2.4. The simulation log from code running on my local computer, and I even use 1km resolution and coupling with badlands. As you can see in log. My code run normally. I think my results will look normal.

HonghaoXiong avatar Apr 04 '25 06:04 HonghaoXiong

Oh, I want explain about my second problem of excessive number of iterations during the surface process. I'm not sure if caused by my code grammar, but it actually work. When I first add passive tracer, My code grammar like pts1 = np.array([[GEO.nd(140.* u.kilometer),GEO.nd(-20. * u.kilometer)]]) that i add a point to trace deformation. Then as show in my screenshot, simulation will be slow during the surface process which show many short times iterations. After I used underworld example's grammar of adding passive tracer, like pts1 = np.ndarray((1, 2)) pts1[:,0] = GEO.nd(140.* u.kilometer) pts1[:,1] = GEO.nd(-20. * u.kilometer), the simulation does not appear this phenomena, as you can see simulation log.

HonghaoXiong avatar Apr 04 '25 07:04 HonghaoXiong

@HonghaoXiong , I am not clear on the issue. The two pieces of code you show are equivalent and produce identical results.

Are you saying the inclusion of a passive tracers swarm, containing only 1 particle, is causing the code to run significantly slower? My suspicion is something else is causing these issues intermittently.

In previous cases I have experienced slow docker behaviour because not enough memory (RAM) is allocated to the docker environment. Maybe increase the RAM size and try run the model again?

julesghub avatar Apr 06 '25 23:04 julesghub

Sir, I mean my adding passive tracer code runs much slower with approach pts1 = np.array([[GEO.nd(140.* u.kilometer),GEO.nd(-20. * u.kilometer)]]) compared to approach pts1 = np.ndarray((1, 2)) pts1[:,0] = GEO.nd(140.* u.kilometer) pts1[:,1] = GEO.nd(-20. * u.kilometer) I tested this on my workstation, and it shouldn't be a RAM issue. The slower code always show many iterations processing the surface in the log. The two pictures below are the comparison.

Image Image

HonghaoXiong avatar Apr 07 '25 00:04 HonghaoXiong

curious, I see no difference between those two declarations of code. So I can't imagine what's causing this issue.

Looking though your script you have previously used pts1 = GEO.circles_grid(radius= 2.0 * u.kilometer, minCoord=[150. * u.kilometer, -6. * u.kilometer], maxCoord=[550 * u.kilometer, -2. * u.kilometer])

Try calling the points something other than pts1 to avoid naming variables the same things. Potentially this could cause issues if references to the original pts1 are still held in memory but overwritten in the input file.

julesghub avatar Apr 07 '25 01:04 julesghub

Sir, The script I submitted to you that I added grids passive tracer, and its ok. My problem about passive tracer code is I want to add points passive tracer instead of grids, and I found difference between two code style. Yet, I agree they are equivalent, but I tested two versions of the code, where the only difference is in the way of adding points for the passive tracer. The result is what I mentioned earlier: there is a significant difference in execution speed. Additionally, the output log shows that the discrepancy lies in the number of iterations during the "processing surface" part.

HonghaoXiong avatar Apr 07 '25 02:04 HonghaoXiong

test.zip This is my adding points passive tracer code by using pts1 = np.ndarray((1, 2)) pts1[:,0] = GEO.nd(140.* u.kilometer) pts1[:,1] = GEO.nd(-20. * u.kilometer)

HonghaoXiong avatar Apr 07 '25 02:04 HonghaoXiong

Hi Honghao,

I believe the issue with the number of iterations is not related to the passive tracers but rather linked to the time step settings in your model. As configured, the time step (dt) is not fixed, and the saving time interval is set to 0.1 ma. This means that whenever particles are advected by the dt_solver (which is typically smaller than 0.1 ma), the surface process will require additional iterations to reach the specified saving time intervals (e.g., 0.1 ma, 0.2 ma, 0.3 ma, etc.), as the dt_sp_calculation is small.

My suggestion is to set a small fixed dt to improve stability (Using a large dt could lead to instability due to the sticky air method)

 Model.run_for(duration=50.* u.megayears,checkpoint_interval=0.1 * u.megayears,dt = 0.01 * u.megayears). 

Additionally, ensure that the checkpoint interval in the surface process is also set to a same smaller value (maybe 10 ka or even smaller), and verify that the time step in the XML configuration matches.

Model.surfaceProcesses = GEO.surfaceProcesses.Badlands(airIndex=[air.index],sedimentIndex=sediment.index, 
                                                       XML="./badlands_123.xml",outputDir="outbdls_1e-4_0p1man_0p4vel_pts",
                                                       surfElevation= 0,resolution=1. * u.kilometer,checkpoint_interval=0.01 * u.megayears,)
<time>
<display>10000.</display>
</time>
<strata>
<laytime>10000.</laytime>
</strata>

Let me know if you have any other questions.

And @julesghub , Could you please help update the Meson build for Badlands to version 2.2.4? This update will enable better compatibility with installations when using NumPy > 1.23.0.

NengLu avatar Apr 07 '25 04:04 NengLu

Thanks @NengLu , Badlands 2.3 contains the meson compatibility fix and the small change in the badlands-underworld coupling.

I put this change in to sync the UWGeo Tut_11 script outputs, as we discussed previously. I'm not sure why this is now causing problems for @HonghaoXiong. I can take this change out of badlands and keep the meson compatibility. What do you think?

julesghub avatar Apr 07 '25 04:04 julesghub

Thanks @NengLu , Badlands 2.3 contains the meson compatibility fix and the small change in the badlands-underworld coupling.

I put this change in to sync the UWGeo Tut_11 script outputs, as we discussed previously. I'm not sure why this is now causing problems for @HonghaoXiong. I can take this change out of badlands and keep the meson compatibility. What do you think?

Thank you, Julian. Yeah, I agree; we may use the 2.3 with meson.

NengLu avatar Apr 07 '25 04:04 NengLu

Thanks, @NengLu, Sir. But I do not know why code running have different speed just because different adding passive tracer code. I tested two scripts where the only difference in adding passive tracer code. And , I have another question, Sir. In badlands.xml, the display will overridden by linkage, so do i need set this parameter? Or does it make sense to set this parameter? In previous underworld version, like 2.15.1, I always ignored strata serious parameter, should I set this parameter? These questions I also want to ask for you ,Sir. @julesghub

HonghaoXiong avatar Apr 07 '25 05:04 HonghaoXiong

The linkage will overwrite the display time in the

NengLu avatar Apr 07 '25 06:04 NengLu

The linkage will overwrite the display time in the element, but it will not affect the strata display.

Sir, I mean I don't set series parameters in old version. And code still work ,the results look normal. In 2.16 version, If i don't set parameters or my not match , the code will quit. So, I installed badlands 2.2.4 according to Mr.Julian advice, and I can still ignore series parameters. Do i need set this parameter? Sir.

HonghaoXiong avatar Apr 07 '25 06:04 HonghaoXiong

Hi @NengLu and @HonghaoXiong

I made a version of badlands 2.3.1 that should have a workaround for the difference Hong has seen between 2.2.4 and 2.3.

To use the workaround you will need to first install the 2.3.1 code (see below) and set udw_force_final_strata = 1 in the badlands.xml file.

To install this version of the code you will need to pip install git+https://github.com/julesghub/badlands.git/@udw-force-final-strata#egg=badlands&subdirectory=badlands

Please try this out and let me know how it goes. If it works I'll PR to the main badlands repo ASAP.

julesghub avatar Apr 30 '25 02:04 julesghub

Hello Mr.Julian Sir, I sincerely apologize for keeping you waiting. I have tried installing badlands 2.3.1 by pip install git+https://github.com/julesghub/badlands.git/@udw-force-final-strata#egg=badlands&subdirectory=badlands but it did not work. The Error is show below picture. I have checked "setup.py" and path is right. I don't know where is wrong.

Image

HonghaoXiong avatar May 08 '25 02:05 HonghaoXiong