machinekit-hal icon indicating copy to clipboard operation
machinekit-hal copied to clipboard

hal_lib does not free or reuse memory from unloaded components until it quits.

Open ArcEye opened this issue 7 years ago • 44 comments

This issue was discovered in https://github.com/machinekit/machinekit/issues/1123#issuecomment-282989556.

Never actually fixed as it seemed to rely upon an extreme use of halcmd to encounter it.

Since found through @l29ah that there is a M109 script which actually promotes the usage which gives rise to the error.

ArcEye avatar Aug 10 '18 14:08 ArcEye

From @l29ah

I regularily hit the halcmd show pin foo hanging problem when i poll temperature on my 3d printer on am3358; 3f1e265 here.

ArcEye avatar Aug 10 '18 14:08 ArcEye

Can you move this to the relevant Issue tracker as per email on the list https://groups.google.com/forum/#!topic/machinekit/I70IfT-wan0

Issue tracker will be https://github.com/machinekit/machinekit-hal/issues

Will also need to explain what exactly you are doing and why you think that particular commit causes it.

There is a known problem with repeatedly polling halcmd pin to get an output instead of doing it in a programmatic way. #1123 (comment)

I think it is may be due to the way memory is ordered on boundaries to enable atomic operations. This would result in orphaned memory and if an operation is repeated a huge number of times, you run out of hal memory.

ArcEye avatar Aug 10 '18 14:08 ArcEye

From @l29ah I am at the issue tracker :) I'm using the included nc_files/M109 to wait for the temperature to settle. As far i understand, this IS a programmatic way. Sometimes it will never return, and poking halcmd suggesting the problem outlined in #1123 (comment) As far as i understood, this issue is considered fixed, so i'm writing to say it is not. The commit hash is the history point where i observe the behaviour; i don't tell that the commit is the case.

ArcEye avatar Aug 10 '18 14:08 ArcEye

You are at CLOSED general Issue tracker that mentions the problem amoungst many others.

The problem is it will remain closed, so to air this issue you need to open a new one.

I'm using the included nc_files/M109 to wait for the temperature to settle. As far i understand, this IS a programmatic way.

There is nothing programmatic about using a bash script to repeatedly call halcmd and then try to parse the output.

I was referring to finding the hal_pin_t struct for the pin name in question and reading its _data_ptr_addr for the value. This doesn't have the side effects of repeatedly loading the whole halcmd component

A comment that github decided to hide for some reason, showed that using halpr_find_pin_by_name() , which does what I described above, ran a huge number of times without issue (10,000 times or equivalent to running M109 for 2.777 hours continuously)

Just run a program testmem via halcmd loadusr -W testmem 10000 that does 10,000 iterations of halpr_find_pin_by_name(), gets the value and prints the result. Ran to the end without any issues.

The issue was never 'solved', it just appeared an extreme use of halcmd which appeared unlikely to be encountered. I was not aware there was a M code script which did exactly the same thing, but then I am not into plastic squirting :stuck_out_tongue_winking_eye:

I will move this into a separate issue, in the new repo and see if I can find time write a user component which takes a pin name and value and outputs TRUE when the value is met or exceeded, or similar.

Will have to look at how this is used though, I imagine the call to M109 is blocking and only returns when the bed is up to temp, thus pausing the GCode.

ArcEye avatar Aug 10 '18 14:08 ArcEye

The larger issue is to look hard at hal memory allocation, to see if there is a way of tracking the unused memory before the boundary used to start allocation and free that too when the component instance is freed ( or suchlike)

ArcEye avatar Aug 10 '18 14:08 ArcEye

Yes it's blocking. What would be a better way to implement simple custom g-code commands addressing the hal parameters? Seems like the BBB is having a hard time calling the scripts as even M104 causes it to pause for a second or two.

l29ah avatar Aug 10 '18 22:08 l29ah

You can try accessing the pin from gcode. That should access a read only copy directly rather than launching halcmd.

I think you need FEATURES=12 in your ini file and then something like this to call

;; need to set e0.temp.set to same as param #1 first

o<m109> sub
    #<_tempReq> = #1
    #<_startTemp> = #<_hal[Therm.temp0]>
    o101 while [#<_hal[Therm.temp0]> LT #<_tempReq>]
        G4 P1
    o101 endwhile
    (DEBUG, "Starting temp was #<_startTemp>")
    (DEBUG, "Temp now is #<_hal[Therm.temp0]>")
o<m109> endsub
m2

This is off the top of my head and untested as I don't have anything that uses those pins, but something similar should work OK.

A starting point would be to issue a setp e0.temp.set <temp> in the HAL Configuration window of Axis and then run the MDI panel and repeatedly enter (DEBUG, #<_hal[Therm.temp0]>) to get a read out of increasing temp.

Once satisfied it works you could implement a sub similar to above and have a user M code which simply sets the required temp and call that first.

ArcEye avatar Aug 11 '18 11:08 ArcEye

Just tested accessing hal pins still works and it does.

Put into the [RS274NGC] section of the ini file FEATURES = 12

Then issue a MDI command to test, example below checks if axisui.run-disable is set

Output is to bottom right of axis. (output is always fp even if pin is a bit)

selection_064

ArcEye avatar Aug 11 '18 11:08 ArcEye

I have now written a C component to replace M109 (albeit presently reading the value of halui.flood.is-on, because I don't have the required pins to test on)

Test file is a short gcode file which moves to X20 Y20, then call M109 P1 and waits for halui.flood.is-on value to change to 1

G40 G90 G21 
g0 X20 Y20 
M109 P1
g0 x0 y0
M2

The print below shows it went through over half a million iterations (final figure when I got bored and stopped it was 628494), reading the pin and outputting its value.

selection_068

So I think we can conclusively say that will solve the problem experienced with repeatedly polling halcmd to do the same.

I will now try to add one that targets the Therm.temp0 pin to the build.

Are you able to test with a source build on your system, or do you just use packages?

ArcEye avatar Aug 11 '18 15:08 ArcEye

Yes, i can.

l29ah avatar Aug 11 '18 15:08 l29ah

Excellent, I have the test component building in the tree now, just need to rewrite to target Therm.temp0.

I will push to my repo and come back to you when done. You will need to clone my repo, checkout the new-M109 branch and build.

I will add a new M code M119 to just set the temperature. Then you can use M119 P<temp> to set the temperature and then M109 P<temp> to wait for it

Probably won't get time to finish until tomorrow, will notify when ready

ArcEye avatar Aug 11 '18 16:08 ArcEye

On Sat, Aug 11, 2018 at 09:12:35AM -0700, ArcEye wrote:

I will add a new M code M119 to just set the temperature.

You should use M104 (or M140, for heated bed) for non-blocking setting of the target temperature, since it's used by everyone else in 3D printing, i think.

-- () ascii ribbon campaign - against html mail /\ http://arc.pasp.de/ - against proprietary attachments

l29ah avatar Aug 11 '18 16:08 l29ah

OK, I'll be guided by you, didn't realise they were already there, makes things easier.

What I don't understand is why all the instructions talk about using M1xx Snn

It should be M1xx Pnn http://www.machinekit.io/docs/gcode/m-code/#sec:M100-to-M199

You can in some circs use a third arg called S, but there can be problems with it getting mixed up with a speed setting AFAIR.

ArcEye avatar Aug 11 '18 16:08 ArcEye

You're right, it is Pnn for setting the target temperature, and it works this way for me. Slic3r has a machinekit g-code output format that generates Pnn codes, but apparently RepRap-style gcode interpreters use Snn, and S0 is reserved for switching the heater off completely regardless the temperature :/

l29ah avatar Aug 11 '18 16:08 l29ah

OK, glad I wasn't going mad, I'll come back to you tomorrow

ArcEye avatar Aug 11 '18 16:08 ArcEye

The component and Submakefile changes all build OK so looks ready to test.

The changes are at https://github.com/ArcEye/machinekit/commit/132643b2acc70eb961a9d9a7c3140bbcd9c56dd7

Clone my machinekit repo, checkout branch new-M109 and then build a RIP.

Be sure to run in a terminal, I have left in the debug prints to stderr, which will show the pin value read and the value you told it to wait for. Hopefully this will give a safety net so you can be sure it has the right values, the temperature is rising and it returns at the right point.

I await your results.

ArcEye avatar Aug 12 '18 09:08 ArcEye

It doesn't set the target temperature like original M109 did and expected to behave, so everyone'll have to pre-process g-code before feeding it to machinekit if M109 is not prepended by M104. OTOH, slic3r does call M104 before M109, at least in simple cases.

l29ah avatar Aug 12 '18 12:08 l29ah

Can you test it works, call M104 first for now. I can add something later to set the temp as well.

ArcEye avatar Aug 12 '18 12:08 ArcEye

There is an issue with parameters to M codes that I need to check on. Machinekit passes a float value to the component, even if the actual value in gcode is essentially bool eg. 1 becomes 1.000000

Need to check the value received and the one read from the pin are converted properly to integers, but I have run out of time for now, return to it tomorrow sometime hopefully

ArcEye avatar Aug 12 '18 17:08 ArcEye

can't open none
can't open none
Counter: 1 Last value: 0 Target = 110
Counter: 2 Last value: 0 Target = 110
Counter: 3 Last value: 0 Target = 110
Counter: 4 Last value: 0 Target = 110
Counter: 5 Last value: 0 Target = 110

I feel like i've misconfigured something, even though i've corrected the source to point to a proper signal name. Too bad i can't run this thing under gdb.

l29ah avatar Aug 12 '18 22:08 l29ah

can't open none can't open none

That output did not come from the component. Looks like the output from a python script that has been passed an empty string.

i've corrected the source to point to a proper signal name

What is that name? I just copied the old script. Also I thought it was a pin not a signal?
Only pins normally have <component-name>.<pin-name> designations.

I need to find some more time to experiment with this, using pins that I do have and choosing one that is going to increase in value despite the gcode progress being blocked by the M109 call.

Probably easiest to write another component that gets switched on by a fake M104 call then increments a counter which can be read by M109.

ArcEye avatar Aug 13 '18 07:08 ArcEye

I have updated the M109 component with a generic writing function which can be tidied later. It now writes whatever value is passed at M109 P<value> to e0.temp.set and reads Therm.temp0 until it matches the value.

I tested by running this component

component counter;

pin in u32 target = 0;
pin out u32 count = 0;
option singleton yes;               
option userspace yes;

author "ArcEye <arceyeATmgwareDOTcoDOTuk>";
license "GPL";
;;

#include <stdio.h>    /* Standard input/output definitions */
#include <stdlib.h> 
#include <stdint.h>   /* Standard types */
#include <unistd.h>   /* UNIX standard function definitions */

void user_mainloop(void)
{
int inc = 0;

    while(1)
	{
	usleep(500000);
        if(target > 0)
	    {
	    if(count >= target)
		exit(0);
	    else
		count = ++inc;
	    }
	}
    exit(0);
}

and changing the pin names in M109 to counter.target and counter.count respectively.

The result was M109 does exactly as expected, in the screenshot below, receives a value of 50, sets counter.target to 50 and then waits until counter.count is 50 and exits.

selection_069

The new M109 is in my repo.

ArcEye avatar Aug 13 '18 10:08 ArcEye

Also I thought it was a pin not a signal?

Yes, it's a signal:

newsig e0.temp.meas  float

I've used configs/ARM/BeagleBone/ as a reference for my hal configuration.

l29ah avatar Aug 13 '18 14:08 l29ah

I looked at configs/ARM/BeagleBone/CRAMPS and got even more confused.

e0.temp.set appears to not have a writer so can be written to, so I have added a function to write to signal instead of pin.

Are you suggesting reading a signal to get the temperature, I think that is what e0.temp.meas carries. You could just read the pin Therm.ch-04.value which is linked to it

However, I don't know why the original M109 script has a completely different pin it reads for the temperature (Therm.temp0) and you were apparently using this script and it worked.

ArcEye avatar Aug 13 '18 15:08 ArcEye

On Mon, Aug 13, 2018 at 08:50:08AM -0700, ArcEye wrote:

You could just read the pin Therm.ch-04.value which is linked to it

Oh, thanks, will try it.

However, I don't know why the original M109 script has a completely different pin it reads for the temperature (Therm.temp0) and you were apparently using this script and it worked.

I adjusted it to e0.temp.meas.

-- () ascii ribbon campaign - against html mail /\ http://arc.pasp.de/ - against proprietary attachments

l29ah avatar Aug 13 '18 15:08 l29ah

Right. I will change the component to write to signal e0.temp.set to set the temperature and read from Therm.ch-04.value to get the temperature.

Let you know when it is in the repo.

ArcEye avatar Aug 13 '18 15:08 ArcEye

Okay, now it reads the temperature, but only once:

Counter: 1 Last value: 161 Target = 220
Counter: 2 Last value: 161 Target = 220
Counter: 3 Last value: 161 Target = 220
Counter: 4 Last value: 161 Target = 220
Counter: 5 Last value: 161 Target = 220

l29ah avatar Aug 13 '18 16:08 l29ah

The new component is in the repo, try that.

The print shows it read the pin 5 times, its just that the value stayed the same.

ArcEye avatar Aug 13 '18 16:08 ArcEye

Counter: 486 Last value: 161 Target = 220

And i observe the temperature really changing by other means.

l29ah avatar Aug 13 '18 16:08 l29ah

OK you need to identify what pin actually matches the temperature. The component is reading what the pin's value is, it just isn't changing.

ArcEye avatar Aug 13 '18 16:08 ArcEye