sml icon indicating copy to clipboard operation
sml copied to clipboard

Size of generated code

Open wimalopaan opened this issue 6 years ago • 13 comments

I have this very simple example (see below) of a fsm. I wrote the same fsm with switch-stmts, and that version is much more smaller in code size. Any hints to reduce the size with sml?


#include <sml.h>

namespace sml = boost::sml;

namespace {
    struct start {};
    struct run {};
    struct off {};
    struct error {};
    
    using namespace sml;
    
    struct machine {
        inline auto operator()() const noexcept {
            return sml::make_transition_table(
                        *"off"_s + event<start> = "start"_s,
                        "start"_s + event<run>  = "running"_s,
                        "running"_s + event<off> = "off"_s,
                        "running"_s + event<error> = "error"_s
                                                     );
        }
    };
} 

volatile uint8_t r;

struct FSM {
    static inline void periodic() {
        uint8_t rr = r;
        if (rr == 1) {
            fsm.process_event(start{});
        }
        else if (rr == 2) {
            fsm.process_event(run{});
        }
        else if (rr == 3) {
            fsm.process_event(off{});
        }
        else if (rr > 100) {
            fsm.process_event(error{});
        }
    }
private:
    inline static sml::sm<machine> fsm;
};

using fsm = FSM;

int main() {
    while(true) {
        fsm::periodic();
    }
}

The version with switch:


volatile uint8_t r;

struct FSM {
    enum class State : uint8_t {Off, Start, Running, Error};
    
    inline static void periodic() {
        uint8_t rr = r;
        switch(mState) {
        case State::Off:
            if (rr == 1) {
                mState = State::Start;
            }
            break;
        case State::Start:
            if (rr == 2) {
                mState = State::Running;
            }
            break;
        case State::Running:
            if (rr == 3) {
                mState = State::Off;
            }
            else if (rr > 100) {
                mState = State::Error;
            }
            break;
        case State::Error:
            break;
        }
    }
private:
    inline static State mState = State::Off;
};

using fsm = FSM;

int main() {
    while(true) {
        fsm::periodic();
    }
}

wimalopaan avatar Oct 30 '19 09:10 wimalopaan

I compiled your examples on Linux with g++-8.3.0, here are the results:

faust@hammer /tmp/sml $ g++ -I sml/include/boost/ test.cpp -std=c++17 -O2 -o test
faust@hammer /tmp/sml $ g++ -I sml/include/boost/ test2.cpp -std=c++17 -O2 -o test2
faust@hammer /tmp/sml $ ls -l
загалом 48
drwxr-xr-x 11 faust faust   440 жов 30 13:49 sml
-rwxr-xr-x  1 faust faust 19360 жов 30 13:52 test
-rwxr-xr-x  1 faust faust 19328 жов 30 13:52 test2
-rw-r--r--  1 faust faust   872 жов 30 13:51 test2.cpp
-rw-r--r--  1 faust faust  1176 жов 30 13:50 test.cpp

After applying strip -s both binaries have identical size.

CLang-9.0.0 is slightly different:

faust@hammer /tmp/sml $ clang++ -I sml/include/boost/ test2.cpp -std=c++17 -O2 -o test2
faust@hammer /tmp/sml $ clang++ -I sml/include/boost/ test.cpp -std=c++17 -O2 -o test
faust@hammer /tmp/sml $ ls -l
загалом 52
drwxr-xr-x 11 faust faust   440 жов 30 13:49 sml
-rwxr-xr-x  1 faust faust 23944 жов 30 13:54 test
-rwxr-xr-x  1 faust faust 19128 жов 30 13:53 test2
-rw-r--r--  1 faust faust   872 жов 30 13:51 test2.cpp
-rw-r--r--  1 faust faust  1176 жов 30 13:50 test.cpp

But after striping both binaries are around 14k.

I don't see much difference in size.

madf avatar Oct 30 '19 11:10 madf

I found a way to reduce the size some more:

https://godbolt.org/z/danKOA

But still not full equivalent.

wimalopaan avatar Oct 30 '19 13:10 wimalopaan

I compiled all three versions also for the AVR plattform to see the assembler differences:

text data bss dec hex filename 322 0 2 324 144 bm60.elf 254 0 2 256 100 bm61.elf 300 0 6 306 132 bm62.elf

bm60 is the first sml-example, bm61 is the switch-code, and bm62 is the second sml example. The question here, why are there now (bm62) 6 bytes in the bss section (1 byte for the volatile, 1 byte for the state of the fsm, but 4 bytes additional for ?).

wimalopaan avatar Oct 30 '19 13:10 wimalopaan

Maybe alignment?

madf avatar Oct 30 '19 14:10 madf

I suspect it has todo with thread-safety für the guard-functions.

If I don't need thread-safety: howto disable?

wimalopaan avatar Oct 30 '19 14:10 wimalopaan

I don't see anything related to thread safety in disassembly.

madf avatar Oct 30 '19 15:10 madf

This is from your modified version:

    b0:   80 91 05 01     lds     r24, 0x0105      ; 0x800105 <r>

This is from switch version:

    90:   80 91 01 01     lds     r24, 0x0101      ; 0x800101 <r>

madf avatar Oct 30 '19 15:10 madf

What do you want to say with that?

wimalopaan avatar Oct 30 '19 15:10 wimalopaan

Okay, I think I know where the 4 bytes come from. Transition table is build by inheriting each individual transition. When they are empty (have no fields) they lay within the same address and the whole table has zero size (though sizeof() will return 1). But when you inherit from non-empty classes the resulting class can't have zero size, because you must be able to address members of your non-empty classes. So they lay with 1 byte shift:

faust@hammer ~ $ cat test.cpp
#include <iostream>

struct Z {};
struct B1 {};
struct B2 {};
struct B3 {};
struct B4 {};
struct D : B1, B2, B3, B4 {};

int main()
{
    std::cout << sizeof(D) << "\n";
    return 0;
}
faust@hammer ~ $ g++ test.cpp -o test
faust@hammer ~ $ ./test
1

vs

faust@hammer ~ $ cat test.cpp
#include <iostream>

struct Z {};
struct B1 {Z z;};
struct B2 {Z z;};
struct B3 {Z z;};
struct B4 {Z z;};
struct D : B1, B2, B3, B4 {};

int main()
{
    std::cout << sizeof(D) << "\n";
    return 0;
}
faust@hammer ~ $ g++ test.cpp -o test
faust@hammer ~ $ ./test
4

Back to your example. In the original code, with 4 events and no guards all transitions are empty, they have no fields. In the example with 1 event and guards each transition has a data field - a zero_wrapper around your guard. And the whole transition table is built by inheriting each of the transition, so it has size 4. Plus 1 byte of the state itself - 5 bytes for the machine. Plus your global r - makes 6 bytes.

madf avatar Oct 30 '19 17:10 madf

Ok, many thanks for that explanation!

Looks like one can't get the minimum code and data size as it is with the switch. In the following characteristics

bm60.elf : sml without guards, different events bm61.elf : switch bm62.elf : sml with guards, same event

$ avr-nm -CS -t d --size-sort bm60.elf
08388864 00000001 B r
08388865 00000001 b FSM::fsm
00000320 00000006 t _GLOBAL__sub_I_r
00000314 00000006 T main
00000326 00000012 T __tablejump2__
00000194 00000016 T __do_clear_bss
00000210 00000022 T __do_global_ctors
00000244 00000070 W FSM::periodic()
$ avr-nm -CS -t d --size-sort bm61.elf
08388864 00000001 B r
08388865 00000001 b (anonymous namespace)::FSM::mState
00000192 00000016 T __do_clear_bss
00000220 00000052 T main
$ avr-nm -CS -t d --size-sort bm62.elf
08388864 00000001 B r
08388865 00000005 b (anonymous namespace)::FSM::fsm
00000302 00000006 t _GLOBAL__sub_I_r
00000308 00000012 T __tablejump2__
00000194 00000016 T __do_clear_bss
00000210 00000022 T __do_global_ctors
00000244 00000058 T main

That's a pity, because in my experience, properly designed abstractions do not yield increasing code or data size.

wimalopaan avatar Oct 31 '19 05:10 wimalopaan

Well, I modified the very first example a bit:

#include <cstdint>
#include <sml.h>

namespace sml = boost::sml;

volatile uint8_t r = 0;

namespace {
    struct start {};
    struct run {};
    struct off {};
    struct error {};
    
    using namespace sml;
    
    struct machine {
        inline auto operator()() const noexcept {
            return sml::make_transition_table(
                        *"off"_s + event<start> = "start"_s,
                        "start"_s + event<run>  = "running"_s,
                        "running"_s + event<off> = "off"_s,
                        "running"_s + event<error> = "error"_s
                                                     );
        }
    };
} 

struct FSM {
    static inline void periodic() {
        toEvent(r);
    }
    static inline void toEvent(uint8_t value) {
        if (value == 1) {
            fsm.process_event(start{});
        }
        else if (value == 2) {
            fsm.process_event(run{});
        }
        else if (value == 3) {
            fsm.process_event(off{});
        }
        else if (value > 100) {
            fsm.process_event(error{});
        }
    }
private:
    inline static constinit sml::sm<machine> fsm{};
};

using fsm = FSM;

int main() {
    while(true) {
        fsm::periodic();
    }
}

And now:

$ avr-nm -CS -t d --size-sort bm60.elf
08398848 00000001 B r
08398849 00000001 b FSM::fsm
00000280 00000006 t _GLOBAL__sub_I_r
00000286 00000012 T __tablejump2__
00000174 00000016 T __do_clear_bss
00000190 00000022 T __do_global_ctors
00000224 00000056 T main

Now, we have only 1 byte for the state and 56 bytes for the code. Looks ok now.

wimalopaan avatar Oct 31 '19 05:10 wimalopaan

You can do even better if you remove static:

faust@hammer /tmp/sml $ avr-nm -CS -t d --size-sort test.elf 
08388864 00000001 B r
00000116 00000016 T __do_clear_bss
00000144 00000062 T main

Slightly bigger main but no additional costs from global initialization. I got 800 bytes overall size versus 836 bytes with static.

madf avatar Oct 31 '19 12:10 madf

Besides:

faust@hammer /tmp/sml $ avr-size test2.elf test4.elf 
   text	   data	    bss	    dec	    hex	filename
    210	      0	      2	    212	     d4	test2.elf
    210	      0	      1	    211	     d3	test4.elf

test2.elf - your switch version. And if you get rid of static in the switch version:

faust@hammer /tmp/sml $ avr-size test2.elf test4.elf 
   text	   data	    bss	    dec	    hex	filename
    194	      0	      1	    195	     c3	test2.elf
    210	      0	      1	    211	     d3	test4.elf

madf avatar Oct 31 '19 12:10 madf