2

I'm trying to write a library for my hobbyish purpose and it's almost ready for a smoke test. The only thing that annoys me is a called-once fuction :

(Note: all uppercase phrases are macro constants)

{
/////////// Interface clock enable
LCD_CTRL_PORT_CLK_EN;
LCD_DATA_PORT_CLK_EN;

//////////// Config control pins
//// pull up
LCD_CTRL_PORT->PUPDR &= ~LCD_CTRL_PORT_32BIT_MASK;
LCD_CTRL_PORT->PUPDR |= ( (0b01 << (LCD_CS_POS*2U))
                        | (0b01 << (LCD_RS_POS*2U))
                        | (0b01 << (LCD_WR_POS*2U))
                        | (0b01 << (LCD_RD_POS*2U))
                        | (0b01 << (LCD_RST_POS*2U)) );
//// set output direction
LCD_CTRL_PORT->MODER &= ~LCD_CTRL_PORT_32BIT_MASK;
LCD_CTRL_PORT->MODER |= ( (0b01 << (LCD_CS_POS*2U))
                        | (0b01 << (LCD_RS_POS*2U))
                        | (0b01 << (LCD_WR_POS*2U))
                        | (0b01 << (LCD_RD_POS*2U))
                        | (0b01 << (LCD_RST_POS*2U)) );
//// Max output speed
LCD_CTRL_PORT->OSPEEDR &= ~LCD_CTRL_PORT_32BIT_MASK;
LCD_CTRL_PORT->OSPEEDR |= ( (0b11 << (LCD_CS_POS*2U))
                            | (0b11 << (LCD_RS_POS*2U))
                            | (0b11 << (LCD_WR_POS*2U))
                            | (0b11 << (LCD_RD_POS*2U))
                            | (0b11 << (LCD_RST_POS*2U)) );
//// all control pin set HIGH (idle)
LCD_CTRL_PORT->BSRR |= (LCD_CS | LCD_RS | LCD_WR | LCD_RD | LCD_RST);
////////

//////////// Config data pins
LCD_DATA_PORT->PUPDR &= ~LCD_CTRL_PORT_32BIT_MASK;
LCD_DATA_PORT->OSPEEDR &= ~LCD_DATA_PORT_32BIT_MASK;
LCD_DATA_PORT->MODER &= ~LCD_CTRL_PORT_32BIT_MASK;

#if LCD_USE8BIT_MODE == 1
LCD_DATA_PORT->PUPDR |= ( 0x5555 << (LCD_DATAPORT_OUTPUT_OFFSET*2U) );
LCD_DATA_PORT->OSPEEDR |= ( 0xFFFF << (LCD_DATAPORT_OUTPUT_OFFSET*2U) );
LCD_DATA_PORT->MODER |= ( 0x5555 << (LCD_DATAPORT_OUTPUT_OFFSET*2U) );
LCD_DATA_PORT->BSRR = ( 0xFF << (LCD_DATAPORT_OUTPUT_OFFSET*2U) );
#else
LCD_DATA_PORT->PUPDR |= 0x55555555;
LCD_DATA_PORT->OSPEEDR |= 0xFFFFFFFF;
LCD_DATA_PORT->MODER |= 0x55555555;
LCD_DATA_PORT->BSRR = 0xFFFF;
#endif

I'm using True Studio and the platform is STM32F3. The function just plainly modifies some registers with constants derived from some macro constants.

But why it does cost almost 1 kB of flash (in .text section, I'm using Truestudio's build analyzer) although it isn't quite long ? Even HAL-GPIO function is 3x smaller.


Example macro:

#define LCD_XXXX_PORT  GPIOX
#define LCD_XXX        GPIO_PIN_X
#define LCD_XXX_POS    POSITION_VAL(LCD_XXX)
Long Pham
  • 1,260
  • 1
  • 11
  • 21
  • Have you compiled with the optimiser turned on? It often makes a huge difference, and I would expect something like the above to collapse to an inline function or the like at -o2. – Dan Mills Aug 14 '18 at 15:59
  • @DanMills I haven't. Thanks for the tip. Let me try. – Long Pham Aug 14 '18 at 16:02
  • 1
    *"it cost almost 1 kB of flash"* I don't think the code shown is that big. Check in the final result where the 1K comes from. The linker might pull in a big blob of library code. – Oldfart Aug 14 '18 at 16:02
  • @Oldfart True Studio's build analyzer shows that the function takes ~ 1 kB (968 bytes, to be precise) in .text section. – Long Pham Aug 14 '18 at 16:07
  • @DanMills 8 bytes saved for -O2, -O3 and -Ofast. Zero for -Osize. – Long Pham Aug 14 '18 at 16:17
  • Have a look at the generated machine code and analyze what it does. On some micros some things need to be aligned to certain addresses – PlasmaHH Aug 14 '18 at 16:22
  • How odd. What types are you using for things (bit shifts on signed types have some oddities C), and what assembler is this generating? – Dan Mills Aug 14 '18 at 16:26
  • You could try to get the `LCD_CTRL_PORT` pointer into a register. – Oldfart Aug 14 '18 at 16:26
  • 2
    What are LCD_CTRL_PORT_CLK_EN and LCD_DATA_PORT_CLK_EN doing? You should really post the assembly you get, at least the first few dozen lines. – Adam Haun Aug 14 '18 at 16:47
  • 1
    Most of this should be collapsing to constants at compile time. You need to examine your output to see where the size is really being taken, likely it isn't here. Also don't forget there's some fixed overhead like the vector table. – Chris Stratton Aug 14 '18 at 17:57
  • I would simply ask ST why their toolchain is crap. This code here should be some 20-30 bytes. – Lundin Aug 15 '18 at 11:01
  • @PlasmaHH Yes, after I learn asm :) – Long Pham Aug 15 '18 at 17:55
  • @DanMills uint32_t for struct member, not sure for macros. I have read that macros aren't type-safe but the GPIO is initilized as I want. – Long Pham Aug 15 '18 at 17:58
  • @AdamHaun They are used to enable GPIO clock. Right now, I'm just using HAL function to do it. They only cost around 60 bytes. – Long Pham Aug 15 '18 at 18:03
  • @Oldfart `LCD_CTRL_PORT` points to a struct which member' addresses are aligned with real register addresses. – Long Pham Aug 15 '18 at 18:10
  • 1
    I was referring to using the keyword 'register' in the variable definition to ask the compiler to assign and keep the pointer in a processor local register. Like: `register int *LCD_CTRL_PORT;` (But then replace `int` with the real type of the LCD control port typedef) – Oldfart Aug 15 '18 at 20:05

3 Answers3

2

I think I have found two reasons:

1. The compiler isn't smart enough to recognize some compile time constants

The problem lies in these macros: #define LCD_XXX_POS POSITION_VAL(LCD_XXX)

I want to create an automatic mask for various GPIO configuration register. In order to do so, I must at least know the position of each pin. So I use POSITION_VAL(), a macro function provided in ST's CMSIS header :

#define POSITION_VAL(VAL)     (__CLZ(__RBIT(VAL))) 

Basically, the macro function takes an parameter (in this case, single-GPIO-pin mask), flips it with rbit instruction, calculates leading zeros with clz instruction and returns leading zeros. By doing so, I can get the position of LCD_XXX pin quite easily.

The compiler seems to replace the macro function with it body everytime, instead of replacing them with constants. I have tested by defining LCD_XXX_POS as constants, the function size reduced to ~380 bytes.

2. Something was wrong with project setting

About the macro POSITION_VAL(VAL), the compiler may not (do not ?) care if it's fed with constants. So, as suggested by this article, I created a new header :

#ifdef POSITION_VAL
#undef POSITION_VAL
#endif

#ifdef __GNUC__
# define POSITION_VAL(VAL) (__builtin_ctz(VAL))
#else
# define POSITION_VAL(VAL) (__CLZ(__RBIT(VAL)))
#endif

GNU C has __builtin_ctz which accomplishes the same thing but can deduce the value at compile time if appropriate.

But Truestudio highlighted #ifdef __GNUC__ branch as inactive although I was sure that I was using GCC. I tried with a new project template, #ifdef __GNUC__ branch wasn't marked inactive and the function only cost ~180 bytes.

Clearly, there was something wrong.....

Long Pham
  • 1,260
  • 1
  • 11
  • 21
  • 1
    A compiler would have to be pretty clever if it should know at _compile time_ how to replace a set of custom inline assembly. It would require a complete simulator. That is a pretty silly and "surprising" macro which should have been written as an inlinable function. – pipe Aug 16 '18 at 07:41
2

But why it does cost almost 1 kB of flash although it isn't quite long ? Even HAL-GPIO function is 3x smaller.

Let's first see what you did ask for.

9 |= operations with a literal*
6 &= ~operation with a literal*
1 = operation with literal*

*assuming the compiler optimizes arithmetic on constants on -O1

The |= takes 4 instructions (2 LDR, ORR and STR), and two constants. (18 bytes)
The &=~ is similar to |=, since ~const will be optimized.
An = is three instructions (LDR,MOV and STR), and two constants. (10 bytes)

Since registers are volatile, these assignments will no be optimized further. This means the code above is about 280 bytes. Maybe less, on if more than -O1 by keeping some pointers around in r1.

However, an empty ARM project is about 600 bytes as well.
That is just the vector table (~80 words), and the stack and memory zero initialization.

1kB, for "just gpio" is not big.

Jeroen3
  • 21,976
  • 36
  • 73
1

It could be because you are using STM32F3 peripheral library. If you only need GPIO access you can include stm32f3xx.h directly. Remove USE_STDPERIPH_DRIVER from build config.

Maple
  • 11,755
  • 2
  • 20
  • 56