Accessing data past 64k boundary on atmega1280

Question

I need to be able to store and refer to constant data arrays which must be placed after the 64k boundary in the atmega1280. How do I create the correct data structure, and then access it for the following example:

const uint16_t PROGMEM port_to_mode_PGM[] = {
    NOT_A_PORT,
    &DDRA,
    &DDRB,
    &DDRC,
    &DDRD,
    &DDRE,
    &DDRF,
    &DDRG,
    &DDRH,
    NOT_A_PORT,
    &DDRJ,
    &DDRK,
    &DDRL,
};

uint16_t getdata(uint8_t index)
{
   return port_to_mode_PGM[index];
}

Currently AVRGCC compiles it placing the data structure after 64k (The whole program is after 64k, using a linker instruction to move the .text section) but then uses the LPM instruction to access the data structure.

The LPM instruction can only access the first 64k of program memory.

It looks like the following should work

uint16_t getdata(uint8_t index)
{
   return pgm_read_word_far( port_to_mode_PGM + index);
}

But the disassembly and simulation show that the pointer port_to_mode_PGM is only a 16 bit value, despite living beyond the 64k boundary.

score 2 · Accepted Answer · answered Feb 16 '12 at 12:34

Accessing constants beyond 64K is tricky as you've already found out, due to the use of 16 bit pointers. If you want to access constants in program memory beyond 64K, you have to calculate the address yourself using 32 bit arithmetic (eg with uint32_t types) and use pgm_read_word_far().

Since you have to have special compiler instructions to access constants located in program space anyway (eg use of PROGMEM), I suggest you make all accesses to these constants go through a function designed to access your data and return the results in the native types that the compiler can more easily work with (ie non PROGMEM types). The abstraction to get the constant's and return it doesn't have to be peppered through your code, but only exist in one place. Use static inline functions and compiler optimization to make this efficient.

You have to link the binaries destined for upper/lower flash differently, so there is almost no extra work to also recompile each binary with a different TEXT_OFFSET #define which is used during compilation to specify the location at which the binary will be linked. You can always use the ELPM instructions independent of compilation for upper/lower flash.

For example:

static inline uint16_t getdata(uint8_t index)
{
    return pgm_read_word_far((uint32_t)TEXT_OFFSET + (uint16_t)port_to_mode_PGM + index);
}

where TEXT_OFFSET is passed in at compile time and matches the value passed to your linker.

If you look at the assembly listing produced for source like this you'll notice a lot of instructions to do the 32 bit arithmetic. If efficiency is important, you can roll your own inline assembly function to do the access in the required manner. The following code shows a custom inline assembly snippet, hacked from samples in <avr/pgmspace.h>. (This snippet only covers the single case of the ELPM_word_enhanced__ macro which suits your micro).

#define __custom_ELPM_word_enhanced__(offset, addr)    \
(__extension__({                        \
uint32_t __offset = (uint32_t)(offset);\
uint16_t __addr16 = (uint16_t)(addr);\
uint16_t __result;                  \
__asm__                             \
(                                   \
    "out %3, %C2"   "\n\t"          \
    "movw r30, %1"  "\n\t"          \
    "elpm %A0, Z+"  "\n\t"          \
    "elpm %B0, Z"   "\n\t"          \
    : "=r" (__result)               \
    : "r" (__addr16),               \
      "r" (__offset),               \
      "I" (_SFR_IO_ADDR(RAMPZ))     \
    : "r30", "r31"                  \
);                                  \
__result;                           \
}))

#define custom_pgm_read_word_far(offset, address_long)  __custom_ELPM_word_enhanced__((uint32_t)(offset), (uint16_t)(address_long))

It would be used similarly as before, but this time the offset is passed in as a separate parameter:

static inline uint16_t getdata(uint8_t index)
{
    return custom_pgm_read_word_far(TEXT_OFFSET, (uint16_t)port_to_mode_PGM + index);
}

This will produce more efficient code as no 32 bit arithmetic is required. It can be made even more efficient if required by requiring that the offset passed in in not a 32 bit value but just the upper word or byte of TEXT_OFFSET. Since all of your constant data accesses can go through a function like this, the information about TEXT_OFFSET is constrained to a single place in your code. Have your data access go through a function such as this rather than using the variable directly also has the advantage that you can mock your getdata() method for testing.

NOTE: These code sample haven't been completely tested.

Hey, this is great, thanks. I'll still use my workaround for this release, but I suspect in the long term I'll have to implement this as this is the better overall option. — Adam Davis, Feb 16 '12 at 16:13

score 1 · Answer 2 · edited Apr 13 '17 at 12:32

I could write the C program to access the far memory specifically, but as mentioned in this related question I'm trying to write the app so it can run at either 0x00000 or 0x10000, and I'd like to avoid having to pepper the code with defines and different code paths based on where it is in memory.

I'm still hoping there's a better solution, but it looks like far pointers just don't work seemlessly in AVRGCC right now. It appears that AVRGCC expects the programmer to decided when and where to use far access and far storage, rather than handling those details itself. This makes sense since it isn't known until link time where the code is going, so it really can't choose between LPM and ELPM at code generation time.

As a workaround I'm moving the start of the secondary program to 0xF800, so the first 4K is below the 64k boundary. AVRGCC does put all the constants in the beginning of the program (interrupt vectors, constants, code, in that order) so as long as my constants take up less than 4k of space, I will be fine using the regular array access within AVRGCC.

It isn't a 50% division of code space, so one program will necessarily be smaller than the second program, but that is a tradeoff I believe can be made for now, and if it's determined we have to support more memory then we'll have to use a more complex process.

Accessing data past 64k boundary on atmega1280

2 Answers2