Im going to bring you down to pretty bare baremetal, and then you can complicate this from there, but should have a much higher chance of success.
This code blinks the led on port pin PC13.
This uses gnu tools, as written you can use pretty much any gcc for arm from say gcc 3.x.x or gcc 4.x.x to the present gcc 9.x.x. Can use the linux variants or the non-linux variants. (arm-none-eabi- or arm-linux-gnueabi, or other similar variations as it doesnt rely on libraries and uses the compiler as a compiler and linker as a linker and doesnt use the things that will vary). You can find pre-builts for windows, linux, macos. And can build your own from sources too.
flash.s
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
#define GPIOCBASE 0x40011000
#define RCCBASE 0x40021000
int notmain ( void )
{
unsigned int ra;
unsigned int rx;
ra=GET32(RCCBASE+0x18);
ra|=1<<4; //enable port c
PUT32(RCCBASE+0x18,ra);
//config
ra=GET32(GPIOCBASE+0x04);
ra&=~(3<<20); //PC13
ra|= 1<<20; //PC13
ra&=~(3<<22); //PC13
ra|= 0<<22; //PC13
PUT32(GPIOCBASE+0x04,ra);
for(rx=0;;rx++)
{
PUT32(GPIOCBASE+0x10,1<<(13+0));
for(ra=0;ra<200000;ra++) dummy(ra);
PUT32(GPIOCBASE+0x10,1<<(13+16));
for(ra=0;ra<200000;ra++) dummy(ra);
}
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Then build
arm-linux-gnueabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-linux-gnueabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m0 -mthumb -c notmain.c -o notmain.o
arm-linux-gnueabi-ld -nostdlib -nostartfiles -T flash.ld flash.o notmain.o -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list
arm-linux-gnueabi-objcopy -O binary notmain.elf notmain.bin
or
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m0 -mthumb -c notmain.c -o notmain.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o notmain.o -o notmain.elf
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy -O binary notmain.elf notmain.bin
You dont always need the nostdlibs nostartfiles ffreestanding, but sometimes they come in handy.
Bare-metal leaves the programmer with lots of freedom and personal preferences as such some folks may comment very negatively to this answer. Some may not. I recommend you learn at this level and the just call the api level and levels in between.
ARM makes cores not chips, you will want the documentation for the arm core, in this case the st manual will tell you this is a cortex-m3 which in that technical reference manual for arm will tell you it is based on the armv7m architecture which you find in the armv7-m architectural reference manual from arm. infocenter.arm.com. Then the st data sheet for the part and the reference manual for the family (rm0008 it looks like for this one. At this time when you click on reference manuals, you are pointed at interesting manuals you dont need nor want but scroll up to find the reference manual thats the one you want and where you spend almost all your time.
the cortex-ms use a vector table at ARM's address 0x00000000 to boot, the first entry is a freebie to load the stack pointer, whatever is there goes into the stack pointer, you can certainly change the stack pointer in the bootstrap if you prefer. Then there is the reset vector and some others. the vectors are the address of the handler ORRed with 1 its a thumb instruction set vs arm instruction set thing, research this yourself.
This example the bootstrap simply jumps to the C entry point notmain. Some compilers in the past have added junk when linked with main() and to avoid that just dont call it main(). this trivial bootstrap means you cant expect
unsigned int x = 5;
unsigned int y;
To have 5 and 0 respectively when you start your C code. for now easy to do without. You can complicate the linker script and bootstrap to add support for .data and to zero .bss per your preferences. No need for that in this example.
The datasheet shows that PC13 and pretty much all of the gpio pins default to gpio mode, so no need to set the alternate function go do gpio things.
Early in the reference manual is a list of base addresses for this part/family for the various modules in the part.
The reference manual shows that RCC_APB2ENR bit 4 controls the clock enable to the GPIOC block. So a read-modify-write to set that bit to enable the clocks.
Now you can talk to the gpio peripheral. So most of the stm32 family uses a different gpio controller, some of these older/early stm32s you have this one register deal, vs a two register deal, either way. The cnf pins for the port we want general purpose push-pull, for mode one of the outputs speed doesnt matter here.
The pin is configured.
The BSRR register is pretty cool that you can set one or more bits in the register to change the state, the logic basically does the read-modify-write for you to that control bit in the logic that determines if it is high or low.
the loop calling the dummy() function simply burns time. By having the dummy() function in a separate object it cant optimize this loop away, if you didnt make this call in this way this could be dead code and the loop goes away and you dont have a delay and you dont blink it might glow at best. Other ways to do this, this allows for an optimized loop counter and is lower risk than other solutions.
Have to hand tune the value it counts too to "see" the led blink, and can change it double, quadrouple, etc re-build and reprogram and get a warm fuzzy feelling that you are changing the firmware on the part.
Looking at the disassembly to confirm it will have a chance of booting and not just fail or worse brick the part.
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000017 stmdaeq r0, {r0, r1, r2, r4}
800000c: 08000017 stmdaeq r0, {r0, r1, r2, r4}
08000010 <reset>:
8000010: f000 f808 bl 8000024 <notmain>
8000014: e7ff b.n 8000016 <hang>
08000016 <hang>:
8000016: e7fe b.n 8000016 <hang>
vector table is at the right address (yes not 0x00000000 hang on) and the vectors are the address ORRed with one (.thumb_func makes that happen declaring the next label as a function address).
The application flash in most stm32 parts is at address 0x08000000 (some have other/faster flashes but also support this address space). Depending on the boot pins and how the part is booting it will as far as we are concerned mirror what is at 0x08000000 to 0x00000000 so when the arm reads 0x00000004 it sees 08000011, strips the lsbit and jumps to 0x08000010 our reset handler. And also moving the processor to the right address space. Some of these stm32 parts with larger flashes the full address space is only at the application address not at zero, so you want to build for the application space for that part. Some chip vendors dont do this some have their own other address spaces, gotta read the docs. Some of the mbed compatible nucleo boards wont load the flash if you have it built for zero or the incorrect address, I guess it looks at the vector table before programming the part (with a nucleo board which I highly recommend you simply copy the .bin file over to the virtual drive the board makes when you plug it in and the debug mcu programs the target mcu for you, super simple).
Getting this binary onto your part you will have to figure out. There is a factory bootloader that uses one of the uarts and other interfaces, the uart one is not hard to write a program to talk to and program the part that way or find a tool someone else wrote or maybe fake the gui to load the binary even if you dont use the gui to build the binary. Let you sort this out, all part of working at this level.
Lots of folks dont want to do the read-modify-write there are times when that is fine, but in general do it. I have bricked parts such that I couldnt recover them with shortcuts like that. Fortunately before the stm32Gxxxx parts the boot pins (boot0 sometimes boot1) provide a safe backdoor to unbrick the part assuming you always provide some form of access to those on your boards. The newer parts are moving away from this (the stm32g's will give you one shot for a new/erased app flash part, then after that its locked so that first program you want to have change the non-volatile bits to allow the bootloader). I suspect as with atmels arm based parts the factory bootloader is not long for this world in the stm32 parts. Will see.
Either start with nucleo boards first, but eventually invest in one or more usb to uart solutions as well as a debug front end. The wider/bigger (but still as cheap as $10) nucleo boards have a debug end that you can remove the jumpers and use that debugger to program st and other branded cortex-m parts with free tools like openocd. There are other swd solutions. but eventually your toolbox should have jumper wires with mixtures of male/female combinations on the ends and usb uart and usb swd solutions.
CMSIS is like the unified syntax, a mostly failed attempt by arm to try to control something they shouldnt be interfering with. It tries to make the header files and some other parts of the software solution look common across the vendors that use their cores. It has taken some badly written vendor libraries/headers and simply made them that much worse. but the idea here is as you have started you get some header file it then does some form of maybe scary maybe tolerable way of accessing registers, then you need more code.
Chip vendors at this level pretty much have to provide libraries, and tools as customers expect that. This is a really old part so the tools of the day are probably not even available. Vendors need to keep things fresh as they add new parts, get you to feel you need the latest tool and move away from the old tool and so on. If you avoid their stuff you dont have these problems but you do then have to be able to read the docs and program the parts yourself. Generally easy.
Professionally you should be able to work at all levels so should be aware and practice the various solutions (sometimes though that is best done buy buying the right board/part for the current version of the tool/library and not trying to shim it into something older or not completely supported).
At this lower/direct level you can then examine the libraries (often scary when you get in there, but not always) and figure out why the library is broken or why you can or cant do this, etc. And thats the goal here, attempt to get early success, so you dont give up, but then go and look at the libraries, whose startup routines often initialize way more stuff that you need for complicated applications but not simple ones like this. And they then should have gpio pin routines to configure the pin or whole port into an output, input, some alternate function, etc.
This is an output blink the led, if you wanted input then examine the reference manual see if the reset value is an input, it usually is or sometimes its an analog input and you want to change it to a floating input or pull up/pull down.
And then you sample the input data register and mask off the individual pin(s) you are interested in and act on that being careful not to allow the compiler to optimize out future accesses (one of the many reasons I use an abstraction that doesnt get optimized out, there are other reasons).
Most MCU vendors have clock enables for the various peripherals/blocks, not always but a lot of the time and often the block has the clocks gated but not necessarily in reset (powers up not consuming a lot of power, turn things on you want to use rather than scramble to turn everything else off). Some vendors the address space is in the datasheet and the register offsets and bits are in some form of reference manual not necesarily by that name. Some do it all in the datasheet, some have two docs call the datasheet but one is a smaller file than the other and you want the big file.
This example built for the cortex-m0 which uses armv6-m instructions which are pretty close to the original armv4t thumb instruction set and to date the armv6m instructions work on all cortex-ms, but the armv7-ms dont and if you try to borrow code from someone else or yourself you can run into faults because its building instructions that your core doesnt support, quite frustrating to fail to boot immediately. Can then use the larger armv7-m instruction set to add performance later. (not necessarily smaller code, can go either way on size and performance isnt guaranteed can be slower to switch too).
Good luck.
If you want to dabble in the vendors library and toolchain solutions I highly recommend you pick a part/dev board that is currently supported and builds nice and clean examples before buying the board. Then use that board to learn the libraries/tool. Have one set of problems dont amplify/multiply by not using but manipulating the tool and trying to learn the part. Ideally learn the part or the library/tool, not both nor all three at the same time. The stm32f103 is not the part you want to start with if you want to use the tools, it is the part you want to temporarily play with as you can get boards from asia ready to use for a couple of bucks, but you need five to twenty dollars in other stuff to make that board work plus experience. Get a $10 nucleo, maybe something from atmel and/or the microbit or some others. A ti launchpad, etc. And dabble in the different tools and libraries...See where cmsis overlaps across the vendors and where it doesnt. And realistically, get the tools and play with them without or before buying any parts, if the tools dont just work and dont just have some blink the led example that just works as in it builds clean and easy. Then move on to the next one. Then buy a board/part.