I know this question is a bit old, but I've just recently had to research it myself as I'm implementing AES128 on a PIC16 and an 8051, and so I was curious about this question too.
I've used something like this: http://cs.ucsb.edu/~koc/cs178/projects/JT/aes.c
and my ram usage is a couple hundred bytes and the binary size is less than 3kb ROM.
My best advice is to read up on the Wikipedia page http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
and understand the different modes, for instance how AES in OFB mode sorta utilizes ECB mode as a basic building block.
Also the XOR'ing (in OFB-mode) makes it a symmetrical operation, so encrypt/decrypt is the same function which also saves space.
When I understood how AES really worked, I could implement it in C and then test it against the NIST specification** (do this! much code found online is flawed) and only implement what I absolutely needed.
I was able to fit AES128 on an 8051 alongside with some other RF firmware by doing this customization and optimization. The RAM usage (for the whole system) went down from ~2.5kb to just below 2kb, meaning we did not have to upgrade to an 8051 with 4kb SRAM, but could keep using the cheaper 2kb SRAM version.
** Test Vectors are in Appendix F in: http://csrc.nist.gov/publications/nistpubs/800-38a/addendum-to-nist_sp800-38A.pdf
EDIT:
Finally got the code on Github: https://github.com/kokke/tiny-AES-c
I've optimized a bit for size. GCC size output when compiled for ARM:
$ arm-none-eabi-gcc -O2 -c aes.c -o aes.o
$ size aes.o
text data bss dec hex filename
1024 0 204 1228 4cc aes.o
So the resource usage is now 1KB code, 204 bytes RAM.
I don't remember how to build for the PIC, but if the 8bit AVR Atmel Mega16 is anything like the PIC, the resource usage is:
$ avr-gcc -Wall -Wextra -mmcu=atmega16 -O2 -c aes.c -o aes.o
$ avr-size aes.o
text data bss dec hex filename
1553 0 198 1751 6d7 aes.o
So 1.5K code and 198bytes RAM.