The instruction set is the interface specification -- given a word of the value x, the ARM CPU will do y.
The architecture is an implementation of this interface. There can be different implementations, but the one by ARM Ltd. has been developed along the interface specification and is most likely one of the best optimized ones.
The instruction set has been designed to allow the architecture to be realized with minimal effort, by keeping instructions simple enough that a short pipeline (3 stages for ARM7, 5 stages for ARM9) with (almost) no feedback to earlier stages can be used to implement it, which gives a good performance/power trade-off.
The x86 instruction set in contrast is basically legacy defined, from an originally microcoded architecture with fairly good code density. Older x86 compatible architectures were fairly slow, as execution times for different instructions varies, and it becomes difficult to parallelize instruction processing; newer implementations use preprocessing to generate an instruction stream that can be pipelined better, but this technology has certain limits.
The Itanium instruction on the other hand set has a lot of similarities to ARM -- every instruction can be prefixed with a condition, and individual instructions cannot perform complex operations like combinations of memory accesses and arithmetic, but has been optimized further towards high performance, so it became really difficult to write correct assembler programs, as values fetched from memory or computation results are not available to the next instructions as the earlier instruction is not fully completed as the later instruction looks up its operands.
The ARM instruction set is a good trade-off -- a working implementation can be realized with rather few transistors, and can be clocked with speeds in the GHz range, but constraints on well-formed programs due to pipelining do not yet show up excessively, so it is still possible to write assembler by hand.