I'm not sure how this applies to modern computer chips and operating systems, but here is interesting research from Liedtke in 1995 and 1997 showing the overhead of system calls:
For measuring the system-call overhead, getpid, the shortest Linux system call, was examined. To measure its cost under ideal circumstances, it was repeatedly invoked in a tight loop. Table 2 shows the consumed cycles and the time per invocation derived from the cycle numbers. The numbers were obtained using the cycle counter register of the Pentium processor.
Linux = 223 cycles = 1.68 µs (133MHz Pentium)
The Performance of µ-Kernel-Based Systems, Liedtke et al, 1997, http://os.inf.tu-dresden.de/pubs/sosp97/.
It is widely believed that switching between kernel and user mode, between address spaces and between threads is inherently expensive. Some measurements seem to support this belief.
Ousterhout  measured the costs for executing the "null" kernel call getpid. Since the real getpid operation consists only of a few loads and stores, this method measures the basic costs of a kernel call. Normalized to a hypothetical machine with 10 MIPS rating... he showed that most machines need 20-30 μs per getpid, one required even 63 μs. Corroborating these results, we measured 18 μs per Mach μ-kernel call get self thread. In fact, the measured kernel-call costs are high.
For analyzing the measured costs, our argument is based on a 486 (50 MHz) processor. We take an x86 processor, because kernel-user mode switches are extremely expensive on these processors. In contrast to the worst case processor, we use a best-case measurement for discussion, 18 μs for Mach on a 486/50.
The measured costs per kernel call are 18x50 = 900 cycles. The bare machine instruction for entering kernel mode costs 71 cycles, followed by an additional 36 cycles for returning to user mode. These two instructions switch between the user and kernel stack and push/pop flag register and instruction pointer. 107 cycles (about 2 μs) is therefore a lower bound on kernel/user mode switches. The remaining 800 or more cycles are pure kernel overhead. By this term, we denote all cycles which are solely due to the construction of the kernel, nevermind whether they are spent in executing instructions (800 cycles ~ 500 instructions) or in cache and TLB misses (800 cycles ~ 270 primary cache misses ~ 90 TLB misses). We have to conclude that the measured kernels do a lot of work when entering and exiting the kernel. Note that this work by definition has no net effect.
On µ-Kernel Construction, Liedtke, 1995, http://os.ibds.kit.edu/downloads/publ_1995_liedtke_ukernel-construction.pdf.