workqueue: Use atomic_try_cmpxchg_relaxed() in tryinc_node_nr_active()

Use try_cmpxchg() family of locking primitives instead of
cmpxchg(*ptr, old, new) == old.

The x86 CMPXCHG instruction returns success in the ZF flag, so this
change saves a compare after CMPXCHG (and related move instruction
in front of CMPXCHG).

Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when
CMPXCHG fails. There is no need to re-read the value in the loop.

The generated assembly improves from:

     3f7:	44 8b 0a             	mov    (%rdx),%r9d
     3fa:	eb 12                	jmp    40e <...>
     3fc:	8d 79 01             	lea    0x1(%rcx),%edi
     3ff:	89 c8                	mov    %ecx,%eax
     401:	f0 0f b1 7a 04       	lock cmpxchg %edi,0x4(%rdx)
     406:	39 c1                	cmp    %eax,%ecx
     408:	0f 84 83 00 00 00    	je     491 <...>
     40e:	8b 4a 04             	mov    0x4(%rdx),%ecx
     411:	41 39 c9             	cmp    %ecx,%r9d
     414:	7f e6                	jg     3fc <...>

to:

    256b:	45 8b 08             	mov    (%r8),%r9d
    256e:	41 8b 40 04          	mov    0x4(%r8),%eax
    2572:	41 39 c1             	cmp    %eax,%r9d
    2575:	7e 10                	jle    2587 <...>
    2577:	8d 78 01             	lea    0x1(%rax),%edi
    257a:	f0 41 0f b1 78 04    	lock cmpxchg %edi,0x4(%r8)
    2580:	75 f0                	jne    2572 <...>

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
1 file changed