)]}'
{
  "commit": "6ffb9017e9329168b3b4216d15def8e78e1b1fac",
  "tree": "0e72d4e7d107eadefbb6c4f2d1763e1d3108e983",
  "parents": [
    "ae0a457f5d33c336f3c4259a258f8b537531a04b",
    "60ba5b3ed7278a5700c8d57c3f5486b6066f745c"
  ],
  "author": {
    "name": "Alexei Starovoitov",
    "email": "ast@kernel.org",
    "time": "Tue Mar 18 10:28:24 2025 -0700"
  },
  "committer": {
    "name": "Alexei Starovoitov",
    "email": "ast@kernel.org",
    "time": "Wed Mar 19 08:03:06 2025 -0700"
  },
  "message": "Merge branch \u0027resilient-queued-spin-lock\u0027\n\nKumar Kartikeya Dwivedi says:\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nResilient Queued Spin Lock\n\nChangelog:\n----------\nv3 -\u003e v4\nv4: https://lore.kernel.org/bpf/20250303152305.3195648-1-memxor@gmail.com\n\n * Fix bisectability problem by reordering locktorture commit before\n   Makefile commit.\n * Add EXPORT_SYMBOL_GPL to all used symbols and variables by consumers.\n * Skip BPF selftest when nrprocs \u003c 2.\n * Fix kdoc to describe return value for res_spin_lock, slowpath.\n * Move kernel/locking/rqspinlock.{c,h} to kernel/bpf/rqspinlock.{c,h}.\n\nv2 -\u003e v3\nv2: https://lore.kernel.org/bpf/20250206105435.2159977-1-memxor@gmail.com\n\n * Add ifdef\u0027s to fallback to Ankur\u0027s patch when it gets in, until then\n   copy-paste the implementation.\n * Change the meaning of RES_DEF_TIMEOUT from two critical section\n   lengths to one for clarity, and use RES_DEF_TIMEOUT * 2 where needed.\n * Use NSEC_PER_SEC as timeout for TAS fallback.\n * Add Closes: tags for known syzbot reports.\n * Change timeout for TAS fallback to 1 second.\n * Fix more kernel test robot errors.\n * More comments about smp_wmb in release_held_lock_entry interaction.\n * Change RES_NR_HELD to 31.\n * Address comments from Peter, Eduard, Alexei.\n\nv1 -\u003e v2\nv1: https://lore.kernel.org/bpf/20250107140004.2732830-1-memxor@gmail.com\n\n * Address nits from Waiman and Peter\n * Fix arm64 WFE bug pointed out by Peter.\n * Fix incorrect memory ordering in release_held_lock_entry, and\n   document subtleties. Explain why release is sufficient in unlock\n   but not in release_held_lock_entry.\n * Remove dependence on CONFIG_QUEUED_SPINLOCKS and introduce a\n   test-and-set fallback when queued spinlock support is missing on an\n   architecture.\n * Enforce FIFO ordering for BPF program spin unlocks.\n * Address comments from Eduard on verifier plumbing.\n * Add comments as suggested by Waiman.\n * Refactor paravirt TAS lock to use the implemented TAS fallback.\n * Use rqspinlock_t as the type throughout so that it can be replaced\n   with a non-qspinlock type in case of fallback.\n * Testing and benchmarking on arm64, added numbers to the cover letter.\n * Fix kernel test robot errors.\n * Fix a BPF selftest bug leading to spurious failures on arm64.\n\nIntroduction\n------------\n\nThis patch set introduces Resilient Queued Spin Lock (or rqspinlock with\nres_spin_lock() and res_spin_unlock() APIs).\n\nThis is a qspinlock variant which recovers the kernel from a stalled\nstate when the lock acquisition path cannot make forward progress. This\ncan occur when a lock acquisition attempt enters a deadlock situation\n(e.g. AA, or ABBA), or more generally, when the owner of the lock (which\nwe’re trying to acquire) isn’t making forward progress.\n\nThe cover letter provides an overview of the motivation, design, and\nalternative approaches. We then provide evaluation numbers showcasing\nthat while rqspinlock incurs overhead, the performance of rqspinlock\napproaches that of the normal qspinlock used by the kernel.\n\nThe evaluations for rqspinlock were performed by replacing the default\nqspinlock implementation with it and booting the kernel to run the\nexperiments. Support for locktorture is also included with numbers in\nthis series.\n\nThe cover letter\u0027s design section provides an overview of the\nalgorithmic approach. A technical document describing the implementation\nin more detail is available here:\nhttps://github.com/kkdwivedi/rqspinlock/blob/main/rqspinlock.pdf\n\nWe have a WIP TLA+ proof for liveness and mutual exclusion of rqspinlock\nbuilt on top of the qspinlock TLA+ proof from Catalin Marinas [3]. We\nwill share more details and the links in the near future.\n\nMotivation\n----------\n\nIn regular kernel code, usage of locks is assumed to be correct, so as\nto avoid deadlocks and stalls by construction, however, the same is not\ntrue for BPF programs. Users write normal C code and the in-kernel eBPF\nruntime ensures the safety of the kernel by rejecting unsafe programs.\nUsers can upload programs that use locks in an improper fashion, and may\ncause deadlocks when these programs run inside the kernel. The verifier\nis responsible for rejecting such programs from being loaded into the\nkernel.\n\nUntil now, the eBPF verifier ensured deadlock safety by only permitting\none lock acquisition at a time, and by preventing any functions to be\ncalled from within the critical section. Additionally, only a few\nrestricted program types are allowed to call spin locks. As the usage of\neBPF grows (e.g. with sched_ext) beyond its conventional application in\nnetworking, tracing, and security, the limitations on locking are\nbecoming a bottleneck for users.\n\nThe rqspinlock implementation allows us to permit more flexible locking\npatterns in BPF programs, without limiting them to the subset that can\nbe proven safe statically (which is fairly small, and requires complex\nstatic analysis), while ensuring that the kernel will recover in case we\nencounter a locking violation at runtime. We make a tradeoff here by\naccepting programs that may potentially have deadlocks, and recover the\nkernel quickly at runtime to ensure availability.\n\nAdditionally, eBPF programs attached to different parts of the kernel\ncan introduce new control flow into the kernel, which increases the\nlikelihood of deadlocks in code not written to handle reentrancy. There\nhave been multiple syzbot reports surfacing deadlocks in internal kernel\ncode due to the diverse ways in which eBPF programs can be attached to\ndifferent parts of the kernel.  By switching the BPF subsystem’s lock\nusage to rqspinlock, all of these issues can be mitigated at runtime.\n\nThis spin lock implementation allows BPF maps to become safer and remove\nmechanisms that have fallen short in assuring safety when nesting\nprograms in arbitrary ways in the same context or across different\ncontexts. The red diffs due to patches 16-18 demonstrate this\nsimplification.\n\n\u003e  kernel/bpf/hashtab.c         | 102 ++++++++++++++++++++++++++++++++--------------------------...\n\u003e  kernel/bpf/lpm_trie.c        |  25 ++++++++++++++-----------\n\u003e  kernel/bpf/percpu_freelist.c | 113 +++++++++++++++++++++++++---------------------------------...\n\u003e  kernel/bpf/percpu_freelist.h |   4 ++--\n\u003e  4 files changed, 73 insertions(+), 171 deletions(-)\n\nDesign\n------\n\nDeadlocks mostly manifest as stalls in the waiting loops of the\nqspinlock slow path. Thus, using stalls as a signal for deadlocks avoids\nintroducing cost to the normal fast path, and ensures bounded\ntermination of the waiting loop. Our recovery algorithm is focused on\nterminating the waiting loops of the qspinlock algorithm when it gets\nstuck, and implementing bespoke recovery procedures for each class of\nwaiter to restore the lock to a usable state. Deadlock detection is the\nmain mechanism used to provide faster recovery, with the timeout\nmechanism acting as a final line of defense.\n\nDeadlock Detection\n~~~~~~~~~~~~~~~~~~\nWe handle two cases of deadlocks: AA deadlocks (attempts to acquire the\nsame lock again), and ABBA deadlocks (attempts to acquire two locks in\nthe opposite order from two distinct threads). Variants of ABBA\ndeadlocks may be encountered with more than two locks being held in the\nincorrect order. These are not diagnosed explicitly, as they reduce to\nABBA deadlocks.\n\nDeadlock detection is triggered immediately when beginning the waiting\nloop of a lock slow path.\n\nWhile timeouts ensure that any waiting loops in the locking slow path\nterminate and return to the caller, it can be excessively long in some\nsituations. While the default timeout is short (0.5s), a stall for this\nduration inside the kernel can set off alerts for latency-critical\nservices with strict SLOs.  Ideally, the kernel should recover from an\nundesired state of the lock as soon as possible.\n\nA multi-step strategy is used to recover the kernel from waiting loops\nin the locking algorithm which may fail to terminate in a bounded amount\nof time.\n\n * Each CPU maintains a table of held locks. Entries are inserted and\n   removed upon entry into lock, and exit from unlock, respectively.\n * Deadlock detection for AA locks is thus simple: we have an AA\n   deadlock if we find a held lock entry for the lock we’re attempting\n   to acquire on the same CPU.\n * During deadlock detection for ABBA, we search through the tables of\n   all other CPUs to find situations where we are holding a lock the\n   remote CPU is attempting to acquire, and they are holding a lock we\n   are attempting to acquire. Upon encountering such a condition, we\n   report an ABBA deadlock.\n * We divide the duration between entry time point into the waiting loop\n   and the timeout time point into intervals of 1 ms, and perform\n   deadlock detection until timeout happens. Upon entry into the slow\n   path, and then completion of each 1 ms interval, we perform detection\n   of both AA and ABBA deadlocks. In the event that deadlock detection\n   yields a positive result, the recovery happens sooner than the\n   timeout.  Otherwise, it happens as a last resort upon completion of\n   the timeout.\n\nTimeouts\n~~~~~~~~\nTimeouts act as final line of defense against stalls for waiting loops.\nThe ‘ktime_get_mono_fast_ns’ function is used to poll for the current\ntime, and it is compared to the timestamp indicating the end time in the\nwaiter loop. Each waiting loop is instrumented to check an extra\ncondition using a macro. Internally, the macro implementation amortizes\nthe checking of the timeout to avoid sampling the clock in every\niteration.  Precisely, the timeout checks are invoked every 64k\niterations.\n\nRecovery\n~~~~~~~~\nThere is extensive literature in academia on designing locks that\nsupport timeouts [0][1], as timeouts can be used as a proxy for\ndetecting the presence of deadlocks and recovering from them, without\nmaintaining explicit metadata to construct a waits-for relationship\nbetween two threads at runtime.\n\nIn case of rqspinlock, the key simplification in our algorithm comes\nfrom the fact that upon a timeout, waiters always leave the queue in\nFIFO order.  As such, the timeout is only enforced by the head of the\nwait queue, while other waiters rely on the head to signal them when a\ntimeout has occurred and when they need to exit. We don’t have to\nimplement complex algorithms and do not need extra synchronization for\nwaiters in the middle of the queue timing out before their predecessor\nor successor, unlike previous approaches [0][1].\n\nThere are three forms of waiters in the original queued spin lock\nalgorithm.  The first is the waiter which acquires the pending bit and\nspins on the lock word without forming a wait queue. The second is the\nhead waiter that is the first waiter heading the wait queue. The third\nform is of all the non-head waiters queued behind the head, waiting to\nbe signalled through their MCS node to overtake the responsibility of\nthe head.\n\nIn rqspinlock\u0027s recovery algorithm, we are concerned with the second and\nthird kind. First, we augment the waiting loop of the head of the wait\nqueue with a timeout. When this timeout happens, all waiters part of the\nwait queue will abort their lock acquisition attempts. This happens in\nthree steps.\n\n * First, the head breaks out of its loop waiting for pending and locked\n   bits to turn to 0, and non-head waiters break out of their MCS node\n   spin (more on that later).\n * Next, every waiter (head or non-head) attempts to check whether they\n   are also the tail waiter, in such a case they attempt to zero out the\n   tail word and allow a new queue to be built up for this lock. If they\n   succeed, they have no one to signal next in the queue to stop\n   spinning.\n * Otherwise, they signal the MCS node of the next waiter to break out\n   of its spin and try resetting the tail word back to 0. This goes on\n   until the tail waiter is found. In case of races, the new tail will\n   be responsible for performing the same task, as the old tail will\n   then fail to reset the tail word and wait for its next pointer to be\n   updated before it signals the new tail to do the same.\n\nTimeout Bound\n~~~~~~~~~~~~~\nThe timeout is applied by two types of waiters: the pending bit waiter\nand the wait queue head waiter. As such, for the pending waiter, only\nthe lock owner is ahead of it, and for the wait queue head waiter, only\nthe lock owner and the pending waiter take precedence in executing their\ncritical sections.\n\nWe define the timeout value to span at most 1 critical section length,\nand then use the appropriate value (default, or default x 2) depending\non if we are the pending waiter or head of wait queue.\n\nTherefore, the waiting loop wait can span at most 2 critical section\nlengths, and thus, it is unaffected by the amount of contention or the\nnumber of CPUs on the host. Non-head waiters simply wait for the wait\nqueue head to signal them on a timeout.\n\nIn Meta\u0027s production, we have noticed uncore PMU reads and SMIs\nconsuming tens of msecs. While these events are rare, a 0.25 second\ntimeout should absorb such tail events and not raise false alarms for\ntimeouts. We will continue monitoring this in production and adjust the\ntimeout if necessary in the future.\n\nMore details of the recovery algorithm is described in patch 9 and a\ndetailed description is available at [2].\n\nAlternatives\n------------\n\nLockdep: We do not rely on the lockdep facility for reporting violations\nfor primarily two reasons:\n\n* Overhead: The lockdep infrastructure can add significant overhead to\n  the lock acquisition path, and is not recommended for use in\n  production due to this reason. While the report is more useful and\n  exhaustive, the overhead can be prohibitive, especially as BPF\n  programs run in hot paths of the kernel.  Moreover, it also increases\n  the size of the lock word to store extra metadata, which is not\n  feasible for BPF spin locks that are 4-bytes in size today (similar to\n  qspinlock).\n\n* Debug Tool: Lockdep is intended to be used as a debugging facility,\n  providing extra context to the user about the locking violations\n  occurring during runtime. It is always turned off on all production\n  kernels, therefore isn’t available most of the time.\n\nWe require a mechanism for detecting common variants of deadlocks that\nis always available in production kernels and never turned off. At the\nsame time, it must not introduce overhead in terms of time (for the slow\npath) and memory (for the lock word size).\n\nEvaluation\n----------\n\nWe run benchmarks that stress locking scalability and perform comparison\nagainst the baseline (qspinlock). For the rqspinlock case, we replace\nthe default qspinlock with it in the kernel, such that all spin locks in\nthe kernel use the rqspinlock slow path. As such, benchmarks that stress\nkernel spin locks end up exercising rqspinlock.\n\nEvaluation setup\n~~~~~~~~~~~~~~~~\n\nWe set the CPU governor to performance for all experiments.\n\nNote: Numbers for arm64 have been obtained without the no-WFE fallback\nin this series, to perform a fair comparison with the WFE using\nqspinlock baseline.\n\nx86_64:\n\nIntel Xeon Platinum 8468 (Sapphire Rapids)\n96 cores (48 x 2 sockets)\n2 threads per core, 0-95, siblings from 96-191\n2 NUMA nodes (every 48 cores), 2 LLCs (every 48 cores), 1 LLC per NUMA node\nHyperthreading enabled\n\narm64:\n\nAmpere Max Neoverse-N1 256-Core Processor\n256 cores (128 cores x 2 sockets)\n1 thread per core\n2 NUMA nodes (every 128 cores), 1 L2 per core (256 instances), no shared L3\nNo hyperthreading available\n\nThe locktorture experiment is run for 30 seconds.\nAverage of 25 runs is used for will-it-scale, after an initial warm up.\n\nMore information on the locks contended in the will-it-scale experiments\nis available in the evaluation section of the CNA paper, in table 1 [4].\n\nLegend:\n QL - qspinlock (avg. throughput)\n RQL - rqspinlock (avg. throughput)\n\nResults\n~~~~~~~\n\nlocktorture - x86_64\n\nThreads QL\t\tRQL\t\tSpeedup\n-----------------------------------------------\n1\t46910437\t45057327\t0.96\n2\t29871063\t25085034\t0.84\n4\t13876024\t19242776\t1.39\n8\t14638499\t13346847\t0.91\n16\t14380506\t14104716\t0.98\n24\t17278144\t15293077\t0.89\n32\t19494283\t17826675\t0.91\n40\t27760955\t21002910\t0.76\n48\t28638897\t26432549\t0.92\n56\t29336194\t26512029\t0.9\n64\t30040731\t27421403\t0.91\n72\t29523599\t27010618\t0.91\n80\t28846738\t27885141\t0.97\n88\t29277418\t25963753\t0.89\n96\t28472339\t27423865\t0.96\n104\t28093317\t26634895\t0.95\n112\t29914000\t27872339\t0.93\n120\t29199580\t26682695\t0.91\n128\t27755880\t27314662\t0.98\n136\t30349095\t27092211\t0.89\n144\t29193933\t27805445\t0.95\n152\t28956663\t26071497\t0.9\n160\t28950009\t28183864\t0.97\n168\t29383520\t28135091\t0.96\n176\t28475883\t27549601\t0.97\n184\t31958138\t28602434\t0.89\n192\t31342633\t33394385\t1.07\n\nwill-it-scale open1_threads - x86_64\n\nThreads QL      \tQL stddev       stddev% RQL     \tRQL stddev      stddev% Speedup\n-----------------------------------------------------------------------------------------------\n1\t1396323.92\t7373.12\t\t0.53\t1366616.8\t4152.08\t\t0.3\t0.98\n2\t1844403.8\t3165.26\t\t0.17\t1700301.96\t2396.58\t\t0.14\t0.92\n4\t2370590.6\t24545.54\t1.04\t1655872.32\t47938.71\t2.9\t0.7\n8\t2185227.04\t9537.9\t\t0.44\t1691205.16\t9783.25\t\t0.58\t0.77\n16\t2110672.36\t10972.99\t0.52\t1781696.24\t15021.43\t0.84\t0.84\n24\t1655042.72\t18037.23\t1.09\t2165125.4\t5422.54\t\t0.25\t1.31\n32\t1738928.24\t7166.64\t\t0.41\t1829468.24\t9081.59\t\t0.5\t1.05\n40\t1854430.52\t6148.24\t\t0.33\t1731062.28\t3311.95\t\t0.19\t0.93\n48\t1766529.96\t5063.86\t\t0.29\t1749375.28\t2311.27\t\t0.13\t0.99\n56\t1303016.28\t6168.4\t\t0.47\t1452656\t\t7695.29\t\t0.53\t1.11\n64\t1169557.96\t4353.67\t\t0.37\t1287370.56\t8477.2\t\t0.66\t1.1\n72\t1036023.4\t7116.53\t\t0.69\t1135513.92\t9542.55\t\t0.84\t1.1\n80\t1097913.64\t11356\t\t1.03\t1176864.8\t6771.41\t\t0.58\t1.07\n88\t1123907.36\t12843.13\t1.14\t1072416.48\t7412.25\t\t0.69\t0.95\n96\t1166981.52\t9402.71\t\t0.81\t1129678.76\t9499.14\t\t0.84\t0.97\n104\t1108954.04\t8171.46\t\t0.74\t1032044.44\t7840.17\t\t0.76\t0.93\n112\t1000777.76\t8445.7\t\t0.84\t1078498.8\t6551.47\t\t0.61\t1.08\n120\t1029448.4\t6992.29\t\t0.68\t1093743\t\t8378.94\t\t0.77\t1.06\n128\t1106670.36\t10102.15\t0.91\t1241438.68\t23212.66\t1.87\t1.12\n136\t1183776.88\t6394.79\t\t0.54\t1116799.64\t18111.38\t1.62\t0.94\n144\t1201122\t\t25917.69\t2.16\t1301779.96\t15792.6\t\t1.21\t1.08\n152\t1099737.08\t13567.82\t1.23\t1053647.2\t12704.29\t1.21\t0.96\n160\t1031186.32\t9048.07\t\t0.88\t1069961.4\t8293.18\t\t0.78\t1.04\n168\t1068817\t\t16486.06\t1.54\t1096495.36\t14021.93\t1.28\t1.03\n176\t966633.96\t9623.27\t\t1\t1081129.84\t9474.81\t\t0.88\t1.12\n184\t1004419.04\t12111.11\t1.21\t1037771.24\t12001.66\t1.16\t1.03\n192\t1088858.08\t16522.93\t1.52\t1027943.12\t14238.57\t1.39\t0.94\n\nwill-it-scale open2_threads - x86_64\n\nThreads QL      \tQL stddev       stddev% RQL     \tRQL stddev      stddev% Speedup\n-----------------------------------------------------------------------------------------------\n1\t1337797.76\t4649.19\t\t0.35\t1332609.4\t3813.14\t\t0.29\t1\n2\t1598300.2\t1059.93\t\t0.07\t1771891.36\t5667.12\t\t0.32\t1.11\n4\t1736573.76\t13025.33\t0.75\t1396901.2\t2682.46\t\t0.19\t0.8\n8\t1794367.84\t4879.6\t\t0.27\t1917478.56\t3751.98\t\t0.2\t1.07\n16\t1990998.44\t8332.78\t\t0.42\t1864165.56\t9648.59\t\t0.52\t0.94\n24\t1868148.56\t4248.23\t\t0.23\t1710136.68\t2760.58\t\t0.16\t0.92\n32\t1955180\t\t6719\t\t0.34\t1936149.88\t1980.87\t\t0.1\t0.99\n40\t1769646.4\t4686.54\t\t0.26\t1729653.68\t4551.22\t\t0.26\t0.98\n48\t1724861.16\t4056.66\t\t0.24\t1764900\t\t971.11\t\t0.06\t1.02\n56\t1318568\t\t7758.86\t\t0.59\t1385660.84\t7039.8\t\t0.51\t1.05\n64\t1143290.28\t5351.43\t\t0.47\t1316686.6\t5597.69\t\t0.43\t1.15\n72\t1196762.68\t10655.67\t0.89\t1230173.24\t9858.2\t\t0.8\t1.03\n80\t1126308.24\t6901.55\t\t0.61\t1085391.16\t7444.34\t\t0.69\t0.96\n88\t1035672.96\t5452.95\t\t0.53\t1035541.52\t8095.33\t\t0.78\t1\n96\t1030203.36\t6735.71\t\t0.65\t1020113.48\t8683.13\t\t0.85\t0.99\n104\t1039432.88\t6583.59\t\t0.63\t1083902.48\t5775.72\t\t0.53\t1.04\n112\t1113609.04\t4380.62\t\t0.39\t1072010.36\t8983.14\t\t0.84\t0.96\n120\t1109420.96\t7183.5\t\t0.65\t1079424.12\t10929.97\t1.01\t0.97\n128\t1095400.04\t4274.6\t\t0.39\t1095475.2\t12042.02\t1.1\t1\n136\t1071605.4\t11103.73\t1.04\t1114757.2\t10516.55\t0.94\t1.04\n144\t1104147.2\t9714.75\t\t0.88\t1044954.16\t7544.2\t\t0.72\t0.95\n152\t1164280.24\t13386.15\t1.15\t1101213.92\t11568.49\t1.05\t0.95\n160\t1084892.04\t7941.25\t\t0.73\t1152273.76\t9593.38\t\t0.83\t1.06\n168\t983654.76\t11772.85\t1.2\t1111772.28\t9806.83\t\t0.88\t1.13\n176\t1087544.24\t11262.35\t1.04\t1077507.76\t9442.02\t\t0.88\t0.99\n184\t1101682.4\t24701.68\t2.24\t1095223.2\t16707.29\t1.53\t0.99\n192\t983712.08\t13453.59\t1.37\t1051244.2\t15662.05\t1.49\t1.07\n\nwill-it-scale lock1_threads - x86_64\n\nThreads QL      \tQL stddev       stddev% RQL     \tRQL stddev      stddev% Speedup\n-----------------------------------------------------------------------------------------------\n1\t4307484.96\t3959.31\t\t0.09\t4252908.56\t10375.78\t0.24\t0.99\n2\t7701844.32\t4169.88\t\t0.05\t7219233.52\t6437.11\t\t0.09\t0.94\n4\t14781878.72\t22854.85\t0.15\t15260565.12\t37305.71\t0.24\t1.03\n8\t12949698.64\t99270.42\t0.77\t9954660.4\t142805.68\t1.43\t0.77\n16\t12947690.64\t72977.27\t0.56\t10865245.12\t49520.31\t0.46\t0.84\n24\t11142990.64\t33200.39\t0.3\t11444391.68\t37884.46\t0.33\t1.03\n32\t9652335.84\t22369.48\t0.23\t9344086.72\t21639.22\t0.23\t0.97\n40\t9185931.12\t5508.96\t\t0.06\t8881506.32\t5072.33\t\t0.06\t0.97\n48\t9084385.36\t10871.05\t0.12\t8863579.12\t4583.37\t\t0.05\t0.98\n56\t6595540.96\t33100.59\t0.5\t6640389.76\t46619.96\t0.7\t1.01\n64\t5946726.24\t47160.5\t\t0.79\t6572155.84\t91973.73\t1.4\t1.11\n72\t6744894.72\t43166.65\t0.64\t5991363.36\t80637.56\t1.35\t0.89\n80\t6234502.16\t118983.16\t1.91\t5157894.32\t73592.72\t1.43\t0.83\n88\t5053879.6\t199713.75\t3.95\t4479758.08\t36202.27\t0.81\t0.89\n96\t5184302.64\t99199.89\t1.91\t5249210.16\t122348.69\t2.33\t1.01\n104\t4612391.92\t40803.05\t0.88\t4850209.6\t26813.28\t0.55\t1.05\n112\t4809209.68\t24070.68\t0.5\t4869477.84\t27489.04\t0.56\t1.01\n120\t5130746.4\t34265.5\t\t0.67\t4620047.12\t44229.54\t0.96\t0.9\n128\t5376465.28\t95028.05\t1.77\t4781179.6\t43700.93\t0.91\t0.89\n136\t5453742.4\t86718.87\t1.59\t5412457.12\t40339.68\t0.75\t0.99\n144\t5805040.72\t84669.31\t1.46\t5595382.48\t68701.65\t1.23\t0.96\n152\t5842897.36\t31120.33\t0.53\t5787587.12\t43521.68\t0.75\t0.99\n160\t5837665.12\t14179.44\t0.24\t5118808.72\t45193.23\t0.88\t0.88\n168\t5660332.72\t27467.09\t0.49\t5104959.04\t40891.75\t0.8\t0.9\n176\t5180312.24\t28656.39\t0.55\t4718407.6\t58734.13\t1.24\t0.91\n184\t4706824.16\t50469.31\t1.07\t4692962.64\t92266.85\t1.97\t1\n192\t5126054.56\t51082.02\t1\t4680866.8\t58743.51\t1.25\t0.91\n\nwill-it-scale lock2_threads - x86_64\n\nThreads QL      \tQL stddev       stddev% RQL     \tRQL stddev      stddev% Speedup\n-----------------------------------------------------------------------------------------------\n1\t4316091.2\t4933.28\t\t0.11\t4293104\t\t30369.71\t0.71\t0.99\n2\t3500046.4\t19852.62\t0.57\t4507627.76\t23667.66\t0.53\t1.29\n4\t3639098.96\t26370.65\t0.72\t3673166.32\t30822.71\t0.84\t1.01\n8\t3714548.56\t49953.44\t1.34\t4055818.56\t71630.41\t1.77\t1.09\n16\t4188724.64\t105414.49\t2.52\t4316077.12\t68956.15\t1.6\t1.03\n24\t3737908.32\t47391.46\t1.27\t3762254.56\t55345.7\t\t1.47\t1.01\n32\t3820952.8\t45207.66\t1.18\t3710368.96\t52651.92\t1.42\t0.97\n40\t3791280.8\t28630.55\t0.76\t3661933.52\t37671.27\t1.03\t0.97\n48\t3765721.84\t59553.83\t1.58\t3604738.64\t50861.36\t1.41\t0.96\n56\t3175505.76\t64336.17\t2.03\t2771022.48\t66586.99\t2.4\t0.87\n64\t2620294.48\t71651.34\t2.73\t2650171.68\t44810.83\t1.69\t1.01\n72\t2861893.6\t86542.61\t3.02\t2537437.2\t84571.75\t3.33\t0.89\n80\t2976297.2\t83566.43\t2.81\t2645132.8\t85992.34\t3.25\t0.89\n88\t2547724.8\t102014.36\t4\t2336852.16\t80570.25\t3.45\t0.92\n96\t2945310.32\t82673.25\t2.81\t2513316.96\t45741.81\t1.82\t0.85\n104\t3028818.64\t90643.36\t2.99\t2581787.52\t52967.48\t2.05\t0.85\n112\t2546264.16\t102605.82\t4.03\t2118812.64\t62043.19\t2.93\t0.83\n120\t2917334.64\t112220.01\t3.85\t2720418.64\t64035.96\t2.35\t0.93\n128\t2906621.84\t69428.1\t\t2.39\t2795310.32\t56736.87\t2.03\t0.96\n136\t2841833.76\t105541.11\t3.71\t3063404.48\t62288.94\t2.03\t1.08\n144\t3032822.32\t134796.56\t4.44\t3169985.6\t149707.83\t4.72\t1.05\n152\t2557694.96\t62218.15\t2.43\t2469887.6\t68343.78\t2.77\t0.97\n160\t2810214.72\t61468.79\t2.19\t2323768.48\t54226.71\t2.33\t0.83\n168\t2651146.48\t76573.27\t2.89\t2385936.64\t52433.98\t2.2\t0.9\n176\t2720616.32\t89026.19\t3.27\t2941400.08\t59296.64\t2.02\t1.08\n184\t2696086\t\t88541.24\t3.28\t2598225.2\t76365.7\t\t2.94\t0.96\n192\t2908194.48\t87023.91\t2.99\t2377677.68\t53299.82\t2.24\t0.82\n\nlocktorture - arm64\n\nThreads QL\t\tRQL\t\tSpeedup\n-----------------------------------------------\n1\t43320464\t44718174\t1.03\n2\t21056971\t29255448\t1.39\n4\t16040120\t11563981\t0.72\n8\t12786398\t12838909\t1\n16\t13646408\t13436730\t0.98\n24\t13597928\t13669457\t1.01\n32\t16456220\t14600324\t0.89\n40\t16667726\t13883101\t0.83\n48\t14347691\t14608641\t1.02\n56\t15624580\t15180758\t0.97\n64\t18105114\t16009137\t0.88\n72\t16606438\t14772256\t0.89\n80\t16550202\t14124056\t0.85\n88\t16716082\t15930618\t0.95\n96\t16489242\t16817657\t1.02\n104\t17915808\t17165324\t0.96\n112\t17217482\t21343282\t1.24\n120\t20449845\t20576123\t1.01\n128\t18700902\t20286275\t1.08\n136\t17913378\t21142921\t1.18\n144\t18225673\t18971921\t1.04\n152\t18374206\t19229854\t1.05\n160\t23136514\t20129504\t0.87\n168\t21096269\t17167777\t0.81\n176\t21376794\t21594914\t1.01\n184\t23542989\t20638298\t0.88\n192\t22793754\t20655980\t0.91\n200\t20933027\t19628316\t0.94\n208\t23105684\t25572720\t1.11\n216\t24158081\t23173848\t0.96\n224\t23388984\t22485353\t0.96\n232\t21916401\t23899343\t1.09\n240\t22292129\t22831784\t1.02\n248\t25812762\t22636787\t0.88\n256\t24294738\t26127113\t1.08\n\nwill-it-scale open1_threads - arm64\n\nThreads QL      \tQL stddev       stddev% RQL     \tRQL stddev      stddev% Speedup\n-----------------------------------------------------------------------------------------------\n1\t844452.32\t801\t\t0.09\t804936.92\t900.25\t\t0.11\t0.95\n2\t1309419.08\t9495.78\t\t0.73\t1265080.24\t3171.13\t\t0.25\t0.97\n4\t2113074.24\t5363.19\t\t0.25\t2041158.28\t7883.65\t\t0.39\t0.97\n8\t1916650.96\t15749.86\t0.82\t2039850.04\t7562.87\t\t0.37\t1.06\n16\t1835540.72\t12940.45\t0.7\t1937398.56\t11461.15\t0.59\t1.06\n24\t1876760.48\t12581.67\t0.67\t1966659.16\t10012.69\t0.51\t1.05\n32\t1834525.6\t5571.08\t\t0.3\t1929180.4\t6221.96\t\t0.32\t1.05\n40\t1851592.76\t7848.18\t\t0.42\t1937504.44\t5991.55\t\t0.31\t1.05\n48\t1845067\t\t4118.68\t\t0.22\t1773331.56\t6068.23\t\t0.34\t0.96\n56\t1742709.36\t6874.03\t\t0.39\t1716184.92\t6713.16\t\t0.39\t0.98\n64\t1685339.72\t6688.91\t\t0.4\t1676046.16\t5844.06\t\t0.35\t0.99\n72\t1694838.84\t2433.41\t\t0.14\t1821189.6\t2906.89\t\t0.16\t1.07\n80\t1738778.68\t2916.74\t\t0.17\t1729212.6\t3714.41\t\t0.21\t0.99\n88\t1753131.76\t2734.34\t\t0.16\t1713294.32\t4652.82\t\t0.27\t0.98\n96\t1694112.52\t4449.69\t\t0.26\t1714438.36\t5621.66\t\t0.33\t1.01\n104\t1780279.76\t2420.52\t\t0.14\t1767679.12\t3067.66\t\t0.17\t0.99\n112\t1700284.72\t9796.23\t\t0.58\t1796674.6\t4066.06\t\t0.23\t1.06\n120\t1760466.72\t3978.65\t\t0.23\t1704706.08\t4080.04\t\t0.24\t0.97\n128\t1634067.96\t5187.94\t\t0.32\t1764115.48\t3545.02\t\t0.2\t1.08\n136\t1170303.84\t7602.29\t\t0.65\t1227188.04\t8090.84\t\t0.66\t1.05\n144\t953186.16\t7859.02\t\t0.82\t964822.08\t10536.61\t1.09\t1.01\n152\t818893.96\t7238.86\t\t0.88\t853412.44\t5932.25\t\t0.7\t1.04\n160\t707460.48\t3868.26\t\t0.55\t746985.68\t10363.03\t1.39\t1.06\n168\t658380.56\t4938.77\t\t0.75\t672101.12\t5442.95\t\t0.81\t1.02\n176\t614692.04\t3137.74\t\t0.51\t615143.36\t6197.19\t\t1.01\t1\n184\t574808.88\t4741.61\t\t0.82\t592395.08\t8840.92\t\t1.49\t1.03\n192\t548142.92\t6116.31\t\t1.12\t571299.68\t8388.56\t\t1.47\t1.04\n200\t511621.96\t2182.33\t\t0.43\t532144.88\t5467.04\t\t1.03\t1.04\n208\t506583.32\t6834.39\t\t1.35\t521427.08\t10318.65\t1.98\t1.03\n216\t480438.04\t3608.96\t\t0.75\t510697.76\t8086.47\t\t1.58\t1.06\n224\t470644.96\t3451.35\t\t0.73\t467433.92\t5008.59\t\t1.07\t0.99\n232\t466973.72\t6599.97\t\t1.41\t444345.92\t2144.96\t\t0.48\t0.95\n240\t442927.68\t2351.56\t\t0.53\t440503.56\t4289.01\t\t0.97\t0.99\n248\t432991.16\t5829.92\t\t1.35\t445462.6\t5944.03\t\t1.33\t1.03\n256\t409455.44\t1430.5\t\t0.35\t422219.4\t4007.04\t\t0.95\t1.03\n\nwill-it-scale open2_threads - arm64\n\nThreads QL      \tQL stddev       stddev% RQL     \tRQL stddev      stddev% Speedup\n-----------------------------------------------------------------------------------------------\n1\t818645.4\t1097.02\t\t0.13\t774110.24\t1562.45\t\t0.2\t0.95\n2\t1281013.04\t2188.78\t\t0.17\t1238346.24\t2149.97\t\t0.17\t0.97\n4\t2058514.16\t13105.36\t0.64\t1985375\t\t3204.48\t\t0.16\t0.96\n8\t1920414.8\t16154.63\t0.84\t1911667.92\t8882.98\t\t0.46\t1\n16\t1943729.68\t8714.38\t\t0.45\t1978946.72\t7465.65\t\t0.38\t1.02\n24\t1915846.88\t7749.9\t\t0.4\t1914442.72\t9841.71\t\t0.51\t1\n32\t1964695.92\t8854.83\t\t0.45\t1914650.28\t9357.82\t\t0.49\t0.97\n40\t1845071.12\t5103.26\t\t0.28\t1891685.44\t4278.34\t\t0.23\t1.03\n48\t1838897.6\t5123.61\t\t0.28\t1843498.2\t5391.94\t\t0.29\t1\n56\t1823768.32\t3214.14\t\t0.18\t1736477.48\t5675.49\t\t0.33\t0.95\n64\t1627162.36\t3528.1\t\t0.22\t1685727.16\t6102.63\t\t0.36\t1.04\n72\t1725320.16\t4709.83\t\t0.27\t1710174.4\t6707.54\t\t0.39\t0.99\n80\t1692288.44\t9110.89\t\t0.54\t1773676.24\t4327.94\t\t0.24\t1.05\n88\t1725496.64\t4249.71\t\t0.25\t1695173.84\t5097.14\t\t0.3\t0.98\n96\t1766093.08\t2280.09\t\t0.13\t1732782.64\t3606.1\t\t0.21\t0.98\n104\t1647753\t\t2926.83\t\t0.18\t1710876.4\t4416.04\t\t0.26\t1.04\n112\t1763785.52\t3838.26\t\t0.22\t1803813.76\t1859.2\t\t0.1\t1.02\n120\t1684095.16\t2385.31\t\t0.14\t1766903.08\t3258.34\t\t0.18\t1.05\n128\t1733528.56\t2800.62\t\t0.16\t1677446.32\t3201.14\t\t0.19\t0.97\n136\t1179187.84\t6804.86\t\t0.58\t1241839.52\t10698.51\t0.86\t1.05\n144\t969456.36\t6421.85\t\t0.66\t1018441.96\t8732.19\t\t0.86\t1.05\n152\t839295.64\t10422.66\t1.24\t817531.92\t6778.37\t\t0.83\t0.97\n160\t743010.72\t6957.98\t\t0.94\t749291.16\t9388.47\t\t1.25\t1.01\n168\t666049.88\t13159.73\t1.98\t689408.08\t10192.66\t1.48\t1.04\n176\t609185.56\t5685.18\t\t0.93\t653744.24\t10847.35\t1.66\t1.07\n184\t602232.08\t12089.72\t2.01\t597718.6\t13856.45\t2.32\t0.99\n192\t563919.32\t9870.46\t\t1.75\t560080.4\t8388.47\t\t1.5\t0.99\n200\t522396.28\t4155.61\t\t0.8\t539168.64\t10456.64\t1.94\t1.03\n208\t520328.28\t9353.14\t\t1.8\t510011.4\t6061.19\t\t1.19\t0.98\n216\t479797.72\t5824.58\t\t1.21\t486955.32\t4547.05\t\t0.93\t1.01\n224\t467943.8\t4484.86\t\t0.96\t473252.76\t5608.58\t\t1.19\t1.01\n232\t456914.24\t3129.5\t\t0.68\t457463.2\t7474.83\t\t1.63\t1\n240\t450535\t\t5149.78\t\t1.14\t437653.56\t4604.92\t\t1.05\t0.97\n248\t435475.2\t2350.87\t\t0.54\t435589.24\t6176.01\t\t1.42\t1\n256\t416737.88\t2592.76\t\t0.62\t424178.28\t3932.2\t\t0.93\t1.02\n\nwill-it-scale lock1_threads - arm64\n\nThreads QL      \tQL stddev       stddev% RQL     \tRQL stddev      stddev% Speedup\n-----------------------------------------------------------------------------------------------\n1\t2512077.52\t3026.1\t\t0.12\t2085365.92\t1612.44\t\t0.08\t0.83\n2\t4840180.4\t3646.31\t\t0.08\t4326922.24\t3802.17\t\t0.09\t0.89\n4\t9358779.44\t6673.07\t\t0.07\t8467588.56\t5577.05\t\t0.07\t0.9\n8\t9374436.88\t18826.26\t0.2\t8635110.16\t4217.66\t\t0.05\t0.92\n16\t9527184.08\t14111.94\t0.15\t8561174.16\t3258.6\t\t0.04\t0.9\n24\t8873099.76\t17242.32\t0.19\t9286778.72\t4124.51\t\t0.04\t1.05\n32\t8457640.4\t10790.92\t0.13\t8700401.52\t5110\t\t0.06\t1.03\n40\t8478771.76\t13250.8\t\t0.16\t8746198.16\t7606.42\t\t0.09\t1.03\n48\t8329097.76\t7958.92\t\t0.1\t8774265.36\t6082.08\t\t0.07\t1.05\n56\t8330143.04\t11586.93\t0.14\t8472426.48\t7402.13\t\t0.09\t1.02\n64\t8334684.08\t10478.03\t0.13\t7979193.52\t8436.63\t\t0.11\t0.96\n72\t7941815.52\t16031.38\t0.2\t8016885.52\t12640.56\t0.16\t1.01\n80\t8042221.68\t10219.93\t0.13\t8072222.88\t12479.54\t0.15\t1\n88\t8190336.8\t10751.38\t0.13\t8432977.6\t11865.67\t0.14\t1.03\n96\t8235010.08\t7267.8\t\t0.09\t8022101.28\t11910.63\t0.15\t0.97\n104\t8154434.08\t7770.8\t\t0.1\t7987812\t\t7647.42\t\t0.1\t0.98\n112\t7738464.56\t11067.72\t0.14\t7968483.92\t20632.93\t0.26\t1.03\n120\t8228919.36\t10395.79\t0.13\t8304329.28\t11913.76\t0.14\t1.01\n128\t7798646.64\t8877.8\t\t0.11\t8197938.4\t7527.81\t\t0.09\t1.05\n136\t5567293.68\t66259.82\t1.19\t5642017.12\t126584.59\t2.24\t1.01\n144\t4425655.52\t55729.96\t1.26\t4519874.64\t82996.01\t1.84\t1.02\n152\t3871300.8\t77793.78\t2.01\t3850025.04\t80167.3\t\t2.08\t0.99\n160\t3558041.68\t55108.3\t\t1.55\t3495924.96\t83626.42\t2.39\t0.98\n168\t3302042.72\t45011.89\t1.36\t3298002.8\t59393.64\t1.8\t1\n176\t3066165.2\t34896.54\t1.14\t3063027.44\t58219.26\t1.9\t1\n184\t2817899.6\t43585.27\t1.55\t2859393.84\t45258.03\t1.58\t1.01\n192\t2690403.76\t42236.77\t1.57\t2630652.24\t35953.13\t1.37\t0.98\n200\t2563141.44\t28145.43\t1.1\t2539964.32\t38556.52\t1.52\t0.99\n208\t2502968.8\t27687.81\t1.11\t2477757.28\t28240.81\t1.14\t0.99\n216\t2474917.76\t24128.71\t0.97\t2483161.44\t32198.37\t1.3\t1\n224\t2386874.72\t32954.66\t1.38\t2398068.48\t37667.29\t1.57\t1\n232\t2379248.24\t27413.4\t\t1.15\t2327601.68\t24565.28\t1.06\t0.98\n240\t2302146.64\t19914.19\t0.87\t2236074.64\t20968.17\t0.94\t0.97\n248\t2241798.32\t21542.52\t0.96\t2173312.24\t26498.36\t1.22\t0.97\n256\t2198765.12\t20832.66\t0.95\t2136159.52\t25027.96\t1.17\t0.97\n\nwill-it-scale lock2_threads - arm64\n\nThreads QL      \tQL stddev       stddev% RQL     \tRQL stddev      stddev% Speedup\n-----------------------------------------------------------------------------------------------\n1\t2499414.32\t1932.27\t\t0.08\t2075704.8\t24589.71\t1.18\t0.83\n2\t3887820\t\t34198.36\t0.88\t4057432.64\t11896.04\t0.29\t1.04\n4\t3445307.6\t7958.3\t\t0.23\t3869960.4\t3788.5\t\t0.1\t1.12\n8\t4310597.2\t14405.9\t\t0.33\t3931319.76\t5845.33\t\t0.15\t0.91\n16\t3995159.84\t22621.85\t0.57\t3953339.68\t15668.9\t\t0.4\t0.99\n24\t4048456.88\t22956.51\t0.57\t3887812.64\t30584.77\t0.79\t0.96\n32\t3974808.64\t20465.87\t0.51\t3718778.08\t27407.24\t0.74\t0.94\n40\t3941154.88\t15136.68\t0.38\t3551464.24\t33378.67\t0.94\t0.9\n48\t3725436.32\t17090.67\t0.46\t3714356.08\t19035.26\t0.51\t1\n56\t3558449.44\t10123.46\t0.28\t3449656.08\t36476.87\t1.06\t0.97\n64\t3514616.08\t16470.99\t0.47\t3493197.04\t25639.82\t0.73\t0.99\n72\t3461700.88\t16780.97\t0.48\t3376565.04\t16930.19\t0.5\t0.98\n80\t3797008.64\t17599.05\t0.46\t3505856.16\t34320.34\t0.98\t0.92\n88\t3737459.44\t10774.93\t0.29\t3631757.68\t24231.29\t0.67\t0.97\n96\t3612816.16\t21865.86\t0.61\t3545354.56\t16391.15\t0.46\t0.98\n104\t3765167.36\t17763.8\t\t0.47\t3466467.12\t22235.45\t0.64\t0.92\n112\t3713386\t\t15455.21\t0.42\t3402210\t\t18349.66\t0.54\t0.92\n120\t3699986.08\t15153.08\t0.41\t3580303.92\t19823.01\t0.55\t0.97\n128\t3648694.56\t11891.62\t0.33\t3426445.28\t22993.32\t0.67\t0.94\n136\t800046.88\t6039.73\t\t0.75\t784412.16\t9062.03\t\t1.16\t0.98\n144\t769483.36\t5231.74\t\t0.68\t714132.8\t8953.57\t\t1.25\t0.93\n152\t821081.52\t4249.12\t\t0.52\t743694.64\t8155.18\t\t1.1\t0.91\n160\t789040.16\t9187.4\t\t1.16\t834865.44\t6159.29\t\t0.74\t1.06\n168\t867742.4\t8967.66\t\t1.03\t734905.36\t15582.75\t2.12\t0.85\n176\t838650.32\t7949.72\t\t0.95\t846939.68\t8959.8\t\t1.06\t1.01\n184\t854984.48\t19475.51\t2.28\t794549.92\t11924.54\t1.5\t0.93\n192\t846262.32\t13795.86\t1.63\t899915.12\t8639.82\t\t0.96\t1.06\n200\t942602.16\t12665.42\t1.34\t900385.76\t8592.23\t\t0.95\t0.96\n208\t954183.68\t12853.22\t1.35\t1166186.96\t13045.03\t1.12\t1.22\n216\t929319.76\t10157.79\t1.09\t926773.76\t10577.01\t1.14\t1\n224\t967896.56\t9819.6\t\t1.01\t951144.32\t12343.83\t1.3\t0.98\n232\t990621.12\t7771.97\t\t0.78\t916361.2\t17878.44\t1.95\t0.93\n240\t995285.04\t20104.22\t2.02\t972119.6\t12856.42\t1.32\t0.98\n248\t1029436\t\t20404.97\t1.98\t965301.28\t11102.95\t1.15\t0.94\n256\t1038724.8\t19201.03\t1.85\t1029942.08\t12563.07\t1.22\t0.99\n\nWritten By\n----------\nAlexei Starovoitov \u003cast@kernel.org\u003e\nKumar Kartikeya Dwivedi \u003cmemxor@gmail.com\u003e\n\n  [0]: https://www.cs.rochester.edu/research/synchronization/pseudocode/timeout.html\n  [1]: https://dl.acm.org/doi/10.1145/571825.571830\n  [2]: https://github.com/kkdwivedi/rqspinlock/blob/main/rqspinlock.pdf\n  [3]: https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/plain/qspinlock.tla\n  [4]: https://arxiv.org/pdf/1810.05600\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\nLink: https://patch.msgid.link/20250316040541.108729-1-memxor@gmail.com\nSigned-off-by: Alexei Starovoitov \u003cast@kernel.org\u003e\n",
  "tree_diff": []
}
