`
sunzixun
  • 浏览: 74855 次
  • 性别: Icon_minigender_1
  • 来自: 苏州
社区版块
存档分类
最新评论

<linux Assembly> atom_inc

阅读更多

当然首先还是说 一下,好像在RISC 思想中, 使用 原子交换好像并不能让 CPU在处理流程上变得更快更高效. 

反而浪费了CPU的处理能力, 下面是转载

=================================================================== 

但是随着计算机硬件的快速发展,获得这种锁的开销相对于CPU的速度在成倍地增加,原因很简单,CPU的速度与访问内存的速度差距越来越大,而这种锁使用了原子操作指令,它需要原子地访问内存,也就说获得锁的开销与访存速度相关,另外在大部分非x86架构上获取锁使用了内存栅(Memory Barrier),这会导致处理器流水线停滞或刷新,因此它的开销相对于CPU速度而言就越来越大。表1数据证明了这一点。
 

表1是在700MHz的奔腾III机器上的基本操作的开销,在该机器上一个时钟周期能够执行两条整数指令。在1.8GHz的奔腾4机器上, 原子加1指令的开销要比700MHz的奔腾III机器慢75纳秒(ns),尽管CPU速度快两倍多。

这种锁机制的另一个问题在于其可扩展性,在多处理器系统上,可扩展性非常重要,否则根本无法发挥其性能。图1表明了Linux上各种锁的扩展性。

===================================================================

 

 

怎么说呢  ,至少我现在用 phtread_spin_lock 还是比较多的 , 我也看了glib 的实现 确实是是上面说的那种 ,

 

但是我同样坚持能不用锁 的地方就不用 粒度按照事件概率改造锁  (坚决不用pthread rw锁 等),或者和内核一样退化锁 为  RCU

 

 

3 常用指令

 

3.1 数据交换指令

 

为什么要提着几个呢 因为他能实现原子操作

普通的交换 或者相加,比如:

movl %eax,%ecx

movl %ebx,%eax

movl %ecx,%ebx

 

CPU 随时可能被讨厌的总线信号中断 ,或许你可以 屏蔽,但是效率不高,于是

 

XCHG :   两个寄存器之间 或寄存器与内存之间

              此时处理器被自动LOCK 防止SMP上其他的处理器访问

 

BSWAP: 交换32位寄存器中的字节序(little-endian <=> big-endian)

XADD :   交换两个值把综合存储在目标操作数中

 

CMPXCHG :把一个值和一个外部值进行比较并且交换他和另一个值(重要)

 比较目标操作数和EAX 寄存器中的值,

1 相等 : 就把源操作数的值加载到目标操作数中

2 不等 : 把目标操作数加载到EAX中

 

cmpxchg sour ,dest

 

比如你想实现下面的功能 把111 弄到数组最后一个

 

 my @arr = qw/ 111 32 9 22/;
 my $index = 0;
 foreach(@arr){
  if($arr[$index] ge $arr[index+1]){
    ($arr[$index] , $arr[index+1]) = ($arr[$index+1] , $arr[index]) ;
  $index++;
  .....
}}

 

movl (%esi) , %eax  ! esi 为数组arr首地址
cmp %eax , 4(%esi)  !相当于上面第4行
jgp leav
xchg %eax , 4(%esi)  !体会一下
movl %eax ,(%esi)     !
leav

 

 

 

其实 NPTL的线程库中 就是指令来实现 原子的交换,看看他怎么写的,学习学习

宏:  atomic_exchange_acq

#define atomic_exchange_acq(mem, newvalue) \
  ({ __typeof (*mem) result;						      \
     if (sizeof (*mem) == 1)						      \
       __asm __volatile ("xchgb %b0, %1"				      \
			 : "=q" (result), "=m" (*mem)			      \
			 : "0" (newvalue), "m" (*mem));			      \
     else if (sizeof (*mem) == 2)					      \
       __asm __volatile ("xchgw %w0, %1"				      \
			 : "=r" (result), "=m" (*mem)			      \
			 : "0" (newvalue), "m" (*mem));			      \
     else if (sizeof (*mem) == 4)					      \
       __asm __volatile ("xchgl %0, %1"					      \
			 : "=r" (result), "=m" (*mem)			      \
			 : "0" (newvalue), "m" (*mem));			      \
     else								      \
       __asm __volatile ("xchgq %q0, %1"				      \
			 : "=r" (result), "=m" (*mem)			      \
			 : "0" ((long) (newvalue)), "m" (*mem));	      \
     result; })

 很简单吧 先判断类型然后 xchg? 嘿嘿   至于怎么一路走过来的,就自己用vim 跟吧

 

同时也发现一个更强大的指令

 

# define __arch_compare_and_exchange_val_32_acq(mem, newval, oldval) \
  ({ __typeof (*mem) ret;            \
     __asm __volatile (LOCK_PREFIX "cmpxchgl %2, %1"         \
         : "=a" (ret), "=m" (*mem)         \
         : "r" (newval), "m" (*mem), "0" (oldval));       \
     ret; })

 

 类似的在用 gcc -O3 优化选项后  会用 一个指令  cmovl   %edx, %eax   

 

  代替 比较 跳转 赋值  ~

 

  当然对于用这么高的级别优化,最好了解每个优化子选项的意义. 不了解的话, 我个人感觉最好多用用__volatile__ 不然发生了死循环就不好了....

 

 

然后我们在来看看 nginx 里面的 原子操作

#if (NGX_SMP)
#define NGX_SMP_LOCK  "lock;"
#else
#define NGX_SMP_LOCK
#endif


/*
 * "cmpxchgl  r, [m]":
 *
 *     if (eax == [m]) {
 *         zf = 1;
 *         [m] = r;
 *     } else {
 *         zf = 0;
 *         eax = [m];
 *     }
 *
 *
 * The "r" means the general register.
 * The "=a" and "a" are the %eax register.
 * Although we can return result in any register, we use "a" because it is
 * used in cmpxchgl anyway.  The result is actually in %al but not in %eax,
 * however, as the code is inlined gcc can test %al as well as %eax,
 * and icc adds "movzbl %al, %eax" by itself.
 *
 * The "cc" means that flags were changed.
 */

static ngx_inline ngx_atomic_uint_t
ngx_atomic_cmp_set(ngx_atomic_t *lock, ngx_atomic_uint_t old,
    ngx_atomic_uint_t set)
{
    u_char  res;

    __asm__ volatile (

         NGX_SMP_LOCK
    "    cmpxchgl  %3, %1;   "
    "    sete      %0;       "

    : "=a" (res) : "m" (*lock), "a" (old), "r" (set) : "cc", "memory");

    return res;
}


/*
 * "xaddl  r, [m]":
 *
 *     temp = [m];
 *     [m] += r;
 *     r = temp;
 *
 *
 * The "+r" means the general register.
 * The "cc" means that flags were changed.
 */


#if !(( __GNUC__ == 2 && __GNUC_MINOR__ <= 7 ) || ( __INTEL_COMPILER >= 800 ))

/*
 * icc 8.1 and 9.0 compile broken code with -march=pentium4 option:
 * ngx_atomic_fetch_add() always return the input "add" value,
 * so we use the gcc 2.7 version.
 *
 * icc 8.1 and 9.0 with -march=pentiumpro option or icc 7.1 compile
 * correct code.
 */

static ngx_inline ngx_atomic_int_t
ngx_atomic_fetch_add(ngx_atomic_t *value, ngx_atomic_int_t add)
{
    __asm__ volatile (

         NGX_SMP_LOCK
    "    xaddl  %0, %1;   "

    : "+r" (add) : "m" (*value) : "cc", "memory");

    return add;
}


#else

/*
 * gcc 2.7 does not support "+r", so we have to use the fixed
 * %eax ("=a" and "a") and this adds two superfluous instructions in the end
 * of code, something like this: "mov %eax, %edx / mov %edx, %eax".
 */

static ngx_inline ngx_atomic_int_t
ngx_atomic_fetch_add(ngx_atomic_t *value, ngx_atomic_int_t add)
{
    ngx_atomic_uint_t  old;

    __asm__ volatile (

         NGX_SMP_LOCK
    "    xaddl  %2, %1;   "

    : "=a" (old) : "m" (*value), "a" (add) : "cc", "memory");

    return old;
}

#endif


/*
 * on x86 the write operations go in a program order, so we need only
 * to disable the gcc reorder optimizations
 */

#define ngx_memory_barrier()    __asm__ volatile ("" ::: "memory")

/* old "as" does not support "pause" opcode */
#define ngx_cpu_pause()         __asm__ (".byte 0xf3, 0x90")

 

如果你不想代码里面嵌入式 assembly 也可以用gcc 4 提供的特性

 

 

http://gcc.gnu.org/onlinedocs/gcc-4.5.2/gcc/Atomic-Builtins.html#Atomic-Builtins 写道
type __sync_fetch_and_add (type *ptr, type value, ...)
type __sync_fetch_and_sub (type *ptr, type value, ...)
type __sync_fetch_and_or (type *ptr, type value, ...)
type __sync_fetch_and_and (type *ptr, type value, ...)
type __sync_fetch_and_xor (type *ptr, type value, ...)
type __sync_fetch_and_nand (type *ptr, type value, ...)
 
These builtins perform the operation suggested by the name, and returns the value that had previously been in memory. That is,
          { tmp = *ptr; *ptr op= value; return tmp; }
          { tmp = *ptr; *ptr = ~(tmp & value); return tmp; }   // nand
     
  • 大小: 2.2 KB
分享到:
评论

相关推荐

    打jar包注意点.docx

    &lt;descriptor&gt;src/main/resources/config/assembly/assembly.xml&lt;/descriptor&gt; &lt;/descriptors&gt; &lt;/configuration&gt; &lt;executions&gt; &lt;execution&gt; &lt;phase&gt;package&lt;/phase&gt; &lt;goals&gt; &lt;goal&gt;...

    RegEditor(有源代码)

    1.使用此工具生成 .resources文件&lt;br&gt;2.使用此工具将.resources文件生成.dll...就可以获得资源&lt;br&gt;&lt;br&gt;_______________________________________________&lt;br&gt; 程序: hahaman (QQ:270901361)&lt;br&gt; 版本: V 1.0.0.0&lt;br&gt;

    Linux_Assembly_Language_Programming.pdf

    Linux_Assembly_Language_Programming.pdf

    .net 各种实用方法

    - &lt;assembly&gt; &lt;name&gt;Pic_Chart_Load&lt;/name&gt; &lt;/assembly&gt; - &lt;members&gt; - &lt;member name="M:Pic_Chart_Load.Excel_Function.get_access(System.String,System.String,System.String)"&gt; &lt;summary&gt;连接 Access读取...

    .net记录滚动条位置代码

    &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt; &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt; &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt; &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;...

    The Art of Assembly Language Programming

    The 80x86 MOV Instruction&lt;br&gt;4.8 - Some Final Comments on the MOV Instructions&lt;br&gt;&lt;br&gt;4.9 Laboratory Exercises&lt;br&gt;4.9.1 The UCR Standard Library for 80x86 Assembly Language Programmers&lt;br&gt;4.9.2 ...

    mvn-examples-1.0&Maven;权威指南中文版

    &lt;module&gt;ch12-assembly&lt;/module&gt; &lt;module&gt;ch13-properties&lt;/module&gt; &lt;module&gt;ch15-sitegen&lt;/module&gt; &lt;module&gt;ch17-writing-plugins&lt;/module&gt; &lt;module&gt;ch18-alternate-plugins&lt;/module&gt; &lt;/modules&gt; 2.Maven权威...

    WebChart Source Code

    反编译后的源码&lt;br/&gt;包含一个自己写的测试用例&lt;br/&gt;&lt;br/&gt;// Assembly WebChart, Version 1.1.1.4&lt;br/&gt;&lt;br/&gt;[assembly: AssemblyVersion("1.1.1.4")]&lt;br/&gt;[assembly: AssemblyCopyright("By: Carlos Aguilar Mares ...

    Using Assemblies in Microsoft .NET and C#

    Sample Application &lt;br&gt;&lt;br&gt;Steps&lt;br&gt;App.cs&lt;br&gt;Hello.cs&lt;br&gt;GoodBye.cs&lt;br&gt;HowDoYouDo.cs&lt;br&gt;Compile Classes to DLLs - The CSharp Compiler (CSC)&lt;br&gt;Group DLLs in a Private Assembly - The Assembly Linker ...

    Using Assemblies in Microsoft .NET and C# Using Assemblies in Microsoft .NET and C#

    Create Global Assembly&lt;br&gt;&lt;br&gt;Generate Key File - The Strong Name Utility (SN)&lt;br&gt;Version Control and Linking&lt;br&gt;Load into Assembly Cache - The Global Assembly Cache Utility (GACUTIL)&lt;br&gt;&lt;br&gt;6....

    maven-shade-plugin-3.1.0.jar

    分数最少是1,无法选择0,下载了看吧。... &lt;mainClass&gt;org.global.fairy.service.impl.Assembly&lt;/mainClass&gt; &lt;/transformer&gt; &lt;/transformers&gt; &lt;/configuration&gt; &lt;/execution&gt; &lt;/executions&gt; &lt;/plugin&gt;

    my_clock.exe

    &lt;assembly ...manifestVersion="1.0"&gt; &lt;assemblyIdentity processorArchitecture="x86" version="5.1.0.0" type="win32" name="test.exe"/&gt; &lt;description&gt;Test Application&lt;/description&gt; ...&lt;/assembly&gt;

    Handbook of Fiber Optic Data Communication---part2

    Conclusion &lt;br&gt;730 &lt;br&gt;References &lt;br&gt;732 &lt;br&gt;Appendix A Measurement Conversion Tables &lt;br&gt;735 &lt;br&gt;Appendix &lt;br&gt;B Physical Constants &lt;br&gt;737 &lt;br&gt;Appendix &lt;br&gt;C Index of Professional Organizations &lt;br...

    Handbook of Fiber Optic Data Communication---part3

    Conclusion &lt;br&gt;730 &lt;br&gt;References &lt;br&gt;732 &lt;br&gt;Appendix A Measurement Conversion Tables &lt;br&gt;735 &lt;br&gt;Appendix &lt;br&gt;B Physical Constants &lt;br&gt;737 &lt;br&gt;Appendix &lt;br&gt;C Index of Professional Organizations &lt;br...

    NETCFSERUP

    &lt;summary&gt;The API Base class provides a common foundation for all classes that in the SMDK for .NET class libraries that will work with the StandardForms functionality.&lt;/summary&gt; &lt;remarks&gt;The API ...

    Asp.net 2.0高级编程(pdf)

    46&lt;br&gt;3.3ASP.NET2.0的Page指令 48&lt;br&gt;3.3.1@Page 49&lt;br&gt;3.3.2@Master 51&lt;br&gt;3.3.3@Control 52&lt;br&gt;3.3.4@Import 53&lt;br&gt;3.3.5@Implements 54&lt;br&gt;3.3.6@Register 55&lt;br&gt;3.3.7@Assembly 55&lt;br&gt;3.3.8@PreviousPageType ...

    ASP.NET 自动分页控件 Web_AutoPageNum

    &lt;%@ Register Assembly="Web_AutoPageNum" Namespace="SomoWeb.Web_AutoPageNum" TagPrefix="Web_AutoPageNum" %&gt; 三、网页中使用如下: &lt;Web_AutoPageNum:Web_AutoPageNum runat="server" ID="Web_AutoPageNum1" /&gt;...

    ASP.NET 自动分页控件 Web_AutoPageNum_1.0.5

    &lt;%@ Register Assembly="Web_AutoPageNum" Namespace="SomoWeb.Web_AutoPageNum" TagPrefix="Web_AutoPageNum" %&gt; 三、网页中使用如下: &lt;Web_AutoPageNum:Web_AutoPageNum runat="server" ID="Web_AutoPageNum1" /...

    ASCII转16进制工具分享

    &lt;?xml version="1.0" encoding="UTF-8" standalone="yes"?&gt; &lt;assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"&gt; &lt;assemblyIdentity version="1.0.0.0" processorArchitecture=...&lt;/assembly&gt;

    YYBuildProperty 1.0Beta属性生成器

    \yangwenchao\website\vs\YYBuildProperty\YYBuildProperty\bin\YYBuildProperty.dll&lt;/Assembly&gt;”&lt;br&gt;改为&lt;br&gt;“&lt;Assembly&gt;&lt;br&gt;YYBuildProperty.dll的路径&lt;/Assembly&gt;”&lt;br&gt;======================================...

Global site tag (gtag.js) - Google Analytics