ࡱ> `!<5YP5? x9.xڍV{}fS^3 Y7 )XQy]"inޏxU(4%,$ykHJ\g}̭==}s$G(@  8)KspE 7a&n2>B6#p`7""a[Vu ps=ߣ'8Js|'>"npKj.EK+EN{A"]|)O5LNʹ2U'guMw1o|PIL :KNeCZ+S"mժ6ڒj*f{z؎RLx+VG8u61\̦U]ުxxfn;yIp_0dğZMW*Dqjʙ\>e{[0X'>b>wdGhf;@ :d?ѹ.\ !cOa :N@6q~gMPCWh漵wY po[+HlkpUiܪ`v9_i܏9i) h9NNБ(}TTZR!C;#a=hDYA+Q2p &`#~HDo܋ȳȶ.luã wY_C1OGaND;zeN|._`J9o̒)`)>&fˏX lLy(axP{yTm 0Yw[,Y[^w4 (@IG4QPsVL5Jjlzg( WU8NÝ#(·atR>ⴇ*T)^VO}_Bo݂zܮ 5%+C(IQ#& ~ŷ8[EY:hhi=: C7M)=@3z 5*Q&!ue.aPk5#͵dVO[CT(j-pA+}7@0VS3_BnK KG1Q-Lz'h=geiEC[t!g"S,~׺>1ZۤP {QQ 5, U#QD8Ͱ&: KWu#Q=4OUbih9Ŷx[%[E֖ [)l cZCg^ӥZNӴx`!K?orZozZ؎H 'q˶Imt[/Klld p=<)R6Jcl"]zb2ْ}.N2TW{QhV;_! kVVS,/|*ŀ(ϱ 4w=|("yjgE~lzݮ 5n& P%R19+r7=P{r :ԔPgrv{6'Y*nk}Xk̥2EuiT@ix*<̷3!˹\6tJgĎrk%bx /0rk- 8mB)|{8~ñ:ԛ<y y% |< 5_4 ׮vO5 EH9+.Tȯr$MK_'-4Nis)䢼(lY"&d,nVZ.RGJe"ⴄhTWo7}`1J2]b5Zp,ߠg =kO-~cFgxd_E'w:_T+^|XeO=m䒏_ HNS䂟'69|q_7/{5s9gl*D!"2HES|uO?CQaWj; _]Ϣ"aC|0 G%vMתPfhmZka[l~mK4-lB3*Mg5Vuk|۠qmƴM6k$ۢmUmQ[Y_};;^+[/={>nخt5L=N)0(C` ^ ~ZB3hD?TJ]JAqrBVAJ昌&;6uu}]^N}&;/?mٮe^zQpZ6 ٤=dF&Yd!'eOjO]dJFj3d֕ZKhuUVtвVH+-%ʹ4R@ JiI)(]AӊGHN&ٴd:^KZLɴ$vH;I|*f1uDEHDAsBQ'lS>y'G䭜sFUy-7^;(iUx1ISiTͤ14|W J+pF[hz%vlOsW_x-[Gd,z-MlW&)I ӫPjS}L6Ik:kTq&ޚi9mF76VqZ&h)emZѦja4Wn%rK`!̇9̀)Mߍg\/jq]'[lZZ,bPbPӢjP٢ Օ(Evw}c]ྲmne12t%ϖ>Y"h_k-Ƣk`/ R3{>'zM^z.Y;'nov;퀮 <͵ܛjtώ:vCNioC{oW;?EXmomŸHSey-iQűo,TDzX̕Mm,ձ$ZVb acY4h,5f b'Bs =$s5=-hEwKkp״#:]љ]UnKzEwApx-}҇mz/-NRoofAjr(v.ag8v"oG$v0Hn{Kvm[6Yl}9mE˖ymA ̠M A68(i}3(k]!d$jjV3iZV.c%zV$o'hhقF1hb,hf 'hi b >6з;]Kpwh}'ζ6:X Ka!́ ?8 a B'h%0U>G- v-*Ok3N{V1Ha{.ت.[aυoի_zm' })^h.BݠF uҨ=ZckPcqB55n PBa(rlvdmJK_)5ZKN 4B(PLPTDAoӗgyGy?x!7r5x$r&#ǃr8&{+#(*8+2+8)?ddpT{pPYO{jGhvI`d vH*H q!Bwx~܄+~i_~?({!Te?.IYOge?'/HIe*uoJ/[{?6ϥ)K]!]Q;\V:-kaղ_VaY!GeerJY.7HB e5o }RI%ԺF2Y)9t+`=/xrzJ*/.K-*u&.4#5ӧB_Jk}7G턂wJ1pvq7Wk|@MeCQ͍#ZhAaԻ8N:Ş5P:Ȇ:p3Q~0X|zZ`2L9*Xw[{x9ܦxx/tx>a;dz))  /OojAEl~((k C :x65-tݺ%"v@0k#&:B] UQЊ|^ ) |~jyhC[duגhKx>nYX4,Fè F6o`ճ0T΁Vqmc Q{Zkc+ݰvz}c=aD| <\S? _߳⇎N|ow<{ꢻDq\L確D%WR.@u\s3W^V}:5&5wﬕ{c{e?@Wz= uf5=0wߦ0vl>+XrwVmplhXwv?WV}#cNcvewخv{#۞s/mnh_oWY,/~%s-m Z6?r׏~#fE+ZY*V ݭ~-olM|=kkU`|[G࿅n>dLκ4}"HWϷ-,;DcD\+" 8CzPF@w";Lkgyz,vyf';nv.%g5ƻi6ߵqIfSgb׌[ ?Cp pb_+ps CxB^{ F 2b_p5$#S9t|D}>/Q_^hM7`fD{kݍ,#ql 3M4/"W?l=Q~uvȽvuF"y"'w+yˈDx&b bI$KAĦ!bN$b|!jJ2+VV7Q\jW"Zc_šD7%!sBi!L"?.bqbXj(VGDPGR|V:Ԅ:P𒏵s_j/Ix>"6 > _@BHmO ! 䄯^gЦ}i) h|[o܋75T8M;wߕ`zVi ?}Ǻ~bMS{?0]HXj9J.6Z27 vL1r=BJ $snvɆȌ/m!17Fdd:dz]Vm!m915'-''t"DE]{jjڇSُ8j{`©##iLezSFF?$dj)VT] a}pr$Lkp?_{z̘9"sTqH*:-Ut*̐ :䥊gkaG[K[/@5]ͺ/~z֑꠽6Vږ5NT]Tա2:.g1&|qsZF1o6ljZ&i LΜmZfi1>eW=\^.Ҍ])l&VVW~֒בד7= Wi8tka!ig MWū mDTkqu:XKL!g35=f&kdouydF $P~loTG}"=tuYoHG* Ҋقs\Op~z@jssRs^^_@aJcb G:{#+Fͻ;;^#Z#Zڨp01q0uȝ'5$qsE`Rթp0i4z z oqBϸ!T̕̕ӒCcsY.NPnWnMʉO!dl~!K@VWi^iZ`ڑQ:Y N}2R2@퍓鋓ouEҁ!>@L?{(}p3.P.P.ttcގ*-q6p6mneԳRVXVXH%[(mٸ鸖)R6JpɁɂɀIkq.=$uYZim%\.6ڀK֖ZCnk5UVsZANk99e刖}Z SS\6JbE0O,-,pCa~s>~>ɵOZ>QM|H{j;Y:տ/e.SxH&2ߓ9,e)+uYU+,WQ ?/99 r.3r!i)yOKBG\48.ApL"AT.q!!$QI dC!(CI( *mk@m)6!8![pRzB Nˠ #32.8+s258/3 2;(K$"ZC3hҖ~#l ]z?u0Z0j]`_k`_a~~2= ET%zE \=\Q ibHi!dfq5|"Mj*BS/}FMj"Cܚ5>m2P8.G>Ѭ? 0G0ǰ0Ǵ0XǶe 8ǵ A~ v -O`o#hBbB\H_B*H!{@> P&gVjqN@ bZ [ u Z "ۀ " B6.61p6%0 >.-Z3ZtSy;ǜh]=|_3e=\@DN8g7> ~ECw`>r7`Vraw䌷䇷WXq$0r0CgtrDotG_䉘Aa10N&a>ir[i>56:Jm7>*TpjW;Ie.kKAw[=mlk{FEJ[mU7g,dU,UhVbYyg,$Vڒ[)Km%-VrY1+hE0JPkA=hݦ<ߒmuߞ#-1BVY]Tkn3i1 -^2^nm 6VXhm0j-?/m)8K?2\ߛx cKkg[h-nEv,kum%t3-n)TK&[ev,gXڍn}[Q7Ԋ!V rnUrk5܏V5rݬb-\gk:Y{:vŵr-kn]S0F6ճqjdWæj6U-pm+o\9[ZW6{Wv}cBvS.龶 .]ve\#՞e.}tL} |jSZTb'>%IKRMȌv@O3ѝoџR 4 U#Y5jVLJܛo3ؚCk޶zUk^NzƺqG6\7(]o,m.:f3mN:ɖD[m :ܶۮl'9wWx h,Ut0+=]vvpnʪ뻪ZUWʮV`e]--wu]=vs 4/*5쮉fvM5kΎdG[i2Zj{vr4A#uWzgQ?X}?6Lvu^Mпl =iT06O[lbwVvk޶jvk~E_NsQv\g)vNs]d/XQ(pj(J5Rh{}(UWh-u8XKj-tujfњujl0JR JC2zksimjOQelZi Z3uxmМ0Zĕ6=`a_`/C~<]1''e!!Wd c;qk#CHW#tMx_a;2RCrHe%ZAkE$VԐR뱦HfDZsi\{i'Sa 3lg; l~.YIf{ˬ: M|=㉈z݀k+;'N줨eC7Vvf.l0j'@0P Cԩ#ujQF(T= UiT8  UȵDZkV* uD* ꊂuZWM! pPq=P^(co ca^7J?e0M`Mi Dh 6~< @^{Gf3x~&f~vs4Σ6~V:ڭAy*<*r.M1/h{1Y?4J?;/fXo't ?GQ:+QDwS5\0F<Ӝ--kkچ۲mM80 a)ď=8ݰ`#oia,yܟɌ'3f>j~o6gg~oWg;Ֆ[1Дمgpꆯ)ta&]#bF}ٽ`Q0LM8Mf|3¼2ʞoVv. aeva/?4{t1N;Ϊy'Gy'Gx'-laoecfĭeςSˊ,S^Pwš/"TZ v\ yl@764z;DwK$Wq@K =%O_WJ<% ǒ]]Ƶ}V>^Wi]5RyO)Oܿb.ySnS?le_Yè 5|%⤒6& so>(9( / 00DArialngsRomantt . 0DGaramondRomantt . 0 DTimes New Romantt . 00DWingdingsRomantt . 0@0.  @n?" dd@  @@`` |i   HH     HH,,,,ABCLMOPQRUVWXZ  32 6w x y?$2$<5YP5?2$g`v~$s q9?|/ 0AA@8!2"3:~ ʚ;3/ʚ;g4EdEd) 0ppp@ <4dddd@w 0t- g4RdRda 0Zp@ pp<4!d!d@w 0t-0___PPT10 L___PPT9.& ?N-"24 September 2004 CSAW '04O  =.# +Low Latency Messaging Over Gigabit Ethernet+ %Keith Fenech CSAW 24 September 2004 *&P $ Why Cluster Computing? Ideal for computationally intensive applications. Multi-threaded processes allow jobs to be processed in parallel over multiple CPUs. High Bandwidth allows interconnected nodes to achieve supercomputer performance. Networks of Workstations (NOWs)1 Easily available (commodity platforms) Relatively cheap Nodes may be used independently or as a cluster Better utilization of idle computing resources. HZZZ  High Performance Networking Commodity networks dominated by IP over Ethernet Performance is directly affected by: Hardware  bus & network bandwidths Latency  delay incurred in communicating a message from source to destination Overhead  length of time that a processor is engaged in tx/rx of each message Fine-grain threads communicate frequently using small messages. HP communication architecture features: transparency to the application layer allow high-throughput for bandwidth intensive applications low latencies for frequently communicating threads Minimise protocol processing overhead on host machine Gigabit performance not achievable at application layers. Why?WPP@PP(PP@PWHH@(@&   Conventional NICs & Protocols.     Receiver node Ethernet controller receives frame Check CRC for frame Filter MAC destination address NIC generates HW interrupt to notify host PCI transfer to host memory CPU suspends current task & launches interrupt handler to service high priority interrupt Check network layer (IP) header & verify checksum Parse routing tables & store valid IP datagrams in IP buffer Reassemble fragmented datagrams in host memory Call transport layer (TCP/UDP) functions Deliver packet to application layer FPPP@^   $   ^ 4Problems With Conventional Protocols & Architectures55&4 NIC generates a CPU interrupt for each frame Servicing interrupts involves expensive vertical switch to kernel space. Software interrupts to pass IP datagrams to upper layers Servicing incoming packets results in high host CPU load Risk of Receiver Livelock scenarios (as in Denial of Service attacks) PCI bus startup overheads for each message Layered protocols implies expensive memory-to-memory buffer copies,Pb   ]  6  _  Available Techniques 3Bypass kernel for critical data paths Buffer & protocol processing moved to user-space User-level hardware access Zero-copy techniques Scatter/Gather techniques Larger MTUs (Jumbo frames) Larger DMA transfers avoid PCI startup overheads Interrupt coalescing Message descriptors & polling replace interrupts"43 H  /  Q Current Solutions Enabled by programmable NICs Virtual Interface Architecture (VIA2) U-Net 3 (ATM) Myrinet GM4 and Illinois FM5 (Myrinet) QsNet6 (Quadrics) EMP7 (Ethernet)@        Z  6    $   Our Proposal  NOWs running over Gigabit Ethernet Use Tigon2 programmable NIC features (onboard CPU, memory, DMA) Design a reliable lightweight communication protocol for GE Reliable network (ordered & lossless packet delivery) Low-overhead Low-latency Offload protocol processing from host CPU onto NIC CPU Interrupt-free architecture (message descriptor queues + polling) OS Bypass: user-applications & NIC hardware communicate through pinned down shared memory. Zero Copy Dynamic MTUs & DMA sizes  reduce PCI startup overheads Tackle 2 application scenarios Small messages  Latency is critical Large bandwidth  Throughput is criticald0sf0s0sN0sf N V     x    Conclusion  Provide a high performance communication API Replace PVM8 & MPI9 protocols Fine-grained thread communication High Bandwidth applications Remove network communication bottleneck in user-level thread messaging. Interface with SMASH10 user-level thread scheduler Multi-threaded applications can run seamlessly over a cluster of SMPs. Achieve higher throughput with minimal usage of host CPU resources.-}\}JP}c}EP-   HJ cE&G  G   References   D. Culler, A. Arpaci-Dusseau, R. Arpaci-Dusseau, B. Chun, S. Lumetta, A. Mainwaring, R. Martin, C. Yoshikawa, and F. Wong. Parallel Computing on the Berkeley NOW. In Ninth Joint Symposium on Parallel Processing, 1997. Microsoft Compaq, Intel. Virtual Interface Architecture Specification, draft revision 1.0 edition, December 1997. T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: a user-level network interface for parallel and distributed computing. In Proceedings of the fifteenth ACM symposium on Operating systems principles, pages 40 53. ACM Press, 1995. Myricom Inc. Myrinet GM  the low-level message-passing system for Myrinet networks. Scott Pakin, Mario Lauria, and Andrew Chien. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet. 1995. Fabrizio Petrini, Wu chun Feng, Adolfy Hoisie, Salvador Coll, and Eitan Frachtenberg. Quadrics Network (QsNet): High-Performance Clustering Technology. In Hot Interconnects 9, Stanford University, Palo Alto, CA, August 2001. Piyush Shivam, Pete Wyckoff, and Dhabaleswar Panda. EMP: Zero-copy OSbypass NIC-driven Gigabit Ethernet Message Passing. 2001. Message Passing Interface Forum. MPI2: A Message Passing Interface standard. International Journal of High Performance Computing Applications, 12(1 2):1 299, 1998. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, B. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine - A User s Guide and Tutorial for Network Parallel Computing. MIT Press, Cambridge, Mass., 1994. Kurt Debattista. High Performance Thread Scheduling on Shared Momory Multiprocessors. Master s thesis, University of Malta, 2001.LmlP @ c Z         { / N   u 4               /=   Thank you!      /     0` 3333ff3` 3333f33ff3` "3333̙ff3` Kf3̙` &e̙3g3f` f333̙po7` ___f3̙;/f9` ff3Lm` ff3LmNLm>?" dd@*?nAd@q<nAqFLK#M n?" dd@   @@``PR    M`2p>>  p (    H? ?" `}  X Click to edit Master title style!!  @  H? ?" `  RClick to edit Master text styles Second level Third level Fourth level Fifth level!    S    6 #" `] `}  \*     6 #" ``   V*      6\ #" `] `}  ` * / 11      C @ABCDE FjJ@3"0`B  s *DjJ"0 `0H  0޽h ? ___f3̙;/f9___PPT10i.  +D=' = @B + Edge@ 2 0 WO (    HR? ?"@ R X Click to edit Master title style!!    HLR? ?"  R [#Click to edit Master subtitle style$$    6 R #" `] `} R \*     6(Ȓ #" `]}  R V*      6R #" `] `}  V*      C @ABCDE F8c@3"@B  s *DjJ"  ,$ 0ZB   c $D"P0H  0޽h ? ___f3̙;/f9___PPT10i.  +ityD=' = @B +2 0 zr< (  < < 08) P    P*   < 0V     R*  d < c $ ?   < 0JV  0  RClick to edit Master text styles Second level Third level Fourth level Fifth level!     S < 6 _P   P*   < 6U _   R*  H < 0޽h ? 3380___PPT10. b>i 8(    0 P    >*   0Լ     @*   6 _P   >*   6,Ɖ _   @* H  0޽h ? 3380___PPT10.@g{$3 0 $(  r  S R@ R r  S R  R H  0޽h ? ___f3̙;/f980___PPT10.k% 2 0 0 0(   x  c $MV `}  V x  c $NV` ` V H  0޽h ? 33___PPT10i.+D=' = @B + 2 0 @0(  x  c $WV `}  V x  c $tXVP ` V H  0޽h ? 33___PPT10i.p +D=' = @B + 2 0 P0(  x  c $cV `}  V x  c $XdV `$ V H  0޽h ? 33___PPT10i.p7+D=' = @B + 2 0 `0(  x  c $~V `}  V x  c $V+   V H  0޽h ? 33___PPT10i.+D=' = @B + 2 0 (<(  (~ ( s *4V `}  V ~ ( s *dV  V H ( 0޽h ? 33___PPT10i.+D=' = @B + 2 0 p0(  x  c $ V `}  V x  c $V ` V H  0޽h ? 33___PPT10i.+D=' = @B + 2 0  <(   ~  s *V `}  V ~  s *V p! V H  0޽h ? 33___PPT10i.+D=' = @B + 2  0 00(  0x 0 c $V `}  V x 0 c $ȼV ` V H 0 0޽h ? 33___PPT10i.+D=' = @B + 2  0 40(  4x 4 c $R `}  R x 4 c $Rx `` R H 4 0޽h ? 33___PPT10i. $+D=' = @B + 2 0 PXZ(  Xr X S 8G `   d X C <A$MCj02932000000[1] 0)pd X C <A$MCj02155210000[1] `d X C <A$MCj02932000000[1] d X C <A$MCj02155210000[1]0 ( H X 0޽h ? 33___PPT10i.+D=' = @B + 0 *"0 (   X   C <   "   S c< 0   rSteps involved in sending messages over a NIC using conventional Ethernet NICs & generic protocols such as TCP/IP .J  $ H   0޽h ? 3380___PPT10.@ǖR 0 0b(  0X 0 C <    0 S hk< 0   d<Ethernet  Common & affordable  H 0 0޽h ? 3380___PPT10.*0BVP 0 4`(  4X 4 C <    4 S U< 0   b2Clustering allows companies to leverage investment2 H 4 0޽h ? 3380___PPT10..Th 0 <(  <X < C <    < S u< 0   1 Gbps: 80,000 interrupts for 1.5K frames 15,000 interrupts for 9K frames Receiver Livelock: Hardware interrupts are given higher priority than software interrupts by higher layer protocols, thus under heavy load this will result in packets continuously being discarded as buffer are full whilst upper layer protocols have no processing resources available to consume the incoming buffers. &H  N  + H < 0޽h ? 3380___PPT10.>=  0 @(  @X @ C <    @ S 肉< 0    H @ 0޽h ? 3380___PPT10.zmV 0 NFD(  DX D C <   F D S Ԋ< 0   .U-Net OS-bypass User-level protocol processing Message Descriptors EMP - 880Mb/s 23us 1st to offload host-processing for GE use Alteon for protocol processing on NIC VIA (Compaq/Intel/Microsoft - 1997)  Standard Architecture for the interface between high performance NIC & Computer System. Aim is to reduce latency for critical message passing, i.e. improving performance of distributed applications.H=U==FD @   H D 0޽h ? 3380___PPT10.'YA 0  HQ(  HX H C <    H S Ę< 0   SBigphysarea technique"    H H 0޽h ? 3380___PPT10.ş|9   0 `\(  \X \ C <    \ S 0< 0    H \ 0޽h ? 3380___PPT10.P   0 p`(  `X ` C <    ` S < 0    H ` 0޽h ? 3380___PPT10.@h  0 nfd(  dX d C <   f d S 證< 0   Advantages of User Level Thread Scheduler is that multi-threaded applications can be scheduled to run without any vertical context switching, thus not involving the kernel. Very fast context switching times.  H d 0޽h ? 3380___PPT10.yr`3Z\U_agi wd @BlnMS0uw+zP||dq0Ŋٌ!09EX #wOh+'0 hp    (4<4Low Latency Message Passing Over Gigabit EthernetKEdgeKeith Fenech37Microsoft Office PowerPoint@yB@@d@qGlg  '  y--$xx--'̙-- % % --'̙--%"EE--'--% pp--'@Garamond-. f3'2 &Low Latency Messaging ."System9-@Garamond-. f3'2 3Over Gigabit Ethernet .-@Arial-. 2 O$ Keith Fenech.-@Arial-.  2 \$CSAW.-@Arial-. !2 a$24 September 2004r.-՜.+,0    On-screen Show  G Arial GaramondTimes New Roman WingdingsEdge,Low Latency Messaging Over Gigabit EthernetWhy Cluster Computing?High Performance NetworkingConventional NICs & Protocols5Problems With Conventional Protocols & ArchitecturesAvailable TechniquesCurrent Solutions Our Proposal Conclusion References Thank you!  Fonts UsedDesign Template Slide Titles $_s 0Keith FenechKeith Fenech  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefhijklmnpqrstuvxyz{|}~Root EntrydO)Picturesd8Current UserwSummaryInformation(gPowerPoint Document(DocumentSummaryInformation8oRoot EntrydO)J@Picturesd8Current User#SummaryInformation(g  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefhijklmnpqrstuv_sKK