From ea5df50659d56d87b056207a45a17786d3bed697 Mon Sep 17 00:00:00 2001 From: "Joshua E. Jodesty" Date: Thu, 3 Oct 2019 12:29:34 -0400 Subject: [PATCH] readme pt. 3: new build --- distroduce/README.md | 66 +++++++++++------- .../{launch_cluster.sh => launch_cluster.txt} | 0 distroduce/dist/distroduce.zip | Bin 31821 -> 33300 bytes distroduce/emr/__init__.py | 39 +++++++++++ distroduce/emr/launch.py | 44 ++---------- 5 files changed, 83 insertions(+), 66 deletions(-) rename distroduce/configuration/{launch_cluster.sh => launch_cluster.txt} (100%) create mode 100644 distroduce/emr/__init__.py diff --git a/distroduce/README.md b/distroduce/README.md index a057d01..5d5b3ab 100644 --- a/distroduce/README.md +++ b/distroduce/README.md @@ -9,20 +9,24 @@ by Joshua E. Jodesty ``` - -##Description: +## What?: *Description* ***Distributed Produce*** (**[distroduce](distroduce)**) is a message simulation and throughput benchmarking framework / cadCAD execution mode that leverages Apache Spark and Kafka Producer for optimizing Kafka cluster configurations and -debugging real-time data transformations (i.e. Kafka Streams, Spark (Structured) Streaming). cadCAD's user-defined event -simulation framework is used to simulate messages sent to Kafka clusters. +debugging real-time data transformations. *distroduce* leverages cadCAD's user-defined event simulation template and +framework to simulate messages sent to Kafka clusters. This enables rapid and iterative design, debugging, and message +publish benchmarking of Kafka clusters and real-time data processing using Kafka Streams and Spark (Structured) +Streaming. -##Use Cases: -* **Education:** I wanted to provide a tool for Insight Data Engineering Fellows implementing real-time data processing -that enables rapid and iterative design, debugging, and message publish benchmarking of Kafka clusters and real-time -data processing using Kafka Streams and Spark (Structured) Streaming. -* **IoT:** Competes with *AWS IoT Device Simulator* and *Azure IoT Solution Acceleration: Device Simulation*. Unlike -these products, *Distributed Produce* enables a user-defined state updates and agent actions, as well as message publish -benchmarking +##How?: *A Tail of Two Clusters* +***Distributed Produce*** is a Spark Application used as a cadCAD Execution Mode that distributes Kafka Producers, +message simulation, and message publishing to worker nodes of an EMR cluster. Messages published from these workers are +sent to Kafka topics on a Kafka cluster from a Spark bootstrapped EMR cluster. + +##Why?: *Use Case* +* **IoT Event / Device Simulation:** Competes with *AWS IoT Device Simulator* and *Azure IoT Solution Acceleration: +Device Simulation*. Unlike these products, *Distributed Produce* enables a user-defined state updates and agent actions, +as well as message publish benchmarking +* **Development Environment for Real-Time Data Processing / Routing:** ##Get Started: @@ -40,29 +44,32 @@ sh distroduce/configuration/launch_local_kafka.sh python3 distroduce/local_messaging_sim.py ``` -### 1. Write Simulation: +### 1. Write cadCAD Simulation: * [Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation) * [Tutorials](https://github.com/BlockScience/cadCAD/tree/master/tutorials) +* **Terminology:** + * ***Initial Conditions*** - State Variables and their initial values (Start event of Simulation) + * ***[Policy Functions:](https://github.com/BlockScience/cadCAD/tree/master/documentation#Policy-Functions)*** - + computes one or more signals to be passed to State Update Functions + * ***[State Update Functions]((https://github.com/BlockScience/cadCAD/tree/master/documentation#state-update-functions)):*** - + updates state variables change over time + * ***[Partial State Update Block](https://github.com/BlockScience/cadCAD/tree/master/documentation#State-Variables) (PSUB):*** - + a set of State Update Functions and Policy Functions that update state records + ![](https://i.imgur.com/9rlX9TG.png) + + **Note:** State Update and Policy Functions now have the additional / undocumented parameter `kafkaConfig` -**a.** **Define Policy Functions:** Two users interacting on separate chat clients and entering / exiting chat rooms -* [Example](distroduce/action_policies.py) -* [Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation#Policy-Functions) +**a.** **Define Policy Functions:** +* [Example:](distroduce/action_policies.py) Two users interacting on separate chat clients and entering / exiting chat -**b.** **Define State Update Functions:** Used for logging and maintaining state of user actions defined by policies -* [Example](distroduce/state_updates.py) -* [Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation#state-update-functions) +**b.** **Define State Update Functions:** +* [Example:](distroduce/state_updates.py) Used for logging and maintaining state of user actions defined by policies -**c.** **Define Initial Conditions & Partial State Update Block** -* **Initial Conditions** - State Variables and their initial values - * [Example](distroduce/messaging_sim.py) - * [Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation#State-Variables) - -* **Partial State Update Block (PSUB)** - a set of State Update Functions and Policy Functions that update state records - * [Example](distroduce/simulation.py) - * [Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation#Partial-State-Update-Blocks) - ![](https://i.imgur.com/9rlX9TG.png) +**c.** **Define Initial Conditions & Partial State Update Block:** +* **Initial Conditions:** [Example](distroduce/messaging_sim.py) +* **Partial State Update Block (PSUB):** [Example](distroduce/simulation.py) **d.** **Create Simulation Executor:** Used for running a simulation * [Local](distroduce/local_messaging_sim.py) @@ -71,6 +78,11 @@ python3 distroduce/local_messaging_sim.py ### 2. [Configure EMR Cluster](distroduce/configuration/cluster.py) ### 3. Launch EMR Cluster: +**Option A:** Preconfigured Launch +```bash +python3 distroduce/emr/launch.py +``` +**Option B:** Custom Launch - [Example](distroduce/emr/launch.py) ```python from distroduce.emr.launch import launch_cluster from distroduce.configuration.cluster import ec2_attributes, bootstrap_actions, instance_groups, configurations diff --git a/distroduce/configuration/launch_cluster.sh b/distroduce/configuration/launch_cluster.txt similarity index 100% rename from distroduce/configuration/launch_cluster.sh rename to distroduce/configuration/launch_cluster.txt diff --git a/distroduce/dist/distroduce.zip b/distroduce/dist/distroduce.zip index d571a211732305367ade96bd37eb07c0e99543ec..f6523407a8ed8493965e11d0d7967f6d1b2f40e4 100644 GIT binary patch delta 2947 zcmaKuc{r47AIHaxY)xY@83q}~l6{X@?+b4w1k=fB*`v0ip3^QElOOzUy$Fq92jxJT`yNG4!uG@o$$6fblxe0jf8z z0MidL^%k=G`;_!O0Vyaf1DiwTviyV5(*d&lxq(`d*ncDjX(4L$_lkS~O|WkB{j#o9Vx@QfocsiVlE zkN93hM%C1GPyanU%=9&QA2zl4$P77iK^IW95oekGiH6Z-i@d1(v@H5TonQj4*kA|@ zLV|uUA>ju>Ow8*{AP_Tx9<5401$(RrP1r0Dbv)xO)ArI{R6WF`R3cq)XIHG&h@AWE z_3F+x?;X!6OOn_5YZ!wEA{SjL0-h@tQ|fX=-z?#~cWX(u{9=|U&H>4k-34t@dnxgD zJx|6PF*nxc?Fxi7m(h{Yu2SVk9CshP@(o1Jw}(2gkVM<7RGy@oz$s0ro+w;s#MEqe z?#_zP%>3B!9@EknFyo#9eo=gYjok@G^)l?fGQ)&$x)#%!%023=AkY(TApe3efpPyh z{zq27_yvqS<=TCTRZ2pn>E%p{&t&sN9u*vEqBYf4i(_GdU~m2E(gb7qhKAab;?vD4 z)acC2KW~O5Vp1AbSGso;Kd)Z)DymM~l2S)@j+=jNgMv3I7otl-LtI z7V}DFO@n`g?Mjk*!I5w9m%*)_t!z(~8a<*(95Y+1x|-l)37e?__{%Lm4FSv-I=1Ap z`-O_EzR`XgA9!7*w5#CUy3-U$u-6Wk^1En5m?u|1Mnl_8+zYMpJD&;qr@qWLosjB& zyrgwkMr&I`pt+|#5Y|P!Z0#Tu%3Ud)uX9}wZVU4d2&RqgKICJkf8g(x%N1)5tnGyh_uQsV9n8vEz6cf;soa=$w@)^U(=&J@ z?Qw}C!0I_hH2eH~xyp2vCXOgCduG&^!=13%cXWC_L-f<}AVa_4MC&Wflz5+P*v#3U z_wmZ@B@+0wjBPJFgsd9AFD~qnZ5QY~kkHj=*E=;0 zruk5(%6p2!Vm2n5jH1Z$8&Kku)QNTb&&taD7#*c{)AD@D}kXGi^}9`TKZh!A{- z;A8#M+;K)Sj{YPcMczL238xCEzH6JHyWDD6XFzSJXWBg<^2vVFv6z~?QUlxY)-Z?yfT*6qk&vdLf6b!R05D|)BB4bw5% zduXBl%Zi?@#3F2Ky^`$mQgxzmhy=$L@zw#}N2Dm>S%SKBRE^lD559{R{0KW9O zcr4O~YO>qM^MYcjm&#Tx6KxPbu-;}JSy@&WKpJJOD}Q&{`~HX{joEE9)qd{^WxLY* zH@rt++{dfV8#VpYp7k4LGbu!~j?^;Wb7%W}bXhrHDCO1;mBJ6vwB(-hm#K3c_he3R zF);Ga_a@7>k~&;!CdW5ukpz|V$Wr$*-DuNBnYs?8grR*U`0lr?OrJ7zo0 z<((lQ>*i86@FBi_J$Wqf?SI_m%2M?NDFqhY+`i^) zQe379DCySpz4zCT$G;f;L~Q8QoQ{;Ji$xvnL0H~&5y8fAKO~er>j-Z%`I_`W&I#|M`ctcmdivvntH5Vgw_Hb<*E1l|NcCuS&D@^}m038H< zeQbwsT<9?Lg^Dkx@%@Se1}?W-;vsx2^+ay(3#;i*a?QEH8!S%u7nV;2=xG=?7d$z# zRko{7S%l^d;{PO!P05ZwmriWdk_#AI{Dsef9F;Php^iEeoBSrxZQWeetiznA#4Ll@ zb}shT%v$(_RGh$w0^c*pTnKa%IgiL}{8V>hg%qLb(V$<0T#-l=j@U>)Ap2^8@HW;D zv+94((@{UdWUsDQ#-z`D&3|HJWo$(`*&)!$@37%rT_rApvZR#rxz>sWZ9a_49Q&Ef zy3V$@{ybHb9zF^ADwK|;ygRF2ny7aJwv!UaghfeY0+F5A*O*)9y@kNT&0< z=$0{`&x1_8YG09B>{M#e6}z6N%IbPn6`#xADP5Dg6)y z@KHw~zu1BRElEDW&rSe%5+y3~v#9`q5FkE=y#>(?5~wt;!~p~pUqBu>0D|mqZGUyZ zE({Kr`H4H~3Q$4?0Bt?agZuTN^Yr=vZFoW8a1>k?^%HF94*EYu0+u6DhyyH)J!sS4 Md43Rx%aif-7buk{-~a#s delta 2317 zcmY+Gc|6qJ7stnpWrm5dWS54v{a{>vdoE-p_sCbI-Z=k8?6(!B4l?;1;G3c5cv4tFq64 z%W$spXF1XY0ZR`-03iqkXPnEhM03&vcZ;>z1`zQ77{0ww=2u7sJJ_15s06w*07xMb zpatH@ro;^06ARu20^$e_$VE>I(*{xcuZ|Eb*p2iE;=vxAEF0thsjvT=LIGZ3dltaF zBrL(UOJ>qVdjB=a@(D;5ciO=|h-Y*D!C2t{bY%LUd zK466+?AWVD^|P^JWRlUcFo=Q|g;|2}g+P|JDNGTyRVd{2cM21JJOct*WGS~!>Vd&9 zec-JghSPtZ^5<1|?wt!0U>fTy!61J26lSp5AQ%!bPhpzjX4xU~^?HCB9>v+`wEF}v zOUAuuh3^jngFqb|Kq6lD7m*;yOme*j2JCHYz-RcFwKm)AyjAx4I8{u_11u*9&NSDy?rN*;6G}d+K?R=crxl5zFE0^PnRQR)JQP2 zh+M^S;aPp3kvGmXX`?RJN}sraCfopmy86Yh-3~*hg>OloJ}jIJ0u8E_pjXh|t)X;j zxDzqbAGRFA2LxhQVF%>pu|zPJA{j${XH7!%U;mj~JaDI6)_lUaox91kAj!lQYs>+Y z#|eDx__+B)%BmnSbPoDRZTh$eTw8m*AqqLw91-+UKdAJ+x0k3G!*}YqCWCwP)7n)n zjF;Ec@q4qv=ZQ^3_PnmGEYrn?&7i9<(OVN&d9IU5B+?;oH~dm&joT^Fcyiejt)i?H zK+{1q|5$b>ff_q=`HZQ~n)6^Qu7kw=Bs$W{USUMw_Mu$8bR&cfua8S0b809hrsOum zT)sD@a^!DSdS>v%3yatz8HJLC(c1@^@hDRRHp93Wy(w(W6(z}0nMo7sXnB;)Pr0+l z=trJEz)JMbV=HtA*dmT9h?5sdD$j?p)e;@?1Yth^dtM`+6Q0EfWim6l(EWaw!GAV$VX0I)$;iEb#nvI)0iq}YKYYOzB)eaimbEx1}F?2t~)dnis$JG_a zukzCQptgmQ-Hi@PhOK(3=y?JR-HuJ>kHzTdtJ!V9A;G-CbpP{2viEwmp54HSYM%JT zX2zxaekAPqLYu_6%y>{JGFfk$VBzZ6E$}P?gc`a@3#}6%m0v_Pyqh)yX zC}B->Dzu`d=8T(ZvgR)pd0daVB+U;OG${_AvISO(u9A5_`c!q4lE`N%@r>x_f#xxTiWO_BA6Re%#wl*}M z14ua)pQFGiGJjw56>XC)s?i9)W{x;p`jjC%KTJkdPVqvr@?X$}FzMHSC8~USR}u(K zz=W(^f5^2G01wD(uAHgwGdLB|tvA{Ux%jwRwUNFl#;|H9*FXC9%*gdv)=)UrE3axv z@wATzN;urt#e3xObd$Mtueyv}{81;?_=Y$UXvwomN=pnmqdr@ETnIB5J|#x#|-KLe!Kd8JNfS zMZrd7yS{~{4TwL7wO#&fX<;v!)z&%PTqh@yStr`F5S%y?wW%`kr91twG3I)tbd;ar zaNHm{f_8QM4O|f+wp~_>Q=kUH7|PivZwCHtzVPCkB=_z8-fM%K?U8)fwvhvuVhz;w zUN^U|AZh6;1rEa2T^v&pIg)LioUydS9Gk@!?nBFDh4cko*#KWz$|<+dsB@NT&^`6G zg;DN58*xofr^RK|)hEPE$8Nb72206_tW~`uTLLsG9q;2gi~Q=_Rjn?6ys4a zp~AQ2tuv+}OjSq|d}m``LR4d^=i6bjUypcmq@%Wat@U)+Omc8FSW#Cg6npavs%)Ea3W%5lAbA9{u_=Tcs7Z5#Qi%XZA*$^Rk%qGK? zm`MLjt;Xz4eeYdomAS~NQmpG)7`Ng=9_foEvssw3f%ifF)D-$S5D1X