semi-final _

This commit is contained in:
Joshua E. Jodesty 2019-10-03 18:50:34 -04:00
parent c7c68a1abe
commit 8503ec1d6b
3 changed files with 15 additions and 10 deletions

View File

@ -53,7 +53,8 @@ spark-submit --py-files distroduce/dist/distroduce.zip distroduce/local_messagi
* [cadCAD Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation) * [cadCAD Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation)
* [cadCAD Tutorials](https://github.com/BlockScience/cadCAD/tree/master/tutorials) * [cadCAD Tutorials](https://github.com/BlockScience/cadCAD/tree/master/tutorials)
* **Terminology:** * **Terminology:**
* ***[Initial Conditions](https://github.com/BlockScience/cadCAD/tree/master/documentation#state-variables)*** - State Variables and their initial values (Start event of Simulation) * ***[Initial Conditions](https://github.com/BlockScience/cadCAD/tree/master/documentation#state-variables)*** -
State Variables and their initial values (Start event of Simulation)
```python ```python
initial_conditions = { initial_conditions = {
'state_variable_1': 0, 'state_variable_1': 0,

View File

@ -10,8 +10,8 @@ by Joshua E. Jodesty
``` ```
## What?: *Description* ## What?: *Description*
***Distributed Produce*** (**[distroduce](distroduce)**) is a message simulation and throughput benchmarking framework / ***Distributed Produce*** (**[distroduce](distroduce)**) is a distributed message simulation and throughput benchmarking
[cadCAD](https://cadcad.org) execution mode that leverages [Apache Spark](https://spark.apache.org/) and framework / [cadCAD](https://cadcad.org) execution mode that leverages [Apache Spark](https://spark.apache.org/) and
[Apache Kafka Producer](https://kafka.apache.org/documentation/#producerapi) for optimizing Kafka cluster configurations [Apache Kafka Producer](https://kafka.apache.org/documentation/#producerapi) for optimizing Kafka cluster configurations
and debugging real-time data transformations. *distroduce* leverages cadCAD's user-defined event simulation template and and debugging real-time data transformations. *distroduce* leverages cadCAD's user-defined event simulation template and
framework to simulate messages sent to Kafka clusters. This enables rapid and iterative design, debugging, and message framework to simulate messages sent to Kafka clusters. This enables rapid and iterative design, debugging, and message
@ -20,8 +20,8 @@ Streaming.
##How?: *A Tail of Two Clusters* ##How?: *A Tail of Two Clusters*
***Distributed Produce*** is a Spark Application used as a cadCAD Execution Mode that distributes Kafka Producers, ***Distributed Produce*** is a Spark Application used as a cadCAD Execution Mode that distributes Kafka Producers,
message simulation, and message publishing to worker nodes of an EMR cluster. Messages published from these workers are message simulation, and message publishing to worker nodes of an [AWS EMR](https://aws.amazon.com/emr/) cluster.
sent to Kafka topics on a Kafka cluster from a Spark bootstrapped EMR cluster. Messages published from these workers are sent to Kafka topics on a Kafka cluster from a Spark bootstrapped EMR cluster.
##Why?: *Use Case* ##Why?: *Use Case*
* **IoT Event / Device Simulation:** Competes with *AWS IoT Device Simulator* and *Azure IoT Solution Acceleration: * **IoT Event / Device Simulation:** Competes with *AWS IoT Device Simulator* and *Azure IoT Solution Acceleration:
@ -49,9 +49,9 @@ spark-submit --py-files distroduce/dist/distroduce.zip distroduce/local_messagi
### 1. Write cadCAD Simulation: ### 1. Write cadCAD Simulation:
* **Simulation Description:** * **Simulation Description:**
To demonstration of *Distributed Produce*, I implemented a simulation of two users interacting over a messaging service. To demonstration of *Distributed Produce*, I implemented a simulation of two users interacting over a messaging service.
* **Resources** * **cadCAD Resources:**
* [cadCAD Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation) * [Documentation](https://github.com/BlockScience/cadCAD/tree/master/documentation)
* [cadCAD Tutorials](https://github.com/BlockScience/cadCAD/tree/master/tutorials) * [Tutorials](https://github.com/BlockScience/cadCAD/tree/master/tutorials)
* **Terminology:** * **Terminology:**
* ***[Initial Conditions](https://github.com/BlockScience/cadCAD/tree/master/documentation#state-variables)*** - State Variables and their initial values (Start event of Simulation) * ***[Initial Conditions](https://github.com/BlockScience/cadCAD/tree/master/documentation#state-variables)*** - State Variables and their initial values (Start event of Simulation)
```python ```python
@ -133,9 +133,13 @@ cluster_name = 'distibuted_produce'
launch_cluster(cluster_name, region, ec2_attributes, bootstrap_actions, instance_groups, configurations) launch_cluster(cluster_name, region, ec2_attributes, bootstrap_actions, instance_groups, configurations)
``` ```
### 4. Execute Benchmark(s): ### 4. Execute Benchmark(s) on EMR:
* **Step 1:** ssh unto master node * **Step 1:** ssh unto master node
* **Step 2:** Spark Submit ```bash
zip -rq distroduce/dist/distroduce.zip distroduce/
```
* **Step 2:** ssh unto master node
* **Step 3:** Spark Submit
``` ```
spark-submit --master yarn --py-files distroduce.zip messaging_sim.py `hostname | xargs` spark-submit --master yarn --py-files distroduce.zip messaging_sim.py `hostname | xargs`
``` ```

Binary file not shown.