Community_Inclusion_Currencies/SubpopulationGenerator/Subpopulation_Construction....

2159 lines
81 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CIC Subpopulation Construction\n",
"\n",
"* Scale agent model up to subpopulations. Flows are subpopulation to subpopulation. Agents in section food/water providers. All of the flows inside of the bundle become part of the self-loop flow.\n",
"\n",
"* Create subpopulations - use graph zoom type operation, kmeans, clusters. Nodes bundled together. Create 10 nodes. \n",
"\n",
"## Graph Model of Current Spend Activity\n",
"\n",
"We created a network graph of the CIC transaction data as a $G(N,E)$ weighted directed graph with source and target agents as nodes, $N$ and the edges as $E$. Tokens are used as the edge weight to denote the actual CIC flow between agents, as $i,j \\in E$.\n",
"\n",
"The observed data shows the actual payments between network actors that are transacting in CIC. The observed data does not show us shillings payments between actors, actors utility, or demand. We only know actual CIC spends between agents. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# import libraries\n",
"import networkx as nx\n",
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.cluster import KMeans\n",
"\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Dump as of 5-15-2020\n",
"Jan - May 11 2020 xDai Blockchain data\n",
"https://www.grassrootseconomics.org/research"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# import the data\n",
"transactions = pd.read_csv('data/sarafu_xDAI_tx_all_pub_all_time_12May2020.csv')\n",
"users = pd.read_csv('data/sarafu_xDAI_users_all_pub_all_time_12May2020.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>timeset</th>\n",
" <th>transfer_subtype</th>\n",
" <th>source</th>\n",
" <th>s_gender</th>\n",
" <th>s_location</th>\n",
" <th>s_business_type</th>\n",
" <th>target</th>\n",
" <th>t_gender</th>\n",
" <th>t_location</th>\n",
" <th>t_business_type</th>\n",
" <th>tx_token</th>\n",
" <th>weight</th>\n",
" <th>type</th>\n",
" <th>token_name</th>\n",
" <th>token_address</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>2020-01-25 19:13:17.731529</td>\n",
" <td>DISBURSEMENT</td>\n",
" <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>System</td>\n",
" <td>0x245fc81fe385450Dc0f4787668e47c903C00b0A1</td>\n",
" <td>female</td>\n",
" <td>GE Office</td>\n",
" <td>Savings Group</td>\n",
" <td>NaN</td>\n",
" <td>18000.000000</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2020-01-25 19:13:19.056070</td>\n",
" <td>DISBURSEMENT</td>\n",
" <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>System</td>\n",
" <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>9047.660892</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>2020-01-25 19:13:20.288346</td>\n",
" <td>DISBURSEMENT</td>\n",
" <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>System</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>25378.726002</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>2020-01-25 19:13:21.478850</td>\n",
" <td>DISBURSEMENT</td>\n",
" <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>System</td>\n",
" <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
" <td>male</td>\n",
" <td>G.E</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>4495.932576</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>2020-01-26 07:48:43.042684</td>\n",
" <td>DISBURSEMENT</td>\n",
" <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>System</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" <td>male</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>400.000000</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id timeset transfer_subtype \\\n",
"0 1 2020-01-25 19:13:17.731529 DISBURSEMENT \n",
"1 2 2020-01-25 19:13:19.056070 DISBURSEMENT \n",
"2 3 2020-01-25 19:13:20.288346 DISBURSEMENT \n",
"3 4 2020-01-25 19:13:21.478850 DISBURSEMENT \n",
"4 5 2020-01-26 07:48:43.042684 DISBURSEMENT \n",
"\n",
" source s_gender s_location \\\n",
"0 0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F NaN None \n",
"1 0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F NaN None \n",
"2 0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F NaN None \n",
"3 0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F NaN None \n",
"4 0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F NaN None \n",
"\n",
" s_business_type target t_gender \\\n",
"0 System 0x245fc81fe385450Dc0f4787668e47c903C00b0A1 female \n",
"1 System 0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2 male \n",
"2 System 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"3 System 0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435 male \n",
"4 System 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 male \n",
"\n",
" t_location t_business_type tx_token weight type token_name \\\n",
"0 GE Office Savings Group NaN 18000.000000 directed Sarafu \n",
"1 GE Nairobi Farming/Labour NaN 9047.660892 directed Sarafu \n",
"2 GE Nairobi Farming/Labour NaN 25378.726002 directed Sarafu \n",
"3 G.E Farming/Labour NaN 4495.932576 directed Sarafu \n",
"4 Home Farming/Labour NaN 400.000000 directed Sarafu \n",
"\n",
" token_address \n",
"0 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"1 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"2 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"3 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"4 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transactions.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'STANDARD': 0.5085207861177824,\n",
" 'DISBURSEMENT': 0.35574873997902784,\n",
" 'RECLAMATION': 0.13483070053783444,\n",
" 'AGENT_OUT': 0.0008997733653553429}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transactions.transfer_subtype.value_counts(normalize=True).to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Based on the data dictionary provided by Grassroots Economics, we know that the transfer subtype codes are:\n",
"\n",
"* DISBURSEMENT = from Grassroots Economics\n",
"* RECLAMATION = Back to GE, \n",
"* STANDARD = a trade between users, \n",
"* AGENT = when a group account is cashing out\n",
"\n",
"\n",
"For purposes of our analysis, we will subset to STANDARD transactions. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"transactions_subset = transactions[transactions['transfer_subtype'] == 'STANDARD']\n",
"transactions_subset = transactions_subset[transactions_subset['token_name']=='Sarafu']"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>start</th>\n",
" <th>label</th>\n",
" <th>gender</th>\n",
" <th>location</th>\n",
" <th>held_roles</th>\n",
" <th>business_type</th>\n",
" <th>bal</th>\n",
" <th>xDAI_blockchain_address</th>\n",
" <th>confidence</th>\n",
" <th>...</th>\n",
" <th>otxns_in</th>\n",
" <th>otxns_out</th>\n",
" <th>ounique_in</th>\n",
" <th>ounique_out</th>\n",
" <th>svol_in</th>\n",
" <th>svol_out</th>\n",
" <th>stxns_in</th>\n",
" <th>stxns_out</th>\n",
" <th>sunique_in</th>\n",
" <th>sunique_out</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>2020-01-25 19:10:50.218686</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>ADMIN</td>\n",
" <td>System</td>\n",
" <td>8.916761e+06</td>\n",
" <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>19917</td>\n",
" <td>52610</td>\n",
" <td>9</td>\n",
" <td>19862</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2018-10-23 09:09:58</td>\n",
" <td>2</td>\n",
" <td>female</td>\n",
" <td>GE Office</td>\n",
" <td>TOKEN_AGENT</td>\n",
" <td>Savings Group</td>\n",
" <td>1.800000e+05</td>\n",
" <td>0x245fc81fe385450Dc0f4787668e47c903C00b0A1</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>134</td>\n",
" <td>16</td>\n",
" <td>68</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>2018-10-21 14:20:57</td>\n",
" <td>3</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>BENEFICIARY</td>\n",
" <td>Farming/Labour</td>\n",
" <td>5.666089e+01</td>\n",
" <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>9007.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>2018-10-21 15:38:30</td>\n",
" <td>4</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>BENEFICIARY</td>\n",
" <td>Farming/Labour</td>\n",
" <td>1.173773e+04</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>0.100000</td>\n",
" <td>...</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>20619.0</td>\n",
" <td>50449.0</td>\n",
" <td>20</td>\n",
" <td>15</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>2018-10-23 14:10:27</td>\n",
" <td>5</td>\n",
" <td>male</td>\n",
" <td>G.E</td>\n",
" <td>BENEFICIARY</td>\n",
" <td>Farming/Labour</td>\n",
" <td>7.297263e+03</td>\n",
" <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
" <td>0.405063</td>\n",
" <td>...</td>\n",
" <td>15</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>127393.3</td>\n",
" <td>168905.0</td>\n",
" <td>158</td>\n",
" <td>208</td>\n",
" <td>84</td>\n",
" <td>65</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 22 columns</p>\n",
"</div>"
],
"text/plain": [
" id start label gender location held_roles \\\n",
"0 1 2020-01-25 19:10:50.218686 1 NaN None ADMIN \n",
"1 2 2018-10-23 09:09:58 2 female GE Office TOKEN_AGENT \n",
"2 3 2018-10-21 14:20:57 3 male GE Nairobi BENEFICIARY \n",
"3 4 2018-10-21 15:38:30 4 male GE Nairobi BENEFICIARY \n",
"4 5 2018-10-23 14:10:27 5 male G.E BENEFICIARY \n",
"\n",
" business_type bal xDAI_blockchain_address \\\n",
"0 System 8.916761e+06 0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F \n",
"1 Savings Group 1.800000e+05 0x245fc81fe385450Dc0f4787668e47c903C00b0A1 \n",
"2 Farming/Labour 5.666089e+01 0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2 \n",
"3 Farming/Labour 1.173773e+04 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 \n",
"4 Farming/Labour 7.297263e+03 0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435 \n",
"\n",
" confidence ... otxns_in otxns_out ounique_in ounique_out svol_in \\\n",
"0 0.000000 ... 19917 52610 9 19862 0.0 \n",
"1 0.000000 ... 134 16 68 0 0.0 \n",
"2 0.000000 ... 2 2 1 0 0.0 \n",
"3 0.100000 ... 6 1 1 0 20619.0 \n",
"4 0.405063 ... 15 1 1 0 127393.3 \n",
"\n",
" svol_out stxns_in stxns_out sunique_in sunique_out \n",
"0 0.0 0 0 0 0 \n",
"1 0.0 0 0 0 0 \n",
"2 9007.0 0 1 0 1 \n",
"3 50449.0 20 15 11 5 \n",
"4 168905.0 158 208 84 65 \n",
"\n",
"[5 rows x 22 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"users.head()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Farming/Labour': 0.43367860016090104,\n",
" 'Food/Water': 0.22863032984714401,\n",
" 'Shop': 0.1406878519710378,\n",
" 'Fuel/Energy': 0.06365647626709574,\n",
" 'None': 0.0621983105390185,\n",
" 'Transport': 0.04379525341914722,\n",
" 'Education': 0.014380530973451327,\n",
" 'Savings Group': 0.006335478680611424,\n",
" 'Health': 0.00331858407079646,\n",
" 'Environment': 0.001910699919549477,\n",
" 'System': 0.0012067578439259854,\n",
" 'Staff': 0.00010056315366049879,\n",
" 'Chama': 5.0281576830249393e-05,\n",
" 'Game': 5.0281576830249393e-05}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"users['business_type'].value_counts(normalize=True).to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Combine user and transaction tables\n",
"\n",
"Combine user and transaction tables on both the source and target features."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"user_subset = users[['bal','xDAI_blockchain_address']]\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>timeset</th>\n",
" <th>transfer_subtype</th>\n",
" <th>source</th>\n",
" <th>s_gender</th>\n",
" <th>s_location</th>\n",
" <th>s_business_type</th>\n",
" <th>target</th>\n",
" <th>t_gender</th>\n",
" <th>t_location</th>\n",
" <th>t_business_type</th>\n",
" <th>tx_token</th>\n",
" <th>weight</th>\n",
" <th>type</th>\n",
" <th>token_name</th>\n",
" <th>token_address</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>72647</th>\n",
" <td>170140</td>\n",
" <td>2020-04-30 10:43:45.170528</td>\n",
" <td>STANDARD</td>\n",
" <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>9007.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72648</th>\n",
" <td>10</td>\n",
" <td>2020-01-26 08:26:22.521902</td>\n",
" <td>STANDARD</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" <td>male</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>100.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72649</th>\n",
" <td>11</td>\n",
" <td>2020-01-26 08:27:26.757372</td>\n",
" <td>STANDARD</td>\n",
" <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
" <td>male</td>\n",
" <td>G.E</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>2.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72650</th>\n",
" <td>13</td>\n",
" <td>2020-01-26 08:32:05.154096</td>\n",
" <td>STANDARD</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" <td>male</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>23.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72651</th>\n",
" <td>15</td>\n",
" <td>2020-01-26 08:38:42.186525</td>\n",
" <td>STANDARD</td>\n",
" <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
" <td>male</td>\n",
" <td>Test</td>\n",
" <td>Health</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>12.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147810</th>\n",
" <td>208035</td>\n",
" <td>2020-05-11 08:52:34.504171</td>\n",
" <td>STANDARD</td>\n",
" <td>0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe</td>\n",
" <td>Unknown gender</td>\n",
" <td>kilibole</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec</td>\n",
" <td>Unknown gender</td>\n",
" <td>Kilibole</td>\n",
" <td>Food/Water</td>\n",
" <td>NaN</td>\n",
" <td>20.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147811</th>\n",
" <td>208021</td>\n",
" <td>2020-05-11 08:49:20.768559</td>\n",
" <td>STANDARD</td>\n",
" <td>0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc</td>\n",
" <td>Unknown gender</td>\n",
" <td>Kikomani</td>\n",
" <td>Food/Water</td>\n",
" <td>0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985</td>\n",
" <td>Unknown gender</td>\n",
" <td>Bofu</td>\n",
" <td>Shop</td>\n",
" <td>NaN</td>\n",
" <td>350.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147812</th>\n",
" <td>208459</td>\n",
" <td>2020-05-11 10:11:18.699013</td>\n",
" <td>STANDARD</td>\n",
" <td>0x2e44845BE57687bFdcdd26044bB7CdD575781336</td>\n",
" <td>male</td>\n",
" <td>Miyani</td>\n",
" <td>Shop</td>\n",
" <td>0xfCF20a412eB6DD345237C7BEeBab53B424b98297</td>\n",
" <td>male</td>\n",
" <td>Miyani</td>\n",
" <td>Shop</td>\n",
" <td>NaN</td>\n",
" <td>400.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147813</th>\n",
" <td>208395</td>\n",
" <td>2020-05-11 10:01:04.805823</td>\n",
" <td>STANDARD</td>\n",
" <td>0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A</td>\n",
" <td>male</td>\n",
" <td>Kilifi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF</td>\n",
" <td>Unknown gender</td>\n",
" <td>KIlifi</td>\n",
" <td>Education</td>\n",
" <td>NaN</td>\n",
" <td>20.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147814</th>\n",
" <td>208396</td>\n",
" <td>2020-05-11 10:01:10.449068</td>\n",
" <td>STANDARD</td>\n",
" <td>0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF</td>\n",
" <td>Unknown gender</td>\n",
" <td>KIlifi</td>\n",
" <td>Education</td>\n",
" <td>0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A</td>\n",
" <td>male</td>\n",
" <td>Kilifi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>20.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>75167 rows × 16 columns</p>\n",
"</div>"
],
"text/plain": [
" id timeset transfer_subtype \\\n",
"72647 170140 2020-04-30 10:43:45.170528 STANDARD \n",
"72648 10 2020-01-26 08:26:22.521902 STANDARD \n",
"72649 11 2020-01-26 08:27:26.757372 STANDARD \n",
"72650 13 2020-01-26 08:32:05.154096 STANDARD \n",
"72651 15 2020-01-26 08:38:42.186525 STANDARD \n",
"... ... ... ... \n",
"147810 208035 2020-05-11 08:52:34.504171 STANDARD \n",
"147811 208021 2020-05-11 08:49:20.768559 STANDARD \n",
"147812 208459 2020-05-11 10:11:18.699013 STANDARD \n",
"147813 208395 2020-05-11 10:01:04.805823 STANDARD \n",
"147814 208396 2020-05-11 10:01:10.449068 STANDARD \n",
"\n",
" source s_gender \\\n",
"72647 0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2 male \n",
"72648 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"72649 0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435 male \n",
"72650 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"72651 0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3 male \n",
"... ... ... \n",
"147810 0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe Unknown gender \n",
"147811 0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc Unknown gender \n",
"147812 0x2e44845BE57687bFdcdd26044bB7CdD575781336 male \n",
"147813 0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A male \n",
"147814 0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF Unknown gender \n",
"\n",
" s_location s_business_type \\\n",
"72647 GE Nairobi Farming/Labour \n",
"72648 GE Nairobi Farming/Labour \n",
"72649 G.E Farming/Labour \n",
"72650 GE Nairobi Farming/Labour \n",
"72651 Test Health \n",
"... ... ... \n",
"147810 kilibole Farming/Labour \n",
"147811 Kikomani Food/Water \n",
"147812 Miyani Shop \n",
"147813 Kilifi Farming/Labour \n",
"147814 KIlifi Education \n",
"\n",
" target t_gender \\\n",
"72647 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"72648 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 male \n",
"72649 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"72650 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 male \n",
"72651 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"... ... ... \n",
"147810 0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec Unknown gender \n",
"147811 0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985 Unknown gender \n",
"147812 0xfCF20a412eB6DD345237C7BEeBab53B424b98297 male \n",
"147813 0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF Unknown gender \n",
"147814 0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A male \n",
"\n",
" t_location t_business_type tx_token weight type token_name \\\n",
"72647 GE Nairobi Farming/Labour NaN 9007.0 directed Sarafu \n",
"72648 Home Farming/Labour NaN 100.0 directed Sarafu \n",
"72649 GE Nairobi Farming/Labour NaN 2.0 directed Sarafu \n",
"72650 Home Farming/Labour NaN 23.0 directed Sarafu \n",
"72651 GE Nairobi Farming/Labour NaN 12.0 directed Sarafu \n",
"... ... ... ... ... ... ... \n",
"147810 Kilibole Food/Water NaN 20.0 directed Sarafu \n",
"147811 Bofu Shop NaN 350.0 directed Sarafu \n",
"147812 Miyani Shop NaN 400.0 directed Sarafu \n",
"147813 KIlifi Education NaN 20.0 directed Sarafu \n",
"147814 Kilifi Farming/Labour NaN 20.0 directed Sarafu \n",
"\n",
" token_address \n",
"72647 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"72648 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"72649 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"72650 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"72651 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"... ... \n",
"147810 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"147811 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"147812 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"147813 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"147814 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 \n",
"\n",
"[75167 rows x 16 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transactions_subset"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"transactions_subset_v1 = transactions_subset.merge(user_subset, how='left', left_on='source', right_on='xDAI_blockchain_address')\n",
"transactions_subset_v1['s_bal'] = transactions_subset_v1['bal']\n",
"del transactions_subset_v1['bal']\n",
"transactions_subset_v1['s_xDAI_blockchain_address'] = transactions_subset_v1['xDAI_blockchain_address']\n",
"del transactions_subset_v1['xDAI_blockchain_address']"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"transactions_subset_v2 = transactions_subset_v1.merge(user_subset, how='left', left_on='target', right_on='xDAI_blockchain_address')"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"transactions_subset_v2 = transactions_subset_v1.merge(user_subset, how='left', left_on='target', right_on='xDAI_blockchain_address')\n",
"transactions_subset_v2['t_bal'] = transactions_subset_v2['bal']\n",
"del transactions_subset_v2['bal']\n",
"transactions_subset_v2['t_xDAI_blockchain_address'] = transactions_subset_v2['xDAI_blockchain_address']\n",
"del transactions_subset_v2['xDAI_blockchain_address']"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>timeset</th>\n",
" <th>transfer_subtype</th>\n",
" <th>source</th>\n",
" <th>s_gender</th>\n",
" <th>s_location</th>\n",
" <th>s_business_type</th>\n",
" <th>target</th>\n",
" <th>t_gender</th>\n",
" <th>t_location</th>\n",
" <th>t_business_type</th>\n",
" <th>tx_token</th>\n",
" <th>weight</th>\n",
" <th>type</th>\n",
" <th>token_name</th>\n",
" <th>token_address</th>\n",
" <th>s_bal</th>\n",
" <th>s_xDAI_blockchain_address</th>\n",
" <th>t_bal</th>\n",
" <th>t_xDAI_blockchain_address</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>170140</td>\n",
" <td>2020-04-30 10:43:45.170528</td>\n",
" <td>STANDARD</td>\n",
" <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>9007.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" <td>56.660892</td>\n",
" <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
" <td>11737.726002</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10</td>\n",
" <td>2020-01-26 08:26:22.521902</td>\n",
" <td>STANDARD</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" <td>male</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>100.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" <td>11737.726002</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>902.500000</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>11</td>\n",
" <td>2020-01-26 08:27:26.757372</td>\n",
" <td>STANDARD</td>\n",
" <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
" <td>male</td>\n",
" <td>G.E</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>2.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" <td>7297.262576</td>\n",
" <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
" <td>11737.726002</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>13</td>\n",
" <td>2020-01-26 08:32:05.154096</td>\n",
" <td>STANDARD</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" <td>male</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>23.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" <td>11737.726002</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>902.500000</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15</td>\n",
" <td>2020-01-26 08:38:42.186525</td>\n",
" <td>STANDARD</td>\n",
" <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
" <td>male</td>\n",
" <td>Test</td>\n",
" <td>Health</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>male</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>NaN</td>\n",
" <td>12.0</td>\n",
" <td>directed</td>\n",
" <td>Sarafu</td>\n",
" <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
" <td>448.000000</td>\n",
" <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
" <td>11737.726002</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id timeset transfer_subtype \\\n",
"0 170140 2020-04-30 10:43:45.170528 STANDARD \n",
"1 10 2020-01-26 08:26:22.521902 STANDARD \n",
"2 11 2020-01-26 08:27:26.757372 STANDARD \n",
"3 13 2020-01-26 08:32:05.154096 STANDARD \n",
"4 15 2020-01-26 08:38:42.186525 STANDARD \n",
"\n",
" source s_gender s_location \\\n",
"0 0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2 male GE Nairobi \n",
"1 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male GE Nairobi \n",
"2 0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435 male G.E \n",
"3 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male GE Nairobi \n",
"4 0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3 male Test \n",
"\n",
" s_business_type target t_gender \\\n",
"0 Farming/Labour 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"1 Farming/Labour 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 male \n",
"2 Farming/Labour 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"3 Farming/Labour 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 male \n",
"4 Health 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 male \n",
"\n",
" t_location t_business_type tx_token weight type token_name \\\n",
"0 GE Nairobi Farming/Labour NaN 9007.0 directed Sarafu \n",
"1 Home Farming/Labour NaN 100.0 directed Sarafu \n",
"2 GE Nairobi Farming/Labour NaN 2.0 directed Sarafu \n",
"3 Home Farming/Labour NaN 23.0 directed Sarafu \n",
"4 GE Nairobi Farming/Labour NaN 12.0 directed Sarafu \n",
"\n",
" token_address s_bal \\\n",
"0 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 56.660892 \n",
"1 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 11737.726002 \n",
"2 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 7297.262576 \n",
"3 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 11737.726002 \n",
"4 0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4 448.000000 \n",
"\n",
" s_xDAI_blockchain_address t_bal \\\n",
"0 0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2 11737.726002 \n",
"1 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 902.500000 \n",
"2 0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435 11737.726002 \n",
"3 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 902.500000 \n",
"4 0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3 11737.726002 \n",
"\n",
" t_xDAI_blockchain_address \n",
"0 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 \n",
"1 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 \n",
"2 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 \n",
"3 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 \n",
"4 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transactions_subset_v2.head()"
]
},
{
"cell_type": "code",
"execution_count": 265,
"metadata": {},
"outputs": [],
"source": [
"# subset the data into the needed columns for clustering\n",
"combined = transactions_subset_v2[['source','s_location','s_business_type','target','t_location',\n",
" 't_business_type','weight','s_bal','t_bal']]"
]
},
{
"cell_type": "code",
"execution_count": 266,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>source</th>\n",
" <th>s_location</th>\n",
" <th>s_business_type</th>\n",
" <th>target</th>\n",
" <th>t_location</th>\n",
" <th>t_business_type</th>\n",
" <th>weight</th>\n",
" <th>s_bal</th>\n",
" <th>t_bal</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>9007.0</td>\n",
" <td>56.660892</td>\n",
" <td>11737.726002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>100.0</td>\n",
" <td>11737.726002</td>\n",
" <td>902.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
" <td>G.E</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>2.0</td>\n",
" <td>7297.262576</td>\n",
" <td>11737.726002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>23.0</td>\n",
" <td>11737.726002</td>\n",
" <td>902.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
" <td>Test</td>\n",
" <td>Health</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>12.0</td>\n",
" <td>448.000000</td>\n",
" <td>11737.726002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75162</th>\n",
" <td>0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe</td>\n",
" <td>kilibole</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec</td>\n",
" <td>Kilibole</td>\n",
" <td>Food/Water</td>\n",
" <td>20.0</td>\n",
" <td>0.000000</td>\n",
" <td>5.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75163</th>\n",
" <td>0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc</td>\n",
" <td>Kikomani</td>\n",
" <td>Food/Water</td>\n",
" <td>0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985</td>\n",
" <td>Bofu</td>\n",
" <td>Shop</td>\n",
" <td>350.0</td>\n",
" <td>0.000000</td>\n",
" <td>800.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75164</th>\n",
" <td>0x2e44845BE57687bFdcdd26044bB7CdD575781336</td>\n",
" <td>Miyani</td>\n",
" <td>Shop</td>\n",
" <td>0xfCF20a412eB6DD345237C7BEeBab53B424b98297</td>\n",
" <td>Miyani</td>\n",
" <td>Shop</td>\n",
" <td>400.0</td>\n",
" <td>0.000000</td>\n",
" <td>800.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75165</th>\n",
" <td>0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A</td>\n",
" <td>Kilifi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF</td>\n",
" <td>KIlifi</td>\n",
" <td>Education</td>\n",
" <td>20.0</td>\n",
" <td>400.000000</td>\n",
" <td>500.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75166</th>\n",
" <td>0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF</td>\n",
" <td>KIlifi</td>\n",
" <td>Education</td>\n",
" <td>0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A</td>\n",
" <td>Kilifi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>20.0</td>\n",
" <td>500.000000</td>\n",
" <td>400.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>75167 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" source s_location s_business_type \\\n",
"0 0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2 GE Nairobi Farming/Labour \n",
"1 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 GE Nairobi Farming/Labour \n",
"2 0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435 G.E Farming/Labour \n",
"3 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 GE Nairobi Farming/Labour \n",
"4 0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3 Test Health \n",
"... ... ... ... \n",
"75162 0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe kilibole Farming/Labour \n",
"75163 0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc Kikomani Food/Water \n",
"75164 0x2e44845BE57687bFdcdd26044bB7CdD575781336 Miyani Shop \n",
"75165 0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A Kilifi Farming/Labour \n",
"75166 0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF KIlifi Education \n",
"\n",
" target t_location t_business_type \\\n",
"0 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 GE Nairobi Farming/Labour \n",
"1 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 Home Farming/Labour \n",
"2 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 GE Nairobi Farming/Labour \n",
"3 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 Home Farming/Labour \n",
"4 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 GE Nairobi Farming/Labour \n",
"... ... ... ... \n",
"75162 0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec Kilibole Food/Water \n",
"75163 0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985 Bofu Shop \n",
"75164 0xfCF20a412eB6DD345237C7BEeBab53B424b98297 Miyani Shop \n",
"75165 0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF KIlifi Education \n",
"75166 0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A Kilifi Farming/Labour \n",
"\n",
" weight s_bal t_bal \n",
"0 9007.0 56.660892 11737.726002 \n",
"1 100.0 11737.726002 902.500000 \n",
"2 2.0 7297.262576 11737.726002 \n",
"3 23.0 11737.726002 902.500000 \n",
"4 12.0 448.000000 11737.726002 \n",
"... ... ... ... \n",
"75162 20.0 0.000000 5.000000 \n",
"75163 350.0 0.000000 800.000000 \n",
"75164 400.0 0.000000 800.000000 \n",
"75165 20.0 400.000000 500.000000 \n",
"75166 20.0 500.000000 400.000000 \n",
"\n",
"[75167 rows x 9 columns]"
]
},
"execution_count": 266,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined"
]
},
{
"cell_type": "code",
"execution_count": 267,
"metadata": {},
"outputs": [],
"source": [
"source = combined.source.values\n",
"target = combined.target.values\n",
"# remove the source and target variables for clustering\n",
"del combined['source']\n",
"del combined['target']"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# create dummy variables of the categorical variables \n",
"updated = pd.get_dummies(combined)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compute 10 clusters based off of the following features:\n",
"* s_location\n",
"* s_business_type\n",
"* t_location\n",
"* t_business_type\n",
"* weight, which is tokens exchange\n",
"* s_bal\n",
"* t_bal"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"kmeans = KMeans(n_clusters=10, random_state=1,n_jobs=-1).fit(updated.values)"
]
},
{
"cell_type": "code",
"execution_count": 268,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/aclarkdata/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" \n"
]
}
],
"source": [
"# add the clusters back to the combined dataframe\n",
"combined['cluster'] = kmeans.labels_"
]
},
{
"cell_type": "code",
"execution_count": 269,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/aclarkdata/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" \n",
"/home/aclarkdata/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" This is separate from the ipykernel package so we can avoid doing imports until\n"
]
}
],
"source": [
"# add back the source and target variables\n",
"combined['source'] = source\n",
"combined['target'] = target"
]
},
{
"cell_type": "code",
"execution_count": 270,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>s_location</th>\n",
" <th>s_business_type</th>\n",
" <th>t_location</th>\n",
" <th>t_business_type</th>\n",
" <th>weight</th>\n",
" <th>s_bal</th>\n",
" <th>t_bal</th>\n",
" <th>cluster</th>\n",
" <th>source</th>\n",
" <th>target</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>9007.0</td>\n",
" <td>56.660892</td>\n",
" <td>11737.726002</td>\n",
" <td>4</td>\n",
" <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>100.0</td>\n",
" <td>11737.726002</td>\n",
" <td>902.500000</td>\n",
" <td>6</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>G.E</td>\n",
" <td>Farming/Labour</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>2.0</td>\n",
" <td>7297.262576</td>\n",
" <td>11737.726002</td>\n",
" <td>4</td>\n",
" <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>Home</td>\n",
" <td>Farming/Labour</td>\n",
" <td>23.0</td>\n",
" <td>11737.726002</td>\n",
" <td>902.500000</td>\n",
" <td>6</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Test</td>\n",
" <td>Health</td>\n",
" <td>GE Nairobi</td>\n",
" <td>Farming/Labour</td>\n",
" <td>12.0</td>\n",
" <td>448.000000</td>\n",
" <td>11737.726002</td>\n",
" <td>4</td>\n",
" <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
" <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" s_location s_business_type t_location t_business_type weight \\\n",
"0 GE Nairobi Farming/Labour GE Nairobi Farming/Labour 9007.0 \n",
"1 GE Nairobi Farming/Labour Home Farming/Labour 100.0 \n",
"2 G.E Farming/Labour GE Nairobi Farming/Labour 2.0 \n",
"3 GE Nairobi Farming/Labour Home Farming/Labour 23.0 \n",
"4 Test Health GE Nairobi Farming/Labour 12.0 \n",
"\n",
" s_bal t_bal cluster \\\n",
"0 56.660892 11737.726002 4 \n",
"1 11737.726002 902.500000 6 \n",
"2 7297.262576 11737.726002 4 \n",
"3 11737.726002 902.500000 6 \n",
"4 448.000000 11737.726002 4 \n",
"\n",
" source \\\n",
"0 0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2 \n",
"1 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 \n",
"2 0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435 \n",
"3 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 \n",
"4 0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3 \n",
"\n",
" target \n",
"0 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 \n",
"1 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 \n",
"2 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 \n",
"3 0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72 \n",
"4 0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31 "
]
},
"execution_count": 270,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined.head()"
]
},
{
"cell_type": "code",
"execution_count": 271,
"metadata": {},
"outputs": [],
"source": [
"# export the clusters to csv\n",
"combined.to_csv('clusters.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Descriptive statistics \n",
"\n",
"Calculate relevant statistics, such as median, mean, etc for creating probability distributions in the subpopulation model."
]
},
{
"cell_type": "code",
"execution_count": 272,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>weight</th>\n",
" <th>s_bal</th>\n",
" <th>t_bal</th>\n",
" </tr>\n",
" <tr>\n",
" <th>cluster</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>291.207791</td>\n",
" <td>781.072620</td>\n",
" <td>822.216334</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>606.382822</td>\n",
" <td>1392.005606</td>\n",
" <td>251651.998315</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>514.121339</td>\n",
" <td>1929.337442</td>\n",
" <td>63415.727008</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1589.169014</td>\n",
" <td>67756.340856</td>\n",
" <td>8186.550913</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>527.759612</td>\n",
" <td>1084.332676</td>\n",
" <td>7787.095537</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3909.855263</td>\n",
" <td>251651.998315</td>\n",
" <td>14977.323705</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1291.446186</td>\n",
" <td>21011.816026</td>\n",
" <td>2106.185386</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>601.117140</td>\n",
" <td>1166.988406</td>\n",
" <td>37913.602381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>457.416897</td>\n",
" <td>1038.146815</td>\n",
" <td>19108.202892</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1783.015873</td>\n",
" <td>4128.615253</td>\n",
" <td>104810.011282</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" weight s_bal t_bal\n",
"cluster \n",
"0 291.207791 781.072620 822.216334\n",
"1 606.382822 1392.005606 251651.998315\n",
"2 514.121339 1929.337442 63415.727008\n",
"3 1589.169014 67756.340856 8186.550913\n",
"4 527.759612 1084.332676 7787.095537\n",
"5 3909.855263 251651.998315 14977.323705\n",
"6 1291.446186 21011.816026 2106.185386\n",
"7 601.117140 1166.988406 37913.602381\n",
"8 457.416897 1038.146815 19108.202892\n",
"9 1783.015873 4128.615253 104810.011282"
]
},
"execution_count": 272,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined.groupby('cluster').mean()"
]
},
{
"cell_type": "code",
"execution_count": 276,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"mean\n",
"291.21\n",
"std\n",
"731.89\n",
"median\n",
"310.0\n",
"\n",
"1\n",
"mean\n",
"606.38\n",
"std\n",
"1529.2\n",
"median\n",
"174.5\n",
"\n",
"2\n",
"mean\n",
"514.12\n",
"std\n",
"1882.62\n",
"median\n",
"206.38\n",
"\n",
"3\n",
"mean\n",
"1589.17\n",
"std\n",
"5646.55\n",
"median\n",
"64767.51\n",
"\n",
"4\n",
"mean\n",
"527.76\n",
"std\n",
"1337.06\n",
"median\n",
"229.17\n",
"\n",
"5\n",
"mean\n",
"3909.86\n",
"std\n",
"8736.56\n",
"median\n",
"251652.0\n",
"\n",
"6\n",
"mean\n",
"1291.45\n",
"std\n",
"3381.92\n",
"median\n",
"18304.36\n",
"\n",
"7\n",
"mean\n",
"601.12\n",
"std\n",
"1548.68\n",
"median\n",
"211.7\n",
"\n",
"8\n",
"mean\n",
"457.42\n",
"std\n",
"1496.74\n",
"median\n",
"250.5\n",
"\n",
"9\n",
"mean\n",
"1783.02\n",
"std\n",
"7596.17\n",
"median\n",
"310.0\n",
"\n"
]
}
],
"source": [
"# compute median, Q1,Q3, mean, and sigma\n",
"clustersMedianSourceBalance = []\n",
"clusters1stQSourceBalance = []\n",
"clusters3rdQSourceBalance = []\n",
"clustersMu = []\n",
"clustersSigma = []\n",
"for i in range(0,len(combined.cluster.unique())):\n",
" temp = combined[combined['cluster']==i]\n",
" print(i)\n",
" print('mean')\n",
" print(round(temp.weight.mean(),2))\n",
" clustersMu.append(round(temp.weight.mean(),2))\n",
" print('std')\n",
" print(round(temp.weight.std(),2))\n",
" clustersSigma.append(round(temp.weight.std(),2))\n",
" print('median')\n",
" print(round(temp.s_bal.median(),2))\n",
" clustersMedianSourceBalance.append(round(temp.weight.median(),2))\n",
" clusters1stQSourceBalance.append(round(temp.s_bal.quantile(0.25),2))\n",
" clusters3rdQSourceBalance.append(round(temp.s_bal.quantile(0.75),2))\n",
" print()\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {},
"outputs": [],
"source": [
"# Create initilization file (copy from here) \n",
"\n",
"clusters = ['0','1','2','3','4','5','6','7','8','9']\n",
"\n",
"clustersMedianSourceBalance = [310.0,174.5,206.38,64767.51,229.17,251652.0,18304.36,211.7,250.5,310.0]\n",
"\n",
"clusters1stQSourceBalance = [112.53,119.22,100.46,64767.51,100.0,251652.0,14050.3,109.42,102.46,150.72]\n",
"\n",
"clusters3rdQSourceBalance = [800.24,540.43,582.48,64767.51,924.5,251652.0,24857.5,670.44,968.88,1458.79]\n",
"\n",
"clustersMu = [291.21,606.38,514.12,1589.17,527.76,3909.86,1291.45,601.12,457.42,1783.02]\n",
"\n",
"clustersSigma = [731.89,1529.2,1882.62,5646.55,1337.06,8736.56,3381.92,1548.68,1496.74,7596.17]\n",
"\n",
"# nested dictionary\n",
"UtilityTypesOrdered = { '0': dict(zip(list(combined[combined['cluster']==0].t_business_type.value_counts(normalize=True).to_dict().keys()),values)), \n",
" '1': dict(zip(list(combined[combined['cluster']==1].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),\n",
" '2': dict(zip(list(combined[combined['cluster']==2].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),\n",
" '3': dict(zip(list(combined[combined['cluster']==3].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),\n",
" '4': dict(zip(list(combined[combined['cluster']==4].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),\n",
" '5': dict(zip(list(combined[combined['cluster']==5].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),\n",
" '6': dict(zip(list(combined[combined['cluster']==6].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),\n",
" '7': dict(zip(list(combined[combined['cluster']==7].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),\n",
" '8': dict(zip(list(combined[combined['cluster']==8].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),\n",
" '9': dict(zip(list(combined[combined['cluster']==9].t_business_type.value_counts(normalize=True).to_dict().keys()),values))\n",
" 'external': {'Food/Water':1,\n",
" 'Fuel/Energy':2,\n",
" 'Health':3,\n",
" 'Education':4,\n",
" 'Savings Group':5,\n",
" 'Shop':6}}\n",
"\n",
" \n",
" \n",
"# nested dictionary \n",
"utilityTypesProbability = { '0': combined[combined['cluster']==0].t_business_type.value_counts(normalize=True).to_dict(), \n",
" '1': combined[combined['cluster']==1].t_business_type.value_counts(normalize=True).to_dict(),\n",
" '2': combined[combined['cluster']==2].t_business_type.value_counts(normalize=True).to_dict(),\n",
" '3': combined[combined['cluster']==3].t_business_type.value_counts(normalize=True).to_dict(),\n",
" '4': combined[combined['cluster']==4].t_business_type.value_counts(normalize=True).to_dict(),\n",
" '5': combined[combined['cluster']==5].t_business_type.value_counts(normalize=True).to_dict(),\n",
" '6': combined[combined['cluster']==6].t_business_type.value_counts(normalize=True).to_dict(),\n",
" '7': combined[combined['cluster']==7].t_business_type.value_counts(normalize=True).to_dict(),\n",
" '8': combined[combined['cluster']==8].t_business_type.value_counts(normalize=True).to_dict(),\n",
" '9': combined[combined['cluster']==9].t_business_type.value_counts(normalize=True).to_dict(),\n",
" 'external': {'Food/Water':0.6,\n",
" 'Fuel/Energy':0.10,\n",
" 'Health':0.03,\n",
" 'Education':0.015,\n",
" 'Savings Group':0.065,\n",
" 'Shop':0.19}}"
]
},
{
"cell_type": "code",
"execution_count": 256,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'0': {'Food/Water': 1,\n",
" 'Farming/Labour': 2,\n",
" 'Shop': 3,\n",
" 'Fuel/Energy': 4,\n",
" 'None': 5,\n",
" 'Transport': 6,\n",
" 'Savings Group': 7,\n",
" 'Education': 8,\n",
" 'Health': 9,\n",
" 'Environment': 10,\n",
" 'Staff': 11,\n",
" 'System': 12,\n",
" 'Chama': 13,\n",
" 'Game': 14},\n",
" '1': {'Food/Water': 1},\n",
" '2': {'Savings Group': 1, 'Farming/Labour': 2, 'Food/Water': 3},\n",
" '3': {'Farming/Labour': 1,\n",
" 'Food/Water': 2,\n",
" 'Shop': 3,\n",
" 'Savings Group': 4,\n",
" 'Fuel/Energy': 5,\n",
" 'None': 6,\n",
" 'Transport': 7,\n",
" 'Education': 8},\n",
" '4': {'Food/Water': 1,\n",
" 'Savings Group': 2,\n",
" 'Farming/Labour': 3,\n",
" 'Shop': 4,\n",
" 'Fuel/Energy': 5,\n",
" 'Health': 6,\n",
" 'None': 7,\n",
" 'Transport': 8,\n",
" 'Education': 9},\n",
" '5': {'Farming/Labour': 1,\n",
" 'Food/Water': 2,\n",
" 'Savings Group': 3,\n",
" 'Shop': 4,\n",
" 'Fuel/Energy': 5,\n",
" 'Transport': 6},\n",
" '6': {'Food/Water': 1,\n",
" 'Farming/Labour': 2,\n",
" 'Shop': 3,\n",
" 'Fuel/Energy': 4,\n",
" 'Savings Group': 5,\n",
" 'Education': 6,\n",
" 'Transport': 7,\n",
" 'None': 8,\n",
" 'Health': 9,\n",
" 'Staff': 10},\n",
" '7': {'Savings Group': 1, 'Food/Water': 2, 'Farming/Labour': 3, 'Shop': 4},\n",
" '8': {'Savings Group': 1,\n",
" 'Food/Water': 2,\n",
" 'Health': 3,\n",
" 'Education': 4,\n",
" 'Farming/Labour': 5,\n",
" 'Shop': 6},\n",
" '9': {'Savings Group': 1}}"
]
},
"execution_count": 256,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"UtilityTypesOrdered"
]
},
{
"cell_type": "code",
"execution_count": 257,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'0': {'Food/Water': 0.3376267211378423,\n",
" 'Farming/Labour': 0.3294560447874111,\n",
" 'Shop': 0.19210546224844907,\n",
" 'Fuel/Energy': 0.041685580269329704,\n",
" 'None': 0.03374186715085489,\n",
" 'Transport': 0.028086699954607355,\n",
" 'Savings Group': 0.01813814495385081,\n",
" 'Education': 0.012445150552277198,\n",
" 'Health': 0.00450143743380239,\n",
" 'Environment': 0.0012672113784233622,\n",
" 'Staff': 0.0006808896958692692,\n",
" 'System': 0.0002080496292933878,\n",
" 'Chama': 3.782720532607051e-05,\n",
" 'Game': 1.8913602663035257e-05},\n",
" '1': {'Food/Water': 1.0},\n",
" '2': {'Savings Group': 0.6427282569469506,\n",
" 'Farming/Labour': 0.25045110068567306,\n",
" 'Food/Water': 0.1068206423673764},\n",
" '3': {'Farming/Labour': 0.25480153649167736,\n",
" 'Food/Water': 0.1882202304737516,\n",
" 'Shop': 0.18437900128040974,\n",
" 'Savings Group': 0.16645326504481434,\n",
" 'Fuel/Energy': 0.12419974391805377,\n",
" 'None': 0.07554417413572344,\n",
" 'Transport': 0.0038412291933418692,\n",
" 'Education': 0.002560819462227913},\n",
" '4': {'Food/Water': 0.3145801420414984,\n",
" 'Savings Group': 0.2651441303439632,\n",
" 'Farming/Labour': 0.16724690154574573,\n",
" 'Shop': 0.1522072134800167,\n",
" 'Fuel/Energy': 0.06057652137585295,\n",
" 'Health': 0.03328227266397438,\n",
" 'None': 0.0036206656454532793,\n",
" 'Transport': 0.0020888455646845844,\n",
" 'Education': 0.0012533073388107507},\n",
" '5': {'Farming/Labour': 0.35526315789473684,\n",
" 'Food/Water': 0.35526315789473684,\n",
" 'Savings Group': 0.18421052631578946,\n",
" 'Shop': 0.05263157894736842,\n",
" 'Fuel/Energy': 0.02631578947368421,\n",
" 'Transport': 0.02631578947368421},\n",
" '6': {'Food/Water': 0.3230154767848228,\n",
" 'Farming/Labour': 0.2860708936595107,\n",
" 'Shop': 0.18871692461308037,\n",
" 'Fuel/Energy': 0.0983524712930604,\n",
" 'Savings Group': 0.040439340988517224,\n",
" 'Education': 0.032950574138791815,\n",
" 'Transport': 0.013979031452820768,\n",
" 'None': 0.012980529206190713,\n",
" 'Health': 0.0024962556165751375,\n",
" 'Staff': 0.000998502246630055},\n",
" '7': {'Savings Group': 0.7747841105354059,\n",
" 'Food/Water': 0.15751295336787566,\n",
" 'Farming/Labour': 0.04870466321243523,\n",
" 'Shop': 0.018998272884283247},\n",
" '8': {'Savings Group': 0.6999073215940685,\n",
" 'Food/Water': 0.19258572752548656,\n",
" 'Health': 0.03670064874884152,\n",
" 'Education': 0.03132530120481928,\n",
" 'Farming/Labour': 0.02057460611677479,\n",
" 'Shop': 0.018906394810009268},\n",
" '9': {'Savings Group': 1.0}}"
]
},
"execution_count": 257,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"utilityTypesProbability"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}