# CIC Subpopulation Construction

* Scale agent model up to subpopulations. Flows are subpopulation to subpopulation. Agents in section food/water providers. All of the flows inside of the bundle become part of the self-loop flow.

* Create subpopulations - use graph zoom type operation, kmeans, clusters. Nodes bundled together. Create 10 nodes. 

## Graph Model of Current Spend Activity

We created a network graph of the CIC transaction data as a $G(N,E)$ weighted directed graph with source and target agents as nodes, $N$ and the edges as $E$. Tokens are used as the edge weight to denote the actual CIC flow between agents, as $i,j \in E$.

The observed data shows the actual payments between network actors that are transacting in CIC. The observed data does not show us shillings payments between actors, actors utility, or demand. We only know actual CIC spends between agents.     

In [1]:
# import libraries
import networkx as nx
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans

import matplotlib.pyplot as plt
%matplotlib inline

## Data Dump as of 5-15-2020
Jan - May 11 2020 xDai Blockchain data
https://www.grassrootseconomics.org/research

In [2]:
# import the data
transactions = pd.read_csv('data/sarafu_xDAI_tx_all_pub_all_time_12May2020.csv')
users = pd.read_csv('data/sarafu_xDAI_users_all_pub_all_time_12May2020.csv')

In [3]:
transactions.head()

Unnamed: 0,id,timeset,transfer_subtype,source,s_gender,s_location,s_business_type,target,t_gender,t_location,t_business_type,tx_token,weight,type,token_name,token_address
0,1,2020-01-25 19:13:17.731529,DISBURSEMENT,0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F,,,System,0x245fc81fe385450Dc0f4787668e47c903C00b0A1,female,GE Office,Savings Group,,18000.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
1,2,2020-01-25 19:13:19.056070,DISBURSEMENT,0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F,,,System,0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2,male,GE Nairobi,Farming/Labour,,9047.660892,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
2,3,2020-01-25 19:13:20.288346,DISBURSEMENT,0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F,,,System,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,,25378.726002,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
3,4,2020-01-25 19:13:21.478850,DISBURSEMENT,0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F,,,System,0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435,male,G.E,Farming/Labour,,4495.932576,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
4,5,2020-01-26 07:48:43.042684,DISBURSEMENT,0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F,,,System,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72,male,Home,Farming/Labour,,400.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4


In [5]:
transactions.transfer_subtype.value_counts(normalize=True).to_dict()

{'STANDARD': 0.5085207861177824,
 'DISBURSEMENT': 0.35574873997902784,
 'RECLAMATION': 0.13483070053783444,
 'AGENT_OUT': 0.0008997733653553429}

Based on the data dictionary provided by Grassroots Economics, we know that the transfer subtype codes are:

* DISBURSEMENT = from Grassroots Economics
* RECLAMATION = Back to GE, 
* STANDARD = a trade between users, 
* AGENT = when a group account is cashing out


For purposes of our analysis, we will subset to STANDARD transactions. 

In [6]:
transactions_subset = transactions[transactions['transfer_subtype'] == 'STANDARD']
transactions_subset = transactions_subset[transactions_subset['token_name']=='Sarafu']

In [10]:
users.head()

Unnamed: 0,id,start,label,gender,location,held_roles,business_type,bal,xDAI_blockchain_address,confidence,...,otxns_in,otxns_out,ounique_in,ounique_out,svol_in,svol_out,stxns_in,stxns_out,sunique_in,sunique_out
0,1,2020-01-25 19:10:50.218686,1,,,ADMIN,System,8916761.0,0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F,0.0,...,19917,52610,9,19862,0.0,0.0,0,0,0,0
1,2,2018-10-23 09:09:58,2,female,GE Office,TOKEN_AGENT,Savings Group,180000.0,0x245fc81fe385450Dc0f4787668e47c903C00b0A1,0.0,...,134,16,68,0,0.0,0.0,0,0,0,0
2,3,2018-10-21 14:20:57,3,male,GE Nairobi,BENEFICIARY,Farming/Labour,56.66089,0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2,0.0,...,2,2,1,0,0.0,9007.0,0,1,0,1
3,4,2018-10-21 15:38:30,4,male,GE Nairobi,BENEFICIARY,Farming/Labour,11737.73,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,0.1,...,6,1,1,0,20619.0,50449.0,20,15,11,5
4,5,2018-10-23 14:10:27,5,male,G.E,BENEFICIARY,Farming/Labour,7297.263,0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435,0.405063,...,15,1,1,0,127393.3,168905.0,158,208,84,65


In [12]:
users['business_type'].value_counts(normalize=True).to_dict()

{'Farming/Labour': 0.43367860016090104,
 'Food/Water': 0.22863032984714401,
 'Shop': 0.1406878519710378,
 'Fuel/Energy': 0.06365647626709574,
 'None': 0.0621983105390185,
 'Transport': 0.04379525341914722,
 'Education': 0.014380530973451327,
 'Savings Group': 0.006335478680611424,
 'Health': 0.00331858407079646,
 'Environment': 0.001910699919549477,
 'System': 0.0012067578439259854,
 'Staff': 0.00010056315366049879,
 'Chama': 5.0281576830249393e-05,
 'Game': 5.0281576830249393e-05}

## Combine user and transaction tables

Combine user and transaction tables on both the source and target features.

In [13]:
user_subset = users[['bal','xDAI_blockchain_address']]


In [14]:
transactions_subset

Unnamed: 0,id,timeset,transfer_subtype,source,s_gender,s_location,s_business_type,target,t_gender,t_location,t_business_type,tx_token,weight,type,token_name,token_address
72647,170140,2020-04-30 10:43:45.170528,STANDARD,0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2,male,GE Nairobi,Farming/Labour,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,,9007.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
72648,10,2020-01-26 08:26:22.521902,STANDARD,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72,male,Home,Farming/Labour,,100.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
72649,11,2020-01-26 08:27:26.757372,STANDARD,0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435,male,G.E,Farming/Labour,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,,2.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
72650,13,2020-01-26 08:32:05.154096,STANDARD,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72,male,Home,Farming/Labour,,23.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
72651,15,2020-01-26 08:38:42.186525,STANDARD,0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3,male,Test,Health,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,,12.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
147810,208035,2020-05-11 08:52:34.504171,STANDARD,0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe,Unknown gender,kilibole,Farming/Labour,0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec,Unknown gender,Kilibole,Food/Water,,20.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
147811,208021,2020-05-11 08:49:20.768559,STANDARD,0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc,Unknown gender,Kikomani,Food/Water,0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985,Unknown gender,Bofu,Shop,,350.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
147812,208459,2020-05-11 10:11:18.699013,STANDARD,0x2e44845BE57687bFdcdd26044bB7CdD575781336,male,Miyani,Shop,0xfCF20a412eB6DD345237C7BEeBab53B424b98297,male,Miyani,Shop,,400.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4
147813,208395,2020-05-11 10:01:04.805823,STANDARD,0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A,male,Kilifi,Farming/Labour,0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF,Unknown gender,KIlifi,Education,,20.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4


In [15]:
transactions_subset_v1 = transactions_subset.merge(user_subset, how='left', left_on='source', right_on='xDAI_blockchain_address')
transactions_subset_v1['s_bal'] = transactions_subset_v1['bal']
del transactions_subset_v1['bal']
transactions_subset_v1['s_xDAI_blockchain_address'] = transactions_subset_v1['xDAI_blockchain_address']
del transactions_subset_v1['xDAI_blockchain_address']

In [16]:
transactions_subset_v2 = transactions_subset_v1.merge(user_subset, how='left', left_on='target', right_on='xDAI_blockchain_address')

In [17]:
transactions_subset_v2 = transactions_subset_v1.merge(user_subset, how='left', left_on='target', right_on='xDAI_blockchain_address')
transactions_subset_v2['t_bal'] = transactions_subset_v2['bal']
del transactions_subset_v2['bal']
transactions_subset_v2['t_xDAI_blockchain_address'] = transactions_subset_v2['xDAI_blockchain_address']
del transactions_subset_v2['xDAI_blockchain_address']

In [18]:
transactions_subset_v2.head()

Unnamed: 0,id,timeset,transfer_subtype,source,s_gender,s_location,s_business_type,target,t_gender,t_location,t_business_type,tx_token,weight,type,token_name,token_address,s_bal,s_xDAI_blockchain_address,t_bal,t_xDAI_blockchain_address
0,170140,2020-04-30 10:43:45.170528,STANDARD,0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2,male,GE Nairobi,Farming/Labour,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,,9007.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4,56.660892,0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2,11737.726002,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31
1,10,2020-01-26 08:26:22.521902,STANDARD,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72,male,Home,Farming/Labour,,100.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4,11737.726002,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,902.5,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72
2,11,2020-01-26 08:27:26.757372,STANDARD,0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435,male,G.E,Farming/Labour,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,,2.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4,7297.262576,0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435,11737.726002,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31
3,13,2020-01-26 08:32:05.154096,STANDARD,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72,male,Home,Farming/Labour,,23.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4,11737.726002,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,902.5,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72
4,15,2020-01-26 08:38:42.186525,STANDARD,0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3,male,Test,Health,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,male,GE Nairobi,Farming/Labour,,12.0,directed,Sarafu,0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4,448.0,0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3,11737.726002,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31


In [265]:
# subset the data into the needed columns for clustering
combined = transactions_subset_v2[['source','s_location','s_business_type','target','t_location',
            't_business_type','weight','s_bal','t_bal']]

In [266]:
combined

Unnamed: 0,source,s_location,s_business_type,target,t_location,t_business_type,weight,s_bal,t_bal
0,0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2,GE Nairobi,Farming/Labour,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,GE Nairobi,Farming/Labour,9007.0,56.660892,11737.726002
1,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,GE Nairobi,Farming/Labour,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72,Home,Farming/Labour,100.0,11737.726002,902.500000
2,0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435,G.E,Farming/Labour,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,GE Nairobi,Farming/Labour,2.0,7297.262576,11737.726002
3,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,GE Nairobi,Farming/Labour,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72,Home,Farming/Labour,23.0,11737.726002,902.500000
4,0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3,Test,Health,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,GE Nairobi,Farming/Labour,12.0,448.000000,11737.726002
...,...,...,...,...,...,...,...,...,...
75162,0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe,kilibole,Farming/Labour,0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec,Kilibole,Food/Water,20.0,0.000000,5.000000
75163,0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc,Kikomani,Food/Water,0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985,Bofu,Shop,350.0,0.000000,800.000000
75164,0x2e44845BE57687bFdcdd26044bB7CdD575781336,Miyani,Shop,0xfCF20a412eB6DD345237C7BEeBab53B424b98297,Miyani,Shop,400.0,0.000000,800.000000
75165,0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A,Kilifi,Farming/Labour,0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF,KIlifi,Education,20.0,400.000000,500.000000


In [267]:
source = combined.source.values
target = combined.target.values
# remove the source and target variables for clustering
del combined['source']
del combined['target']

In [23]:
# create dummy variables of the categorical variables 
updated = pd.get_dummies(combined)

Compute 10 clusters based off of the following features:
* s_location
* s_business_type
* t_location
* t_business_type
* weight, which is tokens exchange
* s_bal
* t_bal

In [24]:
kmeans = KMeans(n_clusters=10, random_state=1,n_jobs=-1).fit(updated.values)

In [268]:
# add the clusters back to the combined dataframe
combined['cluster'] = kmeans.labels_

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [269]:
# add back the source and target variables
combined['source'] = source
combined['target'] = target

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [270]:
combined.head()

Unnamed: 0,s_location,s_business_type,t_location,t_business_type,weight,s_bal,t_bal,cluster,source,target
0,GE Nairobi,Farming/Labour,GE Nairobi,Farming/Labour,9007.0,56.660892,11737.726002,4,0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31
1,GE Nairobi,Farming/Labour,Home,Farming/Labour,100.0,11737.726002,902.5,6,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72
2,G.E,Farming/Labour,GE Nairobi,Farming/Labour,2.0,7297.262576,11737.726002,4,0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31
3,GE Nairobi,Farming/Labour,Home,Farming/Labour,23.0,11737.726002,902.5,6,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31,0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72
4,Test,Health,GE Nairobi,Farming/Labour,12.0,448.0,11737.726002,4,0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3,0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31


In [271]:
# export the clusters to csv
combined.to_csv('clusters.csv')

## Descriptive statistics 

Calculate relevant statistics, such as median, mean, etc for creating probability distributions in the subpopulation model.

In [272]:
combined.groupby('cluster').mean()

Unnamed: 0_level_0,weight,s_bal,t_bal
cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,291.207791,781.07262,822.216334
1,606.382822,1392.005606,251651.998315
2,514.121339,1929.337442,63415.727008
3,1589.169014,67756.340856,8186.550913
4,527.759612,1084.332676,7787.095537
5,3909.855263,251651.998315,14977.323705
6,1291.446186,21011.816026,2106.185386
7,601.11714,1166.988406,37913.602381
8,457.416897,1038.146815,19108.202892
9,1783.015873,4128.615253,104810.011282


In [276]:
# compute median, Q1,Q3, mean, and sigma
clustersMedianSourceBalance = []
clusters1stQSourceBalance = []
clusters3rdQSourceBalance = []
clustersMu = []
clustersSigma = []
for i in range(0,len(combined.cluster.unique())):
    temp = combined[combined['cluster']==i]
    print(i)
    print('mean')
    print(round(temp.weight.mean(),2))
    clustersMu.append(round(temp.weight.mean(),2))
    print('std')
    print(round(temp.weight.std(),2))
    clustersSigma.append(round(temp.weight.std(),2))
    print('median')
    print(round(temp.s_bal.median(),2))
    clustersMedianSourceBalance.append(round(temp.weight.median(),2))
    clusters1stQSourceBalance.append(round(temp.s_bal.quantile(0.25),2))
    clusters3rdQSourceBalance.append(round(temp.s_bal.quantile(0.75),2))
    print()
    


0
mean
291.21
std
731.89
median
310.0

1
mean
606.38
std
1529.2
median
174.5

2
mean
514.12
std
1882.62
median
206.38

3
mean
1589.17
std
5646.55
median
64767.51

4
mean
527.76
std
1337.06
median
229.17

5
mean
3909.86
std
8736.56
median
251652.0

6
mean
1291.45
std
3381.92
median
18304.36

7
mean
601.12
std
1548.68
median
211.7

8
mean
457.42
std
1496.74
median
250.5

9
mean
1783.02
std
7596.17
median
310.0



In [224]:
# Create initilization file (copy from here) 

clusters = ['0','1','2','3','4','5','6','7','8','9']

clustersMedianSourceBalance = [310.0,174.5,206.38,64767.51,229.17,251652.0,18304.36,211.7,250.5,310.0]

clusters1stQSourceBalance = [112.53,119.22,100.46,64767.51,100.0,251652.0,14050.3,109.42,102.46,150.72]

clusters3rdQSourceBalance = [800.24,540.43,582.48,64767.51,924.5,251652.0,24857.5,670.44,968.88,1458.79]

clustersMu = [291.21,606.38,514.12,1589.17,527.76,3909.86,1291.45,601.12,457.42,1783.02]

clustersSigma = [731.89,1529.2,1882.62,5646.55,1337.06,8736.56,3381.92,1548.68,1496.74,7596.17]

# nested dictionary
UtilityTypesOrdered = { '0': dict(zip(list(combined[combined['cluster']==0].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),  
                        '1': dict(zip(list(combined[combined['cluster']==1].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),
                        '2': dict(zip(list(combined[combined['cluster']==2].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),
                        '3': dict(zip(list(combined[combined['cluster']==3].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),
                        '4': dict(zip(list(combined[combined['cluster']==4].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),
                        '5': dict(zip(list(combined[combined['cluster']==5].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),
                        '6': dict(zip(list(combined[combined['cluster']==6].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),
                        '7': dict(zip(list(combined[combined['cluster']==7].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),
                        '8': dict(zip(list(combined[combined['cluster']==8].t_business_type.value_counts(normalize=True).to_dict().keys()),values)),
                        '9': dict(zip(list(combined[combined['cluster']==9].t_business_type.value_counts(normalize=True).to_dict().keys()),values))
                         'external': {'Food/Water':1,
                                            'Fuel/Energy':2,
                                            'Health':3,
                                            'Education':4,
                                            'Savings Group':5,
                                            'Shop':6}}

  
    
#  nested dictionary 
utilityTypesProbability = { '0': combined[combined['cluster']==0].t_business_type.value_counts(normalize=True).to_dict(),  
                        '1': combined[combined['cluster']==1].t_business_type.value_counts(normalize=True).to_dict(),
                        '2': combined[combined['cluster']==2].t_business_type.value_counts(normalize=True).to_dict(),
                        '3': combined[combined['cluster']==3].t_business_type.value_counts(normalize=True).to_dict(),
                        '4': combined[combined['cluster']==4].t_business_type.value_counts(normalize=True).to_dict(),
                        '5': combined[combined['cluster']==5].t_business_type.value_counts(normalize=True).to_dict(),
                        '6': combined[combined['cluster']==6].t_business_type.value_counts(normalize=True).to_dict(),
                        '7': combined[combined['cluster']==7].t_business_type.value_counts(normalize=True).to_dict(),
                        '8': combined[combined['cluster']==8].t_business_type.value_counts(normalize=True).to_dict(),
                        '9': combined[combined['cluster']==9].t_business_type.value_counts(normalize=True).to_dict(),
                        'external': {'Food/Water':0.6,
                                            'Fuel/Energy':0.10,
                                            'Health':0.03,
                                            'Education':0.015,
                                            'Savings Group':0.065,
                                            'Shop':0.19}}

In [256]:
UtilityTypesOrdered

{'0': {'Food/Water': 1,
  'Farming/Labour': 2,
  'Shop': 3,
  'Fuel/Energy': 4,
  'None': 5,
  'Transport': 6,
  'Savings Group': 7,
  'Education': 8,
  'Health': 9,
  'Environment': 10,
  'Staff': 11,
  'System': 12,
  'Chama': 13,
  'Game': 14},
 '1': {'Food/Water': 1},
 '2': {'Savings Group': 1, 'Farming/Labour': 2, 'Food/Water': 3},
 '3': {'Farming/Labour': 1,
  'Food/Water': 2,
  'Shop': 3,
  'Savings Group': 4,
  'Fuel/Energy': 5,
  'None': 6,
  'Transport': 7,
  'Education': 8},
 '4': {'Food/Water': 1,
  'Savings Group': 2,
  'Farming/Labour': 3,
  'Shop': 4,
  'Fuel/Energy': 5,
  'Health': 6,
  'None': 7,
  'Transport': 8,
  'Education': 9},
 '5': {'Farming/Labour': 1,
  'Food/Water': 2,
  'Savings Group': 3,
  'Shop': 4,
  'Fuel/Energy': 5,
  'Transport': 6},
 '6': {'Food/Water': 1,
  'Farming/Labour': 2,
  'Shop': 3,
  'Fuel/Energy': 4,
  'Savings Group': 5,
  'Education': 6,
  'Transport': 7,
  'None': 8,
  'Health': 9,
  'Staff': 10},
 '7': {'Savings Group': 1, 'Food/Water'

In [257]:
utilityTypesProbability

{'0': {'Food/Water': 0.3376267211378423,
  'Farming/Labour': 0.3294560447874111,
  'Shop': 0.19210546224844907,
  'Fuel/Energy': 0.041685580269329704,
  'None': 0.03374186715085489,
  'Transport': 0.028086699954607355,
  'Savings Group': 0.01813814495385081,
  'Education': 0.012445150552277198,
  'Health': 0.00450143743380239,
  'Environment': 0.0012672113784233622,
  'Staff': 0.0006808896958692692,
  'System': 0.0002080496292933878,
  'Chama': 3.782720532607051e-05,
  'Game': 1.8913602663035257e-05},
 '1': {'Food/Water': 1.0},
 '2': {'Savings Group': 0.6427282569469506,
  'Farming/Labour': 0.25045110068567306,
  'Food/Water': 0.1068206423673764},
 '3': {'Farming/Labour': 0.25480153649167736,
  'Food/Water': 0.1882202304737516,
  'Shop': 0.18437900128040974,
  'Savings Group': 0.16645326504481434,
  'Fuel/Energy': 0.12419974391805377,
  'None': 0.07554417413572344,
  'Transport': 0.0038412291933418692,
  'Education': 0.002560819462227913},
 '4': {'Food/Water': 0.3145801420414984,
  'Sa