Posts Tagged ‘data’

Python: My Fake Data Generator P-2

December 15, 2019 Leave a comment

Learning : Python: Functions, Procedures and documentation
Subject: About fake data P-2: (Fake ID)

Before we start i’d like to mention that with our last fcolor() function we write some comments in the first part of the function between three double quote(“””), and if we load the function and call help() as help(fcolor()) we will get that information on the python console as a help as in screen shot.

In this post we will write a function to generate a fake ID number, for ID’s there could be several styles, sometime we just want a random number without any meaning; just X number of random digits. Most of the time we need this number to be mean-full based on certain rules. For example, in Banks they may use some digits that indicate the branch. In sport club, they may include the date … and so-on.

Here we will write a function called key_generator(), the function will take two arguments (dig, s) dig is your key digits number, s is the style, if s = d then the first 6 digits of the key will be the date as ddmmyy + random digits, and if s = anything else or s not passed then the key will be as default (just x-digits). Let’s see the code.

First the summary or say information about the function:

def key_generator(dig, s = 'n'):
       ### Date: 8/12/2019, By: Ali Radwani ###
            This function will generate x-digit key randomly.
            If the argument s = 'd' or 'D' then the key is two part, first (6) digits
            are date as ddmmyy then x-digit random numbers.

            If the argument s anything else than ['d','D'] or no argument passes, then the key
            is random numbers without any meaning.

            The numbers will randomly be selected in range of (10 to 99).

            import: random, datetime

            Argument: int: dig: The number of digits for the key.
                 str: s  : The key style (with date or just random numbers)

            return: int: the_key

Now, if the user pass s=’d’ then part of the key will be the current date, to do this we will call the datetime function in python and split it into dd,mm,yy. Here is the key_generator() function.

def key_generator(dig, s = 'n'):
       ### Date: 8/12/2019, By: Ali Radwani ###
            This function will generate x-digit key randomly.
            If the argument s = 'd' or 'D' then the key is two part, first (6) digits
            are date as ddmmyy then x-digit random numbers.

            If the argument s anything else than ['d','D'] or no argument passes, then the key
            is random numbers without any meaning.

            The numbers will randomly be selected in range of (10 to 99).

            import: random, datetime

            Argument: int: dig: The number of digits for the key.
                 str: s  : The key style (with date or just random numbers)

            return: int: the_key
    if s in ['d','D'] :
        d = str(
        dd = d[8:10]
        mm = d[5:7]
        yy = d[2:4]
        the_key = dd + mm + yy
        for x in range (dig):
            the_key = the_key + str( random.randint(10,99))
        return int(the_key[:(dig + 6)])
    else :
        for x in range (dig):
            the_key = the_key + str( random.randint(10,99))

        return int(the_key[:dig])

In next Fake Data function we will try to write one to generate the date. It will be published on next Sunday.

:: Fake Function List ::

Function Name Description
Color To return a random color code in RGB or Hex.
Date To return a random date.
Mobile To return a mobile number.
Country To return a random country name.
City To return a random City name.
ID To return X random dig as ID.
Time To return random time.


Follow me on Twitter..

By: Ali Radwani

python: Fake Data-set

December 9, 2019 1 comment

Learning : Python to generate fake data-set
Subject: About Fake data library

Most of the time when we working on a project, we need to test our procedures and functions with some data. In most cases we need just dummy data such as dates, names, address .. and so-on.

Last week, I was reading on the net and i fond an article about generating fake data using a library in PHP (PHP is a Computer Programming Language) so I start to find if we have one in Python! and the answer is YES there is a library that we can import called ‘Fake’. I start to work on it and discover it. This post is about the Fake Data-set Library.

The library called ‘Faker’ and we need to install it in our python environment, i use : pip install Faker to install it. In it’s documentation we can use some properties like : name, city, date, job .. and others. So if we want to generate a fake name we write this:

# Using lib:fake to generate fake name

[Output]: Victoria Campbell

Here is a screen-shot from Jupyter notbook screen.

To generate more than one name we can use for loop as:

# Using lib:fake to generate (X) fake name

for x in range (10) :
[Output]: Jared Hawkins
Michael Reid
Ricky Brown
Mary Tyler
Kristy Dudley
Karen Cain
Jennifer Underwood
Desiree Jensen
Carla Rivera
Brandon Cooper

Other properties that we can use are :address, company, job, country, date_time and many other, and with all this we can create a data-set full of fake data.

So if we want to create a fake data-set contain:
Name, Date-of-birth, Company, Job, Country as one person data we will use it like this:

# Using lib:fake to generate (X) person fake data
# Data-set contain: Name, Date-of-birth, Company, Job, Country
p_count = 1
for x in range (p_count):

Name: Crystal Mcconnell
DOB: 2002-09-30
Company: Bailey LLC
Job: Insurance underwriter
country: Pakistan

Now if we want to store the person data in a dictionary type variable and use it later, we can do this as following:

# Using lib:fake to generate (X) person fake data and store it in a dictionary 
people_d ={}
p_count = 5
for x in range (p_count):
    ID = x

# To print-out the people_d data-set.
for x in people_d :

Just in case we want a complicated ID we can use a random function (8-dig) integer, or combining two fake numbers such as (fake.zipcode() and fake.postcode()) just to make sure that we will not have a duplicate ID.

Using fake library will help a lot, and it has many attributes and properties that can be inserted in a data-set. For more information on this document you may read it here: Fake Library

Follow me on Twitter..

By: Ali Radwani

Python: Machine Learning – Part 1

November 27, 2019 1 comment

Learning :Python and Machine Learning
Subject: Requirements, Sample and Implementation

Machine Learning: I will not go through definitions and uses of ML, I think there is a lot of other posts that may be more informative than whatever i will write. In this post I will write about my experience and learning carve to learn and implement ML model and test my own data.

The Story: Two, three days ago I start to read and watch videos about Machine Learning, I fond the “scklearn” site, from there I create the first ML to test an Iris data-set and then I wrote a function to generate data (my own random data) and test it with sklearn ML model.

Let’s start ..


1. Library to Import: To work with sklearn models and other functions that we will use, we need to import coming libraries:

import os # I will use it to clear the terminal.

import random # I will use it to generate my data-set.

import numpy as np

import bunch # To create data-set as object

from sklearn import datasets

from sklearn import svm

from sklearn import tree

from sklearn.model_selection import train_test_split as tts

2. Data-set: In my learning steps I use one of sklearn data-set named ” Iris” it store information about a flower called ‘Iris’. To use sklear ML Model on other data-sets, I create several functions to generate random data that can be passed into the ML, I will cover this part later in another post.
First we will see what is the Iris dataset, this part of information is copied from sklearn site.

::Iris dataset description ::
dataset type: Classification
contain: 3 classes, 50 Samples per class (Total of 150 sample)
4 Dimensionality
Features: real, positive

The data is Dictionary-like object, the interesting attributes are:
‘data’: the data to learn.
‘target’: the classification labels.
‘target_names’: the meaning of the labels.
‘feature_names’: the meaning of the features.
‘DESCR’: the full description of the dataset.
‘filename’: the physical location of iris csv.

Note: This part helps me to write me data-set generating function, that’s why we import the Bunch library to add lists to a data-set so it will appear as an object data-set, so the same code we use for Iris data-set will work fine with our data-set. In another post I will cover I will load the data from csv file and discover how to create a such file..

Start Writing the code parts: After I wrote the code and toned it, I create several functions to be called with other data-set and not hard-code any names in iris data-set. This way we can load other data-set in easy way.

The Code

 # import libraries 

import numpy as np
from sklearn import datasets
#from sklearn import svm
from sklearn import tree
from sklearn.model_selection import train_test_split as tts
import random, bunch

Next step we will load the iris dataset into a variable called “the_data”

 # loading the iris dataset. 

the_data = datasets.load_iris() 

From the above section “Iris dataset description” we fond that the data is stored in data, and the classification labels stored in target, so now we will store the data and the target in another two variables.

 # load the data into all_data, and target in all_labels. 
all_labels =   

We will create an object called ‘clf’ and will use the Decision Tree Classifier from sklearn.

 #  create Decision Tree Classifier 

clf = tree.DecisionTreeClassifier()

In Machine Learning programs, we need some data for training and another set of data for testing before we pass the original data or before we deploy our code for real data. The sklearn providing a way or say function to split a given data into two parts test and train. To do this part and to split the dataset into training and test I create a function that we will call and pass data and label set to it and it will return the following : train_data, test_data, train_labels, test_labels.

 #  Function to split a data-set into training and testing data. 

def get_test_train_data(data,labels):

  train_data, test_data, train_labels, test_labels = tts(data,labels,test_size = 0.1)
  return train_feats, test_feats, train_labels, test_labels

After splitting the data we will have four list or say data-sets, we will pass the train_data and the train_labels to the train_me() function, I create this function so we can pass the train_data, train_labels and it will call the ( from sklearn. By finishing this part we have trained our ML Model and is ready to test a sample data. But first let’s see the train_me() function.

 #  Function train_me() will pass the train_data to sklearn Model. 

def train_me(train_data1,train_labels1):,train_labels1)
  print('\n The Model been trained. ')

As we just say, now we have a trained Model and ready for testing. To test the data set we will use the clf.predict function in sklearn, this should return a prediction labels list as the ML Model think that is right. To check if the predictions of the Model is correct or not also to have the percentage of correct answers we will count and compare the prediction labels with the actual labels in the test_data that we have. Here is the code for get_prediction()

 #  get_prediction() to predict the data labels. 

def get_prediction(new_data_set,test_labels2,accu):

  print('\n This is the prediction labels of the data.\n')

  # calling prediction function clf.predict
  prediction = clf.predict(new_data_set)
  print('\n prediction labels are : ',prediction,len(prediction))
  # print the Accuracy
  if accu == 't' :
    cot = 0
    for i in range (len(prediction)) :
      print(prediction[i] , new_data_set[i],test_labels2[i])
      if [prediction[i]] == test_labels2[i]:
        cot = cot + 1
    print('\ncount :',cot)
    print('\n The Accuracy:',(cot/len(prediction))*100,'%')

The accuracy value determine if we can use the model in a real life or tray to use other model. In the real data scenario, we need to pass ‘False’ flag for accu, because we can’t cross check the predicted result with any data, we can try to check manually for some result.

End of part 1: by now, we have all functions that we can use with our data-set, in coming images of the code and run-time screen we can see that we have a very high accuracy level so we can use our own data-set, and this will be in the coming post.

Result screen shot after running the Iris dataset showing high accuracy level.

Follow me on Twitter..

Python ploting

November 21, 2019 Leave a comment

Learning : Plotting Data using python and numpy
Subject: Plotting Data

The best way to show the data is to make them as a graph or charts, there are several charts type and names each will present your data in a different way and used for different purpose. Plotting the data using python is a good way to show out your data and in coming posts we will cover very basic aspects in plotting data. So if we just want to show a sample for what we are talking about, we will say: we have a sample of hospital data for born childs (male m, female f, in years 200 to 2003).

:: Click to enlarge ::

There are some libraries we can use in python to help us plotting the data, here are some of them. Matplotlib, Plotly and Seaborn are just samples of what we may use, in this post we will use the Matplotlib. To use Matplotlib we need to install it, so if it is not installed in your python you need to do so.

pip install Matplotlib

Then we need to import it in our code using :

import matplotlib.pyplot as plt

To show the data we need to have some variables that will be used in our first example, So the case is that we have some data from a hospital, the data are numbers of born childs (male m, female f) in years 2000 to 2003. We will store/save the data in list, we will have data_yesrs =[2000,2001,2002,2003], then we will have male born data in data_m=[2,2.5,3,5] and female born data data_f = [3,3.8,4,4.5], the chart will have two axis vertical is Y y_data_title =’In Hundreds’ and horizontal is X x_data_title =’ Years’, now to project all this information on a chart we use this code ..

import matplotlib.pyplot as plt
data_yesrs = [2000,2001,2002,2003] # years on X axis 
data_m = [2,2.5,3,5]   # y data males born
data_f = [3,3.8,4,4.5]  # y data female born
y_data_title ='In Thousands'
x_data_title =' Years'

plt.title('New Born babies')

plt.plot(data_yesrs,data_m,'r-', data_yesrs,data_f,'b--')

Another way to plot the data were we can use a one line for each data set as:

We can see that male data is red line, and female data is blue dashes, we can use some line style to present the data as mentioned bellow:

‘-‘ or ‘solid’ is solid line
‘–‘ or ‘dashed’ is dashed line
‘-.’ or ‘dashdot’ is dash-dotted line
‘:’ or ‘dotted’ is dotted line
‘None’ or ‘ ‘ or ” is draw nothing

And also we can use colors such as :
r: red, g: green,
b: blue, y: yellow .

If we want to add the map or chart key, we need first to import matplotlib.patches as mpatches then to add this line of code:

and the keys [‘Male’,’Female’] MUST be in the same sequence as the main plot code line :
plt.plot(data_yesrs,data_m,’r-‘, data_yesrs,data_f,’b–‘)

Follow me on Twitter..