Archive
Python: Machine Learning – Part 3
Learning :Python and Machine Learning Part 3
Subject: Implementation and saving ML-Model
After creating a data-set and use it to train a ML model and make sure that it works fine and give a height accuracy predictions (Click here to read: Python and Machine Learning Part 2 ), we may or say we need to keep this model trained and re-use it on any actual data. In many real-life ML to training the model may take time with huge train data in image recognition or voice recognition models, so we need to keep the model trained even if we exit the application. To do this in sklearn we will use the “Model persistence” document page and use the joblib serialization.
First we need to import joblib , also import so to print out the file name and the path, we will use two functions in joblib (dump and load) in save_trained_model we will use the dump. Her is the code.
# Function to save a trained ML-Model import joblib, os # To Import joblib and os def save_trained_model(model_name): print('\n You select to save the trained ML model.') ml_name = input(' Enter a file name: ') joblib.dump(model_name, ml_name) print('\n --> ML Model been saved.\n') print(' File Name is :',ml_name) # To print-out the file name print(' File Path is :',os.path.abspath(ml_name)) # To print-out the file path print('\n\n Do you want to save the ML trained Model? (Y,N): ' ) if input('') in ['y','Y'] : save_trained_model(ML_trained_model)
Now after we save our trained ML-Model we want to load it and use it in our ML program without training our machine. I will use the function new_test_data() from part 2 and pass the ML trained model to it. And to do this, first we need to load the trained ML-Mode. So let’s do it.
# Function to load trained ML-Model def load_ML_Model(ML_filename): the_trained_model= joblib.load(ML_filename) return the_trained_model # we call the function in the main application code. ML_model = load_ML_Model(ML_t_model_filename)
And now we will call our new_test_data() function and pass ML_model to see the prediction.
# Function to load trained ML-Model def new_test_data(ML_model): print('\n\n====================================================') print('--------- START PREDICTION for New Data Set ---------') print('\n In this function a new data set will be generated, ') print(' and a trained ML-Model for "mouse on the coordinate plane" ') print(' will be loaded from the disk. So we will not train the Model.') #print(' So we will not train the Model. ') #print(' will use the IF loops.') new_data_size = 1000 new_data_range = 100 print('\n\n The new data range is {}, and the new data size is {}.'.format(new_data_range,new_data_size)) # generate new data new_test_data1= [] for x in range (new_data_size): new_test_data1.append([round(random.uniform(-new_data_range,new_data_range),2),round(random.uniform(-new_data_range,new_data_range),2)]) print('\n This is the prediction for the New Data set..\n') # Do prediction using ML_model. prediction = ML_model.predict(new_test_data1) cot = 0 # check the predictions accuracy . for i in range (len(prediction)) : if prediction[i] =='Up_r': if ((new_test_data1[i][0]) > 0 and (new_test_data1[i][1]) > 0) : cot = cot + 1 elif prediction[i] =='Up_l': if ((new_test_data1[i][0]) 0) : cot = cot + 1 elif prediction[i] =='D_r': if ((new_test_data1[i][0]) > 0 and (new_test_data1[i][1]) < 0) : cot = cot + 1 elif prediction[i] =='D_l': if ((new_test_data1[i][0]) < 0 and (new_test_data1[i][1]) < 0) : cot = cot + 1 print('\n We count {} correct prediction out of {} Instances.'.format(cot,(new_data_size))) print('\n The Accuracy is:',round((cot/len(prediction))*100,3),'%')
![]() |
Python: Machine Learning – Part 2
Learning :Python and Machine Learning Part 2
Subject: Requirements, Sample and Implementation
Machine Learning Implementation : In the previous post (Click to Read: Python and Machine Learning Part 1) we start to learn about the ML Machine Learning and we use the sklearn model with Iris data-set. In this post we will generate our own data-set and tray to pass it to the ML model and find-out if the result are satisfying our needs.
First of all let’s talk about the data we want to collect, since we are doing tests and we can’t do anything on the accuracy checking part, I will select a very easy data so we can make sure that IF our ML-model select the right labels. So I will write a function to generate numbers (two pairs) positives and negatives to present the mouse location on the coordinate plane and the labels will be:
Up_r = Up Right, Up_l= Up Left,
D_r= Down Right, D_l= Down Left
So we have (4) classes 20 Instances in each, that’s 80 Instances in total.
The data will be passed into get_test_train_data() function, and it will return train, test data and labels, then we will train the model using the train_data() function, after that we will run the model on the test data to see if the model succeed in predicting the correct labels.
In this post I will cover the function that will generate the data and converting the data set into object data-set so we can use it in sklearn model without changing our code in part-1. I will use same data-set names as in sklearn Iris data-set.
Also we will write some information or say summary about the data we have and classes. So let’s see this part first..
## Data Set Characteristics ::: Creator: Ali Radwani 26/11/2019 Summary: This function will generate a dataset for Machine Learning for test and learning purpose. Numeric x,y represent the position of the mouse on the coordinate plane. Up_r = Up Right, Up_l= Up Left, D_r= Down Right, D_l= Down Left Number of Instances: 80 (20 in each of four (4) classes) Number of Attributes: 2 numeric (x,y), predictive attributes and the class. Attribute Information: x (Position) y (Position) class: Up_r Up_l D_r D_l
Once we create the data-set object we can append this information as description, adding descriptions to your data and applications is a good habit to learn and to have.
What is our data-set: From the summary part above we can see that we need to write a function to randomly generate two float number ranged from (-N) to (+N), N is our data_range. We assuming that these two numbers (pairs) are x, y of the mouse on the coordinate plane, so depending on each pairs (if it is negative or positive) we will add the corresponding class name, at the end we will have a list with tree values: x,y,label. Let’s see the code .
# Function to generate data-set def data_set_generator(): d_size = 400 # data-set size d_range = 200 # Data-set range data_list=[] nd1=[] # FOR loop to generate the random float numbers for x in range (d_size ): nd1 =([round(random.uniform(-d_range,d_range),2),round(random.uniform(-d_range,d_range),2)]) # Here we append the x,y pairs with labels. if nd1[0] > 0 and nd1[1] > 0 : data_list.append([nd1[0],nd1[1],'Up_r']) if nd1[0] 0 : data_list.append([nd1[0],nd1[1],'Up_l']) if nd1[0] > 0 and nd1[1] < 0 : data_list.append([nd1[0],nd1[1],'D_r']) if nd1[0] < 0 and nd1[1] < 0 : data_list.append([nd1[0],nd1[1],'D_l']) # We use shuffling the data-set to mix the data more for x in range (5): # To mix the data random.shuffle(data_list) return data_list # Return the data-set
![]() |
During writing the Machine Learning ML code to use Iris data-set, the data itself, the labels and other parts was called as an object from the main data-set. So here we need to create several sets of our data then we append them all to-gather. First I will split the data into two sets, one for the data and one for the targets(labels).
# Function to prepare data-set def dataset_prepare(the_dataset): ''' input: dataset The function will split the dataset into 2 sets, one for data (data_set) and one for labels (target_set) ''' target_set = [] data_set = [] for x in range (len(the_dataset)) : data_set.append([the_dataset[x][0],the_dataset[x][1]]) target_set.append([the_dataset[x][2]]) return data_set, target_set
prepare data set![]() |
With above two functions we can now train our model and test it to see accuracy predictions. To make sure again that we can let our ML model to predict more new data-set, I create another function that will generate another set of data, I create this function to see try or say to be confident that YES the model is working. So let’s see the code. .
# Function to create New dataset def new_test_data(): print( '\n\n====================================================' ) print( '--------- START PREDICTION for new data set ---------' ) print( '\n This is new data set, not the test one.. so there is ' ) print( ' no labels to do comparing and to get the accuracy we ' ) print( ' will use the IF loops.' ) new_data_size = 5000 # data-set size new_data_range = 300 # data-set range print( ' The new data range is {}, and the new data size is {}.'.format( new_data_range, new_data_size ) ) new_test_data1 = [] # To generate the new data set. for x in range( new_data_size ): new_test_data1.append( [round( random.uniform( -new_data_range, new_data_range ), 2 ), round( random.uniform( -new_data_range, new_data_range ), 2 )] ) print( '\n\n This is the prediction for the New Data set..\n' ) prediction = clf.predict( new_test_data1 ) cot = 0 # Here we start counting the accuracy for i in range( len( prediction ) ): if prediction[i] == 'Up_r': if ((new_test_data1[i][0]) > 0 and (new_test_data1[i][1]) > 0): cot = cot + 1 elif prediction[i] == 'Up_l': if ((new_test_data1[i][0]) 0): cot = cot + 1 elif prediction[i] == 'D_r': if ((new_test_data1[i][0]) > 0 and (new_test_data1[i][1]) < 0): cot = cot + 1 elif prediction[i] == 'D_l': if ((new_test_data1[i][0]) < 0 and (new_test_data1[i][1]) < 0): cot = cot + 1 print( '\n We count {} correct prediction out of {} Instances.'.format( cot, (new_data_size) ) ) print( '\n The Accuracy is:', round( (cot / len( prediction )) * 100, 3 ), '%' )
Wrapping-up: In this post we wrote a function to generate a data-set and split it into two parts one for training and one for testing. Then we test the model with fresh new data-set that been generated via another function. Here is a screenshot of the final result.
![]() |
Python: Machine Learning – Part 1
Learning :Python and Machine Learning
Subject: Requirements, Sample and Implementation
Machine Learning: I will not go through definitions and uses of ML, I think there is a lot of other posts that may be more informative than whatever i will write. In this post I will write about my experience and learning carve to learn and implement ML model and test my own data.
The Story: Two, three days ago I start to read and watch videos about Machine Learning, I fond the “scklearn” site, from there I create the first ML to test an Iris data-set and then I wrote a function to generate data (my own random data) and test it with sklearn ML model.
Let’s start ..
Requirements:
1. Library to Import: To work with sklearn models and other functions that we will use, we need to import coming libraries:
import os # I will use it to clear the terminal.
import random # I will use it to generate my data-set.
import numpy as np
import bunch # To create data-set as object
from sklearn import datasets
from sklearn import svm
from sklearn import tree
from sklearn.model_selection import train_test_split as tts
2. Data-set: In my learning steps I use one of sklearn data-set named ” Iris” it store information about a flower called ‘Iris’. To use sklear ML Model on other data-sets, I create several functions to generate random data that can be passed into the ML, I will cover this part later in another post.
First we will see what is the Iris dataset, this part of information is copied from sklearn site.
::Iris dataset description ::
dataset type: Classification
contain: 3 classes, 50 Samples per class (Total of 150 sample)
4 Dimensionality
Features: real, positive
The data is Dictionary-like object, the interesting attributes are:
‘data’: the data to learn.
‘target’: the classification labels.
‘target_names’: the meaning of the labels.
‘feature_names’: the meaning of the features.
‘DESCR’: the full description of the dataset.
‘filename’: the physical location of iris csv.
Note: This part helps me to write me data-set generating function, that’s why we import the Bunch library to add lists to a data-set so it will appear as an object data-set, so the same code we use for Iris data-set will work fine with our data-set. In another post I will cover I will load the data from csv file and discover how to create a such file..
Start Writing the code parts: After I wrote the code and toned it, I create several functions to be called with other data-set and not hard-code any names in iris data-set. This way we can load other data-set in easy way.
The Code
# import libraries import numpy as np from sklearn import datasets #from sklearn import svm from sklearn import tree from sklearn.model_selection import train_test_split as tts import random, bunch
Next step we will load the iris dataset into a variable called “the_data”
# loading the iris dataset. the_data = datasets.load_iris()
From the above section “Iris dataset description” we fond that the data is stored in data, and the classification labels stored in target, so now we will store the data and the target in another two variables.
# load the data into all_data, and target in all_labels. all_data= the_data.data all_labels = the_data.target
We will create an object called ‘clf’ and will use the Decision Tree Classifier from sklearn.
# create Decision Tree Classifier clf = tree.DecisionTreeClassifier()
In Machine Learning programs, we need some data for training and another set of data for testing before we pass the original data or before we deploy our code for real data. The sklearn providing a way or say function to split a given data into two parts test and train. To do this part and to split the dataset into training and test I create a function that we will call and pass data and label set to it and it will return the following : train_data, test_data, train_labels, test_labels.
# Function to split a data-set into training and testing data. def get_test_train_data(data,labels): train_data, test_data, train_labels, test_labels = tts(data,labels,test_size = 0.1) return train_feats, test_feats, train_labels, test_labels
After splitting the data we will have four list or say data-sets, we will pass the train_data and the train_labels to the train_me() function, I create this function so we can pass the train_data, train_labels and it will call the (clf.fit) from sklearn. By finishing this part we have trained our ML Model and is ready to test a sample data. But first let’s see the train_me() function.
# Function train_me() will pass the train_data to sklearn Model. def train_me(train_data1,train_labels1): clf.fit(train_data1,train_labels1) print('\n The Model been trained. ')
As we just say, now we have a trained Model and ready for testing. To test the data set we will use the clf.predict function in sklearn, this should return a prediction labels list as the ML Model think that is right. To check if the predictions of the Model is correct or not also to have the percentage of correct answers we will count and compare the prediction labels with the actual labels in the test_data that we have. Here is the code for get_prediction()
# get_prediction() to predict the data labels. def get_prediction(new_data_set,test_labels2,accu): print('\n This is the prediction labels of the data.\n') # calling prediction function clf.predict prediction = clf.predict(new_data_set) print('\n prediction labels are : ',prediction,len(prediction)) # print the Accuracy if accu == 't' : cot = 0 for i in range (len(prediction)) : print(prediction[i] , new_data_set[i],test_labels2[i]) if [prediction[i]] == test_labels2[i]: cot = cot + 1 print('\ncount :',cot) print('\n The Accuracy:',(cot/len(prediction))*100,'%')
The accuracy value determine if we can use the model in a real life or tray to use other model. In the real data scenario, we need to pass ‘False’ flag for accu, because we can’t cross check the predicted result with any data, we can try to check manually for some result.
End of part 1: by now, we have all functions that we can use with our data-set, in coming images of the code and run-time screen we can see that we have a very high accuracy level so we can use our own data-set, and this will be in the coming post.
![]() |
Result screen shot after running the Iris dataset showing high accuracy level.![]() |