Posts Tagged ‘pandas’

Python: Pandas Lesson 5

July 29, 2019 13 comments

Learning : Pandas Lesson 5
Subject: Columns rename and missing data.

We are still in the same track looking after commands that help us in managing and formatting the dataframe. In most cases, we will have a data file from the net, or from a source that may not consider formatting or standardization as his/their concern, or we may find a lots of missing data in the file. In this post we will go through some lines that will make the file in better shape.

Columns, First thing we will look at is the head of the data table. So First, check your data columns with this code:

print(‘\n\n Current columns of the data.\n’,df3.columns)

Now we have the a list of the columns in our datafile, and we can change any of them just to give a more clear meaning or any other purpose. I found that using rename method and passing new columns names as dictionary is better because we can rename without order also not stick to rename them all.

df.rename(columns={‘animal’:’Animal-Kind’,’id’:’ID’,’cage_no’:’InCage’}, inplace=True)
print(‘\n\n Table with new renaming columns .\n’,df3)

I just forgot to add .sample(6) so we will just have sample data, but anyway the new header is there and we use inplace = True so this new header will stay with us in df3.

Missing Data: This is the biggest challenge in any data file, some time the application that used to fills the form, or the person who entering the data or for any other reasons they are not handling the missing data in a standard way, so you may find just empty field, or ‘NA’ or dummy numbers like (0000), or (-0) or dashes (—). Handling such case is realy depending on the customer you are working for, like what they want to put/write in each empty field, now we are just talking about filling with standard key.

In coming code we are saying to pandas: whenever you found NaN replace it with ‘NA’
print(‘\n\n Replace NaN with NA.\n’new_df)

Note This: I will go to our data_file_zoo.csv and just add more NaN to some fields so our coming case will be meaningful.

Our data_file_zoo.csv has 6 columns, animal, id are primary keys and can’t be empty, so there MUST be filled. Now for the other columns I will say for each of columns if the data is NaN then we will replace it with:(MD:Missinf Data, NA:Not Available and – for numbers )
water_need : MD
supervisor : NA
InCage : –
years : –
Note That we MUST use the same columns name in the df we are working with.
Here is the code:

Replacing Missing Data
new_df=df2.fillna({‘water_need’:’MD’, ‘supervisor’:’NA’,’InCage’: ‘-‘ ,’years’:’-‘})

(I mark the replacing fields.)

Let’s say we notes that some data in water_need column is not logical, like if we know it can’t be 600, so we just want to replace any number biger that or equal to 600 in that column ot ‘err’. Code here..

Code to change some value based on a condition.
df2.loc[df2[‘Water’] == 600, [‘Water’]] = ‘err’
print (‘\n\n change 600 to err./n’,df2)

:: Pandas Lessons Post ::

Lesson 1 Lesson 2 Lesson 3
Lesson 4 Lesson 5

Follow me on Twitter..

Python: Pandas Lesson 3

July 22, 2019 15 comments

Learning : Pandas Lesson 3
Subject: dataframe (sort, where and filters)

In my Last post Pandaas Lesson 2, we show some commands that will output part of our dataframe (df) such as if we want to output the information we have about lions, or other animals in the Zoo file. Or to see what aminals fell under particular supervisor. Also I try to add a print statment over each output table to show/describe the table content.

In theis Lesson, or let’s say in this post I will share another bunch of commands dealing with one table of data. We will keep using our Zoo data file. So first I wll call the dataframe df.

import padas and call df

import pandas as pd


df=pd.read_csv(file_name, delimiter=’,’)

print(‘\n Data from Zoo file..’,df)

So, if we want to sort the data based on supervisor name.

df.sort_values(‘supervisor’, inplace=True)
print(‘\n\n Sorted data with Supervisor Name\n’,df)

First thing to notes that we have two group of supervisors name ‘peter’ one with small ‘p’, another with Big ‘P’. Another thing to see that we have some ‘lions’ with NaN under supervisor, this meas there is no data in that feilds. I will not change this now, let’s do this in another lesson.

So, let’s sort the data now with anumal type.

print(‘\n\n Sort with animal type.\n’,df)

If we want to print all animal data under mark supervision, other data will be shown as NaN.
mark_supervision = df[‘supervisor’]==’mark’
df.where(mark_supervision, inplace = True)
print(‘\n\n Any rows else than Mark as supervisore will be as NaN\n’,df)

If we want to add another filter to the upper dataframe to show animals under mark supervision if the animal age is more than 7.
age_biger_7 = df[‘years’] >7
df.where(mark_suoervision & age_biger_7, inplace = True)
print(‘\n\n Only rows under mark supervision if animal age > 7 \n’,df)

:: Pandas Lessons Post ::

Lesson 1 Lesson 2 Lesson 3 Lesson 4
Lesson 5

Follow me on Twitter..

Python: Pandas Lesson

July 21, 2019 13 comments

Learning : DataFrame and some commands
Subject: Pandas printing selected rows

First thing we will do today, we will add another coloumn to our CSV data_file_zoo.csv, we will add ‘years’ this will be hwo old each animal in the zoo is.

File_Name: data_file_zoo.csv

As we just update out file, we need to load it to the memory by calling the df (dataframe), this will happen once we run our code.
Here is a screen shot of the new data using print(df)

Lets say we want to know how many animals are numder 6 years. Here we will use df.loc to locate what we are looking for.

age_less_6 = df.loc[(dfyears<6)]
# To print we may use this:
print(‘ we have {} animals less than 6 years’.format(len(age_less_6)))

Now, we want to print only lion rows:
lino_rows = df.loc[(df.animal==’lion’)]

Here is only rows with animal name ‘elephants’:

Now let’s print only the rows with lion and elephants:
lion_and_elephant = df.loc[(df.animal==’lion’) | (df.animal == ‘elephant’)]

What if we want all the data but not the rows with lino or elephant.
all_exclude_lion_elephant=df.loc[(df.animal !=’lion’) & (df.animal !=’elephant’)]

Follow me on Twitter..

:: Pandas Lessons Post ::

Lesson 1 Lesson 2 Lesson 3 Lesson 4
Lesson 5

Follow me on Twitter..

Python: Pandas Lessons

July 17, 2019 13 comments

Learning : DataFrame and some commands

This is my first hours in Pandas, until now thing are going smooth. I am using pythonanywhere on my PC, and jupyterlab on my galaxy tab S4.

In this post and coming once under name Pandas Lesson I will write some commands and what-ever I think I may need.

So, first thing we need a csv file with data to play with, so I search for some thing simple, i found one with zoo data!, I add two new column to it. so lets see it.

File_Name: data_file_zoo.csv

I add the ” supervisor and cage_no ” to the original file so we will have more room to manipulate.

First Command: first thing we need to call pandas library using import, and set the file name and dataframe.

import pandas as pd
df=pd.read_csv(file_name, delimiter=’,’)

We will use this part for all our initialization part

Other Command: Here are other commands that works with dataframe df.

print(df) Will print out all the data from the file.
print (df.head()) Will print first 5 rows
print (df.tail()) Will print last 5 rows
print (df.sample(3)) Will print random 3 rows from the dataframe.
print(df.columns) Will print the columns in the file
print (df[[‘id’,’animal’,’cage_no’]]) Print only the data from column you want
print (df[[‘id’,’animal’,’cage_no’]].sample(3)) Print random 3 rows of only ‘id’,’animal’,’cage_no’ columns
print (df[df.animal==’lion’]) Get all the rows with animal name = lion . case sensitive
print(df.head()[[‘animal’,’id’]]) Print first five rows of only animal and id

Wrapped up: This is a step one, pandas has many to read about and to learn, I start this initiative just for my self, and i select the hard way to do this, this is not important to my current job, this is nothing that any body will ask me about, but i want to learn and I think i will go further in this self-taught learning sessions..

Update on: 29/7/2019

:: Pandas Lessons Post ::

Lesson 1 Lesson 2 Lesson 3 Lesson 4
Lesson 5

Follow me on Twitter..