Pandas is a software library written in Python for data manipulation and analysis.It offers data structures and operations for manipulationg numerical tables and time series
pip install pandas
import pandas as pd
import os
os.getcwd()
'C:\\Users\\kevin\\project_kevin\\datapandas\\solution\\full_pandas_course'
# move our position into the same folder where the data is stored
path = str(r'C:\Users\kevin\Downloads\DataPandas\archive (1)\Harry_Potter_Movies')
path = path.replace("\\","/")
os.chdir(path)
os.getcwd()
'C:\\Users\\kevin\\Downloads\\DataPandas\\archive (1)\\Harry_Potter_Movies'
## df is a dataframe
df = pd.read_csv('Characters.csv',delimiter=',')
df
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
161 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN |
162 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN |
163 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN |
164 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN |
165 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN |
166 rows × 8 columns
# read the first 10 rows
df.head(10)
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
5 | 6 | Severus Snape | Human | Male | Slytherin | Doe | NaN | NaN |
6 | 7 | Minerva McGonagall | Human | Female | Gryffindor | Cat | Fir | Dragon Heartstring |
7 | 8 | Horace Slughorn | Human | Male | Slytherin | NaN | Cedar | Dragon Heartstring |
8 | 9 | Voldemort | Human | Male | Slytherin | NaN | Yew | Phoenix Feather |
9 | 10 | Neville Longbottom | Human | Male | Gryffindor | NaN | Cherry | Unicorn Hair |
# read the last 10 rows
df.tail(10)
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
156 | 157 | Death Eater | NaN | NaN | NaN | NaN | NaN | NaN |
157 | 158 | Man in a painting | NaN | NaN | NaN | NaN | NaN | NaN |
158 | 159 | Photographer | NaN | NaN | NaN | NaN | NaN | NaN |
159 | 160 | Pixie | NaN | NaN | NaN | NaN | NaN | NaN |
160 | 161 | Student | NaN | NaN | NaN | NaN | NaN | NaN |
161 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN |
162 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN |
163 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN |
164 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN |
165 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN |
# Wich is the most commond gender in Harry Pottere movies?
df.groupby('Gender')['Gender'].count()
Gender Female 42 Human 1 Male 82 Name: Gender, dtype: int64
# df.groupby('Gender')['Gender'].count().plot()
# df.groupby('Gender')['Gender'].count().plot(kind='bar')
df.groupby('Gender')['Gender'].count().plot(kind='pie')
<AxesSubplot:ylabel='Gender'>
df.loc[0,'Character Name']
'Harry Potter'
df.loc[:,:]
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
161 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN |
162 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN |
163 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN |
164 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN |
165 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN |
166 rows × 8 columns
df.loc[10:20,:]
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
10 | 11 | Remus Lupin | Werewolf | Male | Gryffindor | Wolf | Cypress | Unicorn Hair |
11 | 12 | Draco Malfoy | Human | Male | Slytherin | NaN | Hawthorn | Unicorn Hair |
12 | 13 | Alastor Moody | Human | Male | NaN | NaN | NaN | NaN |
13 | 14 | Fred Weasley | Human | Male | Gryffindor | NaN | NaN | NaN |
14 | 15 | Dolores Umbridge | Human | Female | Slytherin | Cat | Birch | Dragon Heartstring |
15 | 16 | Arthur Weasley | Human | Male | Gryffindor | Weasel | NaN | NaN |
16 | 17 | Cornelius Fudge | Human | Male | NaN | NaN | NaN | NaN |
17 | 18 | Sirius Black | Human | Male | Gryffindor | NaN | NaN | NaN |
18 | 19 | George Weasley | Human | Male | Gryffindor | NaN | NaN | NaN |
19 | 20 | Ginny Weasley | Human | Female | Gryffindor | Horse | NaN | NaN |
20 | 21 | Vernon Dursley | Human | Male | NaN | NaN | NaN | NaN |
df.loc[10:20,['Character Name','Gender']]
Character Name | Gender | |
---|---|---|
10 | Remus Lupin | Male |
11 | Draco Malfoy | Male |
12 | Alastor Moody | Male |
13 | Fred Weasley | Male |
14 | Dolores Umbridge | Female |
15 | Arthur Weasley | Male |
16 | Cornelius Fudge | Male |
17 | Sirius Black | Male |
18 | George Weasley | Male |
19 | Ginny Weasley | Female |
20 | Vernon Dursley | Male |
df.iloc[0:20,0:10]
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
5 | 6 | Severus Snape | Human | Male | Slytherin | Doe | NaN | NaN |
6 | 7 | Minerva McGonagall | Human | Female | Gryffindor | Cat | Fir | Dragon Heartstring |
7 | 8 | Horace Slughorn | Human | Male | Slytherin | NaN | Cedar | Dragon Heartstring |
8 | 9 | Voldemort | Human | Male | Slytherin | NaN | Yew | Phoenix Feather |
9 | 10 | Neville Longbottom | Human | Male | Gryffindor | NaN | Cherry | Unicorn Hair |
10 | 11 | Remus Lupin | Werewolf | Male | Gryffindor | Wolf | Cypress | Unicorn Hair |
11 | 12 | Draco Malfoy | Human | Male | Slytherin | NaN | Hawthorn | Unicorn Hair |
12 | 13 | Alastor Moody | Human | Male | NaN | NaN | NaN | NaN |
13 | 14 | Fred Weasley | Human | Male | Gryffindor | NaN | NaN | NaN |
14 | 15 | Dolores Umbridge | Human | Female | Slytherin | Cat | Birch | Dragon Heartstring |
15 | 16 | Arthur Weasley | Human | Male | Gryffindor | Weasel | NaN | NaN |
16 | 17 | Cornelius Fudge | Human | Male | NaN | NaN | NaN | NaN |
17 | 18 | Sirius Black | Human | Male | Gryffindor | NaN | NaN | NaN |
18 | 19 | George Weasley | Human | Male | Gryffindor | NaN | NaN | NaN |
19 | 20 | Ginny Weasley | Human | Female | Gryffindor | Horse | NaN | NaN |
#### How many Characters have as first letter the H letter?
df.loc[:,'Character Name']
0 Harry Potter 1 Ron Weasley 2 Hermione Granger 3 Albus Dumbledore 4 Rubeus Hagrid ... 161 Waiter 162 Boy 2 163 Crowd 164 Gryffindors 165 Professors Name: Character Name, Length: 166, dtype: object
df.loc[:,'Character Name'].str.contains('H.*',regex=True)
0 True 1 False 2 True 3 False 4 True ... 161 False 162 False 163 False 164 False 165 False Name: Character Name, Length: 166, dtype: bool
conditionArrayOnRows = df.loc[:,'Character Name'].str.contains('H.*',regex=True)
df.loc[conditionArrayOnRows,:]
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
7 | 8 | Horace Slughorn | Human | Male | Slytherin | NaN | Cedar | Dragon Heartstring |
46 | 47 | Nearly Headless Nick | Ghost | Human | Gryffindor | NaN | NaN | NaN |
49 | 50 | Rolanda Hooch | Human | Female | NaN | NaN | NaN | NaN |
53 | 54 | Helena Ravenclaw | Ghost | Female | Ravenclaw | NaN | NaN | NaN |
127 | 128 | Sorting Hat | NaN | NaN | NaN | NaN | NaN | NaN |
147 | 148 | Howler | NaN | NaN | NaN | NaN | NaN | NaN |
conditionArrayOnRows = df.loc[:,'Character Name'].str.contains('H.*',regex=True) & df.loc[:,'Gender'].str.contains('Male',regex=True)
df.loc[conditionArrayOnRows,:]
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
7 | 8 | Horace Slughorn | Human | Male | Slytherin | NaN | Cedar | Dragon Heartstring |
df = pd.read_excel('Characters.xlsx')
df
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
161 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN |
162 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN |
163 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN |
164 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN |
165 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN |
166 rows × 8 columns
df.columns
Index(['Character ID', 'Character Name', 'Species', 'Gender', 'House', 'Patronus', 'Wand (Wood)', 'Wand (Core)'], dtype='object')
df.index
RangeIndex(start=0, stop=166, step=1)
df.loc[0,'Character Name']
'Harry Potter'
I will explain you in a specific video on that
df = pd.read_json('Characters.json')
df
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | ||
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | Oak | ||
... | ... | ... | ... | ... | ... | ... | ... | ... |
161 | 162 | Waiter | ||||||
162 | 163 | Boy 2 | ||||||
163 | 164 | Crowd | ||||||
164 | 165 | Gryffindors | ||||||
165 | 166 | Professors |
166 rows × 8 columns
#### index
df.index
#### columns
df.columns
### only one column
df.iloc[0:10,1]
df.loc[0:10,'Character Name']
df['Character Name']
## unique
df['Gender'].unique()
array(['Male', 'Female', 'Human', ''], dtype=object)
type(df)
pandas.core.frame.DataFrame
type(df['Character Name'])
pandas.core.series.Series
print(type(df.loc[0,'Character Name']))
print(df.loc[0,'Character Name'])
<class 'str'> Harry Potter
sheet_name = "HarryPotterCharacters"
### the google sheet is open to everyone. If you wanna access use this link
# https://docs.google.com/spreadsheets/d/12-BkWsFz1N0Ql0cQJEED4O1PLMSAk7HAyYOxpVyXtaE/edit#gid=932548738
# Comment if you wanna see how to access to a private google sheet
sheet_id = "12-BkWsFz1N0Ql0cQJEED4O1PLMSAk7HAyYOxpVyXtaE"
url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
df = pd.read_csv(url)
df
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
161 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN |
162 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN |
163 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN |
164 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN |
165 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN |
166 rows × 8 columns
# Data inside the dataset
data = [['tom riddle',100000],['Albus the magician',100],['Merlin',10]]
# Create the dataframe
df = pd.DataFrame(data,columns=['Name','Power'])
#print dataframe
df
Name | Power | |
---|---|---|
0 | tom riddle | 100000 |
1 | Albus the magician | 100 |
2 | Merlin | 10 |
# Create a dataframe from 2 Series
series1 = pd.Series([1, 2, 3, 4, 5])
series2 = pd.Series(['A', 'B', 'C', 'D', 'E'])
# Create a DataFrame from the two Series
df = pd.DataFrame({'Column1': series1, 'Column2': series2})
df
Column1 | Column2 | |
---|---|---|
0 | 1 | A |
1 | 2 | B |
2 | 3 | C |
3 | 4 | D |
4 | 5 | E |
# Data inside the dataset
data = [['rectangle_1',10,10],['rectangle_2',1,2],['rectangle_3',5,2]]
# Create the dataframe
df = pd.DataFrame(data,columns=['shape','height','width'])
#print dataframe
df
shape | height | width | |
---|---|---|---|
0 | rectangle_1 | 10 | 10 |
1 | rectangle_2 | 1 | 2 |
2 | rectangle_3 | 5 | 2 |
df.loc[:,'area'] = df.loc[:,'height']*df['width']
df
shape | height | width | area | |
---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 |
1 | rectangle_2 | 1 | 2 | 2 |
2 | rectangle_3 | 5 | 2 | 10 |
df.loc[:,'volume'] = df.loc[:,'height']*df['width']
df
shape | height | width | area | volume | |
---|---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 | 100 |
1 | rectangle_2 | 1 | 2 | 2 | 2 |
2 | rectangle_3 | 5 | 2 | 10 | 10 |
df.drop(['volume'],axis=1)
shape | height | width | area | |
---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 |
1 | rectangle_2 | 1 | 2 | 2 |
2 | rectangle_3 | 5 | 2 | 10 |
df
shape | height | width | area | volume | |
---|---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 | 100 |
1 | rectangle_2 | 1 | 2 | 2 | 2 |
2 | rectangle_3 | 5 | 2 | 10 | 10 |
df = df.drop(['volume'],axis=1)
df
shape | height | width | area | |
---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 |
1 | rectangle_2 | 1 | 2 | 2 |
2 | rectangle_3 | 5 | 2 | 10 |
df['perimeterT'] = 2*( df.loc[:,'height']+df['width'])
df
shape | height | width | area | perimeterT | |
---|---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 | 40 |
1 | rectangle_2 | 1 | 2 | 2 | 6 |
2 | rectangle_3 | 5 | 2 | 10 | 14 |
df.rename(columns = {'perimeterT':'perimeter'})
shape | height | width | area | perimeter | |
---|---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 | 40 |
1 | rectangle_2 | 1 | 2 | 2 | 6 |
2 | rectangle_3 | 5 | 2 | 10 | 14 |
df
shape | height | width | area | perimeterT | |
---|---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 | 40 |
1 | rectangle_2 | 1 | 2 | 2 | 6 |
2 | rectangle_3 | 5 | 2 | 10 | 14 |
df = df.rename(columns = {'perimeterT':'perimeter'})
df
shape | height | width | area | perimeter | |
---|---|---|---|---|---|
0 | rectangle_1 | 10 | 10 | 100 | 40 |
1 | rectangle_2 | 1 | 2 | 2 | 6 |
2 | rectangle_3 | 5 | 2 | 10 | 14 |
df.perimeter
0 40 1 6 2 14 Name: perimeter, dtype: int64
df.width
0 10 1 2 2 2 Name: width, dtype: int64
df.loc[:,'shape']
0 rectangle_1 1 rectangle_2 2 rectangle_3 Name: shape, dtype: object
df.index
RangeIndex(start=0, stop=3, step=1)
df = df.set_index('shape')
df.index
Index(['rectangle_1', 'rectangle_2', 'rectangle_3'], dtype='object', name='shape')
df
height | width | area | perimeter | |
---|---|---|---|---|
shape | ||||
rectangle_1 | 10 | 10 | 100 | 40 |
rectangle_2 | 1 | 2 | 2 | 6 |
rectangle_3 | 5 | 2 | 10 | 14 |
# Data inside the dataset
data = [['rectangle_1',10,10]
,['rectangle_2',1,2]
,['rectangle_3',5,2]
,['rectangle_3',5.01,2]
,['rectangle_3',5.0003,2]
]
# Create the dataframe
df = pd.DataFrame(data,columns=['shape','height','width'])
#print dataframe
df
shape | height | width | |
---|---|---|---|
0 | rectangle_1 | 10.0000 | 10 |
1 | rectangle_2 | 1.0000 | 2 |
2 | rectangle_3 | 5.0000 | 2 |
3 | rectangle_3 | 5.0100 | 2 |
4 | rectangle_3 | 5.0003 | 2 |
df = df.set_index('shape',verify_integrity=True)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-100-942061329115> in <module> ----> 1 df = df.set_index('shape',verify_integrity=True) ~\anaconda3\lib\site-packages\pandas\core\frame.py in set_index(self, keys, drop, append, inplace, verify_integrity) 4777 if verify_integrity and not index.is_unique: 4778 duplicates = index[index.duplicated()].unique() -> 4779 raise ValueError(f"Index has duplicate keys: {duplicates}") 4780 4781 # use set to handle duplicate column names gracefully in case of drop ValueError: Index has duplicate keys: Index(['rectangle_3'], dtype='object', name='shape')
df
height | width | |
---|---|---|
shape | ||
rectangle_1 | 10.0000 | 10 |
rectangle_2 | 1.0000 | 2 |
rectangle_3 | 5.0000 | 2 |
rectangle_3 | 5.0100 | 2 |
rectangle_3 | 5.0003 | 2 |
### create a dataset
# Data inside the dataset
data = [['World War II starts','01/09/1939']
,['World War II ends','02/09/1945']
,['Landing on the Moon','20/07/1969']
,['Twin Towers Fall','11/09/2001']
]
# Create the dataframe
df = pd.DataFrame(data,columns=['event','date_event'])
#print dataframe
df
event | date_event | |
---|---|---|
0 | World War II starts | 01/09/1939 |
1 | World War II ends | 02/09/1945 |
2 | Landing on the Moon | 20/07/1969 |
3 | Twin Towers Fall | 11/09/2001 |
type(df.loc[0,'date_event'])
str
from datetime import datetime
datetime_object = datetime.strptime('02/04/2023','%d/%m/%Y')
datetime_object
datetime.datetime(2023, 4, 2, 0, 0)
datetime_object = datetime.strptime('02-04-2023','%d-%m-%Y')
datetime_object = datetime.strptime('02-04-23','%d-%m-%y')
datetime_object
datetime.datetime(2023, 4, 2, 0, 0)
df['date_event'] = df['date_event'].apply(lambda e : datetime.strptime(e,'%d/%m/%Y'))
df['date_event']
0 1939-09-01 1 1945-09-02 2 1969-07-20 3 2001-09-11 Name: date_event, dtype: datetime64[ns]
type(df.loc[0,'date_event'])
pandas._libs.tslibs.timestamps.Timestamp
datetime_object = datetime.strptime('02-04-2023','%d-%m-%Y')
df['distance_from_now'] = datetime_object-df['date_event']
df['distance_from_now']
0 30529 days 1 28336 days 2 19614 days 3 7873 days Name: distance_from_now, dtype: timedelta64[ns]
df['distance_from_now_integer'] =df['distance_from_now'].dt.days
df['distance_from_now_year'] = df['distance_from_now_integer']/365
df['distance_from_now_year']
0 83.641096 1 77.632877 2 53.736986 3 21.569863 Name: distance_from_now_year, dtype: float64
df.sort_values(by=['distance_from_now_year'])
event | date_event | distance_from_now | distance_from_now_integer | distance_from_now_year | |
---|---|---|---|---|---|
3 | Twin Towers Fall | 2001-09-11 | 7873 days | 7873 | 21.569863 |
2 | Landing on the Moon | 1969-07-20 | 19614 days | 19614 | 53.736986 |
1 | World War II ends | 1945-09-02 | 28336 days | 28336 | 77.632877 |
0 | World War II starts | 1939-09-01 | 30529 days | 30529 | 83.641096 |
# Data inside the dataset
data = [['rectangle_1',10,10],
['rectangle_1',10,10]
,['rectangle_2',1,2]
,['rectangle_3',5,2]
,['rectangle_3',5.01,2]
,['rectangle_3',5.0003,2]
]
# Create the dataframe
df = pd.DataFrame(data,columns=['shape','height','width'])
#print dataframe
df
shape | height | width | |
---|---|---|---|
0 | rectangle_1 | 10.0000 | 10 |
1 | rectangle_1 | 10.0000 | 10 |
2 | rectangle_2 | 1.0000 | 2 |
3 | rectangle_3 | 5.0000 | 2 |
4 | rectangle_3 | 5.0100 | 2 |
5 | rectangle_3 | 5.0003 | 2 |
conditionOnRowsDuplicate = ~df.duplicated()
conditionOnRowsDuplicate
0 True 1 False 2 True 3 True 4 True 5 True dtype: bool
df.loc[conditionOnRowsDuplicate,:]
shape | height | width | |
---|---|---|---|
0 | rectangle_1 | 10.0000 | 10 |
2 | rectangle_2 | 1.0000 | 2 |
3 | rectangle_3 | 5.0000 | 2 |
4 | rectangle_3 | 5.0100 | 2 |
5 | rectangle_3 | 5.0003 | 2 |
df.duplicated(subset=['shape'])
0 False 1 True 2 False 3 False 4 True 5 True dtype: bool
df.groupby('shape').mean()
height | width | |
---|---|---|
shape | ||
rectangle_1 | 10.000000 | 10 |
rectangle_2 | 1.000000 | 2 |
rectangle_3 | 5.003433 | 2 |
df.groupby('shape').median()
height | width | |
---|---|---|
shape | ||
rectangle_1 | 10.0000 | 10 |
rectangle_2 | 1.0000 | 2 |
rectangle_3 | 5.0003 | 2 |
df.groupby('shape').median().plot(kind='bar',title='Rectangle plot')
<AxesSubplot:title={'center':'Rectangle plot'}, xlabel='shape'>
df.describe()
height | width | |
---|---|---|
count | 6.000000 | 6.000000 |
mean | 6.001717 | 4.666667 |
std | 3.463509 | 4.131182 |
min | 1.000000 | 2.000000 |
25% | 5.000075 | 2.000000 |
50% | 5.005150 | 2.000000 |
75% | 8.752500 | 8.000000 |
max | 10.000000 | 10.000000 |
## df is a dataframe
df = pd.read_csv('Characters.csv')
df
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
161 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN |
162 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN |
163 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN |
164 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN |
165 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN |
166 rows × 8 columns
df.dropna() ### I can drop the rows presenting at least one null value
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
6 | 7 | Minerva McGonagall | Human | Female | Gryffindor | Cat | Fir | Dragon Heartstring |
10 | 11 | Remus Lupin | Werewolf | Male | Gryffindor | Wolf | Cypress | Unicorn Hair |
14 | 15 | Dolores Umbridge | Human | Female | Slytherin | Cat | Birch | Dragon Heartstring |
df.dropna(subset=['Gender']) ### I can drop the rows presenting at least one null value in a specific column
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
120 | 121 | Charity Burbage | Human | Female | NaN | NaN | NaN | NaN |
121 | 122 | Dilys Derwent | Human | Female | NaN | NaN | NaN | NaN |
122 | 123 | Eldred Worple | Human | Male | NaN | NaN | NaN | NaN |
123 | 124 | Marcus Belby | Human | Male | Ravenclaw | NaN | NaN | NaN |
124 | 125 | Mykew Gregorovitch | Human | Male | NaN | NaN | NaN | NaN |
125 rows × 8 columns
We have a dataset like this About the cancer at lungs
age | ifIsASmoker | praticeSports | nationality | blood group | predictiveIfHasACancerOnLung
30000 have not blood group
## df is a dataframe
df = pd.read_csv('Characters.csv',encoding='latin1')
df
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather |
1 | 2 | Ron Weasley | Human | Male | Gryffindor | Jack Russell Terrier | NaN | NaN |
2 | 3 | Hermione Granger | Human | Female | Gryffindor | Otter | Vine | Dragon Heartstring |
3 | 4 | Albus Dumbledore | Human | Male | Gryffindor | Phoenix | Elder | Thestral Tail Hair |
4 | 5 | Rubeus Hagrid | Half-Human/Half-Giant | Male | Gryffindor | NaN | Oak | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
161 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN |
162 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN |
163 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN |
164 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN |
165 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN |
166 rows × 8 columns
## df is a dataframe
df_dialogue = pd.read_csv('Dialogue.csv',encoding='latin1')
df_dialogue
Dialogue ID | Chapter ID | Place ID | Character ID | Dialogue | |
---|---|---|---|---|---|
0 | 1 | 1 | 8 | 4 | I should have known that you would be here...P... |
1 | 2 | 1 | 8 | 7 | Good evening, Professor Dumbledore. Are the ru... |
2 | 3 | 1 | 8 | 4 | I'm afraid so, Professor. The good, and the bad. |
3 | 4 | 1 | 8 | 7 | And the boy? |
4 | 5 | 1 | 8 | 4 | Hagrid is bringing him. |
... | ... | ... | ... | ... | ... |
7439 | 7440 | 234 | 71 | 1 | Then Slytherin House will have gained a wonder... |
7440 | 7441 | 234 | 71 | 84 | Really? |
7441 | 7442 | 234 | 71 | 1 | Really. |
7442 | 7443 | 234 | 71 | 1 | Ready? |
7443 | 7444 | 234 | 71 | 84 | Ready. |
7444 rows × 5 columns
conditionOnRows = df_dialogue.loc[:,'Dialogue'].str.contains('Dad! Look! Harry\'s got a letter!')
conditionOnRows
0 False 1 False 2 False 3 False 4 False ... 7439 False 7440 False 7441 False 7442 False 7443 False Name: Dialogue, Length: 7444, dtype: bool
df_dialogue.loc[conditionOnRows,:]
Dialogue ID | Chapter ID | Place ID | Character ID | Dialogue | |
---|---|---|---|---|---|
63 | 64 | 3 | 8 | 31 | Dad! Look! Harry's got a letter! |
Character 31 is Dudley Dursley
df_merge = df.merge(df_dialogue, left_on='Character ID',right_on='Character ID')
df_merge
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | Dialogue ID | Chapter ID | Place ID | Dialogue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 22 | 2 | 8 | Yes, Aunt Petunia. |
1 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 25 | 2 | 8 | Yes, Uncle Vernon. |
2 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 38 | 2 | 8 | He's asleep! |
3 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 40 | 2 | 8 | Sorry about him. He doesn't understand what it... |
4 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 44 | 2 | 8 | Anytime. |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7439 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN | 1087 | 41 | 71 | Oy! What do you two think you're doing? |
7440 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN | 357 | 15 | 52 | Oh, that was wicked, Harry. |
7441 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN | 528 | 20 | 51 | Go go Gryffindor! Go go Gryffindor! Go go Gryf... |
7442 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN | 508 | 19 | 51 | Yay! |
7443 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN | 7048 | 219 | 66 | Protego Maxima... Finato Duri... Repello Inimi... |
7444 rows × 12 columns
conditionOnRows = df_merge.loc[:,'Dialogue'].str.contains('Dad! Look! Harry\'s got a letter!')
conditionOnRows
0 False 1 False 2 False 3 False 4 False ... 7439 False 7440 False 7441 False 7442 False 7443 False Name: Dialogue, Length: 7444, dtype: bool
df_merge.loc[conditionOnRows,:]
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | Dialogue ID | Chapter ID | Place ID | Dialogue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
6455 | 31 | Dudley Dursley | Human | Male | NaN | NaN | NaN | NaN | 64 | 3 | 8 | Dad! Look! Harry's got a letter! |
df_merge
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | Dialogue ID | Chapter ID | Place ID | Dialogue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 22 | 2 | 8 | Yes, Aunt Petunia. |
1 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 25 | 2 | 8 | Yes, Uncle Vernon. |
2 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 38 | 2 | 8 | He's asleep! |
3 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 40 | 2 | 8 | Sorry about him. He doesn't understand what it... |
4 | 1 | Harry Potter | Human | Male | Gryffindor | Stag | Holly | Phoenix Feather | 44 | 2 | 8 | Anytime. |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7439 | 162 | Waiter | NaN | NaN | NaN | NaN | NaN | NaN | 1087 | 41 | 71 | Oy! What do you two think you're doing? |
7440 | 163 | Boy 2 | NaN | NaN | NaN | NaN | NaN | NaN | 357 | 15 | 52 | Oh, that was wicked, Harry. |
7441 | 164 | Crowd | NaN | NaN | NaN | NaN | NaN | NaN | 528 | 20 | 51 | Go go Gryffindor! Go go Gryffindor! Go go Gryf... |
7442 | 165 | Gryffindors | NaN | NaN | NaN | NaN | NaN | NaN | 508 | 19 | 51 | Yay! |
7443 | 166 | Professors | NaN | NaN | NaN | NaN | NaN | NaN | 7048 | 219 | 66 | Protego Maxima... Finato Duri... Repello Inimi... |
7444 rows × 12 columns
df_merge.sort_values(by=['House', 'Harry Potter'])
df_merge.sort_values(by=['House', 'Character Name'])
Character ID | Character Name | Species | Gender | House | Patronus | Wand (Wood) | Wand (Core) | Dialogue ID | Chapter ID | Place ID | Dialogue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
7143 | 81 | Fleur Delacour | Human (Quarter-Veela) | Female | Beauxbatons Academy of Magic | NaN | NaN | NaN | 3302 | 121 | 37 | You saved her, even though she wasn't yours to... |
7144 | 81 | Fleur Delacour | Human (Quarter-Veela) | Female | Beauxbatons Academy of Magic | NaN | NaN | NaN | 3303 | 121 | 37 | Thank you! And you... . You helped! |
7145 | 81 | Fleur Delacour | Human (Quarter-Veela) | Female | Beauxbatons Academy of Magic | NaN | NaN | NaN | 3485 | 130 | 58 | Au revoir Ron. |
7146 | 81 | Fleur Delacour | Human (Quarter-Veela) | Female | Beauxbatons Academy of Magic | NaN | NaN | NaN | 5797 | 186 | 8 | You are still beautiful to me, William. |
7147 | 81 | Fleur Delacour | Human (Quarter-Veela) | Female | Beauxbatons Academy of Magic | NaN | NaN | NaN | 5827 | 186 | 8 | Bill, look away -- I'm 'ideous. |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6559 | 34 | Xenophilius Lovegood | Human | Male | NaN | NaN | NaN | NaN | 6627 | 204 | 12 | Did, didn't I? How silly of me. |
6560 | 34 | Xenophilius Lovegood | Human | Male | NaN | NaN | NaN | NaN | 6629 | 204 | 12 | No, you mustn't -- |
6561 | 34 | Xenophilius Lovegood | Human | Male | NaN | NaN | NaN | NaN | 6631 | 204 | 12 | You're my only hope. They were angry, you see... |
6562 | 34 | Xenophilius Lovegood | Human | Male | NaN | NaN | NaN | NaN | 6633 | 204 | 12 | Him. Surely you call him You-Know- Who. But ... |
6563 | 34 | Xenophilius Lovegood | Human | Male | NaN | NaN | NaN | NaN | 6635 | 204 | 12 | Stop! I've got him -- |
7444 rows × 12 columns
df_merge.to_csv('Character_Dialogue.csv',sep=';')
df_merge.to_excel('Character_Dialogue.xlsx')
df_merge.to_json('Character_Dialogue.json')
df_merge= pd.read_csv('Character_Dialogue.csv',delimiter=';')
DataPandas here: https://www.webdatapandas.com/