pandas數(shù)據(jù)處理進(jìn)階-創(chuàng)新互聯(lián)

一、pandas的統(tǒng)計(jì)分析

成都創(chuàng)新互聯(lián)是一家集網(wǎng)站建設(shè),東海企業(yè)網(wǎng)站建設(shè),東海品牌網(wǎng)站建設(shè),網(wǎng)站定制,東海網(wǎng)站建設(shè)報(bào)價(jià),網(wǎng)絡(luò)營(yíng)銷,網(wǎng)絡(luò)優(yōu)化,東海網(wǎng)站推廣為一體的創(chuàng)新建站企業(yè)，幫助傳統(tǒng)企業(yè)提升企業(yè)形象加強(qiáng)企業(yè)競(jìng)爭(zhēng)力?？沙浞譂M足這一群體相比中小企業(yè)更為豐富、高端、多元的互聯(lián)網(wǎng)需求。同時(shí)我們時(shí)刻保持專業(yè)、時(shí)尚、前沿，時(shí)刻以成就客戶成長(zhǎng)自我，堅(jiān)持不斷學(xué)習(xí)、思考、沉淀、凈化自己，讓我們?yōu)楦嗟钠髽I(yè)打造出實(shí)用型網(wǎng)站。

1、關(guān)于pandas 的數(shù)值統(tǒng)計(jì)(統(tǒng)計(jì)detail 中的單價(jià)的相關(guān)指標(biāo))

import pandas as pd

# 加載數(shù)據(jù)

detail = pd.read_excel("./meal_order_detail.xlsx")

print("detail :\n", detail)

print("detail 的列索引名稱:\n", detail.columns)

print("detail 的形狀:\n", detail.shape)

print("detail 數(shù)據(jù)類型:\n", detail.dtypes)

print("amounts 的大值：\n",detail.loc[:,'amounts'].max())

print("amounts 的最小值：\n",detail.loc[:,'amounts'].min())

print("amounts 的均值：\n",detail.loc[:,'amounts'].mean())

print("amounts 的中位數(shù)：\n",detail.loc[:,'amounts'].median())

print("amounts 的方差：\n",detail.loc[:,'amounts'].var())

print("amounts 的describe：\n",detail.loc[:,'amounts'].describe())

# 對(duì)于兩列的統(tǒng)計(jì)結(jié)果

print("amounts 的describe：\n",detail.loc[:,['counts','amounts']].describe())

print("amounts 的describe：\n",detail.loc[:,'amounts'].describe())

print("amounts 的describe：\n",detail.loc[:,'counts'].describe())

print("amounts 的極差：\n",detail.loc[:,'amounts'].ptp())

print("amounts 的標(biāo)準(zhǔn)差：\n",detail.loc[:,'amounts'].std())

print("amounts 的眾數(shù)：\n",detail.loc[:,'amounts'].mode()) # 返回眾數(shù)的數(shù)組

print("amounts 的眾數(shù)：\n",detail.loc[:,'counts'].mode()) # 返回眾數(shù)的數(shù)組

print("amounts 的非空值的數(shù)目：\n",detail.loc[:,'amounts'].count())

print("amounts 的大值的位置：\n",detail.loc[:,'amounts'].idxmax()) # np.argmax()

print("amounts 的最小值的位置：\n",detail.loc[:,'amounts'].idxmin()) # np.argmin()

2、pandas對(duì)于非數(shù)值型數(shù)據(jù)的統(tǒng)計(jì)分析

(1)對(duì)于dataframe轉(zhuǎn)化數(shù)據(jù)類型，其他類型轉(zhuǎn)化為object類型

detail.loc[:,'amounts'] = detail.loc[:,'amounts'].astype('object')

(2)類別型數(shù)據(jù)

detail.loc[:,'amounts'] = detail.loc[:,'amounts'].astype('category')

print("統(tǒng)計(jì)類別型數(shù)據(jù)的describe指標(biāo):\n",detail.loc[:, 'amounts'].describe())

(3)統(tǒng)計(jì)實(shí)例

#### 統(tǒng)計(jì)在detail中最火的菜品以及賣出的份數(shù)

## 若白飯算菜

detail.loc[:, 'dishes_name'] = detail.loc[:, 'dishes_name'].astype('category')

print("按照dishes_name統(tǒng)計(jì)描述信息：\n", detail.loc[:, 'dishes_name'].describe())

## 若白飯不算菜，把白飯刪除，再統(tǒng)計(jì)

# drop labels，行的名稱， axis =0,inplace = True

# 定位到白飯的行

bool_id = detail.loc[:, 'dishes_name'] == '白飯/大碗'

# 進(jìn)行獲取行名稱

index = detail.loc[bool_id, :].index

# 進(jìn)行刪除

detail.drop(labels=index, axis=0, inplace=True)

# 在進(jìn)行轉(zhuǎn)化類型

detail.loc[:, 'dishes_name'] = detail.loc[:, 'dishes_name'].astype('category')

# 在進(jìn)行統(tǒng)計(jì)描述信息

print("按照dishes_name統(tǒng)計(jì)描述信息：\n", detail.loc[:, 'dishes_name'].describe())

## 統(tǒng)計(jì)在detail 中訂單中菜品最多的

# 將 order_id 轉(zhuǎn)化為類別型數(shù)據(jù) ，再進(jìn)行describe

detail.loc[:, 'order_id'] = detail.loc[:, 'order_id'].astype("category")

# 統(tǒng)計(jì)描述

print("按照order_id統(tǒng)計(jì)描述信息為:\n", detail.loc[:, 'order_id'].describe())

二、pandas時(shí)間數(shù)據(jù)

datetime64[ns] ---numpy 里面的時(shí)間點(diǎn)類

Timestamp ---pandas 默認(rèn)的時(shí)間點(diǎn)類型----封裝了datetime64[ns]

DatetimeIndex ---pandas 默認(rèn)支持的時(shí)間序列結(jié)構(gòu)

1、可以通過(guò) pd.to_datetime 將時(shí)間點(diǎn)數(shù)據(jù)轉(zhuǎn)化為pandas默認(rèn)支持的時(shí)間點(diǎn)數(shù)據(jù)

res = pd.to_datetime("2016/01/01")

print("res:\n",res)

print("res 的類型：\n",type(res))

2、時(shí)間序列轉(zhuǎn)化 --可以通過(guò)pd.to_datetime 或者pd.DatetimeIndex將時(shí)間序列轉(zhuǎn)化為pandas默認(rèn)支持的時(shí)間序列結(jié)構(gòu)

res = pd.to_datetime(['2016-01-01', '2016-01-01', '2016-01-01', '2011-01-01'])

res1 = pd.DatetimeIndex(['2016-01-01', '2016-01-02', '2016-02-05', '2011-09-01'])

print("res:\n", res)

print("res 的類型：\n", type(res))

print("res1:\n", res1)

print("res1 的類型：\n", type(res1))

3、

import pandas as pd

# #加載數(shù)據(jù)

detail = pd.read_excel("./meal_order_detail.xlsx")

# print("detail :\n",detail)

print("detail 的列索引名稱:\n", detail.columns)

print("detail 的形狀:\n", detail.shape)

# print("detail 數(shù)據(jù)類型:\n",detail.dtypes)

print("*" * 80)

# 獲取place_order_time列

print(detail.loc[:, 'place_order_time'])

# 轉(zhuǎn)化為pandas默認(rèn)支持的時(shí)間序列結(jié)構(gòu)

detail.loc[:, 'place_order_time'] = pd.to_datetime(detail.loc[:, 'place_order_time'])

# print(detail.dtypes)

print("*" * 80)

# 獲取該時(shí)間序列的屬性---可以通過(guò)列表推導(dǎo)式來(lái)獲取時(shí)間點(diǎn)的屬性

year = [i.year for i in detail.loc[:, 'place_order_time']]

print("年：\n", year)

month = [i.month for i in detail.loc[:, 'place_order_time']]

print("月：\n", month)

day = [i.day for i in detail.loc[:, 'place_order_time']]

print("日：\n", day)

quarter = [i.quarter for i in detail.loc[:, 'place_order_time']]

print("季度：\n", quarter)

# 返回對(duì)象

weekday = [i.weekday for i in detail.loc[:, 'place_order_time']]

print("周幾：\n", weekday)

weekday_name = [i.weekday_name for i in detail.loc[:, 'place_order_time']]

print("周幾：\n", weekday_name)

is_leap_year = [i.is_leap_year for i in detail.loc[:, 'place_order_time']]

print("是否閏年：\n", is_leap_year)

4、時(shí)間加減

import pandas as pd

res = pd.to_datetime("2016-01-01")

print("res:\n", res)

print("res 的類型：\n", type(res))

print("時(shí)間推后一天：\n", res + pd.Timedelta(days=1))

print("時(shí)間推后一小時(shí)：\n", res + pd.Timedelta(hours=1))

detail.loc[:, 'place_over_time'] = detail.loc[:, 'place_order_time'] + pd.Timedelta(days=1)

print(detail)

## 時(shí)間差距計(jì)算

res = pd.to_datetime('2019-10-9') - pd.to_datetime('1996-11-07')

print(res)

5、獲取本機(jī)可以使用的最初時(shí)間和最后使用的時(shí)間節(jié)點(diǎn)

print(pd.Timestamp.min)

print(pd.Timestamp.max)

三、分組聚合

import pandas as pd

import numpy as np

# 加載數(shù)據(jù)

users = pd.read_excel("./users.xlsx")

print("users:\n", users)

print("users 的列索引：\n", users.columns)

print("users 的數(shù)據(jù)類型：\n", users.dtypes)

# 根據(jù)班級(jí)分組、統(tǒng)計(jì)學(xué)員的班級(jí)的平均年齡

# groupby 分組

# by ---指定分組的列，可以是單列也可以是多列

# res = users.groupby(by='ORGANIZE_NAME')['age'].mean()

# 按照單列進(jìn)行分組，統(tǒng)計(jì)多個(gè)列的指標(biāo)

# res = users.groupby(by='ORGANIZE_NAME')[['age','USER_ID']].mean()

res = users.groupby(by=['ORGANIZE_NAME', 'poo', 'sex'])['age'].mean()

print(res)

# 利用agg

# 進(jìn)行同時(shí)對(duì)age 求平均值、對(duì)userid 求大值

# 只需要指定 np.方法名

print(users.agg({'age': np.mean, 'USER_ID': np.max}))

# 對(duì)age 和 USER_ID 同時(shí)分別求和和均值

print(users[['age', 'USER_ID']].agg([np.sum, np.mean]))

# 對(duì)age USER_ID 求取不同個(gè)數(shù)的統(tǒng)計(jì)指標(biāo)

print(users.agg({'age': np.min, 'USER_ID': [np.mean, np.sum]}))

def hh(x):

return x + 1

# 自定義函數(shù)進(jìn)行計(jì)算

# res = users['age'].apply(hh)

# res = users[['age','USER_ID']].apply(lambda x:x+1)

res = users['age'].transform(lambda x: x + 1)

# 不能進(jìn)行跨列的運(yùn)算

print(res)

四、透視表與交叉表

import pandas as pd

# 加載數(shù)據(jù)

detail = pd.read_excel("./meal_order_detail.xlsx")

print("detail :\n", detail)

print("detail 的列名：\n", detail.columns)

print("detail 的數(shù)據(jù)類型：\n", detail.dtypes)

# 獲取時(shí)間點(diǎn)的日屬性

# 必須pandas默認(rèn)支持的時(shí)間序列類型

detail.loc[:, 'place_order_time'] = pd.to_datetime(detail.loc[:, 'place_order_time'])

# 以列表推導(dǎo)式來(lái)獲取日屬性

detail.loc[:, 'day'] = [i.day for i in detail.loc[:, 'place_order_time']]

# 透視表是一種plus 版的分組聚合

# 創(chuàng)建一個(gè)透視表

# data dataframe數(shù)據(jù)

# values 最終統(tǒng)計(jì)指標(biāo)所針對(duì)對(duì)象，要關(guān)心的數(shù)據(jù)主體

# index --按照index 進(jìn)行行分組

# columns ---按照columns進(jìn)行列分組

# aggfunc ---對(duì)主體進(jìn)行什么指標(biāo)的統(tǒng)計(jì)

# res = pd.pivot_table(data=detail[['amounts','order_id','counts','dishes_name','day']],values='amounts',columns=['day','counts'],index=['order_id','dishes_name'],aggfunc='mean',margins=True)

# # print(res)無(wú)錫人流醫(yī)院 http://www.wxbhnk120.com/

# res.to_excel("./hh.xlsx")

# 交叉表 mini版的透視表

# 如果只傳index 與columns 統(tǒng)計(jì)這兩列的相對(duì)個(gè)數(shù)

# res = pd.crosstab(index=detail['counts'],columns=detail['amounts'])

# values 必須和aggfunc同時(shí)存在

res = pd.crosstab(index=detail['order_id'],columns=detail['counts'],values=detail['amounts'],aggfunc='mean')

print(res)

五、案例

1、營(yíng)業(yè)額案例

import pandas as pd

# detail 有時(shí)間數(shù)據(jù)

# 加載數(shù)據(jù)

detail = pd.read_excel("./meal_order_detail.xlsx")

print("detail :\n", detail)

print("detail 的列名：\n", detail.columns)

print("detail 的數(shù)據(jù)類型：\n", detail.dtypes)

# 計(jì)算每個(gè)菜品的銷售額，增加到detail

detail.loc[:, 'pay'] = detail.loc[:, 'counts'] * detail.loc[:, 'amounts']

# print(detail)

# 獲取時(shí)間點(diǎn)的日屬性

# 必須pandas默認(rèn)支持的時(shí)間序列類型

detail.loc[:, 'place_order_time'] = pd.to_datetime(detail.loc[:, 'place_order_time'])

# 以列表推導(dǎo)式來(lái)獲取日屬性

detail.loc[:, 'day'] = [i.day for i in detail.loc[:, 'place_order_time']]

# print(detail)

# 以日為分組，統(tǒng)計(jì)pay的sum

res = detail.groupby(by='day')['pay'].sum()

print(res)

# print(type(res))

df = pd.DataFrame(res.values, columns=['monty'], index=res.index)

print(df)

print(type(df))

2、連鎖超市案例

import pandas as pd

# 加載數(shù)據(jù)

order = pd.read_csv("./order.csv", encoding='ansi')

print("order:\n", order)

print("order 的列索引：\n", order.columns)

# 1、哪些類別的商品比較暢銷?

# 剔除銷量 < 0 的數(shù)據(jù) (保留銷量 >0 的數(shù)據(jù))

# 保存

bool_id = order.loc[:, '銷量'] > 0

data = order.loc[bool_id, :] # 剔除異常數(shù)據(jù)之后的正常數(shù)據(jù)

print(data.shape)

print("*" * 80)

# 刪除異常

# bool_id = order.loc[:,'銷量'] <= 0

# index = order.loc[bool_id,:].index

# data = order.drop(labels=index,axis=0,inplace=False)

# 按照類別進(jìn)行分組，統(tǒng)計(jì)銷量的和

# 進(jìn)行dataframe或者series的值排序

# 如果series sort_values()直接按照seies的值進(jìn)行排序

# 如果df 那么需要指定按照哪一列進(jìn)行排序，by= 列名

# 默認(rèn)是升序ascending=True

# ascending=False 降序

# res = data.groupby(by='類別ID')['銷量'].sum().sort_values(ascending=False)

# print(res)

# 2、哪些商品比較暢銷?

# 分組聚合實(shí)現(xiàn)

# res = data.groupby(by='商品ID')['銷量'].sum().sort_values(ascending=False).head(10)

# print(res)

# 透視表實(shí)現(xiàn)

# res = pd.pivot_table(data=data.loc[:, ['商品ID', '銷量']], index='商品ID', values='銷量', aggfunc='sum').sort_values(by='銷量',

# ascending=False).head(

# 10)

# print(res)

# 3、求不同門店的銷售額占比

# 提示：訂單中沒有銷售額字段，所有需要新增一個(gè)銷售額字段。增加字段后按照門店編號(hào)進(jìn)行分組，然后計(jì)算占比。

# # 先計(jì)算銷售額

# data.loc[:,'銷售額'] = data.loc[:,'單價(jià)'] * data.loc[:,'銷量']

# # 按照門店編號(hào)進(jìn)行分組統(tǒng)計(jì)銷售額的sum

# res = data.groupby(by='門店編號(hào)')['銷售額'].sum()

# # print(res)

# # 計(jì)算所有的銷售額總和

# all_ = res.sum()

# # print(all_)

# per_ = res / all_

# print("各個(gè)門店的銷售額占比為：\n",per_.apply(lambda x:format(x,".2%")))

# a = 100.105

# print("%.2f"%a)

# print("{}%".format(2.0))

# 匿名函數(shù)

# print(lambda x:x+5) #

# def add(x):

# # return x+5

# 4、哪段時(shí)間段是超市的客流高峰期?

# 提示：需要知道每個(gè)時(shí)間段對(duì)應(yīng)的客流量，但是訂單表中既有日期又有時(shí)間，我們需要從中提出小時(shí)數(shù)，這里利用訂單ID去重計(jì)數(shù)代表客流量。

# 先對(duì)訂單去重

# subset 去重的那一列的列名，可以是多列，多列的時(shí)候傳列表

data.drop_duplicates(subset='訂單ID', inplace=True)

# print(data.shape)

# 按照小時(shí)分組對(duì)訂單ID進(jìn)行統(tǒng)計(jì)數(shù)量

# 將成交時(shí)間轉(zhuǎn)化為 pandas默認(rèn)支持的時(shí)間序列類型

data.loc[:, '成交時(shí)間'] = pd.to_datetime(data.loc[:, '成交時(shí)間'])

# 獲取小時(shí)屬性，增加到data 中

data.loc[:, 'hour'] = [i.hour for i in data.loc[:, '成交時(shí)間']]

# print(data)

# 按照hour 分組統(tǒng)計(jì) 訂單ID數(shù)量

res = data.groupby(by='hour')['訂單ID'].count().sort_values(ascending=False)

print(res)

另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)cdcxhl.cn，海內(nèi)外云服務(wù)器15元起步，三天無(wú)理由+7*72小時(shí)售后在線，公司持有idc許可證，提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國(guó)服務(wù)器、虛擬主機(jī)、免備案服務(wù)器”等云主機(jī)租用服務(wù)以及企業(yè)上云的綜合解決方案，具有“安全穩(wěn)定、簡(jiǎn)單易用、服務(wù)可用性高、性價(jià)比高”等特點(diǎn)與優(yōu)勢(shì)，專為企業(yè)上云打造定制，能夠滿足用戶豐富、多元化的應(yīng)用場(chǎng)景需求。

當(dāng)前文章：pandas數(shù)據(jù)處理進(jìn)階-創(chuàng)新互聯(lián)
分享網(wǎng)址：http://www.rwnh.cn/article4/cejhie.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供電子商務(wù)、靜態(tài)網(wǎng)站、微信公眾號(hào)、企業(yè)建站、微信小程序、網(wǎng)站設(shè)計(jì)

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請(qǐng)盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如需處理請(qǐng)聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來(lái)源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

内射老阿姨1区2区3区4区_久久精品人人做人人爽电影蜜月_久久国产精品亚洲77777_99精品又大又爽又粗少妇毛片

pandas數(shù)據(jù)處理進(jìn)階-創(chuàng)新互聯(lián)