A few months ago I noticed a blog post listing the most commonly used functions/modules for a few of the most popular python libraries as determined by number of instances on Github. I’ve created visualizations of these results and wrote examples for the top 10 from each library. A few are included here, but the full set of examples can be found in the ipython notebook file.

MOST POPULAR PANDAS, PANDAS.DATAFRAME, NUMPY, AND SCIPY FUNCTIONS ON GITHUB

I pulled the statistics from the original post (linked to above) using requests and BeautifulSoup for python. The bar plots were made with matplotlib and seaborn, where the functions are ordered by the number of unique repositories containing instances. For example we see that pd.Timestamp is not as often used in a project as a number of others, despite it having a very high number of total instances on Github.

PANDAS

popular_pandas_functions

1) Dataframe: Creates a dataframe object.

1
2
3
4
5
6
7
df=pd.DataFrame(data={'y': [1,2,3],
                        'score': [93.5,89.4,90.3],
                        'name': ['Dirac','Pauli','Bohr'],
                        'birthday': ['1902-08-08','1900-04-25','1885-10-07']})
print(type(df))
print(df.dtypes)
df

out_1

 

6) Merge: Combine dataframes.

1
2
3
4
5
6
df_new=pd.DataFrame(data=list(zip(['Dirac','Pauli','Bohr','Einstein'],
                                    [True,False,True,True])),
                      columns=['name','friendly'])
 
df_merge=pd.merge(left=df, right=df_new, on='name', how='outer')
df_merge

out_2

 

popular_pandas_df_functions

NUMPY

popular_numpy_functions

 

3) arange: Create an array of evenly spaced values between two limits.

1
np.arange(start=1.5, stop=8.5, step=0.7, dtype=float)

out_3

 

8) mean: Get mean of all values in list/array or along rows or columns.

1
2
3
4
5
6
vals=np.array([1,2,3,4]*3).reshape((3,4))
print(vals)
print('')
print('mean entire array =', np.mean(vals))
print('mean along columns =', np.mean(vals, axis=0))
print('mean along rows =', np.mean(vals, axis=1))

out_4

 

SCIPY

popular_scipy_functions

 

1) stats: A module containing various statistical functions and distributions (continuous and discrete).

1
2
3
4
5
6
7
8
9
10
11
# Normal distribution:
 
# plot Gaussian
x=np.linspace(-5,15,50)
plt.plot(x, sp.stats.norm.pdf(x=x, loc=5, scale=2))
 
# plot histogram of randomly sampling
np.random.seed(3)
plt.hist(sp.stats.norm.rvs(loc=5, scale=2, size=200),
                           bins=50, normed=True, color='red', alpha=0.5)
plt.show()

out_5

 

5) linalg: Among other things, this module contains linear algebra functions including inverse (linalg.inv), determinant (linalg.det), and matrix/vector norm (linalg.norm) along with eigenvalue tools e.g., linalg.eig.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
matrix=np.array([[4.3,8.9],[2.2,3.4]])
print(matrix)
print('')
 
# Find norm
norm=sp.linalg.norm(matrix)
print('norm =', norm)
# Alternate method
print(norm==np.square([vforrowinmatrixforvinrow]).sum()**(0.5))
print('')
 
# Get eigenvalues and eigenvectors
eigvals, eigvecs=sp.linalg.eig(matrix)
print('eigenvalues =', eigvals)
print('eigenvectors = ', eigvecs)

out_6

 

6) interpolate: A module containing splines and other interpolation tools.

1
2
3
4
5
6
7
8
9
# Spline fit for scattered points
 
x=np.linspace(0,10,10)
xs=np.linspace(0,11,50)
y=np.array([0.5,1.8,1.3,3.5,3.4,
5.2,3.5,1.0,-2.3,-6.3])
spline=sp.interpolate.UnivariateSpline(x, y)
plt.scatter(x, y); plt.plot(xs, spline(xs))
plt.show()

out_7

 

8) signal: This module must be import directly. It contains tools for signal processing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Fit noisy signal smooth line
importscipy.signal
np.random.seed(0)
 
# Create noisy data
x=np.linspace(0,6*np.pi,100)
y=[sp.special.sph_jn(n=3, z=xi)[0][0]forxiinx]
y=[yi+(np.random.random()-0.5)*0.7foryiiny]
# y = np.sin(x)
 
# Get paramters for an order 3 lowpass butterworth filter
b, a=sp.signal.butter(3,0.08)
 
# Initialize filter
zi=sp.signal.lfilter_zi(b, a)
 
# Apply filter
y_smooth, _=sp.signal.lfilter(b, a, y, zi=zi*y[0])
 
plt.plot(x, y, c='blue', alpha=0.6)
plt.plot(x, y_smooth, c='red', alpha=0.6)
plt.title('Noisy spherical bessel function signal processing')
plt.savefig('noisy_signal_fit.png', bbox_inches='tight')
plt.show()

out_8

 

10) misc: A module containing “utilities that don’t have another home”. Based on the google search results, people often use `misc.imread` and `mics.imsave` to open and save pictures.

1
2
3
4
5
6
7
8
9
10
# Get a raccoon face
 
# Get the raccoon
pics=sp.misc.face(), sp.misc.face(gray=True)
 
# Look at it
fig, axes=plt.subplots(1,2, figsize=(10,4))
forpic, axinzip(pics, axes):
    ax.imshow(pic); ax.set_xticks([]); ax.set_yticks([])
plt.show()

out_9

 

 

 

Thanks for reading. 

原文链接:,转发请注明来源!

东方祥 数学支配着宇宙。——毕达哥拉斯

加关注
喜欢 | 1

登陆后发表文章

  • 0条回应给“TOP 20 PANDAS, NUMPY AND SCIPY FUNCTIONS ON GITHUB”的评论