Data Science: Why You Should Learn To Work With Numpy Before Pandas .

Data Science: Why You Should Learn To Work With Numpy Before Pandas .

Introduction

Learning Numpy as a beginner in data science, cannot be overly emphasized. Numpy is one library usually ignored by so many due to the intense calculations in it and lack of very enticing visual displays which Pandas, Matplotlib and other Data Science tools possess.

Although Numpy is highly underrated by beginners, it is the essentials of all other Data Science toolkits used in Data Science projects. Numpy is the silent giant carrying all our Data Science projects. Just like our human body cannot survive properly without the skeletal frame, almost all Data Science toolkits can't function without the help of Numpy. Lets take a look at the valid reasons why, shall we?

ladt.jpg

A Brief Introduction to Numpy

Numpy is a short form for Numerical Python and works as a Python Library that aids in performing a wide range of numerical operations on Arrays. You might wonder what Arrays are.

An array is a data structure which contains multiple values of the same data types stored using the same variable name and accessible through index. They are a special kind of list which contains only items with the same data types.

creating numpy arrays

#first we import numpy as np
import numpy as np

#creating array from alist
MyList = ['Books', 'Glasses', 'Keyboards', 'Mouse']
MyArray = np.array(MyList)
print(MyList)
print()
print(MyArray)

The result to this code

['Books', 'Glasses', 'Keyboards', 'Mouse']

['Books' 'Glasses' 'Keyboards' 'Mouse']

dimensions, shaping and reshaping in numpy

import numpy as np
#Here is my array!!
arr = np.array([[1,2,3,4],[5,6,7,8],[3,4,5,9]])

#checking for shape
print(arr.shape)

#checking for dimension
print(arr.ndim)

The result to this code:

#Result to array shape. It's a 3 by 4 array
(3, 4)

#Result to array dimension
2

Note:You could try increasing the number of square brackets at the beginning and end of the code above from two to three and so on. I bet you will be thrilled at your result.

A Brief Introduction to Pandas

Pandas is an extension of Numpy popularly used in manipulation of Data. The elemental code for Pandas uses the Numpy Library extensively. Pandas enables you to efficiently clean messy data. It allows you to get deeper understanding of the data and answers more relevant questions from it. It contains two data objects Series and Data Frames.

Working with Series in Pandas

Series is a one dimensional array capable of holding any data type. It is more like a column in a table.

import pandas as pd
import numpy as np

#creating a pandas series from alist
#This was done by passing a one dimensional array to the pd.Series
a = [2,23,4,5,66,7]
MyPdSeries = pd.Series(a)

#Note that series most be written with the 1st letter in capital(Series) 
print(MyPdSeries)

The result to this code

0     2
1    23
2     4
3     5
4    66
5     7
dtype: int64

Working with DataFrames in Pandas

DataFrames in Pandas is a 2 dimensional data structure or array with rows and columns.

#Creating a simple pandas dataframe

Info = {
"calories" : [ 420 , 380 , 390 ],
"duration" : [ 50 , 40 , 45 ]
}
#passing info into the dataframe object

Passer = pd.DataFrame(Info)
print(Passer)

The result to this code

  calories  duration
0       420        50
1       380        40
2       390        45

Importance of Numpy in Pandas and Other Data Science Libraries

  • It forms the core of Pandas, Scipy, Matplotlib, ScikitLearn, Scikitimage, and other Data Science packages.
  • Basic understanding of array manipulations is essential in representing data of all kinds in Data Science.
  • Understanding Pandas data Objects like Series and DataFrames require an exceptional knowledge of Numpy Arrays.
  • Working with the most important Machine Learning toolkits ScikitLearn will require nd arrays as input .
  • Numpy is insanely faster than Pandas in performing complex mathematical calculations such as solving linear algebra, finding gradient descent, matrix multiplications and vectorization of data.

Conclusion

Through this article I was able to explain the important functions of Numpy in Pandas. Although it is important to learn Numpy before pandas, I will recommend that you follow a project based system of learning for this. You should learn just enough of the minimum amount of theory for each topic in Numpy and in Pandas and start a project/practice with them. Later on you can do more on the topics in Numpy and Pandas and do a better project. If you keep repeating this learning system in the same style ,you will be able to expand your learning at your own pace and retain information better.