We’ve collated a collection of cheat sheets for you to get to grips with the main libraries used in data science.
They are grouped into the fields for which each library is designed: Basics, Databases, Data Manipulation, Data Visualization, Analysis, Machine Learning, Deep Learning and Natural Language Processing (NLP).
Basics
If you're just starting out in the world of data science, it's important to understand how at least two of the basic libraries work: Python and NumPy. These two libraries are used throughout the entire development process. The third library, Scipy, is a mathematical tool that can handle more complex calculations than NumPy.
Python basics
NumPy basics
Level: Beginner - Intermediate
Area: Basics
Description: NumPy is the mathematical Python library par excellence (its name is taken from Numerical Python). It allows us to work more efficiently with vectors and matrices.
Source: DataCamp
Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf
SciPy
Level: Advanced
Area: Basics
Description: The SciPy library has been developed to work with NumPy and is designed for more complex numerical calculations, more closely related to scientific computing.
Source: DataCamp
Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_SciPy_Cheat_Sheet_Linear_Algebra.pdf
Database
Data can be stored in sets or, sometimes, in relational or non-relational databases that are imported into the working platform.
SQL
Level: Beginner - Intermediate
Area: Relational databases
Description: relational databases use a structure of separate tables that store data more efficiently and create relations between them using keys. SQL is the best language for querying data stored in these tables, thanks to its versatility.
Source: sqltutorial
Cheat sheet: https://www.sqltutorial.org/sql-cheat-sheet/
MongoDB
Level: Beginner - Intermediate
Area: Non-relational databases
Description: non-relational databases are increasingly popular, especially due to the rise in big data companies and apps, as they make it possible to overcome the barriers of data structures posed by relational databases. MongoDB is the leader in distributed databases.
Source: codecentric
Cheat sheet: https://blog.codecentric.de/files/2012/12/MongoDB-CheatSheet-v1_0.pdf
Data Manipulation
Before getting started with data analytics, it's essential to organise the data set's information so that it's easier to perform the necessary analytical operations. This process is known as data manipulation.
Pandas
Level: Beginner - Intermediate
Area: Data manipulation
Description: Pandas is the library per excellence for processing data in DataFrames, in other words, it allows us to read records, manipulate data, group them and organise them in a way that facilitates our analysis. This cheat sheet shows you some essential steps to help you use the library.
Source: DataCamp
Cheat sheet: http://datacamp-community-prod.s3.amazonaws.com/dbed353d-2757-4617-8206-8767ab379ab3
Data Wrangling
Level: Beginner - Intermediate
Area: Data manipulation
Description: Prior to conducting an analysis, it's important to clean the DataFrame and organise our data, since we sometimes find duplicate, void or invalid records. The process of cleaning the DataFrame so we can use it for our analysis is known as Data Cleaning or Data Wrangling.
Source: pandas
Cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Data Visualization
Data visualization is the graphic representation of data and is particularly important for conducting analyses or portraying analysis results, which can help us discover trends, outliers and patterns in the data.
Matplotlib
Level: Beginner
Area: Data visualization
Description: matplotlib is the first library that's been developed for map plotting and projections in Python. It offers a huge range of options for drawing graphs and personalising them, from the most simple to the most complicated of visualizations.
Source: DataCamp
Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf
Seaborn
Level: Intermediate
Area: Data Visualization
Description: The Seaborn library is more advanced than matplotlib and was developed to facilitate the statistical analysis of data directly onto graphs.
Source: DataCamp
Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf
Folium
Level: Intermediate
Area: Data visualization
Description: Within the field of visualization, maps are a very useful form of representation that allows us to depict geospacial positioning and distances. Folium is a library that allows us to generate maps and easily depict data from a data set, rendering a representation such as a mapbox or OpenStreetMap and adding layers of visual data like cluster points or a heatmap.
Source: AndrewChallis
Machine Learning
Machine learning algorithms allow us to make predictions based on available data. These are known either as regression or classification algorithms, depending on the type of data in question. These processes can be supervised or non-supervised, depending on whether the machine learning model is trained using labelled data, or not, which is known as 'ground truth'.
Scikit-Learn
Level: Advanced
Area: Machine learning
Description: Scikit-Learn is a library developed on top of SciPy and designed for data modelling: clustering, feature manipulation, outlier detection, model selection and validation. It is known for being robust and easy to integrate with other Python libraries.
Source: DataCamp
Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf
Deep Learning
Within the field of machine learning, there is a more specific field known as deep learning, which uses artificial neural networks to make predictions.
Keras
Level: Advanced
Area: Deep leaning
Description: The Keras library is written in Python and is capable of running on top of CNTK, TensorFlow and Theano, making it possible to generate and evaluate neural network models.
Source: DataCamp
Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf
Tensorflow
Level: Advanced
Area: Deep learning
Description: This is a second-generation deep learning library developed by Google. It allows users to create models using an API with an inferior or superior abstraction layer, outlining mathematical operations or neural networks, depending on the user's preference.
Source: Altoros
Cheat sheet: https://cdn-images-1.medium.com/max/2000/1*dtOZSuYDonyyBvEULpJALw.png
PyTorch
Level: Advanced
Area: Deep learning
Description: PyTorch is a deep learning library developed by Facebook. It is one of the newest libraries on the market and offers an interface for working with tensors at a more affordable price than TensorFlow or Keras, for example.
Source: PyTorch
Cheat sheet: https://pytorch.org/tutorials/beginner/ptcheat.html
Natural Language Processing (NLP)
Within the field of data science, language analysis is an area that's increasingly gaining ground, with algorithms that have been developed to help us analyse text.
NLTK
Level: Beginner - Intermediate
Area: NLP
Description: NLTK is one of the first libraries developed for natural language analysis and allows users to carry out processes such as tokenization, stemming (lemma analysis), character or word count, in order to read and understand the text under analysis.
Source: Cheatography
Cheat sheet: https://cheatography.com/murenei/cheat-sheets/natural-language-processing-with-python-and-nltk/
spaCy
Level: Advanced
Area: NLP
Description: spaCy is a natural language processing library that analyses texts at difference levels: NER (name, entity, recognition), parser (syntactic analysis) or similarity, from a model trained in one language. It also allows us to create models from scratch with our own examples that recognises the entities we define.
Source: DataCamp
Cheat sheet: http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06
These cheat sheets contain each library's most useful functions and working methods to help you in your day-to-day development tasks. Happy Coding!