Need to turn your skills at programming into effective data science skills? Principles of Data Science is created to help you join the dots between mathematics, programming, and business analysis. With this book, you’ll feel confident about asking―and answering―complex and sophisticated questions of your data to move from abstract and raw statistics to actionable ideas.
With a unique approach that bridges the gap between mathematics and computer science, this books takes you through the entire data science pipeline. Beginning with cleaning and preparing data, and effective data mining strategies and techniques, you’ll move on to build a comprehensive picture of how every piece of the data science puzzle fits together. Learn the fundamentals of computational mathematics and statistics, as well as some pseudocode being used today by data scientists and analysts. You’ll get to grips with machine learning, discover the statistical models that help you take control and navigate even the densest datasets, and find out how to create powerful visualizations that communicate what your data means.
What you will learn
- Get to know the five most important steps of data science
- Use your data intelligently and learn how to handle it with care
- Bridge the gap between mathematics and programming
- Learn about probability, calculus, and how to use statistical models to control and clean your data and drive actionable results
- Build and evaluate baseline machine learning models
- Explore the most effective metrics to determine the success of your machine learning models
- Create data visualizations that communicate actionable insights
- Read and apply machine learning concepts to your problems and make actual predictions
Organized data: This refers to data that is sorted into a row/column structure, where every row represents a single observation and the columns represent the characteristics of that observation.
Unorganized data: This is the type of data that is in the free form, usually text or raw audio/signals that must be parsed further to become organized. Whenever you open Excel (or any other spreadsheet program), you are looking at a blank row/column structure waiting for organized data. These programs don’t do well with unorganized data. For the most part, we will deal with organized data as it is the easiest to glean insight from, but we will not shy away from looking at raw text and methods of processing unorganized forms of data.
Data science is the art and science of acquiring knowledge through data.
There are three basic classifications of data: Structured vs unstructured, Quantitative vs qualitative, and The four levels of data.
The five essential steps to perform data science are as follows:
1. Asking an interesting question
2. Obtaining the data
3. Exploring the data
4. Modeling the data
5. Communicating and visualizing the results