Although there are several programming languages available for the so-called Data Science (R, Julia, Scala, Java, MATLAB...), Python has established itself as the main option due to several factors, among which are its simplicity, availability of libraries, support from the developer community, etc. In this course we will review the main characteristics of Python, its syntax, data types, control structures, built-in functions... and some of the most used libraries in analysis: NumPy and its multidimensional array, and pandas and the main data structures that offers, series and dataframes. We will learn to graphically display the data using the matplotlib library, which will be especially useful both in the exploratory analysis prior to the analysis and in the communication of the analysis results, and we will also review the seaborn library which, based on matplotlib, offers a more high level and comfortable to use interface. We will also review the organization and most important functions of the Scikit-Learn library by applying them to simple examples of supervised and unsupervised analysis.
Content
- Introduction to Python
- Type of data
- Control structures
- Built-in Python libraries
- The Numpy bookstore
- The multidimensional array
- The pandas library
- The Serie
- The dataframe
- Data visualization with matplotlib
- The seaborn library
- The Scikit-Learn library
- Supervised algorithms
- Unsupervised algorithms
- Data transformation functions
- Other functions