%load_ext pretty_jupyter
Installing Python¶
This walkthrough is part of the ECLR page.
Python is a multifaceted programming language that sees widespread use across diverse domains, including data analysis, statistics, machine learning, and general-purpose programming. Its intuitive syntax and extensive library ecosystem make it an attractive choice for novice and experienced users. This module aims to introduce the fundamental principles of Python, emphasizing its practical applications in Econometrics, Data Science and beyond.
But Python’s influence doesn’t stop at data science—it’s a driving force behind many websites and apps you use daily. Platforms like YouTube, Spotify, Dropbox, and Instagram are all built using Python and its web development framework Django . Whether you're streaming you favourite songs, sharing photos, or collaborating on data files, Python is working behind the scenes to make these experiences seamless. This connection between Python and the digital platforms we rely on highlights its extraordinary versatility: it’s a tool for advanced analytics and a foundation for building the modern web.
In this section, we’ll start with the basics: setting up Python, understanding its syntax, and writing simple scripts. From there, you’ll learn how to manage data, perform calculations, and create reusable code through functions. By the end of this section, you’ll have the foundation to explore advanced topics and unlock Python’s full potential—whether in data science, econometrics, or creating the next big app or website.
This walkthrough allows you to install and setup Python on your computer.
- to set up Python using the Jupyter Notebook interface
- write your first Python code!
- to learn how to import libraries such as
numpy
andpandas
to start working with arrays and tabular data and install additional libraries directly from Jupyter Notebook using!pip install
. - to recognise and fix common Python errors, such as
SyntaxError
andNameError
. - to learn about about data types
Installation on your computer¶
You can work using Python fully in the Cloud. That is may be the right approach if you only need to use Python very occasionally. In a later section we will explain how you can do that. We recommend, however, that you install everything you need to work with Python on your computer. In that way yo can also work when you are offline.
In order to be in a position to work with Python on your computer we recommend that you go through the following steps.
- Install Python
- Install Anaconda
- Install Visual Studio Code (VS Code)
Downloading Python Software¶
This Data Lab assumes that you have Python installed on your computer. You should make sure that you have downloaded the latest version of Python from (https://www.python.org/). So, you go to Downloads and then select the latest version of Python. Once the Python file is downloaded, you should see this package in your downloads folder. Simply double-click it, and then you are going to see this Python installer.
Anaconda: Scientific Stack for Python¶
To run Python code, a Python interpreter is required—this is a tool that translates and executes Python instructions. In this guide, we will utilize the Anaconda distribution as our interpreter. Therefore, the next natural step is to install Anaconda, a widely used, open-source distribution of the Python programming language. Anaconda is a preferred choice among researchers and professionals because it simplifies managing Python environments, especially for data science, machine learning, and scientific computing tasks. It provides an integrated platform that streamlines package management and deployment, addressing the complexities of handling dependencies. Notably, Anaconda includes over 1,500 pre-installed libraries that are essential for these domains, such as:
NymPy
- for numerical computing.pandas
- for data manipulation and analysis.matplotlib
andseaborn
- for data visualisation.scikit-learn
- for machine learning tasks.SciPy
- for scientific and technical computing.
For beginners, Anaconda is especially advantageous as it simplifies the process of installing and managing libraries with complex dependencies. Moreover, it includes conda
, a robust package and environment manager that supports the installation of binary packages and the creation of isolated environments.
A YouTube video walking you through the Anaconda installation is available from Arthur Turrell.
Visual Studio Code¶
This is an integrated development environment (IDE). It makes it easier to work on your code and importantly easily interact with your data. There are many different IDEs available. The one we have most experience with is Visual Studio Code and therefore we recommend that one. It works with all major operating systems and it is free software.
A YouTube video walking you through the installation of Visual Studio Code is available from Arthur Turrell. It will also show you how to run python code in interactive mode and in jupyter notebooks (see below).
Other popular IDEs are Spyder and PyCharm.
Working online¶
If you do not wish to install Python, Anaconda and VS Code on your computer (or if something does not work out with your installation) you can also work on the cloud. As this is not the recommended mode of working we will not go into any detail here. But here is a set of services (all free for reasonably small usage) you could use.
Python Integrated Development Environments¶
Tool | Type | Key Features | Best For |
---|---|---|---|
Jupyter Notebook | Interactive Web-Based Environment | Code execution, Markdown documentation, data visualization, inline outputs | Teaching, data analysis, and walkthroughs |
JupyterLab | Advanced Web-Based Environment | Multi-document interface, side-by-side notebooks, consoles, and file viewers | Multitasking with enhanced flexibility |
GitHub Codespaces | Web-based IDE | requires a Github Account. After accessing Codespaces, you choose a Template. Select the "Jupyter Notebook" quick-start template and hit “Use this template.” Your Codespace will load in an online version of Visual Studio Code, pre-installed with Python. You can check the Python version by running Python --version in the terminal, typically located at the bottom panel. | working with GitHub repositories |
Jupyter Notebooks¶
You can write python code scripts (files with extension .py) that can be executed by Python. Here, however, we recommend that you write your code in a Jupyter Notebook. Jupyter Notebooks are particularly useful as they will help you write Python code, analyse data, and document your thought process- all in one file! This makes it an invaluable tool for both beginners and professionals in Python. More specifically, Jupyter Notebook:
- Simplifies the learning process by allowing you to see the code and its output side by side.
- Supports step-by-step workflows, making it ideal for better understanding data analysis, econometrics, and machine learning procedures.
- Combines code execution, documentation, and visualisation in one interactive.
Jupyter notebooks can be written online but also in the IDEs mentioned above and indeed the recommended VS Code.
Prepare your Workspace¶
Before you start, you should create a space (i.e. a folder) on your computer from where you are planning to store all your Python files and data. Ensure you know this folder's exact location and its full path. This will help you navigate to it easily from within Jupyter Notebook.
For instance, if you create a folder named PyWork
on your C drive, then the path to your folder will be C:/PyWork
.
For this computer lab, we are using the datafile STATS19_GM_AccData.csv. You should download this data file into the folder you just created and want to work from. The data file in your working folder ensures that Python can easily access it without complex file paths. It is also a good idea to have the Data Dictionary ready. This will help you understand the structure and variables in the data.
Exploring Jupyter Notebook and searching¶
Jupyter Notebook is an intuitive and powerful tool that allows you to write and execute Python code in a user-friendly environment. Once you have installed Anaconda, open the Anaconda Navigator
for the first time. You should see a window that looks something like this:
In the Anaconda Navigator interface, locate the Jupyter Notebook option. Then click the launch button to open Jupyter Notebook. This will open your default web browser and load the Jupyter Notebook interface. Now you are ready to create a New Python Notebook!. In the Jupyter Notebook interface, look for the “New” button in the top-right corner of the page. Then click on New and select Python 3 from the dropdown menu (see image below). This will create a new notebook that is ready for Python coding.
Once you have opened a "new" Jupyter Notebook, the interface appears as follows:
In the new notebook, you will see a blank cell where you can type your Python code (see image below). You are now ready to write and execute your first Python code!
As mentioned, Jupyter Notebook's layout lets you see your code and results in one cohesive interface, with the output appearing immediately after running the cell. Let's say that you want to calculate 7+5. Type 7+5 into the first cell behind the ( "[ ]:" sign) and press Run or Shift+Enter in your keyboard. You should then see the correct result pop up just beneath the cell. The workflow is interactive because you can edit the code in a cell and re-run it to update the output instantly.
#addition
7+5
12
This was easy, however what about calculating $\sqrt{(7569)}$, $e^5$ and $ln(5)$
To make these calculations we need to import the math
library. The math
library is a built-in Python library and doesn't require installation. The functions used (such as math.sqrt()
, math.exp()
, and math.log()
) are part of Python’s standard library, so you can use them directly without needing to install any additional packages.
import math
# Square root function
math.sqrt(7569)
87.0
# Calculating exponential fucntion with base e raised to the power of 5
math.exp(5)
148.4131591025766
# Calculating the natural logarith of 5
math.log(5)
1.6094379124341003
However, if you want to use more advanced mathematical operations, or handle more complex data types, you might want to install additional libraries like NumPy
for numerical computing.
Functions¶
In programming, functions are reusable blocks of code that perform a specific task. They help in organizing code, making it more readable and maintainable. In Python, functions are defined using the def
keyword followed by the function name and parentheses ()
.
def function_name():
# Code block
pass
Where def
is the keyword to define a function in Python and the function_name
is the name you assign to your function, following Python's naming rules (e.g., no spaces, can't start with a number). To indicate that we are dealing with a function we must include parentheses ()
. The pass
is nothing more than a placeholder that does nothing and is used when you want to define an empty function.
Let's assume that we are dealing with this hypothetical demand function P = 100 - 0.5Q
, and we want to calculate the Marginal Revenue (MR) using the price elasticity of demand (E): $$MR = \frac{P}{1 - \frac{1}{E}}$$
where:
- $MR$ : Marginal Revenue
- $P$ : Price of the good
- $E$ : Price elasticity of demand
At this point, it is important to mention that, in Python, when we assign a number to a variable, a numeric object is created. Python provides support for four main numerical types:
- int: Represents whole numbers (e.g., 1, 42, -7).
- long: (Python 2 only; in Python 3,
int
includes long numbers) Represents larger whole numbers.- float: Represents decimal or fractional numbers (e.g., 3.14, -0.001).
- complex: Represents numbers with a real part and an imaginary part (e.g., 3 + 4t).
In our example, our parameters of interest are quantity and price elasticity of demand, which are assigned the values 50 and -2, respectively. What numeric types do Q
and E
represent here? Here, we are thinking that the values representing the quantity of the good and the price elasticity of demand are passed as int
. Although it is fine to use int
for quantities if you expect whole numbers, if a calculation produces a decimal (e.g., division), it is good to knwo that Python automatically converts them to float
. For instance:
```python price = 100 - 0.5 * quantity # Here, quantity is an int (50)
Since 0.5
is a float, the result of 0.5 * quantity
is a float. In other words, Python automatically promotes int
values to float
during arithmetic operations that produce decimals. Using float
explicitly ensures consistent behaviour and avoids surprises when handling mixed types.
You could use
int
if you are writing this function only for cases where quantity and elasticity are whole numbers. However, if you want to future-proof your code and make it more broadly applicable, usingfloat
is the better choice
Therefore, we have:
def marginal_revenue(quantity, price_elasticity):
"""
Calculate marginal revenue using price elasticity of demand.
MR = P * (1 + 1/E)
Parameters:
quantity (float): Quantity of goods sold.
price_elasticity (float): Price elasticity of demand.
Returns:
float: Marginal revenue.
"""
if price_elasticity >= -1:
return "Price elasticity must be less than -1 for this formula to apply."
price = 100 - 0.5 * quantity # Hypothetical linear demand curve: P = 100 - 0.5Q
return price * (1 + 1 / price_elasticity)
# Example
mr = marginal_revenue(quantity=50, price_elasticity=-2)
print(f"Marginal Revenue: {mr}")
Marginal Revenue: 37.5
The def marginal_revenue(quantity, price_elasticity)
defines the Python function under investigation, referred to as marginal_revenue
. The quantity
variable represents the number of goods sold, which, as we know, influences the price based on the demand curve, whereas price-elasticity
represents the price elasticity of demand, which influences the relationship between price and revenue.
The if
represents a statement we make to make decisions in our programs. This condition checks whether the price elasticity of demand (E) is greater than or equal to -1. If the condition is true, the function immediately stops execution and returns an error message explaining why this input is invalid like shown below:
def marginal_revenue(quantity, price_elasticity):
if price_elasticity >= -1:
return "Price elasticity must be less than -1 for this formula to apply."
price = 100 - 0.5 * quantity # Hypothetical linear demand curve: P = 100 - 0.5Q
return price * (1 + 1 / price_elasticity)
# Example
mr = marginal_revenue(quantity=50, price_elasticity=-0.9)
print(f"Marginal Revenue: {mr}")
Marginal Revenue: Price elasticity must be less than -1 for this formula to apply.
The function here checks if E ≥ -1 because the marginal revenue would not make sense for elasticities greater than or equal to -1 in this context, as total revenue (TR) wouldn't increase with additional units sold!
From this example, we indicate that $MR$ is equal to 37.5 and $P$ is equal to $P=100-0.5Q ⇔ P=100-0.5*50 ⇔ P= 75$. Why is $MR < P$ ? The fact that the price elasticity of demand is equal to -2 (E=-2) indicates that demand is elastic. When demand is elastic, increasing the quantity sold results in a relatively larger decrease in price (more sensitive to price changes), reducing marginal revenue!
As one could indicate, functions are essential in Python programming for creating reusable and organized code. They allow you to break down complex problems into smaller, manageable parts. By defining functions, you can avoid repetition and make your code more modular and easier to maintain.
Creating Variables¶
In Python, variables are used to store data in a computer’s memory with a label for easy access. Variables are like containers that hold different types of information, such as numbers, text, or more complex data structures. For instance, we can store the price of a product, someone’s name, email, age and so on. How do we do this in Python?
To declare a variable, we start by typing a name for that variable. Let’s type the name “price”
price = 625.55
Here, the number 625.55 is stored somewhere in our computer’s memory and we are attaching this “price” as a label for that memory location. So, now we can read the value at this memory location and print it on the "output" cell.
print(price)
625.55
This makes it easy to work with stored data in our program!
Note that Python does not require you to explicitly declare the data type of a variable—Python will infer it for you", it means that Python is a dynamically-typed language just like R, unlike some other programming languages (e.g., C, C++, or Java) that are statically typed!
However, variables can also store different types of data, such as:
product_name = "Laptop" # A string
quantity = 5 # An integer
price_per_unit = 399.99 # A float
Here, product_name
, stores a string ("Laptop"
). A string represents a sequence of characters, in other words, text data. Remember that in Python, as in many different programming languages, whenever we are dealing with textual data, we should always surround our text with quotes- either single (' '
) or double quotes (" "
). Moreover, quantity
stores an integer (5
), a whole number and price_per_unit
stores a float (399.99
), representing a decimal number.
Once we Run
this line of code we get the following output:
# Variables can store different types of data:
product_name = "Laptop" # A string
quantity = 5 # An integer
price_per_unit = 399.99 # A float
# Display the values:
print("Product:", product_name)
print("Quantity:", quantity)
print("Price per unit:", price_per_unit)
Product: Laptop Quantity: 5 Price per unit: 399.99
As we can see, executing this line of code prints each variable's value with a descriptive label.
Case Sensitivity¶
To explain the fact that Python is case sensitive, let's focus on the example below:
price = 100
Price = 200
print("price:", price) # 100
print("Price:", Price) # 200
price: 100 Price: 200
Notice here that Python treats price
and Price
as two different variables. This means variables with names that variables with names that differ only in letter casing are treated as distinct. Here, price
and Price
are two separate variables. On the contrary, treating price
and Price
as the same variable could lead to confusion. For instance:
price = 100
Price = 200
print(price + Price) # Adds 100 and 200
300
If Python were not case-sensitive, it would be unclear whether price and Price refer to the same variable or different ones. By being case-sensitive, Python eliminates such ambiguities!
What if you want to assign values to multiple variables? in a single line?¶
Let's say that we have the following:
x, y, z = 1, 2, 3 # Assigns x = 1, y = 2, z = 3
print("x:", x, "y:", y, "z:", z)
x: 1 y: 2 z: 3
In this example, we have assigned the values 1
, 2
, and 3
to x
, y
, and z
respectively ina single line and once we run the print ()
function we get a diplay of these values.
Suppose now that we accidentally use the wrong variable name in this sequence of code:
total_cost = price_per_unit * quantitty
You will see Python throwing an error message at you: "NameError: name quantitty
is not defined". What do you think this means? In the first instance, this tells you that something went wrong with that command. Can you see what the problem is? Remember that Python is case-sensitive. In this example, uncommenting this will raise a NameError
because quantitty
is not defined. This is an example of a logical error caused by a typo. How can we fix this issue?
We can use the correct variable name as follows:
```python total_cost = price_per_unit * quantity # Use the correct variable name print("Total Cost:", total_cost)
and so this fixes the error by using the correct variable name, quantity
.
An important lesson concerning error messages
- Errors are a normal part of coding, especially when learning or trying new things. Don’t be discouraged!
- Python will stop running the program if it encounters an error, but it won’t harm your computer or the Python environment. So, feel free to experiment!
- Many errors in Python come from simple mistakes like typos, incorrect capitalization (since Python is case-sensitive), or forgetting a colon (:) in a function or loop.
- It is important to read the error message carefully. Python error messages are often very descriptive and will help you identify the issue. Look at:
- The error type: It tells you what went wrong (e.g.,
SyntaxError
,NameError
,TypeError
).
- The error type: It tells you what went wrong (e.g.,
- Stay curious and persistent! Debugging is an essential skill in programming. Each error you fix makes you a better coder.
Prepare your script file and libraries¶
For all critical work, you’ll want to save your progress so you don’t find yourself rewriting everything you did yesterday… or just before your cat decided your keyboard was a great place for a nap! To ensure that, save all your work in a Jupyter Notebook file with the extension .ipynb
. This file allows you to save your commands (code), comments, and outputs in one document. The process that you need to follow consists of the steps outlined here:
- Open the Anaconda Navigator and launch the Jupyter Notebook.
- Create a new notebook by going to File → New Notebook → Python 3. Then, a new notebook will open.
- Remember to click the save button (disk icon), or use Ctrl+S (Windows) or Cmd+S (Mac). Do not forget to name the notebook (e.g.,
python_work.ipynb
)
Now, let's create a Jupyter Notebook file following the abovementioned steps. Create a comment block at the top of your notebook to document your work. It may say something like
Note: Adding comments to your Python code is essential to avoid staring at it tomorrow, wondering, “What was I even thinking here?” In Python, the computer ignores anything after the # symbol but is treasured by you (or anyone else reading your code). Comments are not for Python; they’re for your sanity!
Unlike R, Python does not have a setwd()
function. Instead, you can use the os
module to check and change your working directory. Setting the working directory ensures that Python knows where to look for files and where to save outputs!
import os # Import the os library
print("Current Working Directory:", os.getcwd())
Current Working Directory: C:\Users\Admin
The os
module is part of Python’s standard library, so you don’t need to install anything extra. It allows you to interact with the operating system (e.g., checking the current directory, changing directories, or listing files). The os.getcwd()
gets the current working directory and tells you which folder Python currently uses as its default location for reading or saving files. The output will differ for each user. In this example,we have Current Working Directory: C:\Users\Admin
.
If you want to change the working directory, you will have to use the os.chdir()
function and in the parentheses ()
specify the full path of your desired folder. The os.chdir()
function changes the current working directory to the specified folder. Note that all backward slashes (\
) have to be replaced by forward slashes (/
) to avoid escape character issues.
``` python os.chdir("C:/PythonWork") # Replace with your desired path print("New Working Directory:", os.getcwd())
Now, if you are unsure of your Jupyter Notebook fil'e location, you can use the %pwd
to print the current directory and use the %cd
to change the directory temporarily:
``` python %cd C:/PythonWork
Python libraries are like R packages; they provide additional functions that extend Python’s capabilities. Let's load the libraries that we want to use:
# Import necessary libraries
import pandas as pd # For data manipulation
import matplotlib.pyplot as plt # For data visualization
import numpy as np # For numerical computations
print("Libraries loaded successfully!")
Libraries loaded successfully!
Libraries in Python are collections of pre-written code that provide additional functionality and tools for your programs. Just like in R, libraries in Python are not always included in the base Python installation and need to be installed separately. However, once installed, you need to import them into your script to use their functionalities.
By just typing these commands into the script file nothing is actually happening. If you want Python to execute any of the commands in your Jupyter Notebook file you have to do one of the following:
If Jupyter Notebook tells you that one or more of these libraries are not installed then Python will raise an ImportError
. For example, if seaborn
was not installed you would receive an error message like this:
``` python import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
In such cases, you need to install the library first using pip
, Python's package installer and then import into your script. For example:
!pip install pandas
!pip install matplotlib
Note: In Jupyter Notebook, the
!
prefix runs shell commands directly from the notebook. Use it to install libraries without leaving the notebook environment.