Bootcamp
Search…
2.7: Internet 101

Introduction

Map of the global undersea fiberoptic cables that carry internet traffic across the world.
The Internet consists of the following 3 components.
  1. 1.
    Computers.
  2. 2.
    Physical infrastructure (e.g. cables, modems, routers) that connects these computers.
  3. 3.
    Software on the computers that enables communication across this physical infrastructure.
For Rocket's Bootcamp we'll focus on how to send and receive data across the Internet through Node.js. We will not delve too deep into the physical infrastructure or the networking software protocols used to connect computers. If you would like to read more about those topics, consider the Further Reading section at the bottom of this page.
The majority of computer networking happens at the operating system level, i.e. underneath our programs, which translates data from our apps to and from "packets" that can be sent across the Internet. Before we learn to send data across the Internet in JS (which is relatively simple), we'll learn some core concepts about how this data transfer happens.

What You Should Know

The software and protocols that connect together all the computers on the internet is very complicated. In this section we'll be going over some of the basic mechanisms that we'll need as we write actual JavaScript code, like TCP port. Most of the concepts we won't ever be writing code for and are just a demonstration of different networking properties, such as DNS, LANs and WANs and ping and traceroute. These inner workings of the internet are key concepts to absorb but are not directly related to application code that we will be doing in any projects. Hopefully our work with the Chrome Networking Tools and ping and traceroute will demonstrate the mechanics of how the internet works in a more concrete way.

Networking Protocols

The Internet started as a small computer network where engineers designed "protocols" for computers to "talk" to each other. These protocols answered questions like the following.
  1. 1.
    What signals would the computers send?
  2. 2.
    How would these signals handle slow or unreliable connections?
  3. 3.
    What would happen when the receiving machine was turned off?
Networking protocols are rules for how computers talk to each other, implemented by the operating system so that any program can interface with the network. Files on our file systems work in a similar way, in that each operating system has rules for how it stores and manages files on the hard drive. When those rules are broken, files can become "corrupted" and un-openable. Similarly, when internet protocols are not followed, "packets" of data can become corrupted and un-readable.

TCP/IP

There are multiple layers of protocols running on every computer to make apps like Chrome work. The lowest-level protocol we'll describe is TCP/IP, which stands for Transmission Control Protocol / Internet Protocol. This is the software that takes arbitrary data, splits it up into smaller, more manageable "packets", sends those packets across the network, and reconstructs them at the destination computer.
During Rocket's Bootcamp we won't interface with TCP/IP directly, but we will want to understand some key concepts that make TCP/IP work.

IP Addresses

IP addresses allow any computer to reach any other computer on the network.
TCP/IP defines addresses for each computer on the network such that each computer can be reached by any other computer. These addresses are known as IP addresses. IP addresses typically look like this: 126.234.123.5, 4 sets of numbers from 0-255 separated by full stops. These are known as IPv4 addresses.
There is a newer version of IP addresses called IPv6 with longer addresses and larger numbers. IPv6 addresses are less common today but will eventually become standard because IPv4 will not have enough addresses for all the computers on the Internet. IPv6 addresses look like this: 2001:db8:0:0:0:ff00:42:8329.
In Rocket's Bootcamp and when building apps, we will mostly only work with IP addresses directly when configuring cloud infrastructure. In other situations when we access data across the Internet, we will use domain names like www.rocketacademy.co as more user-friendly proxies for IP addresses. We will learn how domain names translate to IP addresses and vice versa with DNS.

Ports

On a given computer, many programs can be sending and receiving data across the Internet simultaneously, such as WhatsApp, Spotify, Windows Updates, Mac Updates, etc. When each computer only has 1 IP address, how does the operating system determine which data is for which application?
TCP/IP uses ports to determine which application receives which data. Here is a comprehensive list. Occasionally multiple applications will use the same port, in which case we may need to close the others for any to work. This is rare for popular applications.
Ports are typically displayed as a number after IP addresses followed by a colon (:). We can use the following command to see all processes listening on specific ports on our computers.
lsof -i -P -n | grep LISTEN
Apps communicate with addresses that include IP addresses and port numbers.
When building apps, the servers that we host on the Internet to manage data will use ports to "listen" for requests from clients (e.g. browsers or apps). These requests will be like "commands" that tell our servers how to handle data. The port number in a destination address helps the server computer decide which server application to send the request to. For example, in the diagram above where there are 2 Node.js server applications running on the computer, the port numbers 3004 and 80 help the server computer decide which Node app to send the request to.

DNS

​DNS (Domain Name System) is a TCP/IP protocol that translates IP addresses to human-readable domain names and vice versa. DNS exists because most humans would find it easier to remember domain names like google.com and facebook.com than their respective IP addresses.
The DNS system keeps a database of every domain name's IP address. When we enter google.com into our browser, the browser looks up the associated IP address so it can make the request to the correct computer.
The lookup happens in stages. For example, for www.google.com, the browser would following the following steps.
  1. 1.
    Find the IP address of the top-level domain (also known as TLD), i.e. .com. TLD IP addresses are typically stored in our browsers because they do not change often.
  2. 2.
    At the server representing .com, the DNS system would then find the address of google.com.
  3. 3.
    At the server representing google.com, the DNS system would then try to look up www.google.com.
  4. 4.
    Once the entire domain name has been "resolved", the browser sends a request to the final IP address for the relevant website data.
When we buy a domain name, e.g. mysite.com, and wish to host our web app there, we need to link our web app server's IP address with that domain name so that the global DNS system knows at which IP address to find our web app. Many website builders enable clickable configuration for this, but under the hood they are all creating "DNS Records" that map domain names to IP addresses in a global DNS system. We can buy domains from domain marketplaces like the one here.
Simplified diagram of DNS lookup for google.com before sending a request to google.com.
Browsers typically save or "cache" IP addresses to frequently-visited websites to reduce DNS lookup time. Browsers also try to cache the contents of those sites for faster website load times. These cache entries are invalidated at regular intervals to ensure we always get the latest data.

Example

The following command on the command line performs DNS lookup for the specified domain name (also known as host name).
host google.com
Notice there is more than 1 IP address for google.com. Google does this because google.com is a popular website that needs multiple computers to respond to requests. If 1 fails or is overloaded, the browser can try another.

HTTP

HTTP (Hypertext Transfer Protocol) is a protocol built on top of TCP/IP to transfer data from 1 computer to another over the Internet. HTTP was invented to send HTML (Hypertext Markup Language); Today we use HTTP for all kinds of data.

Requests and Responses

The original concept of HTTP was that a person would want to view an HTML page on their browser, and that person would "request" that page by sending an HTTP Request for the relevant HTML file at the relevant IP address. The server at that IP address would receive the request and send back the HTML file as part of an HTTP Response. The browser would then parse the HTML file render it. HTTP is now used for all kinds of data, but the underlying request and response mechanic is the same.
In Rocket's Bootcamp and most of the applications we build in general, we will be using HTTP requests and responses to retrieve data across the Internet. This could be requests for HTML files triggered by typing rocketacademy.co in our browser, or requests for user data triggered by a login button in our app.

Other Protocols

HTTP is the most popular but not the only protocol used on top of TCP/IP. The following are some others.
  1. 1.
    ​DHCP configures our computers when they connect to routers.
  2. 2.
    ​SMTP is a default protocol for email.
There have been many protocols that are now obsolete. QOTD was a protocol to get a quote of the day.

URLs

URLs (Uniform Resource Locators) are a standard way to reference resources on the internet. The following image is a URL with each of its components highlighted. We may not see all of these components in all URLs because many are implied. For example, if we type rocketacademy.co in our browsers, the https:// protocol and index.html path are implied.
Full URL with all its components
The path component of a URL can be a path to a file on the server, e.g. index.html, or a designated address of a product, e.g. facebook.com/timeline. We typically use URL query parameters (the right-most component) to customise the result from the server at the given path.

LAN vs. WAN

Our computers typically connect to the Internet via a router provided by our ISP (Internet Service Provider). All computers connected to the same router are typically connected via LAN (Local Area Network). The router forwards traffic from the LAN to the WAN (Wide Area Network), i.e. the broader Internet. This helps prevents unwanted traffic from accessing our computers, and aggregate Internet connections for simplicity of the Internet.
Our routers forward requests to the ISP, which is directly connected to the internet.

LAN and WAN IP Addresses

Computers typically have a LAN IP address, and routers typically have a WAN IP address. The LAN IP address is used by routers to send responses from the Internet to the correct computers in the LAN. The WAN IP address is shared by computers connected to our router, and is how computers on the Internet identify us.
ifconfig (interface configuration) is a command line program that reveals the LAN IP address of our computer. If on Windows and running Ubuntu, we will need to install ifconfig with sudo apt install net-tools before running it. en0 is typically the interface with our current LAN IP address.
Akira's computer has LAN IP address 192.168.0.195. Not accessible on Internet because LAN.
We can visit https://whatismyipaddress.com/ to verify what our current WAN IP address is. When we visit websites, those sites have access to the source of our requests, i.e. our WAN address, but not our LAN address.
Notice LAN and WAN IP addresses are different. This is to protect our privacy and to streamline the Internet.

Other Common Networking Tools

Ping

​Ping is a networking utility used to test if a computer is reachable on the network. It does not use a port because it is not trying to reach an application inside the computer, only that computer's networking software. Different domains may take a different amounts of time to respond, depending on how far away the servers are.
ping google.com

Traceroute

​Traceroute reveals the servers our TCP/IP packets travel through on the way to their destination.
traceroute google.com
There are several Visual Traceroute utilities online to see these paths on a map. They may originate in Europe because the visual traceroute servers are in Europe.

Exercise

Try ping and traceroute on the following domains. How long do each of the requests take and where do they travel through?
  1. 1.
    ntu.edu.sg
  2. 2.
    genting.com
  3. 3.
    tigerbeer.com.sg
  4. 4.
    stanford.edu
  5. 5.
    nus.edu.sg
  6. 6.
    in-n-out.com
  7. 7.
    musee-orsay.fr
  8. 8.
    rwandatel.rw

Further Reading

  1. 1.
    See Stanford's Intro to Networking course CS144 for further reading on these topics. Here is a playlist of past CS144 videos that past Coding Bootcamp students have found helpful.
  2. 2.
    Past students have found this video helpful in introducing ports and how they are used.
  3. 3.
    Past students have found this video helpful as a crash course to the Internet.
Copy link
On this page
Introduction
What You Should Know
Networking Protocols
TCP/IP
IP Addresses
Ports
DNS
Example
HTTP
Requests and Responses
Other Protocols
URLs
LAN vs. WAN
LAN and WAN IP Addresses
Other Common Networking Tools
Ping
Traceroute
Exercise
Further Reading