☕ Docker Explained

A quick intro to the basics of Docker. Why do we need Containers? What is a Hypervisor, Kubernetes and more! We also have a solution to yesterday's interview problem on building a spell check system.

Nov 12, 2020

Hey Everyone!

Hope you’re all having a fantastic day!

We’re going to do a Tech Dive today (along with our Previous Solution).

Tech Dive - Docker

What is a Hypervisor? Why is it useful?

In the old days, you could only run a single application on a single server. Running multiple applications could be dangerous since the two apps might interfere with each other. One app could modify the file system in a way that the other didn’t expect, causing the other application to crash. Or, one app could depend on a specific version of MySQL, while the other app needed a different version. In a worst case scenario, one app might be written by a malicious user, who’s intentionally trying to get the other app to fail.

This single app - single server configuration becomes especially expensive if you’re trying to run a cloud computing business. When a user starts off, they definitely won’t be able to utilize the full power of the server, so you’re left wasting a ton of computational power.

Hypervisors solve this issue! A hypervisor is computer software that can create and run a virtual machine. A virtual machine is an emulation of the computer system, and runs its own operating system inside of the host machine (the host machine is running the hypervisor to manage the virtual machine). The virtual machine is completely sandboxed, and anything you do in the virtual machine can’t affect the host computer or any other virtual machines the host computer is running. You basically get your own OS.

Now, a cloud computing business can just run a hypervisor on their powerful servers, and the hypervisor will spin up a new virtual machine whenever a user wants to run an application on that server.

You might also run a hypervisor on your home computer if you want to quickly test out a new operating system. You can install VirtualBox on your Windows computer and run a copy of Ubuntu inside one of VirtualBox’s Virtual Machines on top of Windows.

What are Containers?

Going back to our cloud computing business, do we really need to spin up an entire virtual machine every time a user wants to use our server? Spinning up virtual machines is expensive, and virtual machines come with a bunch of protections that aren’t really needed. Each virtual machine is gigabytes in size and takes minutes to spin up! Users are typically just packaging and running a single application on each virtual machine, so it’s a massive waste to create and bundle an entirely new OS.

What if we could use something more lightweight?

This is where containers come in. A Container is just a package for code for the application and all the dependencies that are needed. Each container shares the host OS and does not require its own OS. This allows containers to be spun up much faster and each container takes up much less space!

What is Docker?

Now, when you have containers running on a computer, you’ll need some software to manage them. You’ll need a program to spin up new containers, manage their resource sharing, and shut them down. This program is called a container engine. If you’re into analogies… Container Engine is to Container as Hypervisor is to Virtual Machine.

One extremely popular container engine is the Docker Engine. The Docker Engine is open source and is maintained by Docker, Inc!

The Docker Engine uses a Docker image as a blueprint for how to build the container. The Docker image is just a static file and is very lightweight. You can download various docker images on Docker Hub. You can download Docker Images to run Redis, Node, MongoDB and a bunch of other software.

In order to build Docker Images, you need a Dockerfile. A dockerfile is just a text file with a series of commands that tell the OS how to build the container.

What is Kubernetes

Over the last couple of years, we’ve had a trend from Monolith architectures to Microservices. Rather than have your entire application running in a single application, you’ll break down the individual components (database, authentication, payment, logging, etc.) into different individual services, and expose each individual service with a REST API.

Each service is typically run in it’s own individual container and you’ll have to run several “copies” for each service for scalability (horizontal scaling).

Orchestrating this workflow and making sure the right containers are running at the right time is a hard job! You can end up with a workflow of thousands of containers!

This is where Kubernetes comes in. Kubernetes was originally made at Google and is a platform for “automating deployment, scaling and operations of application containers across clusters of hosts”.

It can be used in conjunction with Docker, where Docker Engine is used to individually boot up and manage the containers.

Previous Solution

As a reminder, here’s the previous question

How would you build a spelling correction system?

Possible Follow On Questions

How would you check if a word is misspelled?
How would you find possible suggestions?
How would you rank the suggestions for the user?

Solution

The core idea for most spell-check systems is based on the Levenshtein distance, the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. The Levenshtein distance from the intended word should be one or two edits. Therefore, if we can keep a hash table for all the words in our dictionary and then look for words that have a Levenshtein distance of 2 or less from the text, we can find the intended word. If the text is already in our dictionary, then it’s not misspelled.

The words in our dictionary that have a Levenshtein distance of 2 or less from our text may be too many to list out for the user. Therefore, it’s important that we rank our suggestions and implement a cut-off for the number of suggestions that we list. There are several ways of ranking our suggestions

History of refinements - users often provide a great amount of data about the most likely misspellings by first entering a misspelled word and then correcting it. This data can be collected and then used to implement rankings.
Typing errors model - spelling mistakes are usually a result of typing errors (typos). Therefore, these errors can be modeled based on the layout of a keyboard (mran -> mean)
Phonetic modeling - spelling errors also happen when the user knows how the word sounds but doesn’t know the exact spelling. Therefore, we can map the text to phonemes and then find all the words that map to the same phonetic sequence.

I didn’t want to make the email too long, so no Interview Question today! We’ll have another one tomorrow!

If you enjoyed this, it would be greatly appreciated if you could share Quastor Daily with a friend!

Share Quastor Daily

Best,

Arpan

Quastor System Design Case Studies

Discussion about this post

Ready for more?