What's in a URL?

URLs are everywhere. We use them to access websites, send emails, download files, and more. But what exactly is a URL and how does it work? In this article, we will explore the anatomy of a URL, the different types of URLs, how to encode and decode URLs, how to design and debug URLs, and some security tips for using URLs.

What is a URL?

A URL (Uniform Resource Locator) is a string of characters that identifies a resource on the Internet. A resource can be anything that can be accessed online, such as a web page, an image, a video, a document, etc.

A URL consists of two parts: a scheme and a path. The scheme specifies the protocol or method of accessing the resource, such as http, https, ftp, mailto, etc. The path specifies the location or address of the resource on the server or network. For example, here is a URL with these two parts

https://www.example.com/index.html
--^--   ---------^----------------
scheme          path

Anatomy of a URL

A URL can have several components that provide additional information about the resource or how to access it. These components are separated by special characters called delimiters. The basic structure of a URL is:

scheme://[user:password@]host[:port]/path[?query][#fragment]

The scheme is followed by a colon : and two slashes //.
The user and password are optional and are used for authentication purposes. They are separated by a colon : and followed by an at sign @.
The host is the name or IP address of the server or network where the resource is located. It can also include a subdomain or a domain name.
The port is optional and specifies the port number on the server where the resource can be accessed. It is preceded by a colon.
The path is the sequence of directories and files that lead to the resource on the server. It is separated by slashes /.
The query is optional and contains additional parameters or data that are sent to the server along with the request for the resource. It is preceded by a question mark ?.
The fragment is optional and identifies a specific part or section of the resource. It is preceded by a hash sign #.

Here is an example of a URL with all these components:

https://www.example.com:8080/blog/post.php?id=123&lang=en#comments

- scheme: https
- user: none
- password: none
- host: www.example.com
- port: 8080
- path: /blog/post.php
- query: id=123&lang=en
- fragment: comments

URL Encoding

Sometimes, we need to include special characters or spaces in our URLs that are not allowed or have a different meaning in the URL syntax. For example, we might want to search for “C# programming” on Google, but we cannot use the # sign in our query because it is used as a delimiter for fragments. To solve this problem, we can use URL encoding or percent encoding. This is a process of converting these characters into a format that can be safely transmitted in a URL. The format consists of a percent sign % followed by two hexadecimal digits that represent the ASCII code of the character. For example, the # sign is encoded as %23 and a space is encoded as %20. So, our search query would look like this:

https://www.google.com/search?q=C%23+programming

URL encoding can also be used to encode non-ASCII characters such as Chinese or Arabic characters into UTF-8 format. For example, the Chinese word for “hello” 你好 is encoded as %E4%BD%A0%E5%A5%BD.

Relative vs Absolute URLs

An absolute URL is a complete URL that specifies all the components of the resource location, such as scheme, host, port, path, query, and fragment. For example:

https://www.example.com/blog/post.php?id=123&lang=en#comments

A relative URL is an incomplete URL that omits some or all of these components and relies on the context or base URL to resolve them. For example:

/blog/post.php?id=123&lang=en#comments

This relative URL can be resolved to an absolute URL if we know that it refers to the same host and scheme as the base URL:

https://www.example.com/blog/post.php?id=123&lang=en#comments

This is useful when we want to link to a resource on the same website or server. For example, we can use a relative URL to link to a different page on the same website:

<a href="/about">About</a>

URL Routing

URL routing is the process of determining which resource to serve based on the URL of a request. Different web servers and frameworks have different ways of implementing URL routing, but the basic idea is to match the URL with a predefined pattern or rule that specifies which resource to serve. For example, a simple rule could be:

If the URL is /home, serve the home page
If the URL is /blog/post?title=whats-in-a-url, serve the blog post with the title “What’s in a URL?”
If the URL is /about, serve the about page
If the URL does not match any of these rules, serve a 404 page (not found)

URL routing can also involve extracting parameters from the URL and passing them to the resource. For example, in the URL /blog/post?title=whats-in-a-url, the parameter title has the value whats-in-a-url. This value can be used by the blog post resource to display the appropriate content.

URL routing can be static or dynamic. Static routing means that the rules are fixed and do not change based on the request. Dynamic routing means that the rules are flexible and can change based on the request. For example, a dynamic routing rule could be:

If the URL is /blog/post/<slug>, serve the blog post with the slug <slug>
If no blog post with that slug exists, serve a 404 page

In this case, <slug> is a variable that can take any value. For example, /blog/post/whats-in-a-url and /blog/post/how-to-design-urls are both valid URLs that match this rule.

Best Practices for URL Design

URL design is an important aspect of web development that affects user experience, search engine optimization (SEO), and maintainability. Here are some best practices for creating URLs that are clear, consistent, and meaningful:

Use descriptive words instead of numbers or codes. For example, /blog/post/whats-in-a-url is better than /blog/post/12345.
Use hyphens (-) instead of underscores (_) or spaces (%20) to separate words. For example, /blog/post/whats-in-a-url is better than /blog/post/whats_in_a_url or /blog/post/whats%20in%20a%20url.
Use lowercase letters instead of uppercase letters. For example, /blog/post/whats-in-a-url is better than /blog/post/Whats-In-A-URL.
Use short and simple URLs instead of long and complex ones. For example, /blog/post/whats-in-a-url is better than /blog/category/web-development/subcategory/url-design/article/whats-in-a-url.
Avoid using unnecessary words or characters in URLs. For example, /blog/post/whats-in-a-url is better than /blog/posts/view-post.php?title=whats-in-a-url.
Use keywords that are relevant to your content and target audience in URLs. For example, /blog/post/whats-in-a-url is better than /blog/post/random-stuff.
Be consistent with your URL structure and conventions across your website.
Avoid changing URLs once they are published. If you need to change a URL, use a redirect to redirect requests to the old URL to the new URL.
Use canonical URLs: Canonical URLs are the preferred versions of your pages that you want search engines to index and display. You should use canonical tags or redirects to avoid duplicate content issues caused by variations in your URLs. For example, if you have multiple URLs that point to the same page, such as https://www.example.com/blog/whats-in-a-url and https://example.com/whats-in-a-url, you should specify one of them as the canonical URL and redirect or link to it from the others.

Summary

In this article, we learned about URLs and how they are used to identify and locate resources on the web. We also learned about the different components of a URL and how to use URL encoding to include special characters in URLs. We also learned about URL routing and some best practices for URL design. I hope you found this article useful. See you next time!