Common data structures and when to use each

Michael Choi
5 min readMay 6, 2020
Photo by Joanna Kosinska on Unsplash

Learning data structures could be difficult for someone new to coding. It does help a new coder understand the power of recursion and Object Oriented Programming in driving more efficiency out of your system.

For example, imagine that you installed a new operating system and starting the operating system took 10 minutes for the system to load. If a developer you know can make the operating system start in 30 seconds instead, wouldn’t you be stoked?

Similarly, if you installed a desktop application and it took 5 minutes to do a complex computation and on the next version update, the same operation took only 2 seconds, wouldn’t you be excited?

Efficient algorithm helps make happy customers. For web applications, understanding how to build data structures is not as important as if you were building a new operating system, database, or programming language from scratch. However, some of the principles, such as looking at the time-complexity of the software and knowing what Big O notation your function is operating is still important.

For this article, let me address different common data structures and when each could be used.

Array

Array is a common data structure that’s used very frequently. Searching for a particular value in the array O(N). If you want to insert a new value in the middle of the array, you’ll also need to shift the values. Therefore, insertion or deletion for an array takes additional O(N). This means if you are looking for a specific value in the array and wanting to insert another value right after, or delete that value, total big O for that is O(N) + O(N) = O(2N).

Linked List

Linked list (whether it’s a singly linked list or doubly linked list) is great for adding information sequentially. If you need to build a queue system or a stack system, a linked list is the way to go. It’s also quite simple to implement.

For a linked list, searching for a particular value in the linked list is O(N) as you would need to start from the first item and traverse through each item. Once you find a specific node to delete or where you want to insert a new node, that operation is only O(1). Therefore, if you were searching for a specific value to delete, as you would perform both ‘search’ and ‘deletion’, the total big O for that would be O(N) + O(1) = O(N).

Note that a linked list is a lot faster for insertion/deletion than doing the operation in the array!

Binary Search Tree

A binary search tree is great for organizing lots of numbers and quickly searching for a specific value as well as adding/removing a value in the list. Searching a specific value in a binary search tree is O(log N). Note that this is significantly faster than a linked list or an array which were O(N) for the search operation.

Inserting a new value in the binary search tree is faster than an array but slower than a linked list. Insertion or deletion in a binary search tree is O(log N).

Hash Table

A hash table is great for organizing lots of information. Depending on how many items are stored in each bucket and whether each bucket stores a linked list or a BST or another data structure, the big O notation for its search, insertion, or deletion can change.

For example, if the hash table had enough buckets where in each bucket, only one value was stored, search, insertion, and deletion would all be O(1).

If the hash table had a linked list store in each bucket, then the search would be O(N) although z times faster than if everything was stored in a giant linked list (where z is the number of buckets in the hash table). Insertion and deletion operation would be O(1) still as each bucket is holding a linked list.

A hash table could also have an array stored in each bucket or a BST for that matter.

Summary Chart

If we were only to look at the average time complexity for search, insertion, and deletion, these data structures would compare as follows:

If we were to look at the average time complexity and the worst-case complexity, it would look as follows:

Note that the O notation for the worst case doesn’t change much except for

  1. Hash table — if the hash function is completely inefficient and put all the values into a single bucket, then it would take N times to search for a particular value. Although it would be highly unlikely to have such a bad hash function, after all, we’re looking at the worst case scenario.
  2. BST — if the binary search tree was completely lopsided and had a linear tree, then the height of the tree would be N. When the height of the tree is the same as the number of values stored in BST, then the search, insertion, and deletion would happen at O(N).

What’s the main key takeaways?

  1. SLL/DLL/Stacks/Queue are great for insertion and deletion and have O(1) for insertion/deletion!
  2. BST is great for search (clearly, it is named after all binary “search” tree) and performs this at O(log N).
  3. Arrays have O(N) for search/insertion/deletion
  4. When there are lots of data, a hash table is great for increasing how fast information can be searched. Depending on the number of buckets, it can speed up search lookup by hundreds/thousands (again by the number of buckets used). Hash table does use more disk/memory storage so there is a bit of trade-off.
  5. When using a hash table, you can store arrays, SLL/DLL/Stacks/Queue, or BST. Depending on which data type you store, you inherit the big O notation behavior of these data structures within the hash table.

Thanks for reading!

--

--