I was working on a requirement where the requirement was to cache urls coming continuously from a
source.And our cache has limited size.We can't afford to store all the
urls.So here we decided to store 5 lakh most recently used urls in
our cache.And those urls are not in use since a while(least recently
used urls) will be evicted and new urls coming from the source
will be added in the cache.If the new url is already present in cache we
will mark the url as most recently used.
For more clarity on caches,please refer the blog post caches. Now let's decide what data structure we will use.We can simply think of storing the urls in an LinkedList of size 5 lakh.But there are some limitations of using LinkedList.
But the complexity for this is of order n i.e. O(n),which is not desired.
But the complexity for this is of order n ie. O(n), which is not desired.
From this we conclude that LinkedList is not the correct choice for this.
To solve the first problem we use a doublyLinkedList,where the least recently used elements will be available in the tail of the list and can be accessed in O(1) time.And most recently used elements will be in the head of the list.
To Solve the second problem we use a hash map,so that we can check whether an url is available in cache or not in O(1) time.
So to create a LRU cache, we have to take the help of two data structure namely a DoublyLinkedList and a HashMap.
Please see the implementation below.It is straight forward.
The method we applied here uses a DoublyLinkedList and HashMap.Double linked list is to maintain insertion order and finding the tail of the cache in O(1) time.And HashMap to check if an url is already exist in cache in O(1) time.
But Java has a lesser known data structure know as LinkedHashMap ,which provides both the features of doublly linked list and hashmap.
But here to remember that by default the LinkedHashMap order is the insertion order, not access order.But there is a constructor of it to provide the access order for the same.We should use the below constructor LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder) to solve the purpose.
//Stright from the java doc
A special constructor is provided to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order). This kind of map is well-suited to building LRU caches. Invoking the put or get method results in an access to the corresponding entry (assuming it exists after the invocation completes). The putAll method generates one entry access for each mapping in the specified map, in the order that key-value mappings are provided by the specified map's entry set iterator. No other methods generate entry accesses. In particular, operations on collection-views do not affect the order of iteration of the backing map.
link http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html
Let's see the implementation
For more clarity on caches,please refer the blog post caches. Now let's decide what data structure we will use.We can simply think of storing the urls in an LinkedList of size 5 lakh.But there are some limitations of using LinkedList.
Limitation 1:
What will happen if we want to find the least recently used urls.It is our frequent requirement as we want to evict the least recently used url from the cache to keep it free for other most recently used urls.But the complexity for this is of order n i.e. O(n),which is not desired.
Limitation 2:
But what will happen if a url comes from the source and we want to check whether the url already exist in the cache or not?For this we have to traverse the whole list.But the complexity for this is of order n ie. O(n), which is not desired.
From this we conclude that LinkedList is not the correct choice for this.
To solve the first problem we use a doublyLinkedList,where the least recently used elements will be available in the tail of the list and can be accessed in O(1) time.And most recently used elements will be in the head of the list.
To Solve the second problem we use a hash map,so that we can check whether an url is available in cache or not in O(1) time.
So to create a LRU cache, we have to take the help of two data structure namely a DoublyLinkedList and a HashMap.
Please see the implementation below.It is straight forward.
package com.brainatjava.lru;
import java.util.HashMap;
import java.util.Map;
public class LRUCache {
private DoublyLinkedList urlList;
private Map urleMap;
public LRUCache(int cacheSize) {
urlList = new DoublyLinkedList(4);
urleMap = new HashMap();
}
public void accessURL(String url ) {
Node pageNode = null;
if(urleMap.containsKey(url)) {
// If url is present in the cache, move the page to the head of list
pageNode = urleMap.get(url);
urlList.takeURLToHead(pageNode);
} else {
// If the page is not present in the urlcache, add the page to the urlcache
if(urlList.getCurrSize() == urlList.getSize()) {
// If cache is full, we will remove the tail from the cache
// and remove it from urlmap.
urleMap.remove(urlList.getTail().getURL());
}
pageNode = urlList.addPageToList(url);
urleMap.put(url, pageNode);
}
}
public void printCacheState() {
urlList.printList();
System.out.println();
}
public static void main(String[] args) {
int cacheSize = 4;
LRUCache cache = new LRUCache(cacheSize);
cache.accessURL("http://a");
cache.printCacheState();
cache.accessURL("http://b");
cache.printCacheState();
cache.accessURL("http://a");
cache.printCacheState();
cache.accessURL("http://a");
cache.printCacheState();
cache.accessURL("http://d");
cache.printCacheState();
cache.accessURL("http://c");
cache.printCacheState();
cache.accessURL("http://g");
cache.printCacheState();
cache.accessURL("http://h");
cache.printCacheState();
cache.accessURL("http://c");
cache.printCacheState();
}
}
class DoublyLinkedList {
private final int size;
private int currSize;
private Node head;
private Node tail;
public DoublyLinkedList(int size) {
this.size = size;
currSize = 0;
}
public Node getTail() {
return tail;
}
public void printList() {
if(head == null) {
return;
}
Node tmp = head;
while(tmp != null) {
System.out.print(tmp);
tmp = tmp.getNext();
}
}
public Node addPageToList(String url) {
Node pageNode = new Node(url);
if(head == null) {
head = pageNode;
tail = pageNode;
currSize = 1;
return pageNode;
} else if(currSize < size) {
currSize++;
} else {
tail = tail.getPrev();
tail.setNext(null);
}
pageNode.setNext(head);
head.setPrev(pageNode);
head = pageNode;
return pageNode;
}
public void takeURLToHead(Node node) {
if(node == null || node == head) {
return;
}
if(node == tail) {
tail = tail.getPrev();
tail.setNext(null);
}
Node prev = node.getPrev();
Node next = node.getNext();
prev.setNext(next);
if(next != null) {
next.setPrev(prev);
}
node.setPrev(null);
node.setNext(head);
head.setPrev(node);
head = node;
}
public int getCurrSize() {
return currSize;
}
public void setCurrSize(int currSize) {
this.currSize = currSize;
}
public Node getHead() {
return head;
}
public void setHead(Node head) {
this.head = head;
}
public int getSize() {
return size;
}
}
class Node {
private String url;
private Node prev;
private Node next;
public Node(String url) {
this.url = url;
}
public String getURL() {
return url;
}
public void setURL(String url) {
this.url = url;
}
public Node getPrev() {
return prev;
}
public void setPrev(Node prev) {
this.prev = prev;
}
public Node getNext() {
return next;
}
public void setNext(Node next) {
this.next = next;
}
public String toString() {
return url + " ";
}
}
But Java has a lesser known data structure know as LinkedHashMap ,which provides both the features of doublly linked list and hashmap.
But here to remember that by default the LinkedHashMap order is the insertion order, not access order.But there is a constructor of it to provide the access order for the same.We should use the below constructor LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder) to solve the purpose.
//Stright from the java doc
A special constructor is provided to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order). This kind of map is well-suited to building LRU caches. Invoking the put or get method results in an access to the corresponding entry (assuming it exists after the invocation completes). The putAll method generates one entry access for each mapping in the specified map, in the order that key-value mappings are provided by the specified map's entry set iterator. No other methods generate entry accesses. In particular, operations on collection-views do not affect the order of iteration of the backing map.
link http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html
Let's see the implementation
package com.brainatjava.lru;
import java.util.LinkedHashMap;
public class LRUCache extends LinkedHashMap {
private int size;
private LRUCache(int size){
super(size, 0.75f, true);
this.size =size;
}
@Override
protected boolean removeEldestEntry(java.util.Map.Entry eldest) {
// TODO Auto-generated method stub
return size() > size;
}
@Override
public V get(Object key) {
// TODO Auto-generated method stub
return super.get(key);
}
public static LRUCache newInstance(int size) {
return new LRUCache(size);
}
}
No comments:
Post a Comment