Showing posts with label Redis. Show all posts
Showing posts with label Redis. Show all posts

Sunday, October 23, 2016

Bloom Filters By Example

In this post we will discuss about bloom filter and its use case.Let's create a scenario like this first.
Assume there is cycle stand in our college.And the stand has 1000 slots for parking the cycles.And usually  one slot can have 4 cycles.So definitely that stand has capacity to have 4000 cycles.And it is very well known  that Mr. Akash keeps his cycle in slot no 1 every day.

So if we want to know if Akash is present in college today,we just check slot no 1 and if there is any cycle available there , we say yes Akash is present in college.But it is not hundred percent correct.As we said above each slot can have four cycles,it may be possible that cycle present in slot no 1 may not belongs to Akash.

So here a case arrises, which is false positive.But if no cycle is there in slot no 1, we say that definitely Akash is absent today.So there is no chance of false negative.That is we never say that Akash is absent today in case of his presence in college.

Bloom filter is a simple hash based  filter works  on the same principle.It allows to store elements and help us to quickly  identify many (not all) elements those are not present.Sometimes we can say that Akash is not in the college(if no cycle is there in slot 1).


Use Case:

Suppose we are going to create an anti virus software, which will maintain the list of malicious site and a list of know viruses.A naive approach is to maintain data structure to hold the details of all the malicious programmes.A problem with this approach is that it may consume a considerable amount of memory. If you know of a million malicious programmes, and programmes  need  an average of 10 bytes to store, then you need 10 megabytes of storage. That’s definitely an overhead . Is there any efficient way?Yes there  is.

Implementation Details:


We will do two things with bloom filter
1.insert an element in the filter.
2.Test an element if it is a member of bloom filter.

Bloom filter is a probabilistic data structure,that is not deterministic.We will came to know the reason in a while.

Let's take a bit array of size m.Initialize each position to zero.The idea here is that we choose k hash functions whose max value will be within the range 0 to m.Here k is constant such that k
To test for an element (whether it is in the set), feed it to each of the k hash functions to get k array positions. If any of the bits at these positions is 0, the element is definitely not in the set. if it were, then all the bits would have been set to 1 when it was inserted. If all are 1, then either the element is in the set, or the bits have been set by chance to 1 during the insertion of other elements, which results  false positive.

Deletion of elements are not allowed due to obvious reasons.Suppose we want to delete an element then defintely we need to set any of the  k bit positions generated by k hash functions for that element to 0.But there will be a chance that bit which we are going to set to 0 may be the result of some other elements.


To add an element in the filter simply pass that element to the k hash functions.Now we will get k different values between 0 to m-1 i.e we ll get k different array positions.Mark these positions to 1.Note that here we are not putting the exact hash value in array we are simply marking that position to 1.
As false negatives are not allowed in Bloom filter, we can't delete the elements from it.


Applications of Bloom Filter:


1.Cassandra uses bloom filters to save IO when performing a key lookup: each SSTable has a bloom filter associated with it that Cassandra checks before doing any disk seeks.
2.The Google Chrome web browser use a Bloom filter to identify malicious URLs. Any URL was first checked against a local Bloom filter, and only if the Bloom filter returned a positive result was a full check of the URL performed (and the user warned, if that too returned a positive result).


Although the Bloom Filter is a data structure ,it is called a filter because because it often used as a first pass to filter out elements of a data set that dont match a certain criteria.

Please refer wikipedia  for more applications and detailed explanations.

Saturday, April 23, 2016

Redis Pub Sub with Jedis

Within these days we were working on a project which heavily relies on Redis for its data sync process.To give  more clarity here we are briefing  the scenario.

There are two parts on the system.One is a producer which is making some cache in Redis and another is a consumer which is subscribed to the same Redis.The subscription is through a channel.The consumer is subscribed to the channel.When there is some change in the channel ,it is published to the consumer and the consumer  updates itself accordingly.Here we used Jedis as Redis  client.



In Redis, we can subscribe to multiple channels and when someone publishes messages on those channels, Redis notifies us  with published messages. Jedis provides this functionality with JedisPubSub abstract class. To handle pub / sub events, we need to extend JedisPubSub class and implement the abstract methods.



package com.brainatjava.test;
public class xyzListener extends JedisPubSub {   @Override
    public void onMessage(String channel, String message) {
    System.out.println(message);
}

Now we wrote the code for registering listener in a different class.

public class TestProgram {
private void registerListeners() {
        if(!xyzListener.isSubscribed()){
                Runnable task=()->{       
                try{
                    Jedis jdeisConnection=    ((Jedis) redisTemplate.getConnectionFactory().getConnection().getNativeConnection());
                    jedisConnectionExceptionFlag=false;
                    jdeisConnection.subscribe(lineItemDeliveryListener, "channelName");
                      
                }
                catch(JedisConnectionException jce){
                    logger.error("got jedis connection excpetion "+jce.getMessage(),jce);
                    jedisConnectionExceptionFlag=true;
                }
                catch(RedisConnectionFailureException rce){
                    logger.error("got jedis RedisConnectionFailureException excpetion "+rce.getMessage(),rce);
                    jedisConnectionExceptionFlag=true;
                }
                catch(Exception e){
                logger.error("error in registerListeners "+xyzListener.isSubscribed(),e);
            }};
           
            Thread xyzUpdater = new Thread(task);
            xyzUpdater.setName("xyzUpdater");
            xyzUpdater.start();
        }
    }

}
Here if we notice the above code ,we found that  first we are checking is xyzListener is subscribed to the required channel ,if not we are doing it.But Observe that we are doing it in a hacky way.That is we are getting the native connection first and then we are subscribing to the channel.And the subscribe is a blocking call.It act in wait and watch mode.

So when there is a change in the channel the listener listens it and update its cache accordingly.But There is always should have a fail safe mechanism in place.We have also done that.If somehow Jedis pubsub is not working,then we have a mechanisim in place to do it manually.

The mechanism is like we have a cron scheduler running in every 15 minutes and    checking the cache timestamp and if the cache timestamp of the latest cache and the timestamp of the consumer varies, then we assume there is issue with pubsub and we will update the consumer cache manually.

With this design everything was fine and  the 15 minutes cron was their without any use.After the smooth working of some days we got an alert that the 15 minute cron is running and manually updating the cache.So it has no impact on our service as the cache is getting updated manually with the help of cron scheduler.

But why this happened?After investigetting sometimes we found before some  days the redis was restarted.And this created the whole issue.It is behaving like it is subscribed.Now you know the solution.