Everything Code

Tuesday, October 18, 2011

Chrome and java applet plugin on ubuntu tails

To enable do this:

sudo ln -sf /usr/lib/jvm/java-6-sun/jre/lib/amd64/libnpjp2.so /usr/lib/firefox-addons/plugins/libnpjp2.so
sudo ln -sf /usr/lib/jvm/java-6-sun/jre/lib/amd64/libnpjp2.so /usr/lib/mozilla/plugins/libnpjp2.so
mkdir ~/.mozilla/plugins
cd ~/.mozilla/plugins/
ln -s /usr/lib/firefox-addons/plugins/libnpjp2.so

More info here http://www.google.com/support/forum/p/Chrome/thread?tid=3ef7b4f7b08118ed&hl=en

Monday, January 17, 2011

Collection of noSQL presentations and terms

Some great noSQL resources

A book club for noSQL papers http://nosqlsummer.org/papers with difficulty ratings and local meetings everyone can attend.

noSQL concepts that you really do want to understand: Bloom Filter, Vector Clocks, Gossip Protocol, Dynamo, Paxos, MapReduce, CAP, Eventual Consistency, Column Storage, Consistent Hashing, Hinted Handoff, Read Repair, Write Ahead. Also make sure to know what: Hadoop, GFS, Cassandra, Mongo are.

Lectures:

http://www.infoq.com/presentations/Riak-Core - Andy Gross review of Dynamo (and Riak)
http://nosqltapes.com/ - Collection of lectures about the noSQL ecosystem, plus the technical aspects.

Wednesday, January 5, 2011

Recursive delete utility for version enabled S3 bucket

Deleting S3 bucket that contains anything more then the most trivial collection of keys (>1000) is a never ending saga. Deleting S3 bucket that has versioning enabled is with today's tools simply not possible.

We had 2 buckets that were taking up storage space: the first had ~300,000,000 keys, the second ~2500 but with versioning enabled and keys that contain "\n" (new line), "//" (double slash) and " " (space) in their names. Trying to use tools such as jets3, boto, bucket explorer, aws s3 console, s3fox on either of these buckets failed in the most pathetic/exotic of ways.

I quickly understood that this requires a custom implementation.

Doing some googling I found this http://mark.koli.ch/2010/09/recursively-deleting-large-amazon-s3-buckets.html excellent post proving some pseudo-code for multi threaded delete operation against S3. Good, the correct post in the correct time! Based on Mark's code I've implemented BucketDestroy which has the sole purpose of doing everything it takes to delete an S3 bucket, a seek & destroy S3 bucket commando if you like.

I've enhanced Mark code with the ability to delete not just bucket keys but bucket key versions, this is required because Amazon will not let you delete S3 bucket that has old versions of a Key.

The code is provided below, I've also hosted it on google code where you can find maven based setup instructions and the latest version.

Enjoy!

import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;

import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.DeleteVersionRequest;
import com.amazonaws.services.s3.model.S3VersionSummary;
import com.amazonaws.services.s3.model.VersionListing;


/**
 * Efficient delete of version enabled bucket.
 * 
 * This code will delete all keys from gives bucket and all previous versions of given key.
 * This is done in an efficient multi-threaded ExecutorService environment
 * 
 * Launching:
 * 
 * $java -Daws.key=AK7895IH1234X2GW12IQ -Daws.secret=1234567890123456789012345678901234456789 BucketDestroy "bucketName"

 * @author Maxim Veksler <maxim@vekslers.org>
 * @author Mark S. Kolich http://mark.koli.ch/2010/09/recursively-deleting-large-amazon-s3-buckets.html
 * 
 * @version 0.1 - First pseudo code version, Written by Mark S. Kolich. 
 * @version 0.2 - Maxim Veksler - Fix paging. Implement deletion of versioned keys.
 * 
 * For the latest version see http://code.google.com/p/bucket-destroy/
 */
public class BucketDestroy {
    private static final String AWS_ACCESS_KEY_PROPERTY = "aws.key";
    private static final String AWS_SECRET_PROPERTY = "aws.secret";
    private static AmazonS3Client s3;
    
    public static void main(String[] args) {

        final String accessKey = System.getProperty(AWS_ACCESS_KEY_PROPERTY);
        final String secret = System.getProperty(AWS_SECRET_PROPERTY);
        if (accessKey == null || secret == null) {
            System.err.println("You're missing the -Daws.key and -Daws.secret required VM properties.");
            System.exit(100);
        }

        if (args.length < 1) {
            System.err.println("Missing required program argument: bucketName.");
            System.exit(100);
        }
        
        // S3 Initialization
        s3 = new AmazonS3Client(new BasicAWSCredentials(accessKey, secret));


        // Instantiate BucketDestroy and call destroyBucket() - Bucket deleting beast!
        new BucketDestroy().destroyBucket(args[0]);
    }

    /**
     * Do everything it takes to delete amazon S3 bucket. 
     * 
     * Currently it means delete all versions of bucket keys.
     * 
     * TODO: 
     *  - Support distributed operation by doing shuffle on first level of bucketName/../
     *  - Seek a method to obtain MFA to delete MFA enabled bucket
     *  - Implement starvation prevention for deleting threads (do not wait until next s3.listVersions finishes)
     */
    public void destroyBucket(final String bucketName) {
        // Set up a new thread pool to delete 20 objects at a time.
        ExecutorService _pool = Executors.newFixedThreadPool(20);

        // Get counter, just for status
        final AtomicInteger ATOMIC_INTEGER = new AtomicInteger();
    
        // List all key in the bucket
        VersionListing versionListing = s3.listVersions(bucketName, "");
        List<S3VersionSummary> versionSummaries = versionListing.getVersionSummaries();
        while(versionSummaries != null && versionSummaries.size() > 0) {
            final CountDownLatch latch = new CountDownLatch(versionSummaries.size());
            for(final S3VersionSummary objectSummary : versionSummaries) {
                _pool.execute(new Runnable() {
                    @Override
                    public void run() {
                        String keyName = null;
                        String versionId = null;
                        try {
                            keyName = objectSummary.getKey();
                            versionId = objectSummary.getVersionId();
                            
                            String info = ">>>>   INFO: " + ATOMIC_INTEGER.incrementAndGet() + " deleting: (" + bucketName + "/" + keyName + "@" + versionId + ")";
                            System.out.println(info);
                            
                            s3.deleteVersion(new DeleteVersionRequest(bucketName, keyName, versionId));
                        } catch (Exception e) {
                            String err = ">>>> FAILED delete: (" + bucketName + "/" + keyName + "@" + versionId + ")";
                            System.err.println(err);
                        } finally {
                            latch.countDown();
                        }
                    }
                });
            }
            
            // After sending current batch of delete tasks we block until Latch reaches zero, this 
            // allows to not over populate ExecutorService tasks queue. 
            try {
                latch.await();
            } catch (Exception exception) {
            }

            // Paging over all S3 keys...
            versionListing = s3.listNextBatchOfVersions(versionListing);
            versionSummaries = versionListing.getVersionSummaries();
            
        }
        
        _pool.shutdown();

        try {
            // This code fails with HTTP 400 from aws API 
//            SetBucketVersioningConfigurationRequest setBucketVersioningConfigurationRequest = new SetBucketVersioningConfigurationRequest(bucketName, new BucketVersioningConfiguration().withStatus(BucketVersioningConfiguration.OFF));
//            s3.setBucketVersioningConfiguration(setBucketVersioningConfigurationRequest);
            // So we do this:
            s3.deleteBucket(bucketName);
            System.out.println(">>>>   INFO: bucket " + bucketName + " deleted!");
        } catch (Exception e) { 
            System.err.println("Failed to ultimately delete bucket: " + bucketName);
            e.printStackTrace();
        }

    }
    
}

Monday, August 9, 2010

Accessing hadoop web interface on AWS

Hadoop exports status interface via HTTP, using DNS names.
Amazon uses internal DNS names in their AWS cloud.

Therefor, should you request to access Hadoop status interface from outside the cloud (like your office pc... you will fail).

This guide suggest a solution (using a claver work around) to this problem.

Problem Description

Hadoop builds status html pages & links based on current machine hostname. This hostname represents internal machine DNS name which is resolvable from within the AWS cloud using AWS internal DNS server:

cat /etc/resolv.conf 
nameserver 172.16.0.23
domain ec2.internal
search ec2.internal

the problem here comes into existence when trying to access the status page from your development box (Office, home, mobile, telepathic...) here your machine will try to resolve ip-10-202-30-119.ec2.internal using your ISP DNS server, which will obviously fail.

Solution

The solution to the above problem is to use a combination of several cool tricks in the basis of which stands SOCKS5 proxy. Let's get started:

Install Firefox 3 http://www.mozilla.com/firefox/
FoxyProxy Standard is a proxy manipulation utility for firefox, install it from https://addons.mozilla.org/en-US/firefox/addon/2464/
Make the following configurations to FoxyProxy
1. Right click on foxy proxy Icon and choose options
2. Choose "Add New Proxy"
  
  General
  Proxy Name = AWS Internal
  
  Proxy Details
  Select * Manual proxy configuration
  Host or IP Address = localhost Port = 6666
  [v] SOCKS proxy? (x) SOCKS v5
  
  URL Patterns
  Create 2 patterns: *ec2.internal*, *domu*.internal*
  * Tip: You might want to consider a 3rd patten if you are for example from Israel... *pandora.com*
3. Make sure you have selected "Use proxies based on their pre-defined patterns and priorities" from the FoxyProxy options menu.
This step diverts for Windows & Linux users
Windows
//TODO// Configure putty with dynamic tunning.

Linux
```
nohup ssh -CqTnN -D 6666 ubuntu@ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
```

OK... now try to access http://ip-10-202-30-119.ec2.internal:50070/dfshealth.jsp you should be getting a nice GUI allowing you to browse your HDFS file system.

How it works

You are creating using SSH protocol a socks proxy tunned over ssh. From one hand your ssh client accepts SOCKS5 proxy requests on port 6666 (localhost) and transfers them to the host inside AWS cloud. Your firefox is set to do requests for *patten* which matches internal host names inside AWS cloud over a proxy (using foxyproxy). So what happens is the following:

Firefox: "Hmmm, I need to do HTTP GET for ip-10-202-30-119.ec2.internal" - Who should I be talking to? OH! I know, foxyproxy tells me I should delegate this to my friends, the socks5 proxy at localhost:6666, cool.
Firefox: Hey localhost:6666 (socks5 proxy), do me a favor, get me the content for ip-10-202-30-119.ec2.internal:50070
localhost:6666: Hmmm, ok.
localhost:6666 -> ssh -> ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com: HTTP GET http://ip-10-202-30-119.ec2.internal:50070
ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com: Hey, I know him. let me do a quick http get on his ass
ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com: HTTP GET 10.202.30.119:50070
ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com -> ssh -> localhost:6666 -> Firefox

This completes the cycle. At this point (skipped a few for clerity) you, the user, can the context of the HTTP GET request.

Mission accomplished.

Note that other DNS address (not from the aws ec2 cloud) are not being served over the socks proxy, for example if you go to http://google.com you get a direct hit by the browser - Which is exactly what you want to happens because traffic that goes via proxy has it's costs (bandwidth, latency).

Hope you enjoyed this, comments (as always) are welcome.
Maxim.

Saturday, July 31, 2010

Ubuntu perfect setup for Java Developers

Tuesday, July 13, 2010

MySQL General Query Log Filtering using grep and regular expressions

MYSQL has a very useful feature called Query Log which allows you among other things to debug your application code by viewing DB DML's submitted live from the application straight into DB back-end (Low level, the best!).

To configure this little joy you will need to play a bit with your mysql server settings.

Edit your my.cnf file (Under debian that is /etc/mysql/my.cnf) in the section of

[mysqld]

add

log=/var/log/mysql/query.log

This is explained in greater details at this post.

Now the thing is query log as the name suggests logs everything that has SQL smell on it and I do mean EVERYTHING (besides CRUD operations you will see: commit/rollbacks, show tables, use DB-XYZ and co.) - Which is mostly junk when the only thing you really need is to trace your hibernate based Console save to get the back-end integration going... To summarize in a catch phrase: You wan't to filter out the log spam when debugging your application (vs. your replication ;).

To be more efficient and be able to look at the logs in real time I've created a fast and dirty regex. The following chain of piped grep's will leave you only with INSERT, UPDATE, DELETE statements which is exactly what I was after (Your mileage might vary).

One would use something like this:

tailf /var/log/mysql/query.log | grep -E '[[:space:]]+[[:digit:]]+[[:space:]]Query' | grep -ivE 'Query([[:space:]])+(/\*.*\*/)?(select|set|show|commit|rollback|use)'

The command works by constantly (tailf) pushing events from mysql query.log throw the 2 greps which do some smarty ass filtering to leave you with the interesting part. If you need other command (SELECT for example) you can remove them from the second grep section. If you want to see just a single table add another 3'rd grep to the chain and filter just by table name.

Comments are welcome.

Enjoy,
Maxim.

Saturday, June 19, 2010

Eclipse 3.6 Helios - Final version available for download

I don't know what's so special about this build from eclipse.org foundation but for some reason the excitement is high.

Well friends, without further adieu I give you the path to the final release download location 4 days before the official release schedule

Check it out from here http://mirrors.xmission.com/eclipse/technology/epp/downloads/release/helios/R/

Enjoy.

Everything Code

Tuesday, October 18, 2011

Chrome and java applet plugin on ubuntu tails

Monday, January 17, 2011

Collection of noSQL presentations and terms

Wednesday, January 5, 2011

Recursive delete utility for version enabled S3 bucket

Monday, August 9, 2010

Accessing hadoop web interface on AWS

Problem Description

Solution

How it works

Saturday, July 31, 2010

Ubuntu perfect setup for Java Developers

Tuesday, July 13, 2010

MySQL General Query Log Filtering using grep and regular expressions

Saturday, June 19, 2010

Eclipse 3.6 Helios - Final version available for download

Blog Archive

About Me

Everything Code

Tuesday, October 18, 2011

Chrome and java applet plugin on ubuntu tails

Monday, January 17, 2011

Collection of noSQL presentations and terms

Wednesday, January 5, 2011

Recursive delete utility for version enabled S3 bucket

Monday, August 9, 2010

Accessing hadoop web interface on AWS

Problem Description

Solution

How it works

Saturday, July 31, 2010

Ubuntu perfect setup for Java Developers

Tuesday, July 13, 2010

MySQL General Query Log Filtering using grep and regular expressions

Saturday, June 19, 2010

Eclipse 3.6 Helios - Final version available for download

Blog Archive

Subscribe To

About Me