Java Bytecode Assembly, Interesting concepts in the art of Computer Science/Software Development, maybe from time to time other pesky corp. related development issues.

Monday, August 9, 2010

Accessing hadoop web interface on AWS

Hadoop exports status interface via HTTP, using DNS names.
Amazon uses internal DNS names in their AWS cloud.

Therefor, should you request to access Hadoop status interface from outside the cloud (like your office pc... you will fail).

This guide suggest a solution (using a claver work around) to this problem.

Problem Description

Hadoop builds status html pages & links based on current machine hostname. This hostname represents internal machine DNS name which is resolvable from within the AWS cloud using AWS internal DNS server:
cat /etc/resolv.conf 
nameserver 172.16.0.23
domain ec2.internal
search ec2.internal 
the problem here comes into existence when trying to access the status page from your development box (Office, home, mobile, telepathic...) here your machine will try to resolve ip-10-202-30-119.ec2.internal using your ISP DNS server, which will obviously fail.

Solution

The solution to the above problem is to use a combination of several cool tricks in the basis of which stands SOCKS5 proxy. Let's get started:

  1. Install Firefox 3 http://www.mozilla.com/firefox/
  2. FoxyProxy Standard is a proxy manipulation utility for firefox, install it from https://addons.mozilla.org/en-US/firefox/addon/2464/
  3. Make the following configurations to FoxyProxy



    1. Right click on foxy proxy Icon and choose options

    2. Choose "Add New Proxy"

      General
      Proxy Name = AWS Internal

      Proxy Details
      Select * Manual proxy configuration
      Host or IP Address = localhost Port = 6666
      [v] SOCKS proxy? (x) SOCKS v5

      URL Patterns
      Create 2 patterns: *ec2.internal*, *domu*.internal*
      * Tip: You might want to consider a 3rd patten if you are for example from Israel... *pandora.com*
    3. Make sure you have selected "Use proxies based on their pre-defined patterns and priorities" from the FoxyProxy options menu.
  4. This step diverts for Windows & Linux users
    Windows
    //TODO// Configure putty with dynamic tunning.

    Linux



    nohup ssh -CqTnN -D 6666 ubuntu@ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com

OK... now try to access http://ip-10-202-30-119.ec2.internal:50070/dfshealth.jsp you should be getting a nice GUI allowing you to browse your HDFS file system.

How it works


You are creating using SSH protocol a socks proxy tunned over ssh. From one hand your ssh client accepts SOCKS5 proxy requests on port 6666 (localhost) and transfers them to the host inside AWS cloud. Your firefox is set to do requests for *patten* which matches internal host names inside AWS cloud over a proxy (using foxyproxy). So what happens is the following:

Firefox: "Hmmm, I need to do HTTP GET for ip-10-202-30-119.ec2.internal" - Who should I be talking to? OH! I know, foxyproxy tells me I should delegate this to my friends, the socks5 proxy at localhost:6666, cool.
Firefox: Hey localhost:6666 (socks5 proxy), do me a favor, get me the content for ip-10-202-30-119.ec2.internal:50070
localhost:6666: Hmmm, ok.
localhost:6666 -> ssh -> ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com: HTTP GET http://ip-10-202-30-119.ec2.internal:50070
ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com: Hey, I know him. let me do a quick http get on his ass
ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com: HTTP GET 10.202.30.119:50070
ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com -> ssh -> localhost:6666 -> Firefox

This completes the cycle. At this point (skipped a few for clerity) you, the user, can the context of the HTTP GET request.

Mission accomplished.

Note that other DNS address (not from the aws ec2 cloud) are not being served over the socks proxy, for example if you go to http://google.com you get a direct hit by the browser - Which is exactly what you want to happens because traffic that goes via proxy has it's costs (bandwidth, latency).


Hope you enjoyed this, comments (as always) are welcome.
Maxim.

Saturday, July 31, 2010

Ubuntu perfect setup for Java Developers

Tuesday, July 13, 2010

MySQL General Query Log Filtering using grep and regular expressions

MYSQL has a very useful feature called Query Log which allows you among other things to debug your application code by viewing DB DML's submitted live from the application straight into DB back-end (Low level, the best!).

To configure this little joy you will need to play a bit with your mysql server settings.

Edit your my.cnf file (Under debian that is /etc/mysql/my.cnf) in the section of
[mysqld]
add
log=/var/log/mysql/query.log
This is explained in greater details at this post.


Now the thing is query log as the name suggests logs everything that has SQL smell on it and I do mean EVERYTHING (besides CRUD operations you will see: commit/rollbacks, show tables, use DB-XYZ and co.) - Which is mostly junk when the only thing you really need is to trace your hibernate based Console save to get the back-end integration going... To summarize in a catch phrase: You wan't to filter out the log spam when debugging your application (vs. your replication ;).

To be more efficient and be able to look at the logs in real time I've created a fast and dirty regex. The following chain of piped grep's will leave you only with INSERT, UPDATE, DELETE statements which is exactly what I was after (Your mileage might vary).

One would use something like this:
tailf /var/log/mysql/query.log | grep -E '[[:space:]]+[[:digit:]]+[[:space:]]Query' | grep -ivE 'Query([[:space:]])+(/\*.*\*/)?(select|set|show|commit|rollback|use)'

The command works by constantly (tailf) pushing events from mysql query.log throw the 2 greps which do some smarty ass filtering to leave you with the interesting part. If you need other command (SELECT for example) you can remove them from the second grep section. If you want to see just a single table add another 3'rd grep to the chain and filter just by table name.

Comments are welcome.

Enjoy,
Maxim.

Saturday, June 19, 2010

Eclipse 3.6 Helios - Final version available for download

I don't know what's so special about this build from eclipse.org foundation but for some reason the excitement is high.

Well friends, without further adieu I give you the path to the final release download location 4 days before the official release schedule

Check it out from here http://mirrors.xmission.com/eclipse/technology/epp/downloads/release/helios/R/

Enjoy.

Thursday, April 1, 2010

In the working for automatic User Agent Detecting algorithem

I'm thinking about writing a tool that will extract information from User-Agent strings.

In the mean time, here is a quite comprehensive list of tools I've been able to find around the net that can assist me in this task


Tone of links for known User Agent strings

DB:
http://www.useragentstring.com/
http://www.user-agents.org/index.shtml?moz
http://www.icehousedesigns.com/useragents/spiderlist.php
http://www.botsvsbrowsers.com/
http://useragentstring.com/pages/useragentstring.php

Auto detection tools:
http://nerds.palmdrive.net/useragent/code.html
http://stackoverflow.com/questions/1005153/auto-detect-mobile-browser-via-user-agent
http://mdbf.codeplex.com/
http://detectmobilebrowser.com/
http://stackoverflow.com/questions/2508214/regexp-that-matches-user-agents-of-end-user-browsers-but-not-crawlers-with-90
http://stackoverflow.com/questions/927552/parsing-http-user-agent-string
http://www.djangosnippets.org/snippets/267/
http://browsers.garykeith.com/downloads.asp
http://user-agent-string.info/
http://pypi.python.org/pypi/httpagentparser
http://www.handsetdetection.com/
http://www.tnl.net/ua/OS/Cross-Platform
http://stackoverflow.com/questions/1005153/auto-detect-mobile-browser-via-user-agent
http://api.jquery.com/jQuery.browser/#jQuery.browser.version2
http://www.quirksmode.org/js/detect.html

INFO:
http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/


If you know any other resources that can help please post in comments.
I will update this post as soon as I have some news on the subject.

Thank you,
Maxim.

Wednesday, January 20, 2010

Resetting cached password of Subclipse

I was asked to document a FAQ about Subclipse and cached passwords (my turn to play with the IT hat!). Here is the email (kept here for future reference).



Hello Friends,

Eclipse subversion plugin (Subclipse) will cache the username & password of the last user that did a successful commit with your eclipse installation.
This can be problematic if this user is not you, it turns out that until further notice all your svn commits will be under different username.

To resolve this issue, delete the following 2 folders:
rm -rf /opt/eclipse3.3/configuration/org.eclipse.core.runtime/.keyring
rm -rf ~/.subversion/
Then restart eclipse.

An alternative solution would be to kindly ask the busy IT department to change the svn password of the user under which your commits are being executed, then on first commit attempt Eclipse should prompt your for username & password, which will allow you to input your information.

Reference: http://nlp.cs.byu.edu/~rah67/wordpress/?p=4

Maxim.

Sunday, January 17, 2010

Running Average calculation implementation in Java

Quick blog and store note:

Here is a code to calculate average series of length N without the problem of overflow that naive average calculation could face.




/**
* Copyright (c) 2010, Maxim Veksler 
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without modification, 
* are permitted provided that the following conditions are met: 
* 
* Redistributions of source code must retain the above copyright notice, this list
* of conditions and the following disclaimer.
* 
* Redistributions in binary form must reproduce the above copyright notice, 
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution. Neither the name
* of the http://bytecoded.blogspot.com/ nor the names of its contributors
* may be used to endorse or promote products derived from this software 
* without specific prior written permission.
* 
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/

/**
* Calculate average of arbitrary length series of doubles, running average ensures 
* result won't overflow.
* 
* Algorithm: CELLCOUNT represents number of cells calculated so far. 
* For each : cell
*  avg =  ((CELLCOUNT-1) * avg) + ((1/CELLCOUNT) * avg)
* 
* @param series vararg list of parameters (array of doubles).
* @return Average calculated from supplied series.
*/
public static double movingAverage(double... series) {
if(series.length < 1) {
   return 0;
  } 

  double avg = series[0];
  for(int i = 1; i < series.length; i++) {
   avg = (avg * ((double)i/(i+1))) + (series[i] * ((double)1/(i+1)));
  }
  
  return avg;
 }

About Me

My photo
Tel Aviv, Israel
I work in one of Israel's Hi-Tech company's. I do 5 nines system development with some system administration. This blog will focus on the technical side of things, with some philosophical strings attached.