Monday, May 27, 2013

Scala Highlights

I went to a Scala meetup which was about introducing features of Scala to Java developers. The speaker was Dr. Venkat Subramaniam and his talk was very informative, easy to follow, and engaging. He showed examples of how to do common tasks in Java and then showed how it is better with Scala.

It was recorded so I'll link to the video when it gets posted. Some of my notes:

Scala is hybrid functional - you have the option of functional programming or Java like imperative programming. Functional programming is beautiful as we want to focus on telling the computer what to do instead of exactly how to do it.

Scala is more statically typed than Java - the Scala compiler does type inference so you don't need to be explicit about specifying the type (although there are circumstances when the compiler cannot infer the type). This inference is all compile time.

The concept of ceremony - ceremony is basically all the extra code/things you need to write in order to do what you want. This includes writing getters and setters and boiler plate code. Java has high ceremony -- imagine trying to explain "hello world" in Java to a newbie. Explaining each of the parts of "public static void main(String args[]) ) would take quite a while. Scala automatically creates classes, main methods, getters/setters so you can write less code. Whenever you need to use the IDE to generate code, it is usually a language smell.

From the Java book "Effective Java", everything should be explicitly declared final if the reference is never going to change. It is very easy to find all the places where you put final but very hard to find all the places where you forgot to put final. This is addressed in Scala with the val keyword so that one has to think about immutability when declaring. Also, method parameters are always immutable in Scala. Because Effective Java was such a huge part in improving the quality of Java code, there are many examples of Scala taking concepts from that book and implementing them directly as part of the language.

Scala IDE/Eclipse has a cool REPL called Scala worksheet that helps quickly evaluate and test code.

Sunday, May 26, 2013

Piping to diff

The usually use of the linux diff program is to diff two files:
diff file1.txt file2.txt

but what if I wanted to diff one file with output generated from a different program instead of a file. Well I could send the the output of the program to a file and then diff the two files but there is a much more efficient way.

./program  | diff file1.txt -

This will diff the output of program with file1.txt

Now what if I don't have any files but just want to diff outputs of two files?

The solution is redirection:
diff <(./command1) <(./command2)

much nicer than creating intermediate files

Friday, May 24, 2013

Design Pattern: Facade

 Let's say that I want to make a pb&j sandwich. I have bread, peanut butter and jelly. Now I can take two slices of bread, put peanut butter on one slice and jelly on the other. Then I can put the two slices together. Great.

Now let's look at how we might do this in code. Here we have some PBJ sandwich related classes.
 public class PeanutButter implements Spreadable{  
 public class Jelly implements Spreadable{  
 public class Bread{  
 public class Knife {...}  
 public class Jar{ ...}  

So a client could make a sandwich like so:
 public static void main(String[] args) {  
     Bread slice1 = new Bread();  
     Bread slice2 = new Bread();  
     PeanutButter pb = new PeanutButter();  
     Jelly jelly = new Jelly();  

But thats a little too detailed for me. I don't care about all the complexities. As a client, all I know is that I am hungry and want a sandwich. So let's provide a facade:
 public class PBJSandwich(){  
   public void make();  
 public static void main(String[] args) {  
    PBJSandwich sandwich = new PBJSandwich();  
The make method is a facade because it hides underlying complexities and provides a simplified interface for the client.

Thursday, May 23, 2013

HTTP Preflighting

I was aware of the same origin policy, but am new to the concept of preflighting.  Basically when making AJAX requests to cross-domains (meaning domains that are different than the one from where the JavaScript was served), the browser does a preflight check first to make sure that it is okay to make the original request.

What this means is that first it does an HTTP OPTIONS  request. Along with the regular headers it will include these special ones referring to the original request

Access-Control-Request-Method: POST
Access-Control-Request-Headers: X-PINGOTHER

The server can then look at these headers and response if the request is allowed:

Access-Control-Allow-Origin: http://foo.example
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Allow-Headers: X-PINGOTHER
Access-Control-Max-Age: 1728000
 This response says to allow the POST with the header X-PINGOTHER only from the http://foo.example origin.

What I noticed is that if any extra non-standard headers are sent in the original/preflighted request that are now allowed by the server, the POST will not be sent.
More detailed information can be found here:

Friday, May 3, 2013

What is Data Science?

I'm taking the Introduction to Data Science course on coursera so I'll post a few tidbits on some things I learn over the next many weeks.

First topic, what is Data Science?

Well the term is quite fuzzy so it might depend who you ask but here's Drew Conway's Data Science Diagram -  a commonly referred to on when describing data science.
Since alot of data is electronic now a days you need to be able to somewhat speak the language. You do not need to be a CS major or programmer, but more specific skills of working with data are important from using the command like to put a text file in the right format to programming in R.

The substantive expertise part of it means being able to explore, discover, create hypothesis and tests. Basically, ask and find the right questions and answers.

Conway points out the danger zone because this is the part where people "know enough to be dangerous". Without grounded statistics, one might misinterpret data (when doing data science).  Thinking about it I think the danger zone might be called "Computer Science"

The difference between data science and business intelligence is that in business intelligence, a data warehouse is often created to do specific analysis and answer particular questions which takes a lot  of effort up front to build. This is usually  more specific than data science and BI is not as adaptable when requirements change. In short, BI is about building a particular tool to answer particular questions where data science is more general. Also noted that alot of times the BI engineers do not consume or do analysis on the system they build.