Tuesday, October 24, 2017

Good Read


Interviewed at five top companies in Silicon Valley in five days, and luckily got five job offers

https://medium.com/@XiaohanZeng/i-interviewed-at-five-top-companies-in-silicon-valley-in-five-days-and-luckily-got-five-job-offers-25178cf74e0f?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3BVbhZxjHcQeOotBcbv7c3XA%3D%3D

Thursday, October 19, 2017

Core Java BrushUp : Generics

Always keep two things in mind
  1. Compile Type Safety
    • ArrayList stockList = new ArrayList();
    •  stockList.add(“coins”); //compiler error , String not allowed
  2. Achieved by Type Erasure
What is Type Erasure
Below code:

List<String> list = new ArrayList<String>();
list.add("Hi");
String x = list.get(0);

is compiled into

List list = new ArrayList();
list.add("Hi");
String x = (String) list.get(0);


Also Note
  • it uses Autoboxing
  • cannot be applied to primitive type
  • cannot mix old and new types as it will not guarantee type safety
  • limit Types parameters, by using
Using old and new together
 

ArrayList<String>  myList2 = new ArrayList<String>(); // valid
// BELOW is mixing of OLD + NEW Generics in such code there is 
// no guarantee of type safety
ArrayList myList3 = new ArrayList<String>(); // valid
ArrayList<String> myList4 = new ArrayList(); // valid

What’s difference between E , T or ?
  • type parameter (E or T).
  • ? is used as a wildcard which is used when providing a type argument, e.g. List foo = ... means that foo refers to a list of some type, but we don't know what.


void point_1( ){
 Set<Object> SetOfObject = new HashSet<String>(); //compiler error - incompatible type
 Holder<int> numbers = new Holder<int>(10); //compiler error - unexpected type required: reference found:int
}

 void point_2( ){
  List<Parent> parent_List = new ArrayList<Parent>(3);
  List<Child> child_List = new ArrayList<Child>(3);
     
  parent_List.add(new Parent("father 1"));
  parent_List.add(new Parent("father 2"));
  parent_List.add(new Child("child 1")); //upcasting ok
     
  child_List.add(new Child("lil 1"));
  child_List.add(new Child("lil 2"));
  child_List.add((Child)new Parent("Father of 1"));//HAVE to CAST as usual  
     
     
  // List<Parent> myList = new ArrayList<Child>();
  // ** ERROR ** .. incompatible types ...
  // polymorphism applies here ONLY to List & ArrayList ... ie ... base_types .. not generic_types
  Parent[] myArray = new Child[3]; // but this works..
  // WHY .. it works for arrays[] and not for generics .. coz compiler & JVM behave differently for generic collec & arrays[]
   point_8(child_List);
 }



Sunday, July 30, 2017

Sqoop2 Server

Sqoop CLI limitations
  • User running Scoop CLI , requires connectors installed in the same machine
  • User need to know credentials ( admin may not want to share with all )
  • Scoop jobs are stored locally , and cannot be shared with other team members
  • People can unknowingly create load on RDBMS
Hence a new version is  under development ... called sqoop2 server.
  • sqoop2-sheel only submits jobs to sqoop2 server
  • sqoop2 server does all the work
  • communication to sqoop2 is via json + rest
  • client doesnt need to access hadoop or rdbms ( only server does )


Sqoop : Export


Example

scoop export --connect jdbc:mysql://localhost/db --username root 
--table employee --export-dir /emp/emp_data  

Note
  • Table must already exist in db
  • Input files are read and parsed as per user provided inputs
  • Default is to generate insert statement , can use update  mode to generate update statements.
  • If sqoop attempt to insert violates PK constraint , the export will fail ?? whole or partial ??
  • Inserts are performed by multiple sqoop threads , each thread uses a seperate connection & transaction
  • Every 100 statement sqoop commits transaction ( so if 1 insert fail in 1 thread ... rest of them will commit ??)
  • So export is not an atomic operation , some commits will be visible before others.
Example

scoop export --connect jdbc:mysql://localhost/DBNAME -username root 
-password root --export-dir /input/abc --table test 
--fields-terminated-by "," --columns "id,name,age"  

can use -update-key as well

scoop export--create --connect jdbc:mysql://localhost/DBNAME -username root
-password root --export-dir /input/abc --table test --fields-terminated-by "," 
--columns "id,name,age" --update-key id


Sqoop : Incremental Imports

Incremental import is a technique that imports only the newly added rows in a table. It is required to add ‘incremental’, ‘check-column’, and ‘last-value’ options to perform the incremental import.

Following Syntax
--incremental
--check-column <column name> : tells which column will be used to check that its incremental value.
--last value <last check column value&gt : tells maximum value of the check parameter.

If you run the sqoop import in 'last modified' mode then there is an assumption that there is last

Example

Consider a table with 3 records which you already imported to hdfs using sqoop

+------+------------+----------+------+------------+
| sid  | city       | state    | rank | rDate      |
+------+------------+----------+------+------------+
|  101 | Chicago    | Illinois |    1 | 2014-01-25 |
|  101 | Schaumburg | Illinois |    3 | 2014-01-25 |
|  101 | Columbus   | Ohio     |    7 | 2014-01-25 |
+------+------------+----------+------+------------+

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P

Now you have additional records in the table but no updates on existing records
 
+------+------------+----------+------+------------+
| sid  | city       | state    | rank | rDate      |
+------+------------+----------+------+------------+
|  101 | Chicago    | Illinois |    1 | 2014-01-25 |
|  101 | Schaumburg | Illinois |    3 | 2014-01-25 |
|  101 | Columbus   | Ohio     |    7 | 2014-01-25 |
|  103 | Charlotte  | NC       |    9 | 2013-04-22 |
|  103 | Greenville | SC       |    9 | 2013-05-12 |
|  103 | Atlanta    | GA       |   11 | 2013-08-21 |
+------+------------+----------+------+------------+
Here you should use an --incremental append with --check-column which specifies the column to be examined when determining which rows to import.
 
sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root
 -P --check-column rank --incremental append --last-value 7

The above code will insert all the new rows based on the last value. Now we can think of second case where there are updates in rows

+------+------------+----------+------+------------+
| sid  | city       | state    | rank | rDate      |
+------+------------+----------+------+------------+
|  101 | Chicago    | Illinois |    1 | 2015-01-01 |
|  101 | Schaumburg | Illinois |    3 | 2014-01-25 |
|  101 | Columbus   | Ohio     |    7 | 2014-01-25 |
|  103 | Charlotte  | NC       |    9 | 2013-04-22 |
|  103 | Greenville | SC       |    9 | 2013-05-12 |
|  103 | Atlanta    | GA       |   11 | 2013-08-21 |
|  104 | Dallas     | Texas    |    4 | 2015-02-02 |
|  105 | Phoenix    | Arzona   |   17 | 2015-02-24 |
+------+------------+----------+------+------------+

Here we use incremental lastmodified where we will fetch all the updated rows based on date.
 
sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P 
--check-column rDate --incremental lastmodified --last-value 2014-01-25 --target-dir yloc/loc
 
Nice Example
https://shalishvj.wordpress.com/2014/08/12/sqoop-incremental-imports-using-last-modified-mode/
 
Example # 2


sqoop import --connect jdbc:mysql://localhost/retail_db --username retail_dba
--password cloudera --table products --target-dir /lav/sqoop/retail_db1 
--check-column product_id --incremental lastmodified --last-value 10

Tried running in lastmodified mode with column which is not time stamp.
Got a runtime exception


17/07/30 04:10:48 ERROR manager.SqlManager: Column type is neither timestamp nor date!
17/07/30 04:10:48 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Column type is neither timestamp nor date!
java.lang.RuntimeException: Column type is neither timestamp nor date!
 at org.apache.sqoop.manager.ConnManager.datetimeToQueryString(ConnManager.java:787)
 at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:332)
 at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:498)
 at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
 at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
 at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
 at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
 at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Scoop Jobs
Can use 'scoop job --create' to create jobs for incremental imports , so that we can run it in future by just specifying new values.

scoop job --create job_name_1  

running this command will not actually import / export any data , but will just create a job.
To see a list of scoop jobs
scoop job --list 

can see additional details
scoop job --show 

can execute
scoop job --exec 

Scoop remembers the last highest values , and automatically uses values ahead of it.
it stores the last highest value in meta data.


Where does it store last incremented value ?


Note its on local file system , and not on hdfs.