My First Amazon Skill…

I know this is irrelevant in my Oracle BLOB but hey this is my blog so I can blog anything I want 🙂

Anyways, I got hooked on to this new gadget Amazon Echo over Thanks Giving and we are loving it.

So, I said why not I create my own Skill to play Telugu Radio Stations that I always listen to in my phone!!!

That led me to explore and finally got it working.

You can get my code from GitHub.

Please let me know your feedback and please remember that this is my first Skill and first upload to GitHub so be generous with comments please 🙂

Advertisements

Leave a comment

Convert xlsx to csv in Linux…

You can use Open Office for Linux to open an xlsx file and then save as csv. But this takes a bit of time to load the file in Open Office and then do convert.

Also, what if there are multiple files that you want to convert?

You need to have “ssconvert” which does convert the file in command line and does pretty good job.

Download the below two rpm’s and install them using rpm -Uvh:

http://rpm.pbone.net/index.php3/stat/4/idpl/28272017/dir/fedora_21/com/goffice-0.10.18-1.fc21.x86_64.rpm.html

http://rpm.pbone.net/index.php3/stat/4/idpl/28271945/dir/fedora_21/com/gnumeric-1.12.18-1.fc21.x86_64.rpm.html

gnumeric rpm has ssconvert. Once installed, you can simply use as:
[santhosh@localhost myspark]$ ssconvert ‘Member accounts 11042016.xls’ memberac.csv

Leave a comment

Spark Aggregate function with examples…

Aggregate function takes three parameters (arguments).
1st parameter is the seed value which is (0,1) in most cases
2nd parameter is the computation reduce function
3rd parameter is the combine reduce function

The aggregate function allows the user to apply two different reduce functions to the RDD at the same time.
The first reduce function is applied within each partition to reduce the data within each partition into a single result.
The second reduce function is used to combine the results from all the partitions from the first reduce function.

The ability to have two separate reduce functions for intra partition versus across partition reducing adds a lot of flexibility. For example the first reduce function can be the max function and the second one can be the sum function. The user also specifies an initial value. Here are some important facts.

  • The initial value (1st parameter seed value) is applied at both levels of reduce. So both at the intra partition reduction and across partition reduction.
  • Both reduce functions have to be commutative and associative.
  • Do not assume any execution order for either partition computations or combining partitions.

Lets start with simple examples. Note: changing the argument values to the combOp will gives you drastic results as the 2nd reduce function is combining the results from the seqOp (Sequential Operation) results from the individual partitions. 

From the above:
seqOp ==> is the first reduce function which occurs in across all the nodes
combOp ==> is the second reduce function which combines the result from seqOp

seqOp and combOp are being called with “aggregate” function with 2 parameters for the “collData” collection as collData.aggregate ( (0,0).
First argument 0 is being passed to seqOp lambda function with “x[0] + y”, and the second argument is passed to “x[1] + y”.
combOp is just doing the sum of the results from seqOp from all the nodes.

So, from the collData list, for every element (1,2,3,4,5) argument x[0] and x[1] is used to apply to the equation. In this example we are passing 0 for both arguments, it will just add 0 to each element and gives the results as 15 which is (0+1) + (0+2) + (0+3) + (0+4) + (0+5).

Lets change the aggregate arguments to 0, 1 and see the behavior.

As you can see the result is changed to 15,2 20. Why?
Because, we changed the second argument value to 1 and results in 20 like showed below:
x[0] + y will be iterated as:
(0 + 1) + (0 + 2) + (0 + 3) + (0 + 4) + (0 + 5)
= 1 + 2 + 3 + 4 + 5
= 15
x[1] + y will be iterated as:
(1 + 1) + (1 + 2) + (1 + 3) + (1 + 4) + (1 + 5)
=2 + 3 + 4 + 5 + 6
= 20

Lets changes the arguments to 2,3 and see the results:

Here is how the iterations occurred to get the results 25, 30
(2 + 1) + (2 + 2) + (2 + 3) + (2 + 4) + (2 + 5)
= 3 + 4 + 5 + 6 + 7
25
x[1] + y will be iterated as:
(3 + 1) + (3 + 2) + (3 + 3) + (3 + 4) + (3 + 5)
=4 + 5 + 6 + 7 + 8
30

Here is the another example with sum and multiplication of the elements:

From the above, notice that the second arguments to the lambda function is the multiplication (*) instead of summation (+).
What we are doing in the above example is, getting the sum of all the elements and multiplication of all the elements in the collection at the same time.
Here is how the iteration with the arguments passed to “aggregate” function:
seqOP lambda (x[0] + y, x[1] * y) is iterated thru all the elements in the collData with (0, 1)

——————– x[0] + y ——————-, ——————– x[0] * y ——————
( (0 + 1) + (0 + 2) + (0 + 3) + (0 + 4) + (0 + 5) , (1 * 1) * (1 * 2) * (1 * 3) * (1 * 4) * (1 * 5) )
= (( 1 + 2 + 3 + 4 + 5), (1 * 2 * 3 * 4 * 5))
= (15, 120)

Lets change the argument values to aggregate and see the result:

This time we passed arguments (2, 3) to aggregate.
Here is how the iteration with the arguments (2, 3):
seqOP lambda (x[0] + y, x[1] * y) is iterated thru all the elements in the collData with (0, 1)

——————– x[0] + y ——————-, ——————– x[0] * y ——————
( (2 + 1) + (2 + 2) + (2 + 3) + (2 + 4) + (2 + 5) , (3 * 1) * (3 * 2) * (3 * 3) * (3 * 4) * (3 * 5) )
= (( 3 + 4 + 5 + 6 + 7), (3 * 6 * 9 * 12 * 15))
= (25, 29160)

Leave a comment

CDH 5.8.2 Login Page error 500

You may get this error when there is any kind of interruption in cloudera server process. In my case, I get this every time I bring up my pc after either hibernate or sleep.

Just restart the cloudera server and wait for it come up and then try login again and you should be good:
[root@localhost conf]# service cloudera-scm-server restart
Restarting cloudera-scm-server (via systemctl):            [  OK  ]
[root@localhost conf]# 

HTTP ERROR 500

Problem accessing /cmf/login. Reason:

    Error creating bean with name 'newServiceHandlerRegistry' defined in class path resource [com/cloudera/server/cmf/config/components/BeanConfiguration.class]: Instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public com.cloudera.cmf.service.ServiceHandlerRegistry com.cloudera.server.cmf.config.components.BeanConfiguration.newServiceHandlerRegistry()] threw exception; nested exception is java.lang.IllegalStateException: BeanFactory not initialized or already closed - call 'refresh' before accessing beans via the ApplicationContext

Caused by:

org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'newServiceHandlerRegistry' defined in class path resource [com/cloudera/server/cmf/config/components/BeanConfiguration.class]: Instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public com.cloudera.cmf.service.ServiceHandlerRegistry com.cloudera.server.cmf.config.components.BeanConfiguration.newServiceHandlerRegistry()] threw exception; nested exception is java.lang.IllegalStateException: BeanFactory not initialized or already closed - call 'refresh' before accessing beans via the ApplicationContext
at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:581)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:983)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:879)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:485)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:293)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:290)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:192)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:848)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:790)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:707)
at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:795)
at org.springframework.beans.factory.support.ConstructorResolver.resolvePreparedArguments(ConstructorResolver.java:765)
at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:131)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowir

Leave a comment

How to disable "information" messages in pyspark (python spark) console?

You may notice bunch of messages poping up, as showed below, on the console when you initiate spark console using pyspark.
How do you disable these messages and only show the errors?

[santhosh@localhost Downloads]$ pyspark
Python 2.7.5 (default, Sep 14 2016, 08:35:31)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
16/10/06 11:25:20 INFO spark.SparkContext: Running Spark version 1.6.0
16/10/06 11:25:21 INFO spark.SecurityManager: Changing view acls to: santhosh
16/10/06 11:25:21 INFO spark.SecurityManager: Changing modify acls to: santhosh

.
.
.

16/10/06 11:25:30 INFO storage.BlockManagerMaster: Registered BlockManager
16/10/06 11:25:30 INFO scheduler.EventLoggingListener: Logging events to hdfs://localhost:8020/user/spark/applicationHistory/application_1475694807177_0010
16/10/06 11:25:31 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  ‘_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Python version 2.7.5 (default, Sep 14 2016 08:35:31)
SparkContext available as sc, HiveContext available as sqlContext.
>>>

Solution:
Edit the file /etc/spark/conf/log4j.properties
and change the value for logger from INFO to ERROR
root.logger=INFO,console –> root.logger=ERROR,console

Leave a comment

Installing Google Chrome on Oracle Linux…

These steps are for Oracle Linux 64 bit:

Create a file google.repo under /etc/yum.repos.d/ folder with the following content.

[google-chrome]
name=google-chrome – 64-bit
baseurl=http://dl.google.com/linux/chrome/rpm/stable/x86_64
enabled=1
gpgcheck=1
gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub

Save the file and execute the below command to install Chrome:

# yum install google-chrome-stable

Accept the prompts and once done you will see Chrome under Applications/Internet

Leave a comment

Disable SELinux

Edit the file /etc/selinux/config and change the value for SELINUX to disabled and reboot.

Verify with ‘getenforce’ command to see the state.

Leave a comment