Parsing qliksense healthcheck api results

Abstract When you do a stress test/troubleshooting of a qliksense node it is useful to collect the responses of the healthcheck api and extract some useful info from them (which and how many applications were loaded in memory, …) Collecting data I usually use the command line tool qsense for querying the Qliksense repository while [ 1 ] do qsense healthcheck qlikhost1.redaelli.org ~/certificates/qlik/client.pem >> healthcheck.jl sleep 60 done Each line of the file healthcheck....

September 15, 2021 · 3 min · 445 words · Matteo Redaelli

Analyzing huge sensor data in near realtime with Apache Spark Streaming

For this demo I downloaded and installed Apache Spark 1.5.1 Suppose you have a stream of data from several (industrial) machines likeMACHINE,TIMESTAMP,SIGNAL1,SIGNAL2,SIGNAL3,... 1,2015-01-01 11:00:01,1.0,1.1,1.2,1.3,.. 2,2015-01-01 11:00:01,2.2,2.1,2.6,2.8,. 3,2015-01-01 11:00:01,1.1,1.2,1.3,1.3,. 1,2015-01-01 11:00:02,1.0,1.1,1.2,1.4,. 1,2015-01-01 11:00:02,1.3,1.2,3.2,3.3,.. ...Below a system, written in Python, that reads data from a stream (use the command “nc -lk 9999” to send data to the stream) and every 10 seconds collects alerts from signals: at least 4 suspicious values of a specific signal of the same machine``` from pyspark import SparkContext from pyspark....

November 25, 2015 · 2 min · 279 words · Matteo Redaelli

TwitterPopularTags.scala example of Apache Spark Streaming in a standalone project

This is an easy tutorial of using Apache Spark Streaming with Scala language using the official TwitterPopularTags.scala example and putting it in a standalone sbt project. In few minutes you will be able to receive streams of tweets and manipulating then in realtime with Apache Spark Streaming Install Apache Spark (I used 1.5.1) Install sbt git clone https://github.com/matteoredaelli/TwitterPopularTags cd TwitterPopularTags cp twitter4j.properties.sample twitter4j.properties edit twitter4j.properties sbt package spark-submit –master local –packages “org....

October 22, 2015 · 1 min · 74 words · Matteo Redaelli

Apache Spark news from a Spark Summit 2015

GOAL: unified engine across data sources, workloads and environments. Highlights: dataframes (1.3), SparkR (1.4), … See all video and slides at http://spark-summit.org

April 21, 2015 · 1 min · 22 words · Matteo Redaelli

A case study of adopting Bigdata technologies in your company

Bigdata projects can be very expensive and can easily fail: I suggest to start with a small, useful but not critical project. Better if it is about unstructured data collection and batch processing. In this case you have time to get practise with the new technologies and the Apache Hadoop system can have not critical downtimes. At home I have the following system running on a small Raspberry PI: for sure it is not fast ;-) At work I introduced Hadoop just few months ago for collecting web data and generating daily reports....

March 13, 2015 · 1 min · 93 words · Matteo Redaelli