{"activeVersionTag":"latest","latestAvailableVersionTag":"latest","collection":{"info":{"_postman_id":"08c0bb83-20cc-41ec-81be-2a7e2304f0e8","name":"NGSI-LD Real-time Processing (Spark)","description":"This tutorial is an introduction to the [FIWARE Cosmos Orion Spark Connector](http://fiware-cosmos-spark.rtfd.io), which\nenables easier Big Data analysis over context, integrated with one of the most popular BigData platforms:\n[Apache Spark](https://spark.apache.org/). Apache Spark is a framework and distributed processing engine for stateful\ncomputations over unbounded and bounded data streams. Spark has been designed to run in all common cluster environments,\nperform computations at in-memory speed and at any scale.\n\nThe `docker-compose` file for this tutorial can be found on GitHub: \n\n![GitHub](https://fiware.github.io/tutorials.Big-Data-Flink/icon/GitHub-Mark-32px.png) [FIWARE-LD 305: Big Data Analysis (Flink) ](https://github.com/FIWARE/tutorials.Big-Data-Flink)\n\n# Real-time Processing and Big Data Analysis\n\n> \"You have to find what sparks a light in you so that you in your own way can illuminate the world.\"\n>\n> — Oprah Winfrey\n\nSmart solutions based on FIWARE are architecturally designed around microservices. They are therefore are designed to\nscale-up from simple applications (such as the Supermarket tutorial) through to city-wide installations base on a large\narray of IoT sensors and other context data providers.\n\nThe massive amount of data involved eventually becomes too much for a single machine to analyse, process and store, and\ntherefore the work must be delegated to additional distributed services. These distributed systems form the basis of\nso-called **Big Data Analysis**. The distribution of tasks allows developers to be able to extract insights from huge\ndata sets which would be too complex to be dealt with using traditional methods. and uncover hidden patterns and\ncorrelations.\n\nAs we have seen, context data is core to any Smart Solution, and the Context Broker is able to monitor changes of state\nand raise [subscription events](https://github.com/Fiware/tutorials.Subscriptions) as the context changes. For smaller\ninstallations, each subscription event can be processed one-by-one by a single receiving endpoint, however as the system\ngrows, another technique will be required to avoid overwhelming the listener, potentially blocking resources and missing\nupdates.\n\n**Apache Spark** is an open-source distributed general-purpose cluster-computing framework. It provides an interface for\nprogramming entire clusters with implicit data parallelism and fault tolerance. The **Cosmos Spark** connector allows\ndevelopers write custom business logic to listen for context data subscription events and then process the flow of the\ncontext data. Spark is able to delegate these actions to other workers where they will be acted upon either in\nsequentially or in parallel as required. The data flow processing itself can be arbitrarily complex.\n\nObviously, in reality, our existing Supermarket scenario is far too small to require the use of a Big Data solution, but\nwill serve as a basis for demonstrating the type of real-time processing which may be required in a larger solution\nwhich is processing a continuous stream of context-data events.\n\n# Architecture\n\nThis application builds on the components and dummy IoT devices created in\n[previous tutorials](https://github.com/FIWARE/tutorials.IoT-Agent/). It will make use of three FIWARE components - the\n[Orion Context Broker](https://fiware-orion.readthedocs.io/en/latest/), the\n[IoT Agent for Ultralight 2.0](https://fiware-iotagent-ul.readthedocs.io/en/latest/), and the\n[Cosmos Orion Spark Connector](https://fiware-cosmos-spark.readthedocs.io/en/latest/) for connecting Orion to an\n[Apache Spark cluster](https://spark.apache.org/docs/latest/cluster-overview.html). The Spark cluster itself will\nconsist of a single **Cluster Manager** _master_ to coordinate execution and some **Worker Nodes** _worker_ to execute\nthe tasks.\n\nBoth the Orion Context Broker and the IoT Agent rely on open source [MongoDB](https://www.mongodb.com/) technology to\nkeep persistence of the information they hold. We will also be using the dummy IoT devices created in the\n[previous tutorial](https://github.com/FIWARE/tutorials.IoT-Agent/).\n\nTherefore the overall architecture will consist of the following elements:\n\n-   Two **FIWARE Generic Enablers** as independent microservices:\n    -   The [Orion Context Broker](https://fiware-orion.readthedocs.io/en/latest/) which will receive requests using\n        [NGSI-LD](https://forge.etsi.org/swagger/ui/?url=https://forge.etsi.org/gitlab/NGSI-LD/NGSI-LD/raw/master/spec/updated/full_api.json)\n    -   The FIWARE [IoT Agent for UltraLight 2.0](https://fiware-iotagent-ul.readthedocs.io/en/latest/) which will\n        receive southbound requests using\n        [NGSI-LD](https://forge.etsi.org/swagger/ui/?url=https://forge.etsi.org/gitlab/NGSI-LD/NGSI-LD/raw/master/spec/updated/full_api.json)\n        and convert them to\n        [UltraLight 2.0](https://fiware-iotagent-ul.readthedocs.io/en/latest/usermanual/index.html#user-programmers-manual)\n        commands for the devices\n-   An [Apache Spark cluster](https://spark.apache.org/docs/latest/cluster-overview.html) consisting of a single\n    **ClusterManager** and **Worker Nodes**\n    -   The FIWARE [Cosmos Orion Spark Connector](https://fiware-cosmos-spark.readthedocs.io/en/latest/) will be\n        deployed as part of the dataflow which will subscribe to context changes and make operations on them in\n        real-time\n-   One [MongoDB](https://www.mongodb.com/) **database** :\n    -   Used by the **Orion Context Broker** to hold context data information such as data entities, subscriptions and\n        registrations\n    -   Used by the **IoT Agent** to hold device information such as device URLs and Keys\n-   The **Tutorial Application** does the following:\n    -   Offers static `@context` files defining the context entities within the system.\n    -   Acts as set of dummy [agricultural IoT devices](https://github.com/FIWARE/tutorials.IoT-Sensors/tree/NGSI-LD)\n        using the\n        [UltraLight 2.0](https://fiware-iotagent-ul.readthedocs.io/en/latest/usermanual/index.html#user-programmers-manual)\n\nThe overall architecture can be seen below:\n\n![](https://fiware.github.io/tutorials.Big-Data-Spark/img/Tutorial%20FIWARE%20Spark.png)\n\n## Spark Cluster Configuration\n\n```yaml\nspark-master:\n    image: bde2020/spark-master:2.4.5-hadoop2.7\n    container_name: spark-master\n    expose:\n        - \"8080\"\n        - \"9001\"\n    ports:\n        - \"8080:8080\"\n        - \"7077:7077\"\n        - \"9001:9001\"\n    environment:\n        - INIT_DAEMON_STEP=setup_spark\n        - \"constraint:node==spark-master\"\n```\n\n```yaml\nspark-worker-1:\n    image: bde2020/spark-worker:2.4.5-hadoop2.7\n    container_name: spark-worker-1\n    depends_on:\n        - spark-master\n    ports:\n        - \"8081:8081\"\n    environment:\n        - \"SPARK_MASTER=spark://spark-master:7077\"\n        - \"constraint:node==spark-master\"\n```\n\nThe `spark-master` container is listening on three ports:\n\n-   Port `8080` is exposed so we can see the web frontend of the Apache Spark-Master Dashboard.\n-   Port `7070` is used for internal communications.\n\nThe `spark-worker-1` container is listening on one port:\n\n-   Port `9001` is exposed so that the installation can receive context data subscriptions.\n-   Ports `8081` is exposed so we can see the web frontend of the Apache Spark-Worker-1 Dashboard.\n\n# Start Up\n\nBefore you start, you should ensure that you have obtained or built the necessary Docker images locally. Please clone\nthe repository and create the necessary images by running the commands shown below. Note that you might need to run some\nof the commands as a privileged user:\n\n```console\ngit clone https://github.com/FIWARE/tutorials.Big-Data-Flink.git\ncd tutorials.Big-Data-Flink\ngit checkout NGSI-LD\n./services create\n```\n\nThis command will also import seed data from the previous tutorials and provision the dummy IoT sensors on startup.\n\nTo start the system, run the following command:\n\n```console\n./services start\n```\n\n> :information_source: **Note:** If you want to clean up and start over again you can do so with the following command:\n>\n> ```console\n> ./services stop\n> ```\n\n# Real-time Processing Operations\n\nAccording to the [Apache Spark documentation](https://spark.apache.org/documentation.html), Spark Streaming is an\nextension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data\nstreams. Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using\ncomplex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be\npushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph\nprocessing algorithms on data streams.\n\n![](https://spark.apache.org/docs/latest/img/streaming-arch.png)\n\nInternally, it works as follows. Spark Streaming receives live input data streams and divides the data into batches,\nwhich are then processed by the Spark engine to generate the final stream of results in batches.\n\n![](https://spark.apache.org/docs/latest/img/streaming-flow.png)\n\nThis means that to create a streaming data flow we must supply the following:\n\n-   A mechanism for reading Context data as a **Source Operator**\n-   Business logic to define the transform operations\n-   A mechanism for pushing Context data back to the context broker as a **Sink Operator**\n\nThe **Cosmos Spark** connector - `orion.spark.connector-1.2.2.jar` offers both **Source** and **Sink** operators. It\ntherefore only remains to write the necessary Scala code to connect the streaming dataflow pipeline operations together.\nThe processing code can be complied into a JAR file which can be uploaded to the spark cluster. Two examples will be\ndetailed below, all the source code for this tutorial can be found within the\n[cosmos-examples](https://github.com/ging/fiware-cosmos-orion-spark-connector-tutorial/tree/master/cosmos-examples)\ndirectory.\n\nFurther Spark processing examples can be found on\n[Spark Connector Examples](https://fiware-cosmos-spark-examples.readthedocs.io/).\n\n### Compiling a JAR file for Spark\n\nAn existing `pom.xml` file has been created which holds the necessary prerequisites to build the examples JAR file\n\nIn order to use the Orion Spark Connector we first need to manually install the connector JAR as an artifact using\nMaven:\n\n```console\ncd cosmos-examples\ncurl -LO https://github.com/ging/fiware-cosmos-orion-spark-connector/releases/download/FIWARE_7.9.1/orion.spark.connector-1.2.2.jar\nmvn install:install-file \\\n  -Dfile=./orion.spark.connector-1.2.2.jar \\\n  -DgroupId=org.fiware.cosmos \\\n  -DartifactId=orion.spark.connector \\\n  -Dversion=1.2.2 \\\n  -Dpackaging=jar\n```\n\nThereafter the source code can be compiled by running the `mvn package` command within the same directory\n(`cosmos-examples`):\n\n```console\nmvn package\n```\n\nA new JAR file called `cosmos-examples-1.2.2.jar` will be created within the `cosmos-examples/target` directory.\n\n### Generating a stream of Context Data\n\nFor the purpose of this tutorial, we must be monitoring a system in which the context is periodically being updated. The\ndummy IoT Sensors can be used to do this. Open the device monitor page at `http://localhost:3000/device/monitor` and\nstart a tractor moving. This can be done by selecting an appropriate the command from\nthe drop down list and pressing the `send` button. The stream of measurements coming from the devices can then be seen\non the same page:\n\n![](https://fiware.github.io/tutorials.Big-Data-Spark/img/farm-devices.gif)\n\n## Logger - Reading Context Data Streams\n\nThe first example makes use of the `OrionReceiver` operator in order to receive notifications from the Orion Context\nBroker. Specifically, the example counts the number notifications that each type of device sends in one minute. You can\nfind the source code of the example in\n[org/fiware/cosmos/tutorial/Logger.scala](https://github.com/ging/fiware-cosmos-orion-spark-connector-tutorial/blob/master/cosmos-examples/src/main/scala/org/fiware/cosmos/tutorial/Logger.scala)\n\n### Logger - Installing the JAR\n\nRestart the containers if necessary, then access the worker container:\n\n```console\ndocker exec -it spark-worker-1 bin/bash\n```\n\nAnd run the following command to run the generated JAR package in the Spark cluster:\n\n```console\n/spark/bin/spark-submit \\\n--class  org.fiware.cosmos.tutorial.LoggerLD \\\n--master  spark://spark-master:7077 \\\n--deploy-mode client /home/cosmos-examples/target/cosmos-examples-1.2.2.jar \\\n--conf \"spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console\"\n```","schema":"https://schema.getpostman.com/json/collection/v2.0.0/collection.json","isPublicCollection":false,"owner":"513743","team":157450,"collectionId":"08c0bb83-20cc-41ec-81be-2a7e2304f0e8","publishedId":"TWDUqJNb","public":true,"publicUrl":"https://documenter-api.postman.tech/view/513743/TWDUqJNb","privateUrl":"https://go.postman.co/documentation/513743-08c0bb83-20cc-41ec-81be-2a7e2304f0e8","customColor":{"top-bar":"FFFFFF","right-sidebar":"303030","highlight":"233c68"},"documentationLayout":"classic-double-column","customisation":null,"version":"8.10.0","publishDate":"2021-02-17T12:37:57.000Z","activeVersionTag":"latest","documentationTheme":"light","metaTags":{},"logos":{}},"statusCode":200},"environments":[],"user":{"authenticated":false,"permissions":{"publish":false}},"run":{"button":{"js":"https://run.pstmn.io/button.js","css":"https://run.pstmn.io/button.css"}},"web":"https://www.getpostman.com/","team":{"logo":"https://res.cloudinary.com/postman/image/upload/t_team_logo_pubdoc/v1/team/d7085d490b9144732c65203aa6e3b68b31884d1c33a86b8a00d15da75147ae33","favicon":""},"isEnvFetchError":false,"languages":"[{\"key\":\"csharp\",\"label\":\"C#\",\"variant\":\"HttpClient\"},{\"key\":\"csharp\",\"label\":\"C#\",\"variant\":\"RestSharp\"},{\"key\":\"curl\",\"label\":\"cURL\",\"variant\":\"cURL\"},{\"key\":\"dart\",\"label\":\"Dart\",\"variant\":\"http\"},{\"key\":\"go\",\"label\":\"Go\",\"variant\":\"Native\"},{\"key\":\"http\",\"label\":\"HTTP\",\"variant\":\"HTTP\"},{\"key\":\"java\",\"label\":\"Java\",\"variant\":\"OkHttp\"},{\"key\":\"java\",\"label\":\"Java\",\"variant\":\"Unirest\"},{\"key\":\"javascript\",\"label\":\"JavaScript\",\"variant\":\"Fetch\"},{\"key\":\"javascript\",\"label\":\"JavaScript\",\"variant\":\"jQuery\"},{\"key\":\"javascript\",\"label\":\"JavaScript\",\"variant\":\"XHR\"},{\"key\":\"c\",\"label\":\"C\",\"variant\":\"libcurl\"},{\"key\":\"nodejs\",\"label\":\"NodeJs\",\"variant\":\"Axios\"},{\"key\":\"nodejs\",\"label\":\"NodeJs\",\"variant\":\"Native\"},{\"key\":\"nodejs\",\"label\":\"NodeJs\",\"variant\":\"Request\"},{\"key\":\"nodejs\",\"label\":\"NodeJs\",\"variant\":\"Unirest\"},{\"key\":\"objective-c\",\"label\":\"Objective-C\",\"variant\":\"NSURLSession\"},{\"key\":\"ocaml\",\"label\":\"OCaml\",\"variant\":\"Cohttp\"},{\"key\":\"php\",\"label\":\"PHP\",\"variant\":\"cURL\"},{\"key\":\"php\",\"label\":\"PHP\",\"variant\":\"Guzzle\"},{\"key\":\"php\",\"label\":\"PHP\",\"variant\":\"HTTP_Request2\"},{\"key\":\"php\",\"label\":\"PHP\",\"variant\":\"pecl_http\"},{\"key\":\"powershell\",\"label\":\"PowerShell\",\"variant\":\"RestMethod\"},{\"key\":\"python\",\"label\":\"Python\",\"variant\":\"http.client\"},{\"key\":\"python\",\"label\":\"Python\",\"variant\":\"Requests\"},{\"key\":\"r\",\"label\":\"R\",\"variant\":\"httr\"},{\"key\":\"r\",\"label\":\"R\",\"variant\":\"RCurl\"},{\"key\":\"ruby\",\"label\":\"Ruby\",\"variant\":\"Net::HTTP\"},{\"key\":\"shell\",\"label\":\"Shell\",\"variant\":\"Httpie\"},{\"key\":\"shell\",\"label\":\"Shell\",\"variant\":\"wget\"},{\"key\":\"swift\",\"label\":\"Swift\",\"variant\":\"URLSession\"}]","languageSettings":[{"key":"csharp","label":"C#","variant":"HttpClient"},{"key":"csharp","label":"C#","variant":"RestSharp"},{"key":"curl","label":"cURL","variant":"cURL"},{"key":"dart","label":"Dart","variant":"http"},{"key":"go","label":"Go","variant":"Native"},{"key":"http","label":"HTTP","variant":"HTTP"},{"key":"java","label":"Java","variant":"OkHttp"},{"key":"java","label":"Java","variant":"Unirest"},{"key":"javascript","label":"JavaScript","variant":"Fetch"},{"key":"javascript","label":"JavaScript","variant":"jQuery"},{"key":"javascript","label":"JavaScript","variant":"XHR"},{"key":"c","label":"C","variant":"libcurl"},{"key":"nodejs","label":"NodeJs","variant":"Axios"},{"key":"nodejs","label":"NodeJs","variant":"Native"},{"key":"nodejs","label":"NodeJs","variant":"Request"},{"key":"nodejs","label":"NodeJs","variant":"Unirest"},{"key":"objective-c","label":"Objective-C","variant":"NSURLSession"},{"key":"ocaml","label":"OCaml","variant":"Cohttp"},{"key":"php","label":"PHP","variant":"cURL"},{"key":"php","label":"PHP","variant":"Guzzle"},{"key":"php","label":"PHP","variant":"HTTP_Request2"},{"key":"php","label":"PHP","variant":"pecl_http"},{"key":"powershell","label":"PowerShell","variant":"RestMethod"},{"key":"python","label":"Python","variant":"http.client"},{"key":"python","label":"Python","variant":"Requests"},{"key":"r","label":"R","variant":"httr"},{"key":"r","label":"R","variant":"RCurl"},{"key":"ruby","label":"Ruby","variant":"Net::HTTP"},{"key":"shell","label":"Shell","variant":"Httpie"},{"key":"shell","label":"Shell","variant":"wget"},{"key":"swift","label":"Swift","variant":"URLSession"}],"languageOptions":[{"label":"C# - HttpClient","value":"csharp - HttpClient - C#"},{"label":"C# - RestSharp","value":"csharp - RestSharp - C#"},{"label":"cURL - cURL","value":"curl - cURL - cURL"},{"label":"Dart - http","value":"dart - http - Dart"},{"label":"Go - Native","value":"go - Native - Go"},{"label":"HTTP - HTTP","value":"http - HTTP - HTTP"},{"label":"Java - OkHttp","value":"java - OkHttp - Java"},{"label":"Java - Unirest","value":"java - Unirest - Java"},{"label":"JavaScript - Fetch","value":"javascript - Fetch - JavaScript"},{"label":"JavaScript - jQuery","value":"javascript - jQuery - JavaScript"},{"label":"JavaScript - XHR","value":"javascript - XHR - JavaScript"},{"label":"C - libcurl","value":"c - libcurl - C"},{"label":"NodeJs - Axios","value":"nodejs - Axios - NodeJs"},{"label":"NodeJs - Native","value":"nodejs - Native - NodeJs"},{"label":"NodeJs - Request","value":"nodejs - Request - NodeJs"},{"label":"NodeJs - Unirest","value":"nodejs - Unirest - NodeJs"},{"label":"Objective-C - NSURLSession","value":"objective-c - NSURLSession - Objective-C"},{"label":"OCaml - Cohttp","value":"ocaml - Cohttp - OCaml"},{"label":"PHP - cURL","value":"php - cURL - PHP"},{"label":"PHP - Guzzle","value":"php - Guzzle - PHP"},{"label":"PHP - HTTP_Request2","value":"php - HTTP_Request2 - PHP"},{"label":"PHP - pecl_http","value":"php - pecl_http - PHP"},{"label":"PowerShell - RestMethod","value":"powershell - RestMethod - PowerShell"},{"label":"Python - http.client","value":"python - http.client - Python"},{"label":"Python - Requests","value":"python - Requests - Python"},{"label":"R - httr","value":"r - httr - R"},{"label":"R - RCurl","value":"r - RCurl - R"},{"label":"Ruby - Net::HTTP","value":"ruby - Net::HTTP - Ruby"},{"label":"Shell - Httpie","value":"shell - Httpie - Shell"},{"label":"Shell - wget","value":"shell - wget - Shell"},{"label":"Swift - URLSession","value":"swift - URLSession - Swift"}],"layoutOptions":[{"value":"classic-single-column","label":"Single Column"},{"value":"classic-double-column","label":"Double Column"}],"versionOptions":[],"environmentOptions":[{"value":"0","label":"No Environment"}],"canonicalUrl":"https://documenter.gw.postman.com/view/metadata/TWDUqJNb"}