USE CASE – Log Analysis of Web Application using Spark (Java)
Link to download Intellij Idea –
1. Create a Maven Project in Intellij Idea.
File 🡪 New 🡪Project
Select Maven and then next.
Fill the GroupId and ArtifactID as shown in the image. Click on next and finish.
2. Add spark-core_2.10 & spark-sql_2.10 as dependency in Pom.xml
<!– https://mvnrepository.com/artifact/org.apache.spark/spark-core –>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.1.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/org.apache.spark/spark-sql –>
<dependency> <!– Spark SQL –>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.1.0</version>
</dependency>
3. Create a Java Class under src 🡪 main 🡪 java and save it as LogAnalysis.
3.1 This application analyses the Apache Web Application Access log and provides information after analysis the logs. The Below points are covered in this analysis –
Access Log Analysisusing Spark SQL |
1.Load data from access.log and Print schema using Spark SQL |
2. Print the Content Average Size, Count of requests, Min & Max of Content Size |
3. Find out what types of Response Code received by users. |
4. Any IP Address that has accessed the server more than 10 times. |
5. Peak traffic load timings – top 10 timeframes for high traffic. This will be used to analyze the peak hours of traffic in Application |
6. Top Endpoints accessed by the users. |
4. Build the Project using Maven. Under Maven tab, first click on clean and then on install.
This will provide a jar of the project at the specified location.
5. Upload the Jar to the server using WinScp.
6. Transfer the log file (access.log) from local to server and copy the log file to HDFS location.
- hdfs dfs -copyFromLocal access.log /user/jai/log_analysis/
- hdfs dfs -ls /user/jai/log_analysis/
7. Finally, type the below Command to Submit the Spark Job to the Cluster
spark-submit \
–class LogAnalysis \
–master local[*] \
Log_Analysis-1.0-SNAPSHOT.jar \
/user/jai/log_analysis/access.log