Explaining MapReduce Through Basic Java Codes (With SqliteDb)

Add sqlite library

mkdir -p lib
cd lib
curl -L -O https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.36.0.3/sqlite-jdbc-3.36.0.3.jar

Create Java file:

touch src/main/java

Add codes into the file:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;

public class WordCountSQLite {
    private static final String DB_URL = "jdbc:sqlite:wordcount.db";

    public static void main(String[] args) {
        // Check if the correct number of arguments is provided
        if (args.length < 1) {
            System.err.println("Usage: java WordCountSQLite <input_file>");
            System.exit(1);
        }

        String inputFilePath = args[0];  // Input file path

        // Step 1: Create SQLite database and table
        try (Connection conn = DriverManager.getConnection(DB_URL)) {
            String createTableSQL = "CREATE TABLE IF NOT EXISTS words (word TEXT PRIMARY KEY, count INTEGER)";
            conn.createStatement().execute(createTableSQL);
        } catch (SQLException e) {
            e.printStackTrace();
        }

        // Step 2: Map phase
        Map<String, Integer> wordCountMap = new HashMap<>();

        // Read the input file and count words
        try (BufferedReader br = new BufferedReader(new FileReader(inputFilePath))) {
            String line;
            while ((line = br.readLine()) != null) {
                String word = line.trim();
                if (!word.isEmpty()) {
                    wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) + 1);
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

        // Step 3: Reduce phase - store results in SQLite database
        try (Connection conn = DriverManager.getConnection(DB_URL)) {
            String insertSQL = "INSERT INTO words (word, count) VALUES (?, ?) ON CONFLICT(word) DO UPDATE SET count = count + excluded.count";
            PreparedStatement pstmt = conn.prepareStatement(insertSQL);

            for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
                pstmt.setString(1, entry.getKey());
                pstmt.setInt(2, entry.getValue());
                pstmt.executeUpdate();
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }

        // Step 4: Output results from the database
        try (Connection conn = DriverManager.getConnection(DB_URL);
             ResultSet rs = conn.createStatement().executeQuery("SELECT word, count FROM words")) {
            while (rs.next()) {
                System.out.println(rs.getString("word") + ": " + rs.getInt("count"));
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

Compile:

javac -cp "lib/sqlite-jdbc-3.47.0.0.jar" -d build src/main/java/WordCountSQLite.java

Run:

java -cp "lib/sqlite-jdbc-3.36.0.3.jar:build" WordCountSQLite input/input.csv
  • java: This is the Java Runtime Environment (JRE) command used to run Java applications.

  • -cp: This stands for "classpath." It is an option that allows you to specify where to find user-defined classes and packages in Java. The classpath can include directories, JAR files, and ZIP files.

  • "lib/sqlite-jdbc-3.36.0.3.jar:build": This is the classpath specified for the application. It includes two components:

    • lib/sqlite-jdbc-3.36.0.3.jar: This is the path to the SQLite JDBC driver JAR file. This driver is necessary for your Java application to connect to an SQLite database.

    • build: This is the directory where the compiled class files (like WordCountSQLite.class) are located. By including this in the classpath, you enable the Java runtime to find the compiled classes needed to run your program.

    • Note: The separator between the two paths (:) is used in Unix-like operating systems (Linux, macOS). On Windows, you would use a semicolon (;) instead.

  • WordCountSQLite: This is the name of the main class that contains the main method to be executed. The class name is specified without the .class extension. If WordCountSQLite is in a package, you would need to use the fully qualified name (e.g., com.example.WordCountSQLite).

  • input/input.csv: This is an argument passed to the main method of the WordCountSQLite class. In this case, it represents the path to the input CSV file that your program will read and process. The program expects this argument to be provided when it is run.

Outcome: