Java腾讯面试题之记录单词出现的次数

104 阅读1分钟

Java中的HashMap是一种键值对映射的数据结构,其中键和值都可以是任意类型的对象。HashMap使用哈希表来实现,它允许快速插入和查找元素。

下面是一些HashMap的基本用法:

创建HashMap对象

    HashMap<String, Integer> hashMap = new HashMap<>();

这将创建一个键和值都是String和Integer类型的空HashMap对象。

添加键值对

    hashMap.put("apple", 1); 
    hashMap.put("banana", 2);  
    hashMap.put("orange", 3);

这将向HashMap中添加三个键值对。键值对的顺序不一定与添加的顺序相同。

访问键值对

    Integer value = hashMap.get("banana"); System.out.println(value); // 输出2

这将访问键"banana"对应的值,并输出它的值2。

删除键值对

    hashMap.remove("orange");

这将从HashMap中删除键"orange"对应的键值对。

遍历HashMap

for (String key : hashMap.keySet()) {
    Integer value = hashMap.get(key);
    System.out.println(key + " = " + value);
}

这将遍历HashMap中的所有键值对,并输出它们的键和值。HashMap的键集可以通过调用keySet() 方法获得。

面试题解析单词出现的次数

下面是一个简单的示例,展示如何使用HashMap来记录每个单词在文本中出现的次数:

import java.util.*;

class WordCount {
    public static void main(String[] args) {

        String article = "This is a sample article. It contains some words that will be counted.Test is ready?";
        
        HashMap<String, Integer> wordCount = countWords(article);

        System.out.println(wordCount);

    }

    public static HashMap<String, Integer> countWords(String article) {

        HashMap<String, Integer> wordCount = new HashMap<>();

        String[] words = article.toLowerCase().split("[^a-zA-Z]+");

        for(String word : words) {

            if(!word.isEmpty()) {

                if(wordCount.containsKey(word)) {

                    int count = wordCount.get(word);

                    wordCount.put(word, count + 1);

                } else {

                    wordCount.put(word, 1);
                }
            }
        }

        return wordCount;
    }
}

加点难度统计3亿字符的文章中单词出现的次数

使用外部排序和分块处理技术

import java.io.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.stream.Collectors;

public class WordCount {

    public static void main(String[] args) throws IOException, ExecutionException, InterruptedException {
        // 文件路径
        String filePath = "path/to/large/file.txt";
        
        // 定义块大小
        long blockSize = 10000000; // 10 MB
        
        // 定义线程池
        ExecutorService executor = Executors.newFixedThreadPool(4);

        // 将文件分块
        List<String> chunks = splitFile(filePath, blockSize);

        // 创建一个HashMap来保存单词及其出现次数
        Map<String, Integer> wordCount = new HashMap<>();

        // 在多线程环境下对每个块进行单词计数
        List<Future<Map<String, Integer>>> futures = new ArrayList<>();
        for (String chunk : chunks) {
            Future<Map<String, Integer>> future = executor.submit(() -> {
                Map<String, Integer> map = new HashMap<>();
                try (BufferedReader br = new BufferedReader(new FileReader(chunk))) {
                    String line;
                    while ((line = br.readLine()) != null) {
                        String[] words = line.split("\s+");
                        for (String word : words) {
                            if (!word.trim().isEmpty()) {
                                map.merge(word, 1, Integer::sum);
                            }
                        }
                    }
                }
                return map;
            });
            futures.add(future);
        }

        // 等待计数结果
        for (Future<Map<String, Integer>> future : futures) {
            Map<String, Integer> map = future.get();
            for (Map.Entry<String, Integer> entry : map.entrySet()) {
                wordCount.merge(entry.getKey(), entry.getValue(), Integer::sum);
            }
        }

        // 关闭线程池
        executor.shutdown();

        // 输出结果
        List<Map.Entry<String, Integer>> sorted = wordCount.entrySet().stream()
                .sorted(Collections.reverseOrder(Map.Entry.comparingByValue()))
                .collect(Collectors.toList());
        for (Map.Entry<String, Integer> entry : sorted) {
            System.out.println(entry.getKey() + ": " + entry.getValue());
        }
    }

    private static List<String> splitFile(String filePath, long blockSize) throws IOException {
        List<String> chunks = new ArrayList<>();
        try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
            String line;
            long size = 0;
            int count = 1;
            File file = new File(filePath);
            String baseName = file.getName();
            String basePath = file.getParent();
            BufferedWriter bw = new BufferedWriter(new FileWriter(new File(basePath, baseName + "." + count)));
            while ((line = br.readLine()) != null) {
                bw.write(line);
                bw.newLine();
                size += line.length();
                if (size >= blockSize) {
                    bw.flush();
                    bw.close();
                    chunks.add(basePath + File.separator + baseName + "." + count);
                    count++;
                    size = 0;
                    bw = new BufferedWriter(new FileWriter(new File(basePath, baseName + "." + count)));
                }
            }
            bw.flush();
            bw.close();
            chunks.add(basePath + File.separator + baseName + "." + count);
        }
        return chunks;
    }
}