JDK8之List转Map实现方法及解析

发表于 2022-04-23 更新于 2024-06-19 分类于编程， Java Waline：

阐述了JDK8关于List转Map的几种方式及优缺点

方式1（基本不用）

众所周知，在JDK8之后，List转Map的一般实现方式如下：

ArrayList<Student> list = new ArrayList<>(3);
Student student1 = new Student("张三",18);
Student student2 = new Student("李四",18);
Student student3 = new Student("王五",20);
list.add(student1);
list.add(student2);
list.add(student3);

Map<String, Integer> map = list.stream()
    .collect(Collectors.toMap(Student::getName, Student::getAge));

方式一存在的问题

Collectors.toMap()调用的方法如下：

public static <T, K, U> Collector<T, ?, Map<K,U>> toMap(
    								Function<? super T, ? extends K> keyMapper,
                                    Function<? super T, ? extends U> valueMapper) {
    return new CollectorImpl<>(HashMap::new,
                               uniqKeysMapAccumulator(keyMapper, valueMapper),
                               uniqKeysMapMerger(),
                               CH_ID);
}

在调用CollectorImpl的构造函数前，使用uniqKeysMapAccumulator(keyMapper, valueMapper)对数据进行了处理,处理代码如下：

private static <T, K, V>
BiConsumer<Map<K, V>, T> uniqKeysMapAccumulator(Function<? super T, ? extends K> keyMapper,
                                                Function<? super T, ? extends V> valueMapper) {
    return (map, element) -> {
        K k = keyMapper.apply(element);
        V v = Objects.requireNonNull(valueMapper.apply(element));
        V u = map.putIfAbsent(k, v);
        if (u != null) throw duplicateKeyException(k, u, v);
    };
}

Objects.requireNonNull：对值进行了判断，如果为null则抛出NullPointerException
通过map.putIfAbsent(k, v);及下面的判断，如果key已经存在，则抛出duplicateKeyException

方式2（有缺陷）

Collectors.toMap()还有一个重载的方法，如下所示：

public static <T, K, U> Collector<T, ?, Map<K,U>> toMap(
    								Function<? super T, ? extends K> keyMapper,
                                    Function<? super T, ? extends U> valueMapper,
                                    BinaryOperator<U> mergeFunction) {
    return toMap(keyMapper, valueMapper, mergeFunction, HashMap::new);
}

可以看到return这里调用了类中另外一个同名不同参的toMap方法，如下所示：

public static <T, K, U, M extends Map<K, U>> Collector<T, ?, M> toMap(
    						 Function<? super T, ? extends K> keyMapper,
                             Function<? super T, ? extends U> valueMapper,
                             BinaryOperator<U> mergeFunction,
                             Supplier<M> mapFactory) {
    BiConsumer<M, T> accumulator
        = (map, element) -> map.merge(keyMapper.apply(element),
                                      valueMapper.apply(element), mergeFunction);
    return new CollectorImpl<>(mapFactory, accumulator, mapMerger(mergeFunction), CH_ID);
}

可以看到，可以通过定义自己的mergeFunction也就是对应map.merge的第三个remappingFunction参数的来控制如何累加，优化后如下所示：

ArrayList<Student> list = new ArrayList<>(3);
Student student1 = new Student("张三",18);
Student student2 = new Student("李四",18);
Student student3 = new Student("王五",20);
list.add(student1);
list.add(student2);
list.add(student3);

Map<String, Integer> map = list.stream()
    .collect(Collectors.toMap(Student::getName, Student::getAge,(v1, v2)->v2));

这里，传入了lambda表达式(v1, v2)->v2)，当key出现重复时，移除原key对应的键值对，放入新key的键值对，map.merge具体代码实现如下：

方式2存在的问题

在自定义key重复处处理逻辑后，在调用Map的merge方法时，实际上调用了HashMap的merge方法,在其第一行就是：

1 2	if (value == null \|\| remappingFunction == null) throw new NullPointerException();

当value为null时，抛出NullPointerException异常

方式2问题一般的处理方式

对null情况进行提前判断，并赋给默认值

Map<String, Integer> map = list.stream()
.collect(Collectors.toMap(
	Student::getName, student -> Optional.ofNullable(student.getAge()).orElse(1)
));

方式3（可以使用）

.collect()方法可以自定义自己的累加器容器、累加器容器元素添加方式、累加器容器合并，如下所示：

1 2	Map<String, Integer> map = list.stream() .collect(HashMap::new,(m,v)->m.put(v.getName(),v.getAge()),HashMap::putAll);

速度测试

//准备测试数据
ArrayList<Student> list = new ArrayList<>(100000);
for(int i = 0 ; i < 5000000 ;i++){
    Student student= new Student("测试"+i,i );
    list.add(student);
}

测试方式

直接使用forEach

Map<String, Integer> map3 = new HashMap<>(list.size());
for(Student student:list){
    map3.put(student.getName(),student.getAge());
}
Instant now4 = Instant.now();
System.out.println("直接使用forEach："+Duration.between(now3,now4).toMillis());

手动判空

//方式2-时间测试
Map<String, Integer> map2 = list.stream()
    .collect(Collectors.toMap(
        Student::getName, student -> Optional.ofNullable(student.getAge()).orElse(1)
    ));
Instant now3 = Instant.now();
System.out.println("手动判空："+Duration.between(now2,now3).toMillis());

使用自定义累加器

//方式3-时间测试
Instant now1 = Instant.now();
Map<String, Integer> map = list.stream()
    .collect(HashMap::new,(m,v)->m.put(v.getName(),v.getAge()),HashMap::putAll);
Instant now2 = Instant.now();
System.out.println("使用自定义累加器："+Duration.between(now1,now2).toMillis());

测试结果

直接使用forEach：338

手动判空：591

使用自定义累加器：738

测试结论

直接使用forEach是时间最快的，但是需要自己维护实现细节，对性能要求高大批量数据时可以使用
使用自定义累加器虽然能够避免了空指针及键重复问题，但是创建map和putall的行为仍然会消耗额外时间，但是和手动判空消耗时间差距不大，数据量较小，对性能要求不高的业务场景下可以使用

附录

.collect：流的终端操作，接收一个Collector<? super T, A, R> collector参数，主要的作用是将流中元素汇聚成一个结果。

最终调用了CollectorImpl类的4参数构造函数，进行了返回，所以对CollectorImpl类进行分析，CollectorImpl实现了Collector接口，接口未如下所示：

public interface Collector<T, A, R> {
    Supplier<A> supplier();

    BiConsumer<A, T> accumulator();

    BinaryOperator<A> combiner();

    Function<A, R> finisher();

    Set<Characteristics> characteristics();
}

supplier()：返回一个新的累加器容器

@FunctionalInterface
public interface Supplier<T> {
    T get();
}

accumulator()：将元素添加进累加器容器

@FunctionalInterface
public interface BiConsumer<T, U> {
    void accept(T t, U u);
}

combiner()：合并两个累加器容器

@FunctionalInterface
public interface BinaryOperator<T> extends BiFunction<T,T,T> {

}
@FunctionalInterface
public interface BiFunction<T, U, R> {

    R apply(T t, U u);
}

finisher()：结束前对累加器容器进行转换，转换成最终结果容器

@FunctionalInterface
public interface Function<T, R> {
    R apply(T t);
}

characteristics()：定义combiner()函数的行为

enum Characteristics {
	//表明默认多线程（得到并行流情况下）可以并行调用accumulator()且最终结果正确，不需要调用combiner()，因为操作的是同一个结果容器
    CONCURRENT,
	//不保证结果的顺序性
    UNORDERED,
	//表明 累加器容器 等于 最终结果容器，所以就不会调用finisher()进行最终转换
    IDENTITY_FINISH
}