What is new in PooledArrays.jl?

personBlog by Bogumił Kamiński

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2021/03/05/pooledarrays.html

Introduction

Recently PooledArrays.jl 1.2.1 has been released. The most significant change

since 1.0 release is an improvement of performance of basic operations:
getindex, copy, copy!, and copyto!. The effect of the change is

especially significant for PooledArrays that have large pools. This change, is

one of the steps towards making Julia run fast in Database-like ops

benchmark
for joins.

Let me start with the examples and then I will comment on the internals.

The post was tested under Julia 1.6.0-rc1.

The Benchmarks

I will just share a recordng of a Julia session doing the benchmarks. We start

with PooledArrays.jl 1.0.0:

(@v1.6) pkg> activate . 
  Activating new environment at `~/Project.toml` 
 
(bkamins) pkg> add [email protected] 
   Resolving package versions... 
    Updating `~/Project.toml` 
  [2dfb63ee] + PooledArrays v1.0.0 
    Updating `~/Manifest.toml` 
  [9a962f9c] + DataAPI v1.6.0 
  [2dfb63ee] + PooledArrays v1.0.0 
  Progress [========================================>]  1/1 
1 dependency successfully precompiled in 2 seconds (1 already precompiled) 
 
julia> using BenchmarkTools, PooledArrays 
 
julia> x = PooledArray(string.(1:10^6)); 
 
julia> @benchmark copy($x) 
BenchmarkTools.Trial: 
  memory estimate:  37.44 MiB 
  allocs estimate:  13 
  -------------- 
  minimum time:     16.109 ms (0.00% GC) 
  median time:      17.820 ms (0.00% GC) 
  mean time:        29.851 ms (40.25% GC) 
  maximum time:     175.307 ms (88.17% GC) 
  -------------- 
  samples:          168 
  evals/sample:     1 
 
julia> @benchmark $x[1:1] 
BenchmarkTools.Trial: 
  memory estimate:  33.63 MiB 
  allocs estimate:  12 
  -------------- 
  minimum time:     15.334 ms (0.00% GC) 
  median time:      16.929 ms (0.00% GC) 
  mean time:        30.654 ms (44.06% GC) 
  maximum time:     188.496 ms (89.90% GC) 
  -------------- 
  samples:          164 
  evals/sample:     1 

In order to assess if the timings are good or bad let us do the same operations

using a plain Vector{String}:

julia> x = string.(1:10^6); 
 
julia> @benchmark copy($x) 
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB 
  allocs estimate:  2 
  -------------- 
  minimum time:     395.687 μs (0.00% GC) 
  median time:      441.842 μs (0.00% GC) 
  mean time:        757.598 μs (38.52% GC) 
  maximum time:     6.194 ms (90.51% GC) 
  -------------- 
  samples:          6598 
  evals/sample:     1 
 
julia> @benchmark $x[1:1] 
BenchmarkTools.Trial: 
  memory estimate:  96 bytes 
  allocs estimate:  1 
  -------------- 
  minimum time:     25.310 ns (0.00% GC) 
  median time:      26.013 ns (0.00% GC) 
  mean time:        34.384 ns (8.88% GC) 
  maximum time:     2.433 μs (98.05% GC) 
  -------------- 
  samples:          10000 
  evals/sample:     992 

As you can see things are really bad in PooledArrays.jl. Now start a fresh Julia

session and install 1.2.1 release of the package.

(bkamins) pkg> activate . 
  Activating environment at `~/Project.toml` 
 
(bkamins) pkg> add [email protected] 
   Resolving package versions... 
    Updating `~/Project.toml` 
  [2dfb63ee] ↑ PooledArrays v1.0.0 ⇒ v1.2.1 
    Updating `~/Manifest.toml` 
  [2dfb63ee] ↑ PooledArrays v1.0.0 ⇒ v1.2.1 
  [9fa8497b] + Future 
  [9a3f8284] + Random 
  [9e88b42a] + Serialization 
 
julia> using BenchmarkTools, PooledArrays 
 
julia> x = PooledArray(string.(1:10^6)); 
 
julia> @benchmark copy($x) 
BenchmarkTools.Trial: 
  memory estimate:  3.81 MiB 
  allocs estimate:  4 
  -------------- 
  minimum time:     893.391 μs (0.00% GC) 
  median time:      948.740 μs (0.00% GC) 
  mean time:        1.009 ms (5.08% GC) 
  maximum time:     103.288 ms (98.55% GC) 
  -------------- 
  samples:          4953 
  evals/sample:     1 
 
julia> @benchmark $x[1:1] 
BenchmarkTools.Trial: 
  memory estimate:  160 bytes 
  allocs estimate:  3 
  -------------- 
  minimum time:     47.806 ns (0.00% GC) 
  median time:      53.053 ns (0.00% GC) 
  mean time:        184.150 ns (44.04% GC) 
  maximum time:     55.779 μs (68.58% GC) 
  -------------- 
  samples:          10000 
  evals/sample:     976 

This looks better. Still you have to pay some cost over a Vector{Sting} but it

is much smaller (the cost is due to the fact that PooledArray constructor

performs some consistency checks of passed data to ensure extra safety).

You can expect that other operations that take a PooledArray or its view and

produce a PooledArray (like or copyto!) to experience similar

speedups.

So, what has changed between 1.0 and 1.2 release of PooledArrays.jl?

The Internals

In order to understand why the speedups were possible one needs to understand

first why the original code was slow. The reason is that PooledArray struct

contains three key fields:

  • refs: a vector of integer references (levels) of a PooledArray;
  • pool: a vector giving a mapping from references to actual values;
  • invpool: a dictionary providing a reverse mapping – from values to references.

As you can see this data structure is quite heavy if the size of pool relative

to the size of the PooledArray is large. In the example above I have shown an

extreme case where they were equal. But even if pool has e.g. 10% of size of

the PooledArray the cost is noticeable.

In PooledArrays.jl 1.0 all thee fields refs, pool and invpool were always

copied when a new PooledArray was created. This was expensive. What

PooeldArrays.jl 1.2 introduces is a well known from R and often asked about in

Julia copy on write behavior. What we now do is that we only copy refs. The
pool and invpool fields are shared across PooledArrays as this is a safe

thing to do as long as you are not changing the set of levels in the pool.

So where does the copy on write happen? PooledArrays.jl is now aware if you

share pool and invpool across several arrays and if this is the case and you

add levels to PooledArray then pool and invpool get copied. So essentially

we have a lazy mechanism that copies them only if needed.

One could ask why do we copy pool and invpool at all. One could just keep

sharing them without having to pay the cost of copying at all. The decision was

guided here by two considerations:

  • in practice new levels are added to PooledArray quite rarely (you mostly do

    it when constructing an initial source PooledArray);
  • it is safer to copy the pool and invpool in consideration of potential

    multi-threaded usages of PooledArray (where many tricky corner cases can

    happen).

Conclusions

The main take away is that you can expect your code using PooledArrays.jl to be

much faster since 1.2 release.

Before I finish let me comment on one important feature of PooledArray object

to keep in mind in the multi-threaded applications I have mentioned above.

It is currently not thread safe to add new levels to the pool

of PooledArray. So the rule is: if you add levels to PooledArray make sure

you are not performing any other operations on it in other threads.

However, you can safely perform any operations in multi-threaded context that do

not change the pool. So, in particular (barring standard considerations of

correct multi-threaded code), you are allowed to use setindex! on
PooledArray as long as you do not add new levels.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu