DocOpt.jl icon indicating copy to clipboard operation
DocOpt.jl copied to clipboard

Performance/safety considerations for `partition`

Open jakewilliami opened this issue 4 years ago • 0 comments

Changes

  • Performance: made partition around 50 times faster, and considerably better memory usage;
  • String safety: use prevind and nextind instead of - 1 and + 1 respectively;
  • A Julian touch: return an NTuple{3, SubString{String}}, rather than NTuple{3, String}.

P.S., I know this package isn't worked on an awful lot, but any performance improvements are good, right?...


Benchmarking results:

For benchmarking, the original partitions function has kept its name, and my updated partitions function is called my_partition. For benchmarking, I have generated a random string 3 times, and for each string I choose a random character, and benchmark both functions with the character and string version of that character.

[ Info: Benchmarking methods for random string of length 100000

[ Info: 1/3: Checking partition for character C
  6.913 μs (4 allocations: 97.86 KiB)
[ Info: 1/3: Checking partition for string C
  7.285 μs (4 allocations: 97.86 KiB)
[ Info: 1/3: Checking my_partition for character C
  146.461 ns (2 allocations: 112 bytes)
[ Info: 1/3: Checking my_partition for string C
  146.576 ns (2 allocations: 112 bytes)
[ Info: 2/3: Checking partition for character t
  7.097 μs (4 allocations: 97.86 KiB)
[ Info: 2/3: Checking partition for string t
  6.822 μs (4 allocations: 97.86 KiB)
[ Info: 2/3: Checking my_partition for character t
  146.713 ns (2 allocations: 112 bytes)
[ Info: 2/3: Checking my_partition for string t
  145.491 ns (2 allocations: 112 bytes)
[ Info: 3/3: Checking partition for character t
  7.213 μs (4 allocations: 97.84 KiB)
[ Info: 3/3: Checking partition for string t
  7.149 μs (4 allocations: 97.84 KiB)
[ Info: 3/3: Checking my_partition for character t
  148.481 ns (2 allocations: 112 bytes)
[ Info: 3/3: Checking my_partition for string t
  147.291 ns (2 allocations: 112 bytes)

Benchmarking script:

function partition(s::AbstractString, delim::AbstractString)
    range = findfirst(delim, s)
    if range == nothing
        # no match
        return s, "", ""
    elseif length(range) == 1
        # delim is a single character
        return s[1:range[1]-1], delim, s[range[1]+1:end]
    else
        start, stop = range
        return s[1:start-1], delim, s[stop+1:end]
    end
end
partition(s::AbstractString, delim::Char) = partition(s::AbstractString, string(delim))

function my_partition(str::AbstractString, delim::Union{AbstractString, AbstractChar})
    indices = findfirst(delim, str)
    if isnothing(indices)
        empty_substring = SubString("")
        return SubString(str), empty_substring, empty_substring
    else
        return SubString(str, 1, prevind(str, first(indices))),
            SubString(delim isa AbstractChar ? string(delim) : delim),
            SubString(str, nextind(str, last(indices)))
    end
end

using BenchmarkTools, Random

str_len = 100_000
ntries = 3

@info "Benchmarking methods for random string of length $str_len\n"
for i in 1:ntries
    global str = randstring(str_len)
    global rand_c = rand(str)
    global rand_str = string(rand_c)
    for f in [:partition, :my_partition]
        @info "$i/$ntries: Checking $f for character $rand_c"
        @eval @btime $f(str, rand_c)
        @info "$i/$ntries: Checking $f for string $rand_c"
        @eval @btime $f(str, rand_c)
    end
end

jakewilliami avatar Sep 08 '21 11:09 jakewilliami