Post date: May 9, 2011 6:18:02 AM
Playing with erlang the other day I ran into the need to parse a simple key/value pair configuration list. After looking around I found a lot of different ways to do the same thing. Here are four of the ways that I created or found to search for a value:
loop: You can write a standard recursive looping function.
list: There is a keysearch function in the lists module.
prop: There is a lookup function in the proplists module.
comp: You can do this directly in a list comprehension statement.
First let's create a module and some test data.
-module(kv_test).
-compile(export_all).
test_data() ->
lists:flatten([
{host, "www.example.org"},
{port, 1234},
{test, three, elements},
{test, good},
lists:duplicate(1000, {extra, values}),
{message, "This is a test"},
{message, "Hello World"}
]).
Differences in the methods below show up in the way that they handle multiple entries, long lists and varied tuple structures. For the first cut of this article, I'll concentrate on a simple lookup without duplicates, but the data has some of the problems built in so we can discuss them.
The first method is a simple loop coded without any special library functions. In this version we can make use of the pattern matching in the call to do all our testing for us. This code is simple to write, fairly fast, and can be adjusted to match a wide range of matching needs. However each routine is specific to a particular pattern and you could end up writing several of these.
lookup_loop(_Key, []) ->
none;
lookup_loop(Key, [{Key,Value}|_]) ->
Value;
lookup_loop(Key, [_|Tail]) ->
lookup_loop(Key, Tail).
In the next method, I'll use the lists:keysearch function. Keysearch is one of a family of functions that search, replace, delete, map, sort, etc. lists of tuples based on a key. The key can be in any position of the tuple. Note that this call fails on our test data when passed a key of test. This is because lists:keysearch will return the first tuple it finds with a key of test, which happens to be the 3 element tuple. That doesn't match the {value,{K, V}} case clause. We could accept the 3 element list, but let's assume that the 2 element tuple is the desired value.
lookup_list(Key,Lst) ->
case lists:keysearch(Key, 1, Lst) of
{value,{Key,Value}} ->
Value;
_Else ->
none
end.
Proplists is a whole module dedicated to manipulating lists of key/value tuples. This function also suffers the problem that it finds the 3 element tuple when searching for test. As the tests below show, it is also a lot (10x) slower than the other routines when searching for an item near the start of the list (high startup cost, though it catches up fast).
lookup_prop(Key, Lst) ->
case proplists:lookup(Key, Lst) of
{Key, Value} ->
Value;
_Else ->
none
end.
List comprehensions are a compact way to code the search that has some of the advantages of a hand coded loop, with a concise syntax. It is very much like an SQL query:
SELECT V FROM (SELECT K V FROM Lst where K = Key);
This will select all two element tuples from the list where the first element == Key. It then gathers the second element from each tuple and returns a (possibly empty) list of those values. The nice thing about this method is that you can merge mutiple lists, compare aginst mutiple guard patterns, and run the result through a function; a lot of the flexability of a hand coded loop in a single statement. This one selects the correct 2 element tuple for 'test' but will return multiple results for 'message'. One negative side effect of scanning for all matches is that this routine scans all elements of the list even if it finds the answer in the first one. The same is probably true of proplists:lookup_all/2).
lookup_comp(Key, Lst) ->
case [V || {K, V} <- Lst,K == Key] of
[Value] ->
Value;
[Value|_MoreValues] ->
Value;
[] ->
none
end.
So which to use? Below are some timings I made. First to notice is that all of them are really fast (a few microseconds at most), so unless you have a lot of them to do in a loop, you can pick your favorite. The order of the list changed if we have a long or short list, wanted duplicates or not, had variable sized tuples or not, .... If performance is important, you'll probably have to time several variations. If not, pick the one that is least suprising. If you are sorting or spliting the list then you probably want the proplists version. If you need multiple results, or need to process the values as you collect them then list comprehensions or hand coded lists will probably be the choice.
timeit(F, N) ->
statistics(runtime),
for(1, N, F),
{_,Time} = statistics(runtime),
Time/1000.0.
for(N, N, F) -> F(), ok;
for(I, N, F) -> F(), for(I+1, N, F).
time_test(N) ->
K = host,
L = test_data(),
io:format("null: ~.3e~n",
[timeit(fun()->ok end, N)/N]),
io:format("list: ~.3e~n",
[timeit(fun()->lists:keysearch(host,1,L) end, N)/N]),
io:format("loop: ~.3e~n",
[timeit(fun()->lookup_loop(K,L) end, N)/N]),
io:format("comp: ~.3e~n",
[timeit(fun()->[V||{host,V}<-L] end, N)/N]),
io:format("prop: ~.3e~n",
[timeit(fun()->proplists:lookup(host,L) end, N)/N]),
ok.
% searching for message
%null: 1.30e-7
%list: 1.15e-5
%loop: 6.11e-5
%comp: 4.80e-5
%prop: 1.13e-4
% searching for host
%null: 1.40e-7
%list: 2.20e-7
%loop: 2.00e-7
%comp: 4.59e-5
%prop: 2.50e-7