This is already merged.. so needs to be reverted in case I'm right on the
default behavior..
I'm partial to "natural sort" being the default.. Then we could have
non-default ASCIIbetical or Unicode disregarding precomposed etc. whatever
is fastest.
Palli.
Post by Jeffrey Sarnoff*
looks sane
Are you sure? I'm not sure what lexcmp does (or should do..), but if
say one string has precomposed Unicode letters, and the other
doesn't, should it be possible for them to compare the same? Maybe
it's not preferable by default? Or is it..?
These things can all be tricky; E.g. minor recent upgrades of
PostgreSQL are generally considered safe (and recommended over not
upgrading), with a rare "exception" (just just need to be careful
https://www.postgresql.org/docs/9.5/static/release-9-5-2.html
<https://www.postgresql.org/docs/9.5/static/release-9-5-2.html>
"PostgreSQL 9.5 introduced logic for speeding up comparisons of
string data types by using the standard C library function
|strxfrm()| as a substitute for |strcoll()|. It now emerges that
most versions of glibc (Linux's implementation of the C library)
have buggy implementations of |strxfrm()| that, in some locales, can
produce string comparison results that do not match |strcoll()|.
Until this problem can be better characterized, disable the
optimization in all non-C locales. (C locale is safe since it uses
neither |strcoll()| nor |strxfrm()|.)
Unfortunately, this problem affects not only sorting but also entry
ordering in B-tree indexes, which means that B-tree indexes on text,
varchar, or char columns may now be corrupt if they sort according
to an affected locale and were built or modified under PostgreSQL
9.5.0 or 9.5.1. Users should REINDEX indexes that might be affected.
It is not possible at this time to give an exhaustive list of
known-affected locales. C locale is known safe, and there is no
evidence of trouble in English-based locales such as en_US, but some
other popular locales such as de_DE are affected in most glibc
versions."
I have made a branch with an even faster version of == for
the String type (it took 28%-40% less time in my testing).
Instead of calling lexcmp, which has to deal with strings of
different length, it calls memcmp directly if the string
lengths are equal, and checks if the result is 0.
https://github.com/ScottPJones/julia/tree/spj/fasteqstr
<https://github.com/ScottPJones/julia/tree/spj/fasteqstr>
If somebody could make a PR out of this branch, it would be
appreciated.
Thanks in advance,
Scott