affiliation * availability * courses * papers * vita * etcetera * newpages
GNU * TeX * EPD * Hungarian * links
diacritics * sorting * corpuses
sort programs sort lines in alphabetical order (or any other way you manage to tell them). However, they typically base the ranking on ASCII character codes, that is, A, B ... precede a, b ...; z precedes á, é ....
The first problem is usually overcome within the sort program itself: for example, sort -f in Unix ignores case. You can also find out how to and set the locale on your computer to get accented characters to the right place. By letting your operating system know that you want to use the Hungarian conventions, á will be ordered between a and b.
What justifies the perl scripts provided here, however, is that the conventions of some languages - Hungarian among them - require that some multi-letter graphemes be ordered elsewhere than expected. In Hungarian these are cs, dz, dzs, gy, ly, ny, sz, ty and zs. These count as units ordered after their first part, that is:
cucc < csap, gzip < gyík, zûr < zsír...
Unfortunately, the task is not trivial: some sequences that look like multi-letter graphemes are in fact not, e.g., bércsík may be ranked before or after bérczerge depending on its morphology: bér+csík (after bérczerge) or bérc+sík (before bérczerge). This can be decided only with a morphological/semantic parser, which is probably not worth doing because the problem practically never turns up.
What you do then is the following:
alias husort='p2abc | sort -fd | abc2p'
[Note that this does not work like sort: husort myfile does not do what you want. Use cat myfile | husort instead; you could write a fancy shell script to make it work like that though.]
If your files are typically iso-8859 ecoded extend the alias as
alias husort='iso2p | p2abc | sort -fd | abc2p | p2iso'