The perl scripts abc2p, cork2iso, cork2p, dnew2dold, dold2dnew, dos2html, dos2iso, dos2p, dos2tex, html2dos, html2p, html2tex, iso2cork, iso2dos, iso2p, p2abc, p2cork, p2dos, p2html, p2iso, p2sgml, p2tex, sgml2p, tex2dos, tex2html, tex2p and to2cork do conversions of Hungarian accented letters between different encoding systems. The letters involved are: a/A with acute/umlaut e/E with acute/umlaut i/I with acute o/O with acute/umlaut/double acute u/U with acute/umlaut/double acute The systems are: abc: my invention, used only for alphabetizing cork: Cork encoding, the standard 8-bit TeX encoding (almost ISO-8859, only double accented o/O/u/U are different dos: DOS type extended ascii (old style, dold, and new style, dnew) iso: ISO-8859 tex: the TeX encoding system to: TeX output, which appears in .log files and files output by \write (^^+hex number) html/sgml: used for web documents (difference described below) p: the Pro1sze1ky encoding system Conversions between any two of these systems is possible (except that you can only convert from but not to to (TeX output)), sometimes in more than one steps. In such cases you just have to pipe the result of one conversion into the next: cat myfile.iso | iso2p | p2sgml > myfile.sgml (The resulting file will, of course, be an sgml file only as far as the accented vowels are concerned, the rest of the markup you have to take care of.) The difference between html and sgml is very slight: html has ô for double accented o (and O/u/U), sgml has ő. html2* convert ô, õ as well as ő to double accented o (that is an "sgml" file can be submitted to html2*, unless you need to maintain the difference between ô and ő). There are (at least) two types of encoding that can be called "dos", here the more traditional (and less precise) is used as default to maintain compatibility with other converters (like those in /home/kalman). To have your accented characters converted to the other system, comment out the unwanted codes and uncomment those you prefer (these usually follow the default code). When converting from dos, the two types of systems are merged, that is, both character 143 and character 181 are taken to be A with acute. Alternatively, you may filter your dos result through dold2dnew to change old style dos to the newer system. A problem: neither dos system has a standard code for E with umlaut, but this character is extremely marginal anyway. Conversion to TeX results in \'{\i} for i with acute and \H{o} for o with double acute. Conversion from TeX takes both these forms and the alternatives "\'\i ", "{\'\i}" and \H o as input. (There may even be optional spaces, tabs and one newline between the code for the accent mark and the letter: \' a, \'^Ia, \'^Ja all become a with acute.) The Pro1sze1ky code uses numbers to represent accents: 1 for acute, 2 for umlaut and 3 for double acute. Confusion may result in case one needs both 'e1' and e with acute. (Think of a TeX line like "\font\xy = line10".) To avoid this, real 'e1' is represented as 'e\1' in the Pro1sze1ky code. The converters follow this convention and it is advisable that Pro1sze1ky files are created accordingly. (You should avoid real 'e\1' in Pro1sze1ky files, or add lines to the converters that make them 'e\\1' ...) The two abc converting scripts, p2abc and abc2p, should be used in tandem: first p2abc converts your Pro1sze1ky encoded text to one that can then be sorted (it is sort -fd that creates the correct output), after sorting the result can be reconverted by abc2p. The idea is to overcome Linux's shortcoming (or my ignorance of Linux's hidden capability) of not providing for sorting in an arbitrary (i.e., language specific) order. What I long for is a switch --sort_order which would input a ".sort_order" file looking thus: a=á