Logo Oapen
  • Search
  • Join
    • Deposit
    • For Librarians
    • For Publishers
    • For Researchers
    • Funders
    • Resources
    • OAPEN
    • For Librarians
    • For Publishers
    • For Researchers
    • Funders
    • Resources
    • OAPEN
    View Item 
    •   OAPEN Home
    • View Item
    •   OAPEN Home
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    The Unicode cookbook for linguists

    Managing writing systems using orthography profiles

    Thumbnail
    Download PDF Viewer
    Author(s)
    Moran, Steven
    Cysouw, Michael
    Collection
    Knowledge Unlatched (KU)
    Number
    103595
    Language
    English
    Show full item record
    Abstract
    This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research.
    URI
    http://library.oapen.org/handle/20.500.12657/28277
    Keywords
    Linguistics
    DOI
    10.5281/zenodo.1296780
    ISBN
    9783961100903
    OCN
    1076699025
    Publisher
    Language Science Press
    Publisher website
    https://langsci-press.org/
    Publication date and place
    Berlin, 2018-07-11
    Grantor
    • Knowledge Unlatched - 103595 - Language Science Press 2018 - 2020
    Series
    Translation and Multilingual Natural Language Processing,
    Rights
    https://creativecommons.org/licenses/by/4.0/legalcode
    • Imported or submitted locally

    Browse

    All of OAPENSubjectsPublishersLanguagesCollections

    My Account

    LoginRegister

    Export

    Repository metadata
    Logo Oapen
    • For Librarians
    • For Publishers
    • For Researchers
    • Funders
    • Resources
    • OAPEN

    Newsletter

    • Subscribe to our newsletter
    • view our news archive

    Follow us on

    License

    • If not noted otherwise all contents are available under Attribution 4.0 International (CC BY 4.0)

    Credits

    • logo EU
    • This project received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 683680, 810640, 871069 and 964352.

    OAPEN is based in the Netherlands, with its registered office in the National Library in The Hague.

    Director: Niels Stern

    Address:
    OAPEN Foundation
    Prins Willem-Alexanderhof 5
    2595 BE The Hague
    Postal address:
    OAPEN Foundation
    P.O. Box 90407
    2509 LK The Hague

    Websites:
    OAPEN Home: www.oapen.org
    OAPEN Library: library.oapen.org
    DOAB: www.doabooks.org

     

     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Differen formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    A logged-in user can export up to 15000 items. If you're not logged in, you can export no more than 500 items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.