Posteado por: tamara89 | Abril 27, 2008

Some Characteristics of Translation (Q3)

The main characteristics of the translation task according with a FEMTI´S report are the following :

  • Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.

 

  • Document routing or sorting: The purpose of document routing / sorting is to scan incoming translated documents quickly in order to send them to the appropriate points for further processing or storage.

 

  • Information extraction or summarization: The purpose of information extraction or summarization is to extract some portion(s) of the translated text, either manually or automatically, for subsequent processing or storage. Information extraction is typically concerned with filling templates by identifying atomic elements of events. In contrast, summarization aims to provide a self-contained and internally cohesive text which serves as a selective account of the original.

 

SOURCES:

  • http://www.issco.unige.ch:8080/cocoon/femti/st-home.html

 

Posteado por: tamara89 | Abril 25, 2008

Concepts relationated with Translation

In that article I am going to explain some concepts which are very closely relationated with the translation world, nad these terms are the following:

  • Machine translation: MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition and translation of idioms, as well as the isolation of anomalies. Improved output quality can also be achieved by human intervention: for example some systems are able to translate more accurately if the user has unambiguosly identified which words in text are names . With the assitance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used “as is”.
  • Compuer-assisted translation: is a form of translation where in a human translator translates texts using computer software designes to support and facilitate the translation process. CAT is some times called machine-assisted, or machine-aidded translation.
  • Translation technology: Translation is the action of interpretation of the meaning of a text, and a subsequent production of an equivalent text, also called a translation, that comunicates the same message in another language. The text to be translated is called the “source text” and the language it is to be translated into is called the “target language” and the final product is sometimes called the “target text”.
  • Multilingual content management:A multilingual website is usually a mixture of global and local content. Local content presents no particular content management issues; global content-which has to be translated across all languages locales-does. Deciding where multiple versions of content are going to be required and where content can be maintained separately for different locales is a critical decison that will affect how a site should be maintained and what it will cost.

SOURCES:

Posteado por: tamara89 | Abril 25, 2008

Explanation of Some of The Topics: Machine Learning (Q2)

As a broad subfield of artificial intelligence, machine learning is concerned with the design and development of algorithms and techniques that allow computers to “learn”. At a general level, there are two types of learning: inductive, and deductive. Inductive machine learning methods extract rules and patterns out of massive data sets.

The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. Hence, machine learning is closely related not only to data mining and statistics, but also theoretical computer science.

  • Applications

Machine learning has a wide spectrum of applications including natural language processing, syntactic pattern recognition, search engines, medical diagnosis, bioinformatics and cheminformatics, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, object recognition in computer vision, game playing and robot locomotion.

  • Human interaction

Some machine learning systems attempt to eliminate the need for human intuition in the analysis of the data, while others adopt a collaborative approach between human and machine. Human intuition cannot be entirely eliminated since the designer of the system must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data. Machine learning can be viewed as an attempt to automate parts of the scientific method.

Some statistical machine learning researchers create methods within the framework of Bayesian statistics.

SOURCES:

  • http://en.wikipedia.org/wiki/Machine_learning
Posteado por: tamara89 | Abril 21, 2008

Explanation of three of Research Topics (Q2)

In that article we are going to talk about the different and the most important research topics which are relationated with the Human Language Technologies.

The German Research Center for Artificial Intelligence elaborates that themes in their research:

  • Exploiting -and automatically extending-ontologies for content processing.
  • tighter integration of shallow and deep techniques in processing.
  • Enriching deep processing with stadistical methods.
  • Combining language checking with structuring tools in document authoring.
  • Document indexing for German and English.
  • Automatically associating recognized information with related information and thus buiding up collective knowledge.
  • Automatically structuring and visualizing extracted information.
  • Processing information encoded in multiple languages, among them Chinese and Japanese

 

The Edimburgh Language Technology Group researh mainly in the following areas:

  • Combining Shallow Semantics and Domain knowledge (EASIE).
  • Text Mining for Biomedical Content Curation (TXM).
  • Cross-retail multi-agent Retail Comparison (CROSSMARC).
  • Smart Qualitative Data: Methods and Community tools for Data  Mark-up (SQUAD).
  • Machine Learning for named Entity Recognition (SEER).
  • Integrated Models and Tools for Fine-Grained Prosody in Discourse (Synthesis).
  • Joint Action Science and Technology (JAST).
  • Study of how pairs colaborate when in plannig a route on a map (Collaborating using diagrams) 

 

The Austrian Research Institute for Artificial Intelligence (OFAI) develop linguistic resources and processes as well as application prototypes:

  • LINGUISTIC RESOURCES AND PROCESSES
    • Typed unification-based grammar formalisms.
    • Development of a HPSG-based grammar for German.
    • Natural Language Generation.
    • Speech Synthesis.
    • Computational morphology.
  • APPLICATION PROTOTYPES
    • Natural Language interfaces and advisory systems.
    • concept-to-speech systems.

 

SOURCES:

  

 

 

Posteado por: tamara89 | Abril 17, 2008

Hans Uszkoreit (Q1)

Uszkoreit is Professor of Computational Linguistics at Saarland University. He serves also as Scientific Director at the German Research Center for Artificial Intellingence (DFKI) where he heads the DFKI Language Technology Lab.  At the same time, by cooptation he is Professor of the Computer Science Department.

He studied Linguistics and Computer Science at The Technical University of Berlin and the University of Texas at Austin. During his time in Austin he also worked as a research associate in a large machine translation projects at the Linguistics Research Center. In 1984 he received his Ph.D in linguistics from The University of Texas. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. During this time he was also afilliated with the Center of the Study of Language and Information at Standford University as a senior researcher and later as a project leader. in 1986 he spent six months in Stuttgart to work for IBM Germany as a project leader in the project LILOG. During this time he also taught at the University of Stuttgart. 

in 1988 he was appointed to a newly created chair of Computational Linguistics at Saarland University and started the Deparment of Computational Linguistics and Phonetics. in 1989 he became the head of the newly founded Language Technology Lab. at DFKI. He has been a co-founder and principal investigator of the Special Collaborative Research Division “Resource-Adaptive Cognitive Processes” of the DFG. He is co-founder and professor of the of the “European Posgraduate Program Language Technology and Cognitive Systems”, a joint Ph.D. program with the University of Edinburgh.    

Some of his publications:

  • Uszkoreit, H. (2007) Methods and Applications for Relation Detection. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing, 2007.
  • Uszkoreit, H., V. Kordoni, V. Kubon, M. Rosner and S. Kirchmeier-Andersen. (2005). ‘Language Technology from a European Perspective’. In Chris Brew and Dragomir R. Radev (eds.), Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL, pp. 43 – 48, University of Michigan – Ann Arbor, June 2005, Association for Computational Linguistics.
  • Uszkoreit, H. & B. Joerg (2003 A Virtual Information Center for Language Technology: Ontology, Datastructure, Realization, In: Nordic Language Technology Yearbook,  Museum Tusculanums Forlag, Copenhagen.

 Sources:

Posteado por: tamara89 | Abril 14, 2008

HLT´S Research Centres (Q1)

There are a lot of research Centres, europeans and internationals, which tell us more things about the Human Languages Technologies and which are available on the Net. These are some of them:

  • National Centre for Language Technology (NCLT): Language is the key modality in communication. The National Centre for Language Technology conducts research into the processing of human language by computers such as speech recognition and synthesis, machine translation, human-computer interfaces, information rhltcetrieval and extraction, the teaching and learning of languages using computers and software localization and globalization. Research in Human Language Technology (HLT) is interdisciplinary and includes Natural Language Processing (NLP) and Computational Linguistics (CL). HLT has substantial economic implications and potential. The centre carries out basic research and develops applications. Director :P rof. Josef van Genabith. Administrator :D r. Yafa Al-Raheb.
  • HKUST Human Language Technology Center: Is a multidisciplinary research center  at the hong Kong University of Science and Technology (HKUST) whose mission is to lead state-of-the-art research directions that drive the development of new applications in both text and spoken language technology. HLTC is led by seven faculty members from de EE and the CS departments: Oscar Au, Roland Chin, Pascale Fung, Brian Mak, Bertram Shi, Manhung Siu and Dekai Wu, specializing in speech adn signal processing, stadistical and corpus-based natural language processing, machine translation, text mining, information extraction, Chinese Language processing, knowledge management, and related fields. Special emphasis is given to machine processing of Chinese language and Chinese information. Sistems built at HLCT include automated language translation for the Internet, speech-based web browsing, and speech recognition for the telephone.
  • Language Technology Group: Language Technology (LT) forms a major research area at the Austrian Research Institute for Artificial Intelligence (OFAI) since its inception in 1984. We conduct research in modeling and processing human languages, especially for German. This includes constructing linguistic resources, processing algorithms, and application prototypes. The language technology Group at OFAI is a member of the EU´s European Network of Excellence in Human Language Technologies (ELSNET).

 Sources:

Posteado por: tamara89 | Marzo 30, 2008

Different definitions of Human Language Technologies (Q1)

In our today´s society the term “Human Language Technologies” is one of the most cited because of its great importance, so, there are a lot of definitions made by specialists whom we are going to write and to mention in this article.

One of the most important definitions of HLT is made by the free encyclopedia Wikipedia:

Human Language Technology (HLT) consists of computational linguistics (or CL) and speech technology as its core but includes also many application oriented aspects of them. Language technology is closely connected to computer science and general linguistics.”

Another important definition of that term is given by Hans Uszkoreit, who is professor of Computational Linguistics at the Department of Computational Linguistics and Phonetics of Saarland University at Saabrücken. That scientific Director at the German Research Center for Artificial Intelligence (DFKI) and head of DFKI Language Technology Laboratory defines:

Language technology — sometimes also referred to as human language technology — comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics”.

And a third definition of Human Language Technology could be the definition given by the African advanced Institute for Information and Communication Technology, Meraka Institute who says:

Human Language Technology (HLT) makes it easier for people to interact with machines. This can benefit a wide range of people – from illiterate farmers in remote villages who want to obtain relevant medical information over a cell phone, to scientists in state-of-the-art laboratories who want to focus on problem-solving with computers”. 

·         SOURCES:

 

Posteado por: tamara89 | Febrero 10, 2008

HTML

HTML is the English acronym for HyperText Markup Language, which was translated into Spanish as a language tags Hypertext .It is a markup language designed for structuring and presenting texts in the form of hypertext, which is the standard format of Web pages. Thanks to the Internet and web browsers like Internet Explorer, Opera, Firefox, Netscape or Safari, HTML has become one of the most popular formats, and easy to learn that exist for the preparation of documents for the site.

HTML is not a programming language, although permits include code in Programming Languages, under certain criteria, expanding its capacity and functionality, but this will be achieved exceeding the scope of HTML itself.

Codes basic HTML

  • <html> Sets the beginning of the HTML document, tells the browser that what follows should be interpreted as HTML code.  
  • <head> Sets the header of the HTML document, this header usually contains information about the document that is not shown directly to the user. There’s the title of the browser window. Within the header <head> we find:
    •  <title> Sets the page title. Usually, the title appears in the title bar above the window.
    • <link>: To link the site to style sheets or icons. For example: <link rel=”stylesheet” href=”/style.css” type=”text/css”>
    • <style>: To place style internal page, either using CSS, JavaScript or other similar languages. It is not necessary to place if we are going to link to an external file using the label <link>
  • <body>: Defining the content or main body of the document. This part of html document shown in the browser; within this tag can be defined properties common to all the site, as background color and margins. Within the body <body> can find many labels.
  • Most labels must be closed as open, but with a slash (”/”)

SOURCES:

Posteado por: tamara89 | Febrero 1, 2008

What is XML?

XML is the Extensible Markup Language. It improves the functionality of the Web by letting you identify your information in a more accurate, flexible, and adaptable way.

It is extensible because it is not a fixed format like HTML (which is a single, predefined markup language). Instead, XML is actually a metalanguage—a language for describing other languages—which lets you design your own markup languages for limitless different types of documents. XML can do this because it’s written in SGML, the international standard metalanguage for text document markup (ISO 8879).

There are two current versions of XML. The first, XML 1.0 has undergone minor revisions since then, without being given a new version number, and is currently in its fourth edition, as published on August 16, 2006. It is widely implemented and still recommended for general use. The second, XML 1.1, was initially published on February 4, 2004, the same day as XML 1.0 Third Edition, and is currently in its second edition, as published on August 16, 2006. It contains features that are intended to make XML easier to use in certain cases- mainly enabling the use of line-ending characters used on EBCDIC platforms, and the use of scripts and characters absent from Unicode 2.0. XML 1.1 is not very widely implemented and is recommended for use only by those who need its unique features.

XML 1.0 and XML 1.1 differ in the requirements of characters used for element and attribute names: XML 1.0 only allows characters which are defined in Unicode 2.0, which includes most world scripts, but excludes those which were added in later Unicode versions. Among the excluded scripts are Mongolian, Cambodian, Amharic, Burmese, and others.

Almost any Unicode character can be used in the character data and attribute values of an XML 1.1 document, even if the character is not defined, aside from having a code point, in the current version of Unicode. The approach in XML 1.1 is that only certain characters are forbidden, and everything else is allowed, whereas in XML 1.0, only certain characters are explicitly allowed, thus XML 1.0 cannot accommodate the addition of characters in future versions of Unicode.

In character data and attribute values, XML 1.1 allows the use of more control characters than XML 1.0, but, for “robustness”, most of the control characters introduced in XML 1.1 must be expressed as numeric character references. Among the supported control characters in XML 1.1 are two line break codes that must be treated as whitespace. Whitespace characters are the only control codes that can be written directly.

There are also discussions on an XML 2.0, although it remains to be seen[vague] if such will ever come about. XML-SW (SW for skunk works), written by one of the original developers of XML, contains some proposals for what an XML 2.0 might look like: elimination of DTDs from syntax, integration of namespaces, XML Base and XML Information Set (infoset) into the base standard.

The World Wide Web Consortium also has an XML Binary Characterization Working Group doing preliminary research into use cases and properties for a binary encoding of the XML infoset. The working group is not chartered to produce any official standards. Since XML is by definition text-based, ITU-T and ISO are using the name Fast Infoset[3] for their own binary infoset to avoid confusion (see ITU-T Rec. X.891 | ISO/IEC 24824-1).

·SOURCES:

·http://xml.silmaril.ie/
·Wikipedia “XML Languaje”

Posteado por: tamara89 | Febrero 1, 2008

Lenguaje XML

XML, sigla en inglés de Extended Markup Language («lenguaje de marcas extensible»), es un metalenguaje extensible de etiquetas desarrollado por el World Wide Web Consortium (W3C). Es una simplificación y adaptación del SGML y permite definir la gramática de lenguajes específicos. Se puede usar en bases de datos, editores de texto, hojas de cálculo y casi cualquier cosa imaginable.

Su desarrollo se comenzó en 1996 y la primera versión salió a la luz el 10 de febrero de 1998. La primera definición que apareció fue: Sistema para definir validar y compartir formatos de documentos en la web. Durante el año 1998 XML tuvo un crecimiento exponencial, con apariciones en medios de comunicación, menciones en páginas web, soporte software, etc

Características generales.

  • Directamente utilizable en Internet
  • Soporte para una amplia variedad de aplicaciones para transferencia de datos
  • Compatible con SGML
  • Posibilidad de crear sencillos procesadores de XML
  • Documentos XML legibles y medianamente claros (depende de la definición)
  • Diseño rápido del lenguaje
  • Simple, pero perfectamente formalizado
  • Documentos XML fáciles de crear

Es importante mencionar, por último, las ventajas que ha supuesto la creación del XML:

  • Es extensible, lo que quiere decir que una vez diseñado un lenguaje y puesto en producción, igual es posible extenderlo con la adición de nuevas etiquetas de manera de que los antiguos consumidores de la vieja versión todavía puedan entender el nuevo formato.
  • El analizador es un componente estándar, no es necesario crear un analizador específico para cada lenguaje. Esto posibilita el empleo de uno de los tantos disponibles. De esta manera se evitan bugs y se acelera el desarrollo de la aplicación.
  • Si un tercero decide usar un documento creado en XML, es sencillo entender su estructura y procesarlo. Mejora la compatibilidad entre aplicaciones

Las consecuencias inmediatas del sistema que tratamos son la aparición de la Web 2.0, o el desarrollo de sistema como por ejemplo el TEI.

Fuentes

(Artículo de foro)

Entradas antiguas »

Categorías