https://hal.archives-ouvertes.fr/hal-02177397
Raw File
Tip revision: e4268f65191281edb5713efc252800c1fe6b06fc authored by Software Heritage on 01 January 2008, 00:00:00 UTC
hal: Deposit 317 in collection hal
Tip revision: e4268f6
readme.txt

Jepa - Java Europarl API version 1.1

-----------------------------------------------------------------------

Copyright 2008-12 Vincent Labatut
Copyright 2013 Vincent Labatut & Banu Erdem

Contact: Vincent Labatut <vlabatut@gsu.edu.fr>

Jepa - Java Europarl API is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation.

For source availability and license information see licence.txt

-----------------------------------------------------------------------

Description: This API was designed to retrieve data from the Europarl website, 
i.e. the website of the European Parliament (http://www.europarl.europa.eu).
Version 1 is mainly MEP-centered (MEP: member of the European Parliament)
and does not allow retrieving all the data available on the website. It
focuses on the MEPs' personal information (name, national party, european 
group, birthday/place, etc.) and activity (reports, declarations, motions,
opinions, questions, debates, etc.). 

Use: See the class TestEuroparl to see some examples of how to use the API.
You basically have to instantiate a parser first: either to parse the page
of a member (EuroparlMemberParser) or a page containing a list of activities
(EuroparlActivityParser). The jar contains the lists of MEPs for each term 
(only the ids are listed), which are available through the class EuroparlTerm. 
The Javadoc is complete and gives access to many details. Please, contact 
us if you detect any bugs, we will do our best to correct them quickly.

Language: Note that if Jepa supports all 22 languages used on the Europarl 
website, however only English was exhaustively tested. The translations for 
other languages are certainly incomplete. The user can change this though,
since all strings are defined in a set of xml files located in the res/languages
folder. See the corresponding java classes to get more details regarding
the meaning of the codes they rely uppon. Feel free to complete these
classes if you want to parse the Europarl resources in some language
other than English, or if the English version was made incomplete by 
some later evolutions of the website. Please, keep us informed of any
contributions, so that we can include them in the API and make them
available to the other users. Finally, note some concepts are associated
to different words in certain languages, and not in others. For instance, 
the chair and president positions both translate to president in many
languages (at least on the Europarl website). This might slightly affect
the obtained objects (especially regarding the activities), so we recommend
using English.

Limitations: Jepa was initially limited to a certain set of functionnalities
necessary for a social network mining project. It should be completed so 
that it gives access to the rest of the data available on the Europarl website,
however no firm date has been set yet (it is not even sure this task
will even be accomplished at all). It could also be extended to process the
websites of other european bodies such as the European Commission
(http://ec.europa.eu) or European Council (http://www.european-council.europa.eu).
However this is even more uncertain! You can send (by email) suggestions 
regarding the implementation of new features, however keep in mind
there is not garantee at all they will be present in the next versions ;)  

-----------------------------------------------------------------------

This product uses open source softwares:
  + JDOM v1.1.1
  	Copyright 2000-2004 Jason Hunter & Brett McLaughlin 
    http://www.jdom.org/
    Modified Apache License
    Used to parse the translation XML files.   
  + HTML Cleaner v2.2
    Copyright 2006-2011 HtmlCleaner Team
    http://htmlcleaner.sourceforge.net
    BSD Licence
    Used to clean the Europarl webpages (which are not XHTML compliant) and parse them.
back to top