Jepa - Java Europarl API version 1.1 ----------------------------------------------------------------------- Copyright 2008-12 Vincent Labatut Copyright 2013 Vincent Labatut & Banu Erdem Contact: Vincent Labatut Jepa - Java Europarl API is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see licence.txt ----------------------------------------------------------------------- Description: This API was designed to retrieve data from the Europarl website, i.e. the website of the European Parliament (http://www.europarl.europa.eu). Version 1 is mainly MEP-centered (MEP: member of the European Parliament) and does not allow retrieving all the data available on the website. It focuses on the MEPs' personal information (name, national party, european group, birthday/place, etc.) and activity (reports, declarations, motions, opinions, questions, debates, etc.). Use: See the class TestEuroparl to see some examples of how to use the API. You basically have to instantiate a parser first: either to parse the page of a member (EuroparlMemberParser) or a page containing a list of activities (EuroparlActivityParser). The jar contains the lists of MEPs for each term (only the ids are listed), which are available through the class EuroparlTerm. The Javadoc is complete and gives access to many details. Please, contact us if you detect any bugs, we will do our best to correct them quickly. Language: Note that if Jepa supports all 22 languages used on the Europarl website, however only English was exhaustively tested. The translations for other languages are certainly incomplete. The user can change this though, since all strings are defined in a set of xml files located in the res/languages folder. See the corresponding java classes to get more details regarding the meaning of the codes they rely uppon. Feel free to complete these classes if you want to parse the Europarl resources in some language other than English, or if the English version was made incomplete by some later evolutions of the website. Please, keep us informed of any contributions, so that we can include them in the API and make them available to the other users. Finally, note some concepts are associated to different words in certain languages, and not in others. For instance, the chair and president positions both translate to president in many languages (at least on the Europarl website). This might slightly affect the obtained objects (especially regarding the activities), so we recommend using English. Limitations: Jepa was initially limited to a certain set of functionnalities necessary for a social network mining project. It should be completed so that it gives access to the rest of the data available on the Europarl website, however no firm date has been set yet (it is not even sure this task will even be accomplished at all). It could also be extended to process the websites of other european bodies such as the European Commission (http://ec.europa.eu) or European Council (http://www.european-council.europa.eu). However this is even more uncertain! You can send (by email) suggestions regarding the implementation of new features, however keep in mind there is not garantee at all they will be present in the next versions ;) ----------------------------------------------------------------------- This product uses open source softwares: + JDOM v1.1.1 Copyright 2000-2004 Jason Hunter & Brett McLaughlin http://www.jdom.org/ Modified Apache License Used to parse the translation XML files. + HTML Cleaner v2.2 Copyright 2006-2011 HtmlCleaner Team http://htmlcleaner.sourceforge.net BSD Licence Used to clean the Europarl webpages (which are not XHTML compliant) and parse them.