Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

Revision 5b94fd2c12712e04345d4b130358afbb352f6436 authored by vthierry on 28 October 2025, 18:54:37 UTC, committed by vthierry on 28 October 2025, 18:54:37 UTC
sync from makefile
1 parent 4ad30c1
  • Files
  • Changes
  • a206553
  • /
  • public
  • /
  • turtoise.tex
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • revision
  • directory
  • content
revision badge
swh:1:rev:5b94fd2c12712e04345d4b130358afbb352f6436
directory badge
swh:1:dir:28c1523fab318e277996fac88c12a1b3d8a9190d
content badge
swh:1:cnt:a1d0e97e5e5190bfe6bbd51f2a27732165afe2ba

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • revision
  • directory
  • content
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
turtoise.tex
%
% This file is used in  
%   https://gitlab.inria.fr/line/aide-group/aide/-/blob/master/etc/tex_to_pdf.sh
% and command are documented in http://aide-line.inria.fr/build/www/etc.html#.tex_to_pdf
%

% Defines a minimal layout for pdf display

\documentclass[a4paper,12pt,landscape,pdftoolbar=false,pdfmenubar=false]{article}
\pagestyle{empty} 
\topmargin 0cm \oddsidemargin 0cm \evensidemargin 0cm 
\setlength{\parindent}{0in} 
\setlength{\parskip}{3mm} 
\usepackage[margin=2cm]{geometry}

% Here are the used packages

\usepackage[utf8]{inputenc}
\DeclareUnicodeCharacter{00B0}{\textsuperscript{o}}

\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsfonts}
\usepackage{array}
\usepackage{times}
\usepackage{color}
%\usepackage{algpseudocode}\usepackage{algorithm}
\usepackage{listings}

% Here are some used commands

\newcommand{\deq}{\stackrel {\rm def}{=}} 
\newcommand{\eqline}[1]{~\vspace{0.3cm}\\\centerline{$#1$}\vspace{0.3cm}\\}
\newcommand{\tab}{\hphantom{6mm}}
\newcommand{\hhref}[1]{\href{#1}{#1}}

% Here are some homemade commands to define minimal slides

\newcommand{\slide}[1]{\clearpage\fbox{\parbox[t][16.5cm][t]{\textwidth}{\Huge #1}}\newpage}
\newcommand{\stitle}[1]{~\vspace{1cm}\\\centerline{\fontsize{40}{50}\selectfont \bf #1}\vspace{0.5cm}\\}
\newcommand{\sright}[1]{\begin{flushright}#1\tab\end{flushright}}
\newcommand{\scenter}[1]{\begin{center}#1\end{center}}
\newcommand{\stwo}[4]{\begin{tabular}{ll}\parbox{#1\textwidth}{#3}&\parbox{#2\textwidth}{#4}\end{tabular}}

\begin{document}

\vspace{4cm} \centerline{\Huge Turtoise: Interfacing hierarchical data with knowledge graph}

\subsubsection*{Position of the problem: Hierarchical and distributed representation of symbolic data}

Many data sets are defined using hierarchical data structures, what is called a ``record'' or ``named tuple'', often collected in ``spreadsheet'' or ``table''. The \href{https://www.json.org}{JavaScript Object Notation (JSON)} for instance, under the vocable of ``object'', defines the notion of record, i.e., collection of unordered name and value pairs, each value being either a Boolean, string or numeric litteral or a sub-structure. Here we use the \href{https://line.gitlabpages.inria.fr/aide-group/wjson/#semantic}{wJSON} semantic variant for which, for instance, a list is one to one correspondence with a record with numbers indexing the list values. Furthermore, each data is specified with a \href{https://line.gitlabpages.inria.fr/aide-group/symboling/index.html}{type}. Such representation is universal in the sense that we are used to define almost all data (e.g., everyday structured data, software parameters and other configuration data, digital object metadata, etc) in such a format. 

On the other hand, general purpose language for representing semantic information to define ontology, linked data and metadata, such as semantic web contents \cite{hoekstra_ontology_2009} are highly distributed. The \href{https://en.wikipedia.org/wiki/Resource_Description_Framework}{Resource Description Framework (RDF)} decomposes the knowledge in atoms of the form {\tt (subject predicate object)}, an object being either a data or another subject. For an individual, a data property correspondis to qualitative or quantitative feature, while an object property (i.e. with another subject as object) corresponds to a relation. This corresponding to a graph data structure with predicate as labeled edge and subject or object as labeled node, i.e., ``linked data''. Data modeling is implemented via properties stated on the individual, specified via predicates, such as the RDF schema\footnote{\url{https://www.w3.org/TR/rdf-schema}} or OWL2\footnote{\url{https://www.w3.org/TR/owl2-overview} with \hhref{https://www.w3.org/TR/owl2-primer} for an introduction.}, which formally defines meaning over the defined facts, while we can also use derivation rules (e.g., using SWRL\footnote{\url{https://www.w3.org/Submission/SWRL/}} rules). See, e.g. \cite{mercier_formalizing_2021} for an introduction in this context.

Representing data in such a distributed manner corresponds to A-box\footnote{\hhref{https://en.wikipedia.org/wiki/Abox}} of a knowledge graph. Data modeling using such ontology predicates or derivation rules define a T-box\footnote{\hhref{https://en.wikipedia.org/wiki/Tbox}} which will generates {\em deduced} qualities, offering the possibility to perform inferences, thus implementing dynamic features at a pure symbolic level.

We thus need to map hierachical data structure onto a A-box and reintroduce the deduced results into the original data structure to enrich it.

\newpage \subsubsection*{Position of the problem: Calculated and deduced value}

At the implementation level, More precisely a static data structure implemented as \href{./Value.html}{\tt Value} can also have dynamical features computed from other values, implemented as \href{./FValue.html}{\tt FValue}, and features deduced using a reasoner, implemented as \href{./LValue.html}{\tt LValue}, as represented here:

\centerline{\includegraphics[width=0.9\textwidth]{./symbolic-data-structure.png}}

\newpage \subsubsection*{A one to one correspondence between both representations}

Each hierarchical data structure is translated\footnote{The position is very different from \href{https://en.wikipedia.org/wiki/JSON-LD}{JavaScript Object Notation for Linked Data (JSON-LD)} which is a method of encoding linked data specify to limit the work of transforming {\em any} existing JSON data structures to RDF linked data. Here only rather specific data structures are considered.} in terms of RDF statements as follows: Each record item is a "subject" and each named value corresponds to a "property", the value being the "object" targeted by the relationship. 

For instance ``someone who has the name Alice, knows someone else, who has the name Bob, who knows someone else who has the name Eve, while its email is bob@example.com'' writes, using \href{https://line.gitlabpages.inria.fr/aide-group/wjson/#semantic}{wJSON} on one hand and \href{https://www.w3.org/TR/turtle}{Turtle} syntax on the other hand: \\
\begin{tabular}{ll}
{\em Turtoise implicit node syntax} & {\em Turtle blank node syntax}\\
\begin{minipage}[t]{0.45\textwidth}\begin{verbatim}
{
  @base: https://gitlab.inria.fr/line/aide-group/wjson/-/raw/master/src/test.nt
  @prefix: { 
    foaf: <http://xmlns.com/foaf/0.1/>
  }
  foaf:name: Alice
  foaf:knows: {
    foaf:name: Bob  
    foaf:knows: {
      foaf:name: Eve
    }
    foaf:mbox: bob@example.com
  }
}
\end{verbatim} \end{minipage} &
\begin{minipage}[t]{0.45\textwidth}\begin{verbatim}



@base https://gitlab.inria.fr/line/aide-group/wjson/-/raw/master/src/test.nt
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
[ foaf:name "Alice" ]
  foaf:knows [
    foaf:name "Bob" ;
    foaf:knows [
        foaf:name "Eve" ] ;
    foaf:mbox "bob@example.com" ] 
\end{verbatim} \end{minipage} \\
\end{tabular} \\ which (avoing taking the {\tt @base} for the sake of clarity) expands in \href{https://en.wikipedia.org/wiki/N-Triples}{N-Triples} syntax to {\small\begin{verbatim}
<local:@prefix>               <local:foaf>                      <http://xmlns.com/foaf/0.1> .
<local:>                      <http://xmlns.com/foaf/0.1/name>  "Alice" .
<local:foaf:knows>            <http://xmlns.com/foaf/0.1/name>  "Bob" .
<local:foaf:knows/foaf:knows> <http://xmlns.com/foaf/0.1/name>  "Eve" .
<local:foaf:knows>            <http://xmlns.com/foaf/0.1/mbox>  "bob@example.com" .
\end{verbatim}}

The ``Turtoise'' syntax corresponds to defining a data structure using \href{https://line.gitlabpages.inria.fr/aide-group/wjson/#semantic}{wJSON} syntax with optional {\tt @base} and {\tt @prefix} directives.

What corresponds to blank node\footnote{We do not use blank node labeling of the form {\tt \_:{\em URIref}} because it is interesting to encode the relative location in the hierarchical data structure in the blank node identifier which is not compatible with the {\tt URIref} lexical constraints.} is serialized using {\tt local:} \href{https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier}{\tt IRI} defining the relative location in the structure.

The {\tt @prefix} directive is understood and implemented as specified\footnote{In fact, in Turtoise, the {\tt @prefix} directive visibility, i.e., scope includes all record items and sub-items following its declaration, but not parent data structure; several {\tt @prefix} directives can thus be defined, with different scopes. This is not compliant with the \href{https://www.w3.org/TR/turtle}{Turtle} specification and such usage is not recommended.} in the \href{https://www.w3.org/TR/turtle}{Turtle} specification, e.g., {\tt foaf:name} is expanded to the corresponding absolute \href{https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier}{\tt IRI}. The {\tt @prefix} directive itself is also serialized in order to be able to reconstruct the original data structure. We thus have a one to one correspondence between the \href{https://en.wikipedia.org/wiki/N-Triples}{N-Triples} serialization and the original data structure. 

The {\tt @base} directive is understood and implemented as specified in the \href{https://www.w3.org/TR/turtle}{Turtle} specification. If the {\tt @base} is defined, the {\tt local:} prefix is expanded to the {\tt @base} \href{https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier}{\tt IRI}. The {\tt @base} directive is global to a whole data structure, must be unique and defined at the top level, in accordance with \href{https://www.w3.org/TR/turtle}{Turtle} specification.

In compliance with the \href{https://line.gitlabpages.inria.fr/aide-group/wjson/#semantic}{wJSON} semantic, insertion order is made explicit (e.g., the node of name {\tt Bob} is referenced as someone known by the node of name {\tt Alice}). This will not changed the reasoning on the data structure contents, but allows to keep trace of both the structure and the insertion order, as in human spoken language with is intrinsically sequential.

\subsubsection*{Comparison with Turtle specification}

{\small The idea is that ``Turtoise'' is a dialect of the \href{https://www.w3.org/TR/turtle}{Turtle} semantic using the \href{https://line.gitlabpages.inria.fr/aide-group/wjson/#semantic}{wJSON} syntax with the following characteristics:
\\ - The \href{https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier}{\tt IRI} are defined as for \href{https://www.w3.org/TR/turtle/#sec-iri}{Turtle IRI} using {\tt @base} and {\tt @prefix} directives.
\\ - For this preliminary version, literal are only string without the capability to define \href{https://www.w3.org/TR/turtle/#literals}{Turtle literals}: Boolean and numeric are parsed from their string representation in compliance with \href{./Value.html}{Value} semantic, while language tag and region sub-tag can not be used. There is no obstacle to extend the specification including these features, through not very useful in the present application context.
\\ - The \href{https://www.w3.org/TR/turtle/#predicate-lists}{predicate lists}, \href{https://www.w3.org/TR/turtle/#object-lists}{object lists} and \href{https://www.w3.org/TR/turtle/#BNodes}{blank nodes} correspond to the structure of the \href{https://www.json.org/json-en.html}{JSON} language using the \href{https://line.gitlabpages.inria.fr/aide-group/wjson/#semantic}{wJSON} syntax: A record corresponds to \href{https://www.w3.org/TR/turtle/#predicate-lists}{predicate list}, an \href{https://www.w3.org/TR/turtle/#object-lists}{object list} can be defined using a \href{https://www.w3schools.com/js/js_json_arrays.asp}{JSON array}, while the hierarchical structure induces \href{https://www.w3.org/TR/turtle/#BNodes}{blank nodes} and, up to our best understanding all common usage of \href{https://www.w3.org/TR/turtle/#BNodes}{blank nodes} corresponds to such hierarchical definition. 
\\ - The \href{https://www.w3.org/TR/turtle/#collections}{Turtle collections} correspond to \href{https://www.w3schools.com/js/js_json_arrays.asp}{JSON array} but with a semantic difference: In \href{https://line.gitlabpages.inria.fr/aide-group/wjson/#semantic}{wJSON}, the underlying structure corresponds to \href{https://www.w3.org/TR/rdf11-mt/#rdf-containers}{RDF containers} (represented as a numerical indexed vector of value) and not as in Turtle to a \href{https://www.w3.org/TR/rdf11-mt/#rdf-collections}{RDF collection} (represented as a chained list of values). This seems preferable at the application level, through it would be straightforward to serialized a \href{https://www.w3schools.com/js/js_json_arrays.asp}{JSON array} as a chained list if required.}

\newpage \subsubsection*{Interface with an external reasoner}

At the implementation level, the \href{./LValue.html}{\tt LValue} interface is performed using an external software as schematized here:

\centerline{\includegraphics[width=0.9\textwidth]{./symbolic-data-interface.png}}

The T-box inference rules are not defined using the Turtoise syntax but using the \href{https://www.w3.org/TR/rdf-schema}{RDFs}, \href{https://www.w3.org/TR/owl2-overview}{OWL2} languages or using \href{https://www.w3.org/Submission/SWRL}{SWRL} rules since there is no need to represent such information as a hierarchical data structure.
\newpage {\scriptsize \bibliographystyle{apalike}\bibliography{AIDE.bib,}
\end{document}
The diff you're trying to view is too large. Only the first 1000 changed files have been loaded.
Showing with 0 additions and 0 deletions (0 / 0 diffs computed)
swh spinner

Computing file changes ...

back to top

Software Heritage — Copyright (C) 2015–2026, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API