swh:1:snp:7ce5f1105410d5ee1ad6abfdc873986c25b579e5
Raw File
Tip revision: b55d94055eff37df53a781acb55d46d494eb851f authored by Dirk Roorda on 24 January 2022, 19:50:11 UTC
fix in loading app from arbitrary location
Tip revision: b55d940
start.py
"""
# Start kernel and webserver

What the `text-fabric` script does is the same as:

```sh
python3 -m tf.server.start
```

During start up the following happens:

Kill previous processes
:   The system is searched for non-terminated incarnations of the processes
    it wants to start up.
    If they are encountered, they will be killed, so that they cannot prevent
    a successful start up.

Start TF kernel
:   A `tf.server.kernel` is started.
    This process loads the bulk of the TF data, so it can take a while.
    When it has loaded the data, it sends out a message that loading is done,
    which is picked up by the script.

Start TF web server
:   A short while after receiving the "data loading done" message,
    the TF web server is started.

    !!! hint "Debug mode"
        If you have passed `-d` to the `text-fabric` script,
        the **Flask** web server will be started in debug and reload mode.
        That means that if you modify `web.py` or a module it imports,
        the web server will reload itself automatically.
        When you refresh the browser you see the changes.
        If you have changed templates, the css, or the javascript,
        you should do a "refresh from origin".

Load web page
:   After a short while, the default web browser will be started
    with a url and port at which the
    web server will listen. You see your browser being started up
    and the TF page being loaded.

Wait
:   The script now waits till the web server is finished.
    You finish it by pressing Ctrl-C, and if you have used the `-d` flag,
    you have to press it twice.

Terminate the TF kernel"
:   At this point, the `text-fabric` script will terminate the TF kernel.

Clean up
:   Now all processes that have started up have been killed.

    If something went wrong in this sequence, chances are that a process keeps running.
    It will be terminated next time you call the `text-fabric`.

!!! hint "You can kill too"
    If you run

```sh
text-fabric -k
```

all tf-browser-related processes will be killed.

```sh
text-fabric -k ddd
```

will kill all such processes as far as they are for data source `ddd`.

## Implementation notes

Different corpora will use different ports for the kernel and webserver communication.

The ports are computed from the arguments with which text-fabric is called.

That is done by the [crc32](https://docs.python.org/3.7/library/zlib.html#zlib.crc32) function.
There is no guarantee that collisions occur, and that the ports computed this way are free.
So we will look for the first available port after this.

On the whole, the following things are fairly well taken care of:

*   Invocations of text-fabric with different arguments lead to different ports
*   Repeated invocations of text-fabric with the same arguments lead to the same ports.

In particular, the following invocations lead to different ports:

```sh
text-fabric banks
```

and

```
text-fabric banks:clone
```

and likewise for all other arguments.

"""

import sys
import os
from platform import system

import psutil
import webbrowser
from time import sleep
from subprocess import PIPE, Popen

from ..core.helpers import console
from ..parameters import NAME, VERSION, PROTOCOL, HOST

from .command import (
    argKill,
    argShow,
    argNoweb,
    argApp,
    getPort,
    repSlug,
)
from .kernel import TF_DONE, TF_ERROR

HELP = """
USAGE

text-fabric --help
text-fabric -v
text-fabric -k [app]
text-fabric -p [app]

text-fabric ./path/to/app --locations=locations-string [--modules=modules-string] args
text-fabric app[:specifier] args

where all args are optional and args have one of these forms:

  -noweb
  --checkout=specifier
  --mod=modules
  --set=file
  --modules=modules-string
  --locations=locations-string
  --version=version

EFFECT

If an app is given and the -k and -p flags are not passed,
a TF kernel for that app is started.
When the TF kernel is ready, a web server is started
serving a website that exposes the data through
a query interface.

The default browser will be opened, except when -noweb is passed.

Instead of a standard app that is available on https://github.com/annotation
you can also specify an app you have locally, or no app at all.


"" (empty string): no app
path-to-directory: the directory in which your app resides. This argument
must have a / inside (e.g. ./myapp).
The directory may contain zero or more of these app.py, config.yaml, static/display.css
If they are found, they will be used.

If no app is specified, TF features will be loaded according to the
--locations and --modules args.

For standard apps, the following holds:

:specifier (after the app)
--checkout=specifier

The TF app itself can be downloaded on the fly from GitHub.
The main data can be downloaded on the fly from GitHub.
The specifier indicates a point in the history from where the app should be retrieved.
  :specifier is used for the TF app code.
  --checkout=specifier is used for the main data.

Specifiers may be:
  local                 - get the data from your local text-fabric-data directory
  clone                 - get the data from your local github clone
  latest                - get the latest release
  hot                   - get the latest commit
  tag (e.g. v1.3)       - get specific release
  hash (e.g. 78a03b...) - get specific commit

No specifier or the empty string means: latest release if there is one, else latest commit.

--mod=modules

Optionally, you can pass a comma-separated list of modules.
Modules are extra sets of features on top op the chosen data source.
You specify a module by giving the github repository where it is created,
in the form

  {org}/{repo}/{path}
  {org}/{repo}/{path}:specifier

where
  {org} is the github organization,
  {repo} the name of the repository in that organization
  {path} the path to the data within that repo.
  {specifier} points to a release or commit in the history

It is assumed that the data is stored in directories under {path},
where the directories are named as the versions that exists in the main data source.

--set=file

Optionally, you can pass a file name with the definition of custom sets in it.
This must be a dictionary were the keys are names of sets, and the values
are node sets.
This dictionary will be passed to the TF kernel, which will use it when it runs
queries.

DATA LOADING

Text-Fabric looks for data in ~/text-fabric-data.
If data is not found there, it first downloads the relevant data from
github.

MISCELLANEOUS

-noweb Do not start the default browser


CLEAN UP

If you press Ctrl-C the web server is stopped, and after that the TF kernel
as well.
Normally, you do not have to do any clean up.
But if the termination is done in an irregular way, you may end up with
stray processes.

-p  Show mode. If a data source is given, the TF kernel and web server for that
    data source are shown.
    Without a data source, all local webinterface related processes are shown.
-k  Kill mode. If a data source is given, the TF kernel and web server for that
    data source are killed.
    Without a data source, all local webinterface related processes are killed.
"""

FLAGS = set(
    """
    -noweb
""".strip().split()
)

BANNER = f"This is {NAME} {VERSION}"


def filterProcess(proc):
    procName = proc.info["name"]
    commandName = "" if procName is None else procName.lower()

    kind = None
    slug = None

    trigger = "python"
    if commandName.endswith(trigger) or commandName.endswith(f"{trigger}.exe"):
        parts = [p for p in proc.cmdline() if p not in FLAGS]
        if parts:
            parts = parts[1:]
        if parts and parts[0] == "-m":
            parts = parts[1:]
        if not parts:
            return False
        (call, *args) = parts

        trigger = "text-fabric"
        if call.endswith(trigger) or call.endswith(f"{trigger}.exe"):
            if any(arg in {"-k", "-p"} for arg in args):
                return False
            slug = argApp(parts)[1]
            ports = ()
            kind = "text-fabric"
        else:
            if call == "tf.server.kernel":
                kind = "kernel"
            elif call == "tf.server.web":
                kind = "web"
            elif call.endswith("web.py"):
                kind = "web"
            else:
                return False
            (slug, *ports) = args

        return (kind, slug, *ports)
    return False


def indexProcesses():
    tfProcesses = {}
    for proc in psutil.process_iter(attrs=["pid", "name"]):
        test = filterProcess(proc)
        if test:
            (kind, pSlug, *ports) = test
            tfProcesses.setdefault(pSlug, {}).setdefault(kind, []).append(
                (proc.info["pid"], *ports)
            )
    return tfProcesses


def showProcesses(tfProcesses, slug, term=False, kill=False):
    item = ("killed" if kill else "terminated") if term else ""
    if item:
        item = f": {item}"
    myself = os.getpid()
    for (pSlug, kinds) in tfProcesses.items():
        if slug is None or (slug == pSlug):
            checkKinds = ("kernel", "web", "text-fabric")
            rSlug = repSlug(pSlug)
            for kind in checkKinds:
                pidPorts = kinds.get(kind, [])
                for pidPort in pidPorts:
                    pid = pidPort[0]
                    port = pidPort[-1] if len(pidPort) > 1 else None
                    portRep = "" if port is None else f": {port:>5}"
                    if pid == myself:
                        continue
                    processRep = f"{kind:<12} % {pid:>5}{portRep:>7}"
                    try:
                        proc = psutil.Process(pid=pid)
                        if term:
                            if kill:
                                proc.kill()
                            else:
                                proc.terminate()
                        console(f"{processRep} {rSlug}{item}")
                    except psutil.NoSuchProcess:
                        if term:
                            console(
                                f"{processRep} {rSlug}: already {item}", error=True,
                            )


def connectPort(tfProcesses, kind, pos, slug):
    pInfo = tfProcesses.get(slug, {}).get(kind, None)
    return pInfo[0][pos] if pInfo else None


def main(cargs=sys.argv):
    console(BANNER)
    if len(cargs) >= 2 and any(
        arg in {"--help", "-help", "-h", "?", "-?"} for arg in cargs[1:]
    ):
        console(HELP)
        return
    if len(cargs) >= 2 and any(
        arg == "-v" for arg in cargs[1:]
    ):
        return

    isWin = system().lower().startswith("win")
    pythonExe = "python" if isWin else "python3"

    kill = argKill(cargs)
    show = argShow(cargs)

    (appName, slug, newPortKernel, newPortWeb) = argApp(cargs)

    if appName is None and not kill and not show:
        return

    noweb = argNoweb(cargs)

    tfProcesses = indexProcesses()

    if kill or show:
        if appName is False:
            return
        showProcesses(tfProcesses, None if appName is None else slug, term=kill)
        return

    stopped = False
    portKernel = connectPort(tfProcesses, "kernel", 1, slug)
    portWeb = None

    processKernel = None
    processWeb = None

    if portKernel:
        console(f"Connecting to running kernel via {portKernel}")
    else:
        portKernel = getPort(portBase=newPortKernel)
        console(f"Starting new kernel listening on {portKernel}")
        if portKernel != newPortKernel:
            console(f"\twhich is the first free port after {newPortKernel}")
        processKernel = Popen(
            [pythonExe, "-m", "tf.server.kernel", slug, str(portKernel)],
            stdout=PIPE,
            bufsize=1,
            encoding="utf-8",
        )
        console(f"Loading data for {appName}. Please wait ...")
        for line in processKernel.stdout:
            sys.stdout.write(line)
            if line.rstrip() == TF_ERROR:
                return
            if line.rstrip() == TF_DONE:
                break
        sleep(1)
        stopped = processKernel.poll()

    if not stopped:
        portWeb = connectPort(tfProcesses, "web", 2, slug)
        if portWeb:
            console(f"Connecting to running webserver via {portWeb}")
        else:
            portWeb = getPort(portBase=newPortWeb)
            console(f"Starting new webserver listening on {portWeb}")
            if portWeb != newPortWeb:
                console(f"\twhich is the first free port after {newPortWeb}")
            processWeb = Popen(
                [
                    pythonExe,
                    "-m",
                    "tf.server.web",
                    slug,
                    str(portKernel),
                    str(portWeb),
                ],
                bufsize=0,
                encoding="utf8",
            )

    if not noweb:
        sleep(2)
        stopped = (not portWeb or (processWeb and processWeb.poll())) or (
            not portKernel or (processKernel and processKernel.poll())
        )
        if not stopped:
            console(f"Opening {appName} in browser")
            webbrowser.open(
                f"{PROTOCOL}{HOST}:{portWeb}", new=2, autoraise=True,
            )

    stopped = (not portWeb or (processWeb and processWeb.poll())) or (
        not portKernel or (processKernel and processKernel.poll())
    )
    if not stopped:
        try:
            console("Press <Ctrl+C> to stop the TF browser")
            if processKernel:
                for line in processKernel.stdout:
                    sys.stdout.write(line)
        except KeyboardInterrupt:
            console("")
            if processWeb:
                processWeb.terminate()
                console("TF web server has stopped")
            if processKernel:
                processKernel.terminate()
                console("TF kernel has stopped")


if __name__ == "__main__":
    main()
back to top