https://github.com/andrefs/derzis

sort by:
Revision Author Date Message Commit Date
b735226 Extract jobs from Manager to CurrentJobs class 30 May 2021, 17:36:12 UTC
7a9108a Update readme 29 May 2021, 18:42:09 UTC
a7b92b9 Add README 29 May 2021, 18:41:07 UTC
42a8caa Cancel domain crawl when it times out Manager emits event, ManagerPubSub sends it to Redis. WorkerPubSub add it to a list of canceled jobs, and Worker stops iterating over the resources. 29 May 2021, 18:18:24 UTC
3c87d64 Fix bug in job timeout, canceling and postponing 28 May 2021, 20:11:37 UTC
4ef96d6 minor 27 May 2021, 20:05:59 UTC
a7c4476 Read seeds from data/seeds.txt file 27 May 2021, 19:44:42 UTC
d870039 Worker ids are now UUID instead of PID 27 May 2021, 19:43:43 UTC
7c570f6 move stuff around 27 May 2021, 11:05:18 UTC
6907b00 Docker! 25 May 2021, 21:12:20 UTC
095e026 Fix stuff 25 May 2021, 21:11:57 UTC
a89642b Preparing for docker 25 May 2021, 19:52:14 UTC
1bc0a32 minor 20 May 2021, 12:10:00 UTC
e6e8690 Remove debug prints 20 May 2021, 12:09:39 UTC
93138bb Fix bug removing www from URLs `mongoose-type-url` uses `normalize-url`, which by default removes some stuff. replaced with custom type using String and a validator function which uses `try { new URL(url) }` 20 May 2021, 12:04:02 UTC
9743031 Add url validation library. Add tests 20 May 2021, 11:51:19 UTC
50398f5 Start fixing crawling with wikidata seeds 20 May 2021, 09:46:37 UTC
0437b8a Fix path in worker-pool script 20 May 2021, 09:38:36 UTC
cbdba03 Manager and worker starting ok 20 May 2021, 09:17:40 UTC
b6d8367 Fixing worker 19 May 2021, 19:47:35 UTC
3050a8d split package.json 19 May 2021, 19:28:50 UTC
b1ab942 more 19 May 2021, 18:56:46 UTC
696cce2 move stuff into separate folders 19 May 2021, 18:49:04 UTC
2faa3be Register and deregister jobs in Manager 19 May 2021, 18:43:10 UTC
6c1ad0d Fix bug in cheerio css selector 19 May 2021, 18:41:21 UTC
2943b46 Make Manager a es6 class 18 May 2021, 12:55:52 UTC
2daa6f4 Remove debug prints 18 May 2021, 12:26:41 UTC
f832984 ups db 13 May 2021, 20:41:35 UTC
db4a20a Mongodb connection URI 13 May 2021, 20:36:25 UTC
9e6a569 maybe fixed pathHeads and headCount 08 May 2021, 00:50:23 UTC
ba568a5 pathHeads 07 May 2021, 19:52:06 UTC
9e0a511 A LOT OF STUFF 06 May 2021, 22:17:29 UTC
821ffa3 Stuff 01 May 2021, 19:59:46 UTC
65521d7 Mark paths as active/finished/disabled 16 April 2021, 12:24:30 UTC
047db63 LCB 15 April 2021, 16:51:12 UTC
6118288 Building paths seems ok 15 April 2021, 15:50:53 UTC
23f00f0 Inserting triples 14 April 2021, 23:02:28 UTC
3212407 Crawling domains 13 April 2021, 22:55:22 UTC
b4e0e53 kickoff: init function working 13 April 2021, 16:50:22 UTC
4e959b9 Stuff 05 April 2021, 10:43:33 UTC
3bc35e4 Simplify delay function 11 March 2021, 18:14:46 UTC
0337eaa Dump triples 11 March 2021, 11:35:59 UTC
4e477d6 Dump triples 11 March 2021, 11:32:42 UTC
1024f3f Calc request interval histogram and more stuff 10 March 2021, 20:13:28 UTC
945147d Parse <link> tags in HTML files 09 March 2021, 15:58:34 UTC
0e35325 Add triples from crawlDomain 04 March 2021, 11:29:54 UTC
134329a Validator for URLs in models 02 March 2021, 18:09:33 UTC
eaa7b92 Split pubsub methods from Manager 02 March 2021, 16:26:59 UTC
7fd9eeb Split pubsub code from Worker to WorkerPubSub 02 March 2021, 12:44:06 UTC
047dcd7 Model.upsertMany for Resources and Domains 02 March 2021, 11:25:09 UTC
f8f976a Log axios with winston 01 March 2021, 17:42:23 UTC
a15b899 Improve logging 01 March 2021, 12:23:08 UTC
b660494 domain crawl implemented in worker 26 February 2021, 17:56:08 UTC
e106cc2 Db config 26 February 2021, 17:54:53 UTC
19d9f6a domainCheck working 25 February 2021, 19:44:52 UTC
5c824aa assignJobs published 1 job at a time 23 February 2021, 18:17:02 UTC
182db16 Stuff 23 February 2021, 17:37:23 UTC
35690fe Script for clearing db 23 February 2021, 17:36:58 UTC
56a3c91 Script for adding project 23 February 2021, 17:36:36 UTC
13e935e Add bluebird 23 February 2021, 17:36:27 UTC
0e7c43e Make worker pool kill workers on exit 23 February 2021, 17:35:48 UTC
68bca0e Add project, checkDomain 18 February 2021, 09:44:02 UTC
eaee1cc Save domain check 16 February 2021, 23:20:22 UTC
4e62484 Domain check 16 February 2021, 18:04:41 UTC
447fb3b stuff 16 February 2021, 15:08:32 UTC
af75008 stuff 15 February 2021, 17:05:21 UTC
58a994f Resource.getNext to pop next resource to crawl 10 February 2021, 22:16:07 UTC
0cea238 kickoff 10 February 2021, 21:27:56 UTC
back to top