https://github.com/cran/ff
Raw File
Tip revision: 889932cdd9bf07ff065fe68b74b56dcbd0ef4c25 authored by Jens Oehlschl\xe4gel on 29 March 2012, 00:00:00 UTC
version 2.2-11
Tip revision: 889932c
ANNOUNCEMENT-2.2.txt
Dear R community,

The next release of package ff is available on CRAN. With kind help of Brian Ripley it now supports the Win64 and Sun versions of R. It has three major functional enhancements:

a) new fast in-memory sorting and ordering functions (single-threaded)
b) ff now supports on-disk sorting and ordering of ff vectors and ffdf dataframes
c) ff integer vectors now can be used as subscripts of ff vectors and ffdf dataframes

a) is achieved by careful implementation of NA-handling and exploiting context information
b) although permanently stored, sorting and ordering of ff objects can be faster than the standard routines in R
c) applying an order to ff vectors and ffdf dataframes is substantially slower than in pure R because it involves disk-access AND sorting index positions (to avoid random access). 

There is still room for improvement, however, the current status should already be useful. I run some comparisons with SAS: 
- both could sort German census size (81e6 rows) on a 3GB notebook
- ff sorts and orders faster on single columns
- sorting big multicolumn-tables is faster in SAS

Win64 binaries and version 2.2.1 supporting Sun should appear during the next days on CRAN. For the impatient: checkout from r-forge with revision 67 or higher.
Non-Windows users: please note that you need to set appropriate values for options 'ffbatchbytes' and 'ffmaxbytes' yourself.

Note that  virtual window support is deprecated now because it leads to too complex code. Let us know if you urgently need this and why.

Feedback, ideas and contributions appreciated. To those who offered code during the last months: please forgive us that integrating and documenting was not possible with this release. 


Jens & Daniel




P.S. Below are some timings in seconds at 3e6, 9e6, 27e6 and 81e6 elements from a Lenovo 410s notebook 
(3GB RAM, i5 m520, 2 real cores, 4 hyperthreaded cores, SSD drive, Windows7 32bit)

Legend for software
  ram:  new in-ram inplace operations receiving enough RAM to optimize for speed, not for memory
   ff:  new on-disk operations limiting RAM for this operation at ~500GB
    R:  timings from standard sort() and order()
  SAS:  timings from SAS 9.2 allowing for multithreaded sorting


Legend for type of random data
  rboolean:  bi-boolean with 50% FALSE and TRUE
  rlogical:  tri-boolean with 33% NA, FALSE and TRUE
    rubyte:  integers from 0..255
     rbyte:  33% NA and 67% -127..127
   rushort:  integers from 0..65535
    rshort:  33% NA and 67% -32767..32767
 ruinteger:  50% NA and 50% integers
  rinteger:  random integers
  rusingle:  50% NA and 50% singles
   rsingle:  random singles
  rudouble:  50% NA and 50% doubles
   rdouble:  doubles
   rfactor:  factor with 64 levels of length 66 (being different at bytes 65 and 66)
     rchar:  64 strings of length 66 (being different at bytes 65 and 66)


Legend for abbreviations
  OOM:  out of memory
  OOD:  out of disk
   NT:  not timed because too slow
   NA:  not available
   


Results for sorting a single column
=====================================

, , 3e6

    rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar
ram     0.02     0.03   0.02  0.04    0.02   0.02      0.17     0.11     0.66    0.36     0.66    0.36    0.03    NA
ff      0.25     0.33   0.22  0.25    0.28   0.26      0.38     0.30     1.02    0.65     0.92    0.67    0.39    NA
R         NA     0.35     NA    NA      NA     NA      0.83     0.54       NA      NA     1.28    0.90   64.83 51.20
SAS       NA       NA     NA    NA      NA     NA      1.61     1.32       NA      NA     1.57    1.29      NA 17.01

, , 9e6

    rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar
ram     0.04     0.07   0.03  0.08    0.03   0.07      0.50     0.31     1.88    0.97     1.87    0.97    0.04    NA
ff      0.72     0.93   0.61  0.73    0.84   0.75      1.08     0.86     2.68    1.62     2.57    1.67    0.78    NA
R         NA     0.90     NA    NA      NA     NA      2.84     1.78       NA      NA     3.51    2.12      NA    NT
SAS       NA       NA     NA    NA      NA     NA      4.99     3.90       NA      NA     4.91    4.48      NA 62.76

, , 27e6

    rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor  rchar
ram     0.10     0.24   0.09  0.23    0.11   0.23      1.58     1.00     6.06    3.15     6.00    3.23    0.16     NA
ff      2.19     2.98   1.92  2.21    2.56   2.31      3.22     2.68     8.49    5.18     8.10    5.35    2.58     NA
R         NA     2.72     NA    NA      NA     NA      9.69     5.80       NA      NA    12.34    6.97      NA     NT
SAS       NA       NA     NA    NA      NA     NA     17.02    12.67       NA      NA    17.05   14.07      NA 176.63

, , 81e6

    rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar
ram     0.27     0.67   0.28  0.67    0.33   0.72      5.58     3.23       NA      NA       NA      NA    0.49    NA
ff      6.56     9.06   5.93  6.88    8.52   7.15     10.70     8.54    51.35   28.98    70.20   44.13    7.91    NA
R        OOM      OOM    OOM   OOM     OOM    OOM       OOM      OOM      OOM     OOM      OOM     OOM     OOM   OOM
SAS       NA       NA     NA    NA      NA     NA     61.45    44.94       NA      NA    63.14   46.56      NA   OOD




Results for calculating the order on a single column
====================================================

, , 3e6

    rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor  rchar
ram     0.05     0.07   0.04  0.07    0.09   0.11      0.92     0.53     1.46    0.81     1.31    0.64    0.06     NA
ff      0.14     0.19   0.77  0.58    0.87   0.67      1.04     0.60     1.66    0.81     1.43    0.85    0.74     NA
R         NA     3.23     NA    NA      NA     NA      4.57     4.07       NA      NA     5.27    4.61    4.59 193.75
SAS       NA       NA     NA    NA      NA     NA      1.86     1.48       NA      NA     1.63    1.39      NA  16.83

, , 9e6

    rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar
ram     0.16     0.21   0.17  0.20    0.30   0.28      3.07     1.61     4.24    2.16     4.22    2.19    0.19    NA
ff      0.48     0.51   2.45  1.84    2.91   2.15      3.38     1.92     4.72    2.48     4.54    2.45    1.91    NA
R         NA    12.31     NA    NA      NA     NA     17.02    15.56       NA      NA    16.96   15.47      NT    NT
SAS       NA       NA     NA    NA      NA     NA      6.71     5.97       NA      NA     6.25    5.41      NA 59.27

, , 27e6

    rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor  rchar
ram     0.51     0.67    0.5  0.69    0.92   0.94      9.89     5.31    15.13    7.69    15.15    7.70    0.58     NA
ff      1.33     1.51    7.6  5.77    9.25   6.79     10.72     6.12    15.98    8.53    15.96    8.92    5.80     NA
R         NA    46.37     NA    NA      NA     NA     65.57    59.17       NA      NA    63.74   58.37      NT     NT
SAS       NA       NA     NA    NA      NA     NA     21.41    18.77       NA      NA    20.22   18.84      NA 182.74

, , 81e6

    rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar
ram     1.49     2.03    1.5  2.06    3.15   2.98     34.33    17.89       NA      NA       NA      NA    1.90    NT
ff      3.98     4.65   22.9 17.42   30.33  21.82     36.68    20.36    77.16   49.55   125.01   59.27   17.39    NT
R        OOM      OOM    OOM   OOM     OOM    OOM       OOM      OOM      OOM     OOM      OOM     OOM     OOM   OOM
SAS       NA       NA     NA    NA      NA     NA     86.24    70.32       NA      NA    84.40   68.66      NA    NA




Results for sorting all columns of a table with m columns of random double data (without NAs)
=============================================================================================


, , 3e6

ncol   1    2    5   10    20
SAS 1.65 1.83 3.71 6.90 14.06
ff  1.97 2.37 3.75 6.21 10.86
R   4.70 5.67 5.65 6.46  8.06

, , 9e6

ncol    1     2     5    10    20
SAS  5.18  6.70 14.02 19.25 41.65
ff   6.38  7.96 12.12 19.58 45.43
R   18.86 19.20 20.58   OOM   OOM

, , 27e6

ncol    1     2     5    10     20
SAS 17.79 19.52 35.03 83.30 142.09
ff  22.68 25.79 46.25 87.55 157.62
R   65.56   OOM   OOM   OOM    OOM

, , 81e6

ncol     1      2      5     10     20
SAS  64.78  83.39 143.59 242.23 408.72
ff  167.52 220.03 324.03 502.42 884.03
R      OOM    OOM    OOM    OOM    OOM


back to top