Infrequently Noted

Alex Russell on browsers, standards, and the process of progress.

Processing

As usual, Glen is ahead of everyone on this, but yesterday I got an overview of the Processing environment from the guys who wrote it, and it's so freaking HAWT. It runs on the JVM, but it fixes most of the annoying issues of "where's that jar file?" and "why do I need a class for the main loop?" by constraining the environment to a specific problem domain. And the results are both literally and figuratively beautiful. It feels to me like the kind of think that the Chumby guys should have been using instead of Flash.

Absorbing

In the course of life, there are some moments where you are just so damned thankful to be alive that you almost feel guilty for being in your own shoes. Foo Camp was one of those times.

I'm still exhausted from the whole thing, and my brain is full. Entirely full. It's going to take some time to digest all the great stuff I learned, but a couple of things stood out such that I'm terrified of forgetting any of them. First, Avi Bryant's discussion about "pipes for the web" was a discussion of how we build the small, chainable pieces for the current and next generation of things that we're all hacking on. The discussion of feeds as a generic transport type between processing systems (i.e., the Unix pipe) was amazing. With that back-to-back with Tom Coates' talk on "Dirty Semantics", I got the feeling that we're finally organizing an answer to all the things that have bugged me about the semantic web vision of the future. By acknowledging that the web is dirty, and that it's OK, Tom presented a vision of the kinds of apps that I work on that doesn't have an undercurrent of academic condescension about how you should be doing things. It buckets things into "better because the market will say so" and "worse, because the market will ignore it", and those are the kinds of quality metrics I can get behind.

I also got to meet Ed Loper, the guy who did Epydoc, and we got into a discussion about how computational linguistics and machine translation people can fix the problem of having artificial test sets that cause algorithm mutation towards solutions that might not actually be desirable in the real world. What if, instead of some test suite that has a non-human testing the various algorithms, the system were a front-end to bablefish that would allow researchers to submit a web-services call into a queue of potential translators? The system would shunt off some percentage of the overall traffic to each registered system and collect "good" or "bad" rankings (the UI is tricky here) for various translations. By using the scope of the system to test quality and then to perhaps create a leader-board so research teams can compete, it would allow translation research teams to both provide results to sponsors that are trustable enough to fund ongoing work and, eventually, to provide data to support adoption of the resulting systems either through Open Source or commercialization.

Thanks to Foo, I've got a hundred other things rattling around my head right now, and the worst bit of it is that there were so many people that I wanted to meet and things I wanted to see but couldn't. I've never experienced that depth and breadth of experience in one place before. Yesterday morning, I woke up at 9:30 after having gone to sleep at about 4:30 and I was kicking myself for having not been up at 7 because I could have been talking to people instead of sleeping.

It was that awesome.

CRM 114 on OS X

A quick note to my future self on getting CRM 114 to build and install on OS X.

First, download the latest tarball to a suitable location (/tmp will do). Explode the tarball and cd into the TRE library directory inside of it, currently tre-0.7.4. Next, run:

sudo ./configure --enable-static && make && make install

Once TRE is installed, run man agrep and marvel at the wonder that is agrep. Holy crap is that cool.

Next, edit the main CRM 114 Makefile. Comment out the line in the that reads:

LDFLAGS += -static

On OS X, dynamic library lookup is preferred and I wasn't able to get static linking working anyway. Next, uncomment these lines:

CFLAGS += -I/usr/local/include
LDFLAGS += -L/usr/local/lib

But make sure that this line is still commented out:

#LIBS += -litnl -liconv

Otherwise you'll be on a wild goose chase to find a package that includes a dynamic library for GNU gettext. Luckily, the entropy.ch php packages have such a beast, but to avoid more build path mucking than is absolutely necessaray, just make sure that -lintl isn't in your GCC calls.

The last change is to modify the line that reads:

-lm -ltre -o crm114_tre

to omit the "-lm" flag. It should then read simply:

-ltre -o crm114_tre

At this point, it's safe to build with:

sudo make clean && make && make install

Huzzah!

Update: A couple of final snags, aside from the various setup bits and bobs that aren't automated). In order to actually process my spam/ham folders, it was necessary to patch crm114_config.h and rebuild. The substitution was:

//   default size of the data window: 8 megabytes.
// #define DEFAULT_DATA_WINDOW  8388608
#define DEFAULT_DATA_WINDOW 16777216

which ups the processing window for messages significantly. Also, it was necessaray, as per the comment in the file, to split up the first line of mailreaver.crm into 2 lines, like this:

#!/usr/bin/crm
#    -( spam good cache dontstore stats_only outbound undo verbose maxprio minprio delprio)

Older Posts

Newer Posts