|
||||
|
February 8, 2007
Pipes Yahoo just blew my mind... with Pipes. This web app applies the Quartz Composer UI gestalt to the problem of combining feeds and filters usefully. The initial data sources include Yahoo Search, flickr photos, RSS URLs, and even Google Base. In a couple of minutes, I was able to pipe articles from We Make Money Not Art and Machine Project rss feeds to a replace function, generating a vaguely random we make flickr photo feed. I also made a very foolish way to seek alpha through flickr. November 23, 2004
Company Identity Over the weekend, my SEC filings harvester completed the second pass at downloading ownership-related filings. My hard-working script took nearly two weeks to complete the pull. This project is turning out to be a good lesson in aggregation and identification. After examining the data extracted from these filings, I found gaps in the filings for certain companies due to the re-assignment of CIK codes. For example, Berkshire Hathaway Inc (CIK:1067983) has a an earlier thread of filings under Berkshire Hathaway Inc /DE/ (CIK:109694). Thankfully, data about changes in company identity can be gathered from termination filings and "formerly known as" information in SEC Edgar search results. My third pass at downloading will now take into account changes in company identification from my data sources, including changing names, ticker symbols, and SEC CIKs. I am now thinking about how to additionally publish data about those points where separate threads of identity are joined. October 14, 2004
Role Playing Paul Ford has posted his latest entry in his "Hacking Congress" series, correcting some mistakes made the last round. He examines why he shouldn't have used a "USSenator" tag to describe the role of a person in government. Paul also talks about using Tag URIs to identify individuals. I have also recently been dealing with roles in my processing of SEC filings. Extracting people involved in companies for 2004 took over two weeks to download. I have pulled upwards of 6,000 officer titles from over 100,000 filings. Many titles associated with a officers actually refer to more than one role, and in a myraid of different ways. I have been able to extract a lot of meaningful data from this raw text. As I get nearer to actually publishing this data in various formats (including FOAFCorp), I have been looking into creating an onology for company roles. The basic role types will include Chairman, CEO, CFO, VP, and so on, but there is a need to add additional information. Looking at the huge amount of free-form text for officer titles, I found that a person's role at a company is very often nuanced by two additional concepts: a qualifier such as "Retired" or "Former", and a domain of responsibility such as "Marketing Division" and "Human Resources". I am now working on identifying and naming instances of each of these two concepts (as well the core role types) in the raw officer title text. |
Categories
AI
whoami?
Projects:
The Art of Unix Programming
Eric Raymond Dave Beckett Tim Berners-Lee Tim Bray Dan Brickley Marc Canter Paul Ford Seth Ladd Seb Paquet Clay Shirky Roland Tanglao Dave Winer
Syndication:
Recent Entries
Archives
|
|||
| Copyright © Jamie Pitts | ||||