|
||||
|
« Wikiproxy | Main | Google's Life Recorder » October 14, 2004 Role PlayingPaul Ford has posted his latest entry in his "Hacking Congress" series, correcting some mistakes made the last round. He examines why he shouldn't have used a "USSenator" tag to describe the role of a person in government. Paul also talks about using Tag URIs to identify individuals. I have also recently been dealing with roles in my processing of SEC filings. Extracting people involved in companies for 2004 took over two weeks to download. I have pulled upwards of 6,000 officer titles from over 100,000 filings. Many titles associated with a officers actually refer to more than one role, and in a myraid of different ways. I have been able to extract a lot of meaningful data from this raw text. As I get nearer to actually publishing this data in various formats (including FOAFCorp), I have been looking into creating an onology for company roles. The basic role types will include Chairman, CEO, CFO, VP, and so on, but there is a need to add additional information. Looking at the huge amount of free-form text for officer titles, I found that a person's role at a company is very often nuanced by two additional concepts: a qualifier such as "Retired" or "Former", and a domain of responsibility such as "Marketing Division" and "Human Resources". I am now working on identifying and naming instances of each of these two concepts (as well the core role types) in the raw officer title text. | TrackBackComments
Sounds interesting, looking forward to the FOAFCorp data. Did you see my "translation" of They Rule data into RDF (http://www.wasab.dk/morten/blog/archives/2004/07/06/they-rule-as-rdf)? I'm assuming there will be a lot of overlap in the people mentioned in these two data sets, how could we go about making sure we are able to realise that?
I have looked at it, as well as the various FOAFCorp links and the They Rule sql inserts. Your RDF and script should definitely help me understand FOAFCorp when it comes to publishing what I have as this is my first big RDF project. Merging our data sets should be an interesting exercise in aggregation. Those of us who are work with data about companies and corporate finance should band together and build a free aggregator! We could agree on how to match records, and could collectively feed off of and feed into the central store. I will be using the SEC CIKs as identifiers all around, which should help a lot should other data sets emerge from the SEC filings. The names of people are a fairly good identifier. I have names broken down into parts, which I will be publishing. For matching companies, the ticker symbol is obviously universal, although it changes too often for my comfort. I can extract URLs / domain names for corporations. I am also looking at company names synonyms, which should help with identifying mentions in news stories. Posted by: Jamie Pitts at October 15, 2004 2:10 PM |
whoami?
Projects:
The Art of Unix Programming
Eric Raymond Dave Beckett Tim Berners-Lee Tim Bray Dan Brickley Marc Canter Paul Ford Seth Ladd Seb Paquet Clay Shirky Roland Tanglao Dave Winer
Syndication:
Recent Entries
Categories
Archives
|
|||