Semantic Wave Blog
News feeds and commentary by Jamie Pitts

« Wikiproxy | Main | Google's Life Recorder »

October 14, 2004

Role Playing

Paul Ford has posted his latest entry in his "Hacking Congress" series, correcting some mistakes made the last round. He examines why he shouldn't have used a "USSenator" tag to describe the role of a person in government. Paul also talks about using Tag URIs to identify individuals.

I have also recently been dealing with roles in my processing of SEC filings. Extracting people involved in companies for 2004 took over two weeks to download. I have pulled upwards of 6,000 officer titles from over 100,000 filings. Many titles associated with a officers actually refer to more than one role, and in a myraid of different ways. I have been able to extract a lot of meaningful data from this raw text.

As I get nearer to actually publishing this data in various formats (including FOAFCorp), I have been looking into creating an onology for company roles. The basic role types will include Chairman, CEO, CFO, VP, and so on, but there is a need to add additional information.

Looking at the huge amount of free-form text for officer titles, I found that a person's role at a company is very often nuanced by two additional concepts: a qualifier such as "Retired" or "Former", and a domain of responsibility such as "Marketing Division" and "Human Resources". I am now working on identifying and naming instances of each of these two concepts (as well the core role types) in the raw officer title text.

| TrackBack
Comments


Sounds interesting, looking forward to the FOAFCorp data.

Did you see my "translation" of They Rule data into RDF (http://www.wasab.dk/morten/blog/archives/2004/07/06/they-rule-as-rdf)?

I'm assuming there will be a lot of overlap in the people mentioned in these two data sets, how could we go about making sure we are able to realise that?

Posted by: Morten Frederiksen at October 15, 2004 3:34 AM


I have looked at it, as well as the various FOAFCorp links and the They Rule sql inserts. Your RDF and script should definitely help me understand FOAFCorp when it comes to publishing what I have as this is my first big RDF project.

Merging our data sets should be an interesting exercise in aggregation. Those of us who are work with data about companies and corporate finance should band together and build a free aggregator! We could agree on how to match records, and could collectively feed off of and feed into the central store.

I will be using the SEC CIKs as identifiers all around, which should help a lot should other data sets emerge from the SEC filings. The names of people are a fairly good identifier. I have names broken down into parts, which I will be publishing.

For matching companies, the ticker symbol is obviously universal, although it changes too often for my comfort. I can extract URLs / domain names for corporations. I am also looking at company names synonyms, which should help with identifying mentions in news stories.

Posted by: Jamie Pitts at October 15, 2004 2:10 PM

Small picture of Jamie Pitts When I talk about the semantic web, I feel a lot like Linus. No, not Linus Torvalds. I meant the other one. - JP


whoami?

Projects:
  Winnow My Bloglines Down
  Memecat
  Listgasm


Curently Reading

cover The Art of Unix Programming
Eric Raymond

Semantic People
Danny Ayers
Dave Beckett
Tim Berners-Lee
Tim Bray
Dan Brickley
Marc Canter
Paul Ford
Seth Ladd
Seb Paquet
Clay Shirky
Roland Tanglao
Dave Winer

Syndication:
 RSS Version 1.0
 RSS Version 0.91


Recent Entries
 Hashtags
 Harold's OpenSocial Exploit
 Getting a Handle on OpenSocial Gadgets
 The Future of Software Development
 SixApart: Opening the Social Graph

Categories
 AI
 Blogs
 Business
 Data Munging
 Development
 Formats
 How-To
 Ideas
 Languages
 Law
 Ontologies
 OWL
 People
 Products
 Projects
 QOTD
 RDF
 Research
 Social Software
 SRM
 Standards
 Thinking Out Loud
 Trends
 Twitter
 Visualization
 W3C
 Web Services
 Wikis

Archives
 January 2008
 November 2007
 October 2007
 September 2007
 August 2007
 June 2007
 May 2007
 April 2007
 March 2007
 February 2007
 January 2007
 December 2006
 November 2006
 October 2006
 September 2006
 August 2006
 July 2006
 May 2006
 April 2006
 March 2006
 February 2006
 January 2006
 November 2005
 October 2005
 September 2005
 August 2005
 June 2005
 May 2005
 April 2005
 March 2005
 January 2005
 December 2004
 November 2004
 October 2004
 September 2004
 August 2004
 July 2004
 June 2004
 May 2004
 April 2004
 March 2004


Creative Commons License
This weblog is licensed under a Creative Commons License.

Powered by Movable Type