Microsoft Research e-Sciece Workshop
From Chemical Informatics and Cyberinfrastructure Collaboratory
Meeting website: http://www.mscs06.net/main.asp. This was a very solid workshop (probably > 300 participants from the US and UK).
The talks were filmed, so you should be able to download both the slides and mpegs of the talks. Below are some notes of things that stood out.
MS e-Science Meeting Notes
Talk by Miron Livny
- Miron describes the stork/parrot/whatever system for data management. NeST is the network storage device. Why are they not looking at bittorrent?
- Marty asks: how much is this a problem now? OSG has the problem right now, as does the UW GLOW grid.
- Also, they are still file driven.
COMPASS, Gerd Heber
- Using wikis and portals. But no grids?
- Lots of groups are using wikis. Must be something to innovate here.
- Mentions problems with visualization in a browser.
- Adobe's XMP lets you embed RDF with ppt or pdf.
- They have a 3 tiered architecture, built with ASP.NET and Atlas (?).
- Using javadx for visualization.
- Using or want to use RDF.
- https://compass.tc.cornell.edu/COMPASSPortal
- Browser-based visualization is an "open problem". Their screen shots all look like X windows.
- AuthN and AuthZ: use ASP.NET has some stuff that they provide.
- Apparently they are not using Globus or related things like teragrid at all.
Talk by Marty Humphreys
- Building a web services interface to clusters.
- Trying to make this simple but extensible rather than complicated (ie compare to GGF efforts?)
- How is this different from RSL and later globus XML job descriptors.
- Why would a vendor use this? Makes it no better than any competitor if without bells and whistles.
- JSDL
- sharepoint: MS web services platform, web presence and collaboration.
- They use visual studio to build workflow, publish it to sharepoint server.
- CCS: Microsoft scheduling software for windows clusters?
- No condor scheduler? "Great question."
- How does this compare to gt4 gram? Simpler, scoped down. Globus people participate in the effort.
- Also, really what I want to do is run an application, so I need an application specific wsdl on top of this.
- They also want to do workflow, but why not use an WS-based workflow tool?
- How will they add value to scientific workflow fields? Most escience tools are "fire and forget" but share point lets you do more "user in the loop" workflows and lets you do more collaborative workflow management.
- Comment: JSDL is very similar to something done by IBM.
- How broad has the testing been? How expressive is their approach? Marty: they drew this from 80% solution use cases. Have not evaluated this stringently.
- Miron's comment: 80% rule doesn't really work here in his experience. Has ignored too many earlier approaches. Marty: we have tried to be extensible. Have tried to keep the core very simple but broad.
- Q: in the example (watershed example), what other standards were used (besides HPCProfile)? Ws-security, various other things.
Talk by Julian Bunn
- Hot Grid system: ie "hot mail". You get a certificate and a little power and work your way up. Use portals with really minimal authentication. They build this with Clarens, have an HEP and an NVO version.
NCBI talk (from Entre guy)
- Nice talk by one of the architects of Entre system.
- They have an xml format (mln.dtd) for journal articles. This allows interlinking between large formal databases. The linking is at the conceptual level/scientifific discipline.
- They are doing in Entre some interlinking of articles using text mining and statistical analysis. Also want to build a recommender system.
- They also have a way of generating online "books" out of data with some human-written glue.
- "Major new NIH effort" NCBI WGA (need to look up). Combination of human text (say, scanned pdf), data (in many formats), and data dictionary. Links the human text ("protocol") with data. Using XForms. dbGaP.
- Question: can you use ulms (unified language for medical science)? No semantics, just tag matches. "Discovery initiative" is trying to do some semantic matching. User interface problem--semantic discovery can lead to combinatorial explosion.
- Have you enabled new discoveries? Yes, entre links have been mined.
- Comment: can construct RDF triplets out of the relationships in pubmed, can use this as a basis for constructing ontologies. Response: have external apis for others to use to construct their own ontologies, avoid the academic arguments for these things.
- Limitation: 1 request / 3 seconds for accessing NCBI systems. Response: nih wants to remove this limitation. Makes data available via ftp, so you can mirror or export. But large, linked, changes daily. Must avoid denial of service attacks (intentional or not). Realize they could up the limit over night, or could do more sophisticated load balancing but this is difficult to implement practically.
Alex Szalany Talk (NVO)
- "Rule of 20 Queries" need to use this to develop your database and or web service interface (Jim Grey).
- Barabai: power laws arise in social systems when people are faced with many choices. Choices arise from other choices, social networks influence this. Leads to long tails in probabilities. 1/f power law. "This effected the stock price of Time-Warner"?
- Sensors will out number computers on line in a few years. Needs simple programming interface. This means that we need to support in our services always URLs (REST) as well as SOAP.
Talk by Amit Seth
- They are doing a lot of semantic work on the Entrez and PubChem databases. Need to look at these slides.
Talk by Mark Wilkinson
- Biomoby: a community built ontology for biology.
- Comment made that growing ontologies rather than top down approaches like caBig is the way to go. These are ontologies designed from people that have retired from biology, are brittle and will break when innovations occur.
Talk by Dan Sullivan
(Kepler workflow talk)
Talk by Furrukh Khan
- Portals with sharepoint.
- Atlas==Microsoft AJAX toolkit.
- MS Workflow Foundation: software libraries for workflow building.
- I think Apple has a similar thing.
- "Ajax bridge technology"
- Can use WPF to do C# client side graphics (similar to flash).
- I assume the downside is this uses windows server side.
- Mentions sim.eye and or-eye projects. See http://ease-team.ece.ohio-state.edu/sites/ease/default.aspx.
- Comment by Tom Oinn: functional data flow model works best for scientific workflow. Windows WF looks like a control flow model. Any comments? It is mapping nicely to the process oriented or state machine oriented systems. Tom: scientific flow charts map to data flows, not process flows.
Talk by Bertram Ludascher
- SEEK project is a kepler project.
- They have new projects for sensor grids: comet, etc. Newly funded. I googled around for this but could not find anything.
- Mentions that they want to become interoperable with Taverna.
Talk by Tony Hey
- Gives our CICC project and use of VOTables a plug.
- Mentions the meeting/conference tool that Gregor used. Jim Gray is in charge of this project at some level. Gregor should give feedback on using the tool.
- Mentions pubmed, says it needs to be decentralized.
- Multiple archiving technologies (dspace, fedora, etc) will survive and so will need bridging. No reference, but this is a Mellon funded project. Relevant to semantic scholar.
- In fact the whole talk would be good to review for the semantic scholar.
- Problem with mashups: they are not services, just client endpoints. WMS is maybe an example of a "service mashup".
- Comment: CI really needs domain scientists to do this, but the domain scientists don't get tenure for things like this.
- Comment: data archiving of everything implies several interesting problems in compression, filtering, organization, and mining.
- "Data mining, not data mine-ing"
- caBig presented as a counter example to Astronomy and the open data. Millions spent, almost no data. Tony: funding agencies like NIH must (and do) mandate that publicly funded data must be eventually made public.
- Two things Tony promised to Steve Ballmer: will need build applications for Linux and will not do anything with GPL. Other open source licenses are OK. Comment: would use sql server if it could run on Linux.
- Question: databases are not the whole thing. What about knowledge capture? A: it is a difficult problem, semantic web may help but there are many issues and hard problems to solve.
- Talk by Roger Barga
- They are using Microsoft Workflow Foundation libraries.
- XOML used to describe XML.
- They have a paper.
- This is mostly about provenance, they mention the need to reuse workflows.
- Carole Goble: will there be a community API to avoid sql or rdf queries? A: great value in that.
- Comment: the tool assumes a batch processing model. You need to also support interactivity with the user. Problem is also that you can't jump out of the workflow enabled tool and then back in.
