Blog posts for September 2011

Scrum in 90 seconds

Scrum in 90 Seconds, an introduction

Scrum is a loosely regimented software engineering methodology that maximizes developers' effort toward feature development without dwelling on how the work gets accomplished.  It is a structured agile methodology that leaves the developers to self-organize and leverage their talent and individual motivation to achieve a common goal.

Scrum runs in 30 to 90 day cycles called Sprints.  Software is delivered in partially-functional stages that build upon the previous stage.  Each Sprint has a set of feature goals defined by the Product Owner.

Team Members are Developers that  self-organize into teams and work on the features for the current sprint.  Features are defined in terms of User Stories (AKA Use Cases), Interface Requirements or Functional Requirements.

Quality Assurance and other interested parties may inspect the work product at each sprint milestone and provide feedback and new feature direction to the Product Owner or Scrum Master.

The Scrum Master is like a project manager and handles staff issues, insulates the development team from outside politics and other distractions and collaborates with the Product Owner to define the set of goals for the next stage/sprint.  The Scrum Master facilitates, collaborates and keeps status on the team's progress toward goals.

The Product Owner is responsible for defining the product features and determines whether the goals for each sprint have been met.  The Product Owner work closely with the Scrum Master to define the goals (in terms of features) for the next sprint and they collaborate with the Business Analysts and others to incorporate feedback and new feature direction.

Scrum leaves the developer Team Members to leverage their talent and individual self-motivation to find their way.  Of course there is always room for the more experienced to mentor other team members and help determine the route as the project moves forward.

Scrum enables innovation since it allows for adjustments and decisions on the fly.  It allows the team to explore scenic routes and detours that usually add significant value to the project outcome.

Scrum Ceremonies

To make quality software engineering a repeatable process, we conduct recurring meetings to form good work habits, identify project risk and communicate progress.

Daily Meetings should last only fifteen minutes.  Team members report their status in terms of 1) recent accomplishments  2) next steps  and  3) challenges encountered or foreseen.  No project planning or problem solving is allowed at the Daily Meeting.

Sprint Planning meetings codify innovative ideas for potential future implementation.  The Product Owner, in conjunction with other stakeholders or Business Analysts, defines the broad goals and business requirements for the project in an Outline Format.  The Outline can then be refined and fleshed-out to define various features in terms of a User Story (Use Case).

The set of defined features constitute a Project Backlog that can be recorded on a Wiki or Service Request ticketing system, or in a Spreadsheet.  The Product Owner and Scrum Master, along with the rest of the Team can prioritize and sequence the features by determining where in the sequence of cycles/stages/ sprints they belong.

Sprint Review meetings serve to implement a feedback loop in the Scrum process.  All agile methodologies generally highlight the importance of continuous, incremental improvement.  This is how value is consistently delivered.

Scrum Reports

At any time, the Project Backlog report can be created from a query of the system/spreadsheet used to store the full history of features or User Stories (both incomplete and completed) for the overall project.

For any given Sprint, a Sprint Backlog report would show the User Stories related to the given work cycle.  

A Change Report may be represented as the difference between the Project and Sprint backlogs.

A Burndown Report has time on the x-axis and feature count on the y-axis.  It is a type of trending report and ideally illustrates a decreasing number of outstanding User Stories as time marches on.

ORA 28001 the password has expired

Did you forget to turn off password expiration in your multi-tier Oracle-based web application?

One of the new features in Oracle 11g is that, by default, password expiration is now turned on.  Passwords must be changed periodically or your application ID for Oracle, and thus your web application, may fail to work or fail when running one day.

For Hibernate O/R persistence, you'll see the XWikiHibernateMigrationManager class will fail to initialize and throw an Exception.  The Hibernate configuration file may be blamed when a connection pool can't be created.

Cannot load class com.xpn.xwiki.store.migration.hibernate.XWikiHibernateMigrationManager from param xwiki.store.migration.manager.class
Wrapped Exception: Error number 0 in 3: Exception while hibernate execute
Wrapped Exception: Could not create a DBCP pool. There is an error in the hibernate configuration file, please review it.

An example from my XWiki (I love playing with Groovy) installation:

com.xpn.xwiki.XWikiException: Error number 3001 in 3: Cannot load class com.xpn.xwiki.store.migration.hibernate.XWikiHibernateMigrationManager from param xwiki.store.migration.manager.class Wrapped Exception: Error number 0 in 3: Exception while hibernate execute Wrapped Exception: Could not create a DBCP pool. There is an error in the hibernate configuration file, please review it. at com.xpn.xwiki.XWiki.createClassFromConfig(XWiki.java:1156) at com.xpn.xwiki.XWiki.initXWiki(XWiki.java:828) at com.xpn.xwiki.XWiki.<init>(XWiki.java:771) at com.xpn.xwiki.XWiki.getMainXWiki(XWiki.java:398) at com.xpn.xwiki.XWiki.getXWiki(XWiki.java:486) at com.xpn.xwiki.web.XWikiAction.execute(XWikiAction.java:137) at com.xpn.xwiki.web.XWikiAction.execute(XWikiAction.java:117) at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:431) at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:236) at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1196) at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:414) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at com.xpn.xwiki.web.ActionFilter.doFilter(ActionFilter.java:129) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at com.xpn.xwiki.wysiwyg.server.filter.ConversionFilter.doFilter(ConversionFilter.java:152) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at com.xpn.xwiki.plugin.webdav.XWikiDavFilter.doFilter(XWikiDavFilter.java:68) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.xwiki.container.servlet.filters.internal.SavedRequestRestorerFilter.doFilter(SavedRequestRestorerFilter.java:218) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.xwiki.container.servlet.filters.internal.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:112) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:536) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:915) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:405) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Wrapped Exception: com.xpn.xwiki.XWikiException: Error number 0 in 3: Exception while hibernate execute Wrapped Exception: Could not create a DBCP pool. There is an error in the hibernate configuration file, please review it. at com.xpn.xwiki.store.XWikiHibernateBaseStore.execute(XWikiHibernateBaseStore.java:1069) at com.xpn.xwiki.store.XWikiHibernateBaseStore.executeRead(XWikiHibernateBaseStore.java:1099) at com.xpn.xwiki.store.migration.hibernate.XWikiHibernateMigrationManager.getDBVersion(XWikiHibernateMigrationManager.java:68) at com.xpn.xwiki.store.migration.AbstractXWikiMigrationManager.<init>(AbstractXWikiMigrationManager.java:68) at com.xpn.xwiki.store.migration.hibernate.XWikiHibernateMigrationManager.<init>(XWikiHibernateMigrationManager.java:51) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.xpn.xwiki.XWiki.createClassFromConfig(XWiki.java:1148) at com.xpn.xwiki.XWiki.initXWiki(XWiki.java:828) at com.xpn.xwiki.XWiki.<init>(XWiki.java:771) at com.xpn.xwiki.XWiki.getMainXWiki(XWiki.java:398) at com.xpn.xwiki.XWiki.getXWiki(XWiki.java:486) at com.xpn.xwiki.web.XWikiAction.execute(XWikiAction.java:137) at com.xpn.xwiki.web.XWikiAction.execute(XWikiAction.java:117) at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:431) at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:236) at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1196) at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:414) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at com.xpn.xwiki.web.ActionFilter.doFilter(ActionFilter.java:129) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at com.xpn.xwiki.wysiwyg.server.filter.ConversionFilter.doFilter(ConversionFilter.java:152) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at com.xpn.xwiki.plugin.webdav.XWikiDavFilter.doFilter(XWikiDavFilter.java:68) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.xwiki.container.servlet.filters.internal.SavedRequestRestorerFilter.doFilter(SavedRequestRestorerFilter.java:218) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.xwiki.container.servlet.filters.internal.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:112) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:536) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:915) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:405) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Wrapped Exception: org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (ORA-28001: the password has expired ) at org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1549) at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1388) at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044) at com.xpn.xwiki.store.DBCPConnectionProvider.configure(DBCPConnectionProvider.java:193) at org.hibernate.connection.ConnectionProviderFactory.newConnectionProvider(ConnectionProviderFactory.java:124) at org.hibernate.connection.ConnectionProviderFactory.newConnectionProvider(ConnectionProviderFactory.java:56) at org.hibernate.cfg.SettingsFactory.createConnectionProvider(SettingsFactory.java:414) at org.hibernate.cfg.SettingsFactory.buildSettings(SettingsFactory.java:62) at org.hibernate.cfg.Configuration.buildSettings(Configuration.java:2073) at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1298) at com.xpn.xwiki.store.XWikiHibernateBaseStore.initHibernate(XWikiHibernateBaseStore.java:166) at com.xpn.xwiki.store.XWikiHibernateBaseStore.checkHibernate(XWikiHibernateBaseStore.java:561) at com.xpn.xwiki.store.XWikiHibernateBaseStore.execute(XWikiHibernateBaseStore.java:1055) at com.xpn.xwiki.store.XWikiHibernateBaseStore.executeRead(XWikiHibernateBaseStore.java:1099) at com.xpn.xwiki.store.migration.hibernate.XWikiHibernateMigrationManager.getDBVersion(XWikiHibernateMigrationManager.java:68) at com.xpn.xwiki.store.migration.AbstractXWikiMigrationManager.<init>(AbstractXWikiMigrationManager.java:68) at com.xpn.xwiki.store.migration.hibernate.XWikiHibernateMigrationManager.<init>(XWikiHibernateMigrationManager.java:51) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.xpn.xwiki.XWiki.createClassFromConfig(XWiki.java:1148) at com.xpn.xwiki.XWiki.initXWiki(XWiki.java:828) at com.xpn.xwiki.XWiki.<init>(XWiki.java:771) at com.xpn.xwiki.XWiki.getMainXWiki(XWiki.java:398) at com.xpn.xwiki.XWiki.getXWiki(XWiki.java:486) at com.xpn.xwiki.web.XWikiAction.execute(XWikiAction.java:137) at com.xpn.xwiki.web.XWikiAction.execute(XWikiAction.java:117) at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:431) at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:236) at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1196) at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:414) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at com.xpn.xwiki.web.ActionFilter.doFilter(ActionFilter.java:129) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at com.xpn.xwiki.wysiwyg.server.filter.ConversionFilter.doFilter(ConversionFilter.java:152) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at com.xpn.xwiki.plugin.webdav.XWikiDavFilter.doFilter(XWikiDavFilter.java:68) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.xwiki.container.servlet.filters.internal.SavedRequestRestorerFilter.doFilter(SavedRequestRestorerFilter.java:218) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.xwiki.container.servlet.filters.internal.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:112) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:536) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:915) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:405) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.sql.SQLException: ORA-28001: the password has expired at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70) at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:406) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399) at oracle.jdbc.driver.T4CTTIoauthenticate.receiveOauth(T4CTTIoauthenticate.java:794) at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:391) at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:490) at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:202) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:33) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:465) at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38) at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582) at org.apache.commons.dbcp.BasicDataSource.validateConnectionFactory(BasicDataSource.java:1556) at org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1545) ... 61 more

If you you scrolled down this far I'll reward you with the solution.  Simply change the default profile to allow unlimited password.  As the SYS user from sqlplus:

ALTER PROFILE DEFAULT LIMIT
  PASSWORD_LIFE_TIME UNLIMITED;

Cassandra in 90 seconds

Big Data made easy with Cassandra

A friend of mine at Datastax introduced me to their new Brisk 1.0 toolkit for rapid (think 90 seconds rapid) deployment of a working Cassandra + Hadoop cluster for Big Data applications.

The release is called Brisk 1.0 and it lets a neophyte like me run CQL statement (a SQL-like grammar) just a few minutes later.  CQL is a grammar for expressing SQL-like data movement.

I am going to assume the reader has little familiarity with the concept of Hadoop data node clusters and MapReduce programming architectures but is interested in how a resilient, low-cost Data Warehouse or Operational Data Store (ODS) can be implemented using Cassandra.  I have embedded some links I found helpful.  I hope you enjoy your first Hadoop+Cassandra application experience!

Business Intelligence using Hadoop

The process of analyzing Big Data is typically referred to as Business Intelligence (BI), Data Warehousing, Online Analytical Processing (OLAP), Decision Support System (DSS) or Extract, Transform and Load (ETL).  They all use SQL queries at their core.

Hadoop, Hive, HBase and Cassandra architectures distribute the data alongside the processing power in a grid/cloud of Hadoop data nodes.  They leverage a non-relational approach to database schema definition.  For queries and updates, Cassandra implements a SQL-like grammar called CQL.  Hive and HBase utilize HiveQL.

NoSQL is non-relational and may even be schema-less

If you have been thinking about using a NOSQL (non-relational) database for your next multi-tier web application, this brisk tutorial may be a good way for you to evaluate the distributed, fault-tolerant features of Cassandra.

There are many aspects of these non-ACID database architectures that I won't pretend to understand (yet).  Consistency, Availability, Partitioning and Replication are among them.  Cassandra features a neat Tunable Consistency approach that cleanly solves the data consistency issue associated with multi-master distributed databases.

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

One thing I have learned is data models have to be designed up-front and free from change before deploying new applications to production.  This is because it is difficult to make changes once the data has been deployed (laid out) across your cluster nodes' storage.  Data access patterns should be fixed during the design phase since it may be difficult to accommodate different data access patterns down the road.

Cassandra Concepts

Cassandra relies on a non-relational model for storing information.  For each uniquely identifiable record, the Cassandra schema provides a multidimensional Map (think Java HashMap) of Keys and Values.  It has roots in Google BigTable and Amazon Dynamo but distinguishes itself from both ancestors.

Cassandra doesn't have tables like RDBMS, but one can model the traditional rows and columns table schema to Cassandra.  For a given row, think of a record identified by its primary key, perhaps ordered by that primary key.  Each record has a Map where the Map Keys are like columns names and the Map Values may be Objects of any pre-determined type.  Map Values are implemented as byte arrays and can thus store any serializable Object.

This models a one table schema, with column access using the Cassandra Map's Keys, but what about the typical multiple table join use case?  In the RDBMS world this is characterized by a schema with two or more tables related by foreign keys.  The next concept is that Cassandra schema are based on multi-dimensional or nested Maps.

Take the one table schema we defined above, but let any Value be a nested Map with its own Keys and Values, like another table related by a foreign key.  Cassandra actually defines a tuple with the parent primary key value, child Map Key and child Map Value to keep track of this relationship.

Any given Map Value may point to a another Map in a nested fashion.  The nesting can effectively be 4 or 5 level deep so the nested Map data structure should be more than sufficient for modeling most real world schema, in a fully denormalized fashion.  Cassandra defines relationships using the SuperColumn and Column grammar elements. 

Physically, Cassandra will store that related/nested table data in close proximity to the parent/driving table.  I would posit this is analogous to a physical relational schema realized in a denormalized manner.  I'm learning this as I go.

http://schabby.de/cassandra-getting-started/
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
http://www.datastax.com/faq
http://wiki.apache.org/cassandra/HadoopSupport

Hadoop Concepts

MapReduce is the idea of using multiple workers or processors to organize themselves in a Graph of nodes, for example in a binary tree, then starting with the leaf nodes iteratively Map to lookup data (say a String) by an index, and then pass the intermediate-results up the tree to a Reduce node that aggregates or summarizes the intermediate results before performing another Map operation.  This continues up to the root node of the Graph.  Hadoop is a framework and concrete implementation that provides these nodes with local storage in the Hadoop File System (HDFS).

http://en.wikipedia.org/wiki/Hadoop
http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/InputSplit.html
http://hadoop.apache.org/common/docs/current/hdfs_design.html
http://www.michael-noll.com/tutorials/

CQL and HiveQL

HiveQL is the SQL-like grammar associated with HBase and Hive.  I think you could characterize it as a way to query table-oriented data stored in the Hadoop File System (HDFS).  HBase is backwards-compatible with Hive.

For data access and update, Cassandra introducesCQL for its SQL-like dialect.  I saw CQL referred to as:
Thrift + Avro = CQL
Cassandra and CQL improve on HiveDB since they define and declare the Tunable Consistency properties for Cassandra-controlled data.  Datastax uses a replacement filesystem for HDFS called CassandraFS.  CassandraFS addresses a key challenge of data consistency compared to the HDFS approach yet they retain fundamental compliance with the HDFS API for maximum compatibility.  

Cassandra also features a true peer-to-peer file system name space resolution architecture with no single point of failure.  The HDFS Namenode typically provides a file path (/directory/folder/filename) lookup mechanism analogous to the UNIX filesystem directory entry and inode pattern.

http://en.wikipedia.org/wiki/Apache_Hive
http://en.wikipedia.org/wiki/HBase
http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/

Brisk 1.0 beta2

The following comes directly from the Brisk release notes.
Brisk is an open-source Hadoop and Hive distribution developed by DataStax that
utilizes Apache Cassandra for its core services and storage. Brisk provides
Hadoop MapReduce capabilities using CassandraFS, an HDFS-compatible storage
layer inside Cassandra. By replacing HDFS with CassandraFS, users are able to
leverage their current MapReduce jobs on Cassandra’s peer-to-peer,
fault-tolerant, and scalable architecture. Brisk is also able to support dual
workloads, allowing you to use the same cluster of machines for both real-time
applications and data analytics without having to move the data around between
systems.

Brisk is comprised of the following components. For component-specific information, refer to their respective release notes and documentation.
• Apache Hadoop 0.20.203.0 + (HADOOP-7172, HADOOP-5759, HADOOP-7255)
• Cassandra 0.8.1
• Apache Hive 0.7
• Apache Pig 0.8.3

http://www.datastax.com/docs/0.8/brisk/about_brisk

Cassandra in 90 seconds

First download and extract the Brisk 1.0 binary package to your Linux environment.  It can be downloaded from the following URL:

http://www.datastax.com/docs/0.8/brisk/install_brisk_packages

Second, verify JAVA_HOME is pointing to your 1.6 JRE and java is in your PATH.

Third, reference the Brisk Portfolio Manager demo documentation and run these commands from the Linux bash command line to:

  • start the cluster
  • visit the Hadoop job console
  • load historical stock quote data
  • start Jetty with the sample portfolio viewer web app
  • run the SQL-like HiveQL to conduct the "demo ETL".  This populates new tables with summarized results.

bin/brisk cassandra -t cd demos/portfolio_manager ./bin/pricer -o INSERT_PRICES ./bin/pricer -o UPDATE_PORTFOLIOS ./bin/pricer -o INSERT_HISTORICAL_PRICES -n 100 cd website java -jar start.jar & cd ../../.. ./bin/brisk hive -f demos/portfolio_manager/10_day_loss.q

Here is some example CQL that implements a sample Business Intelligence/ETL use case.  It is found in 10_day_loss.q in the Brisk Portfolio Manager demo.  To enter CQL (or HiveQL if you prefer), run hive interactively using "brisk hive" from the command line.

--Access the data in cassandra
DROP TABLE IF EXISTS Portfolios;
create external table Portfolios(row_key string, column_name string, value string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key,:column,:value",
  "cassandra.ks.name" = "PortfolioDemo",
  "cassandra.ks.repfactor" = "1",
  "cassandra.ks.strategy" = "org.apache.cassandra.locator.SimpleStrategy",
  "cassandra.cf.name" = "Portfolios" ,
  "cassandra.host" = "127.0.0.1" ,
  "cassandra.port" = "9160",
  "cassandra.partitioner" = "org.apache.cassandra.dht.RandomPartitioner")
TBLPROPERTIES (
  "cassandra.input.split.size" = "64000",
  "cassandra.range.size" = "1000",
  "cassandra.slice.predicate.size" = "1000");

DROP TABLE IF EXISTS StockHist;
create external table StockHist(row_key string, column_name string, value string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ("cassandra.ks.name" = "PortfolioDemo");

--first calculate returns
DROP TABLE IF EXISTS 10dayreturns;
CREATE TABLE 10dayreturns(ticker string, rdate string, return double)
STORED AS SEQUENCEFILE;

INSERT OVERWRITE TABLE 10dayreturns
select a.row_key ticker, b.column_name rdate, (cast(b.value as DOUBLE) - cast(a.value as DOUBLE)) ret
from StockHist a JOIN StockHist b on
(a.row_key = b.row_key AND date_add(a.column_name,10) = b.column_name);


--CALCULATE PORTFOLIO RETURNS
DROP TABLE IF EXISTS portfolio_returns;
CREATE TABLE portfolio_returns(portfolio string, rdate string, preturn double)
STORED AS SEQUENCEFILE;


INSERT OVERWRITE TABLE portfolio_returns
select row_key portfolio, rdate, SUM(b.return)
from Portfolios a JOIN 10dayreturns b ON
    (a.column_name = b.ticker)
group by row_key, rdate;


--Next find worst returns and save them back to cassandra
DROP TABLE IF EXISTS HistLoss;
create external table HistLoss(row_key string, worst_date string, loss string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ("cassandra.ks.name" = "PortfolioDemo");

INSERT OVERWRITE TABLE HistLoss
select a.portfolio, rdate, cast(minp as string)
FROM (
  select portfolio, MIN(preturn) as minp
  FROM portfolio_returns
  group by portfolio
) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);

Tags:
Created by Administrator on 07/09/2013
    
This website content is not licensed to you. All rights reserved.
XWiki Enterprise 9.11.1 - Documentation