Curriculum - Biotechnology Innovation and Computation

M.S. in Biotechnology Innovation and Computation Curriculum


One
Two
Three
Four

The Core Curriculum (72 units)

Our core curriculum is based on four main phases of innovation development which include opportunity identification, opportunity development, business planning and concludes with business incubation. This process is described below:



Students enter the program by taking courses that correspond to these four phases. Each course is designed to provide them with the skills necessary to analyze a problem set, evaluate possible solutions, and synthesize this learning into the development of a capstone project that will be developed in segments throughout the program. In addition to these courses, students will take additional courses to supplement their knowledge on specific areas such as bio-computation, machine learning, current industry issues and trends, information retrieval, software methods, project management, etc.


Phase One: Opportunity Identification

02-651 New Technologies & Future Markets (12 units)

In this first core course, students learn to analyze and synthesize emerging technological trends and how these trends can help shape or disrupt new and existing markets. Students are tasked with identifying an emerging trend and will then be challenged to perform detailed research and analysis of this trend and the drivers behind it. By focusing on how emerging trends can influence and create markets, students will learn how to identify key market opportunity inflection points in biotechnology.


In addition to understanding how technological trends affect broad markets, students are also exposed to the relationship between business processes and information technology (IT). Students will be introduced to business process workflow modeling and how these concepts are applied in large organizations. Through this method, students will learn the key drivers behind information systems and how to identify organizational opportunities and leverage these to create disruptive models. Student will also learn to assess new technology sectors for unsolved problems and commercially viable solutions.


The course is designed for the student interested in finding new venture opportunities on the cutting edge of technology and finding and evaluating the opportunities for further development.


Phase Two: Opportunity Development

11-695 Competitive Engineering (12 units)

In the second core course, students will be tasked with building a software application prototype for a biotech/pharmaceutical firm. Students will be introduced to a particular firm (through one of the program advisors) and will learn how to conduct and develop requirements analysis and convert that into feature definition. The customer requirements are often a moving target: they're influenced by the emergence of competitive alternatives (e.g. internal consultants, off-the-shelf software) and also by the team interaction with each others. Students will learn to create a product that best captures the best balance of the customer priorities and feasibility and distinguishing it from competitive alternatives. They will then use this learning to develop their respective prototypes. At the conclusion of the term, teams will compete with each other to determine which team's product is superior. In addition to having to apply various aspects of software development and computational learning, the course will help to provide students with some key insights into how biotech/pharmaceutical businesses operate.


In addition to concepts regarding market demand, students will learn how to aggregate and synthesize information related to demand, pricing and competition. They will then apply this learning to define and prioritize market driven requirements as it relates to a product. This information will then be used to build a product development plan. Students will utilize methods to enhance product quality and customer satisfaction: benchmarking; industry and customer analyses; project metrics, and a range of customer relationship management tools.


Phase Three: Enterprise Planning and Scaling Up

02-654 Biotechnology Enterprise Development (12 units)

In this course students learn how to develop a biotech start-up, create a Minimum Viable Product (MVP), business model and strategy for the product. Students will learn about business modeling, customer development, customer validation, proposal, product branding, and marketing for their product. The course will require students to spend most time to validate their start up concept and prototypes with potential customers and adapt to critical feedback and revise their respective value propositions accordingly. Students learn to balance technical product development with customer requirements, business strategy and budget constraints. This course provides real world, hands-on learning on what it is like to start a company. Different business modeling will be covered. By understand customer discovery and validation concepts will aid students to effectively modify their original concepts to meet market demands. Student teams will learn how to revise, improve their prototype by the end of the term. This is a fast paced course in which students are expected to spend most of the time outside of the classroom to interact with potential customers to validate, test, verify, and integrate essentials elements for their start-up business proposal. Up to now, students have been learning some technologies and methods for solving problems in the life science industry and build a prototype for their start-up. However, a new venture proposal is not a collection of isolated bits. It should be thorough validated via customer's inputs and market needs to tell a single story of how the venture will reach its end goals. Final deliverable is creation and presentation of a well explicated, business proposal in addition to a product prototype corresponding to the business proposal.


Phase Four: Forming Companies and Growing Founders

11-699 Program Capstone (36 units)

The final term will integrate all of the acquired learning in the program towards the development of a formal business plan and software product beta. The effort involved in the capstone project is quite intense and will consist of approximately three months of full time work for each student. The expected deliverables (features to be developed, business plan, technical documentation, etc.) must be agreed to by the course instructor at the outset of the course.


The capstone can either encompass the development of an industry sponsored software project or a software product intended for entrepreneurial startup.


Students are expected to showcase their business and software projects and elicit feedback from academics, industry professionals, investors, and business executives. This phase also acts as an incubation period for companies that will be launched from the program. The capstone will be expected to be completed by the end of spring term.


The Knowledge Curriculum (48 units)

In addition to the core curriculum, students must take additional courses to supplement their knowledge in that cover specific areas such as bio-computation, machine learning, etc.


The following is a list of all required courses:


11-675 Big Data Systems for Biotechnology (12 units)

The Big Data for Biotechnology course focuses on the fundamentals of technologies used in manipulating, storing, and analyzing Big data such as Hadoop, Map Reduce and NonSQL storage solutions. Students will learn the basics then apply to work on a simulated Biotechnology project in a public cloud where they will learn statistical analysis and to create a final report. The objective of the course is for students to develop skills in designing and analyzing a systems that can accept, process, store, and analyze large volumes of structure and unstructured data in (near) real time.


11-693 Software Methods in Biotechnology & Life Science (12 units)

Moore's law describes how processing power continues to be faster, better, and cheaper. It not only powered the computer industry forward, but it also is a key driver for propelling biotechnology. It is hard to imagine the world of biotechnology without the world of software. Moreover, the future will further underscore software's importance for enabling biotechnology innovations.

This course is focusing on the relationship between biotechnology processes and information technology where students will be introduced to business process workflow modeling and how these concepts are applied in large organizations. Through this method, students will learn the key drivers behind information systems and how to identify organizational opportunities and leverage these to create disruptive models. Student will also learn to assess new technology sectors for unsolved problems and commercially viable solutions

By taking this course, students will become conversant with the software technologies that can be applied to commercial life science problems in the present and future.


05-834 Applied Machine Learning (12 units)

Machine Learning is concerned with computer programs that enable the behavior of a computer to be learned from examples or experience rather than dictated through rules written by hand. It has practical value in many application areas of computer science such as on-line communities and digital libraries. This class is meant to teach the practical side of machine learning for applications, such as mining newsgroup data or building adaptive user interfaces. The emphasis will be on learning the process of applying machine learning effectively to a variety of problems rather than emphasizing an understanding of the theory behind what makes machine learning work. This course does not assume any prior exposure to machine learning theory or practice. In the first 2/3 of the course, we will cover a wide range of learning algorithms that can be applied to a variety of problems. In particular, we will cover topics such as decision trees, rule based classification, support vector machines, Bayesian networks, and clustering. In the final third of the class, we will go into more depth on one application area, namely the application of machine learning to problems involving text processing, such as information retrieval or text categorization.

Students with a pre-existing working knowledge of probability, statistics and algorithms will be at an advantage, but the class has been designed so that anyone with a strong numerate background can catch up and fully participate. This 12-unit class is intended for students with some academic background in computing.


Students can select one between these two courses:

02-712 Computational Methods for Biological Modeling and Simulation (12 units)

This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Course work will include problems sets with significant programming components and independent or group final projects.


02-750 Automation of Biological Research (12 units)

Biology has been revolutionized by automated methods for generating large amounts of data on diverse biological processes. This, in addition to the finding that many more components are involved in each process than had earlier been thought, has led to a transition from a reductionist paradigm of biological research involving detailed study of single molecules or events to a systems biology paradigm involving comprehensive, systematic studies combined with computational data analysis. Integration of data from many types of experiments will be required to construct detailed, predictive models of cell, tissue or organism behaviors, and the complexity of the systems suggests the need for these models to be constructed automatically. This will require iterative cycles of acquisition, analysis, modeling, and experimental design, since it is not feasible to do all possible biological experiments. This course will cover a range of automated biological research methods, especially high-throughput screening and next generation sequencing, and a range of relevant computational methods, especially model structure learning and active learning. It assumes a basic knowledge of machine learning. Class sessions will consist of a combination of lectures and discussions of important research papers.


Electives


02-652 Fundamentals of Biotechnology (12 units)

Biotechnology inventions and products are changing paradigms in healthcare, agriculture, and industrial processes. Great opportunities exist for those who have the technologies, skills, and perseverance to bring new technology products to market. These opportunities stem from the disruptive effect of biotechnology on existing markets and the abilities to create new markets. This is an introductory course that provides the basis for students in the Biotechnology Innovation and Computation program or general science students who do not have a good background in biology, cell biology, genetics and molecular biology. This course emphasizes the principles underlying biological processes and cell structures as well as the analysis of genetics and heredity from a molecular perspective. It also covers an introduction to computational molecular biology, using an applied algorithms approach as well as exploring emerging computational problems driven by the newest genomic research.


02-710 Computational Genomics (12 units)

In this course we will discuss classical approaches and latest methodological advances in the context of the following biological problems: 1) Computational genomics, focusing on gene finding, motifs detection and sequence evolution. 2) Medical and populational genetics, focusing on polymorphism analysis, linkage analysis, pedigree and genetic demography, 3) Analysis of high throughput biological data, such as gene expression data, focusing on issues ranging from data acquisition to pattern recognition and classification. 4) Molecular and regulatory evolution, focusing on phylogenetic inference and regulatory network evolution, and 5) Systems biology, concerning how to combine sequence, expression and other biological data sources to infer the structure and function of different systems in the cell. From the computational side this course focuses on modern machine learning methodologies for computational problems in molecular biology and genetics, including probabilistic modeling, inference and learning algorithms, pattern recognition, data integration, time series analysis, active learning, etc.

Students are expected to have successfully completed 10-701 (Machine Learning), or an equivalent class.


02-712 Computational Methods for Biological Modeling and Simulation (12 units)

This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Course work will include problems sets with significant programming components and independent or group final projects


02-713 Algorithms and Data Structures for Scientists (12 units)

Introduction to design and analysis of algorithms and data structures. Emphasis placed on techniques that are useful for the analysis of scientific data. Topics include dynamic programming, linear programming, network flows, local and heuristic search, and randomization. NP-completeness and approximation algorithms will also be covered. Data structures discussed will include balanced trees, priority queues, trees for geometric data, string data structures, and hashing. Minimal previous algorithmic knowledge is assumed. Classwork will include programming assignments, but strong programming skills not required


02-730 Cell and Systems Modeling (12 units)

This course will introduce students to a range of theoretical concepts, software tools, and current applications related to cell and systems modeling. Throughout the course, students will be exposed to different approaches and associated consequences related to analytic, continuum, and stochastic modeling methods, as well as visualization, as they can be applied to different systems of varying chemical kinetic and spatial complexity.

This 12-unit course is designed primarily for entering graduate students with a wide variety of backgrounds, and begins with an overview of elementary chemical kinetics and physiological principles, together with hands-on exposure to the primary software tools. It then progresses to more detailed coverage of fundamental physiology and modeling, including diffusion theory, Brownian Dynamics, principles of mass action kinetics, reaction order, enzyme kinetics, cooperatively, allosteric mechanisms and kinetics, electrochemistry, stochastic simulations under assumptions of well-mixed conditions and in spatially complex models, rule-based approaches for complex chemical systems, logical/Boolean modeling, agent-based models, and parameter estimation. Cellular and physiological examples are drawn from signaling networks, cell cycle regulation, oscillating networks, neural and cardiac simulations, and others.


08-741 Very Large Information Systems (12 units)

Students learn the basic technology for very large information systems. The following topics are covered first: database and information retrieval; file organization; indexes; centralized query processing; concurrency control and serializability theory for transactions. Student then consider parallel query processing, distributed query processing, distributed transaction processing, and replication. In the latter part of the course, the basics of data warehousing, data mining, publish-subscribe processing, and personal information management are addressed. At the end of the 12-unit course, the student will understand the fundamental algorithms used in information systems.


11-641 Search Engines and Web Mining (12 units)


11-676 - Big Data Management in Biotechnology (12 units)


11-683 Outsourcing and Growth Strategies (6 units)

An especially dangerous time for new ventures is right after the initial product launch. At startup, many ventures run lean with a small headcount and minimal operational overhead. After some success, the startup is compelled to expand headcount, increase capital expansion, and scale up operations. In many cases, what was a promising theoretical business model may fail due to inadequate growth management.

Biotechnology companies in particular are increasingly having key functions outsourced to reduce cost and increasing efficiency. The capital cost for laboratories and specialized lab technicians is often prohibitive for biotech startups with a clear and narrow focus. Biotech startups are therefore running much leaner but with a distributed organizational structure. Under these circumstances, managing outsourced functions becomes critical and is a focus of this course. This course will introduce students to issues with growth strategy and outsourcing management.


11-691 Software Project Planning & Management (6 units)

There is a familiar picture regarding software development: it is often delivered late, over-budget, and lacking important features. There is often an inability to capture the customer's actual way of accomplishing work, and then creating a realistic project plan. This will be especially important as software development in the life sciences involves creating applications that are relatively new to the industry.

The course will introduce students to the "Balanced Framework" of project management process that assists biotechnology organizations in planning and managing software projects that support their product development. It provides the identification, structuring, evaluation and ongoing management of the software project that deliver the benefits expected from the organization's investments. It focuses on the delivery of business value being initiated by the project. It helps an organization answer the basic question "Are the things we are doing providing value to the business?"

In this course, students will learn how to examine and explain customer processes and create requirements that reflect how work is actually done. Students will additionally create a software project plan that incorporates: problem framing; customer workflow, planning, project tracking, monitoring, and measurement.


11-741 Information Retrieval (12 units)

This 12-unit course studies the theory, design, and implementation of text-based information systems. The IR core components of the course include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling), clustering algorithms, automatic text categorization, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems. A variety of current research topics are also covered, including cross-lingual retrieval, document summarization, machine learning, topic detection and tracking, and multi-media retrieval.

This course involves written assignments, and programming assignments such as search engine implementation and automatic text classification. Good programming skills are a prerequisite.


11-796 Question Answering (6 units)

The Question Answering Lab course provides a chance for hands-on, in-depth exploration of core algorithmic approaches to question answering (QA). Students will work independently or in small teams to extend or adapt existing QA modules and systems to improve overall performance on known QA datasets (e.g. TREC, CLEF, NTCIR, Jeopardy!), using best practices associated with the Open Advancement of Question Answering initiative. Projects will utilize existing components and systems from LTI (JAVELIN, Ephyra) and other open source projects (UIMA-AS, OAQA) running on a 10-node distributed computing cluster. Each student project will evaluate one or more component algorithms on a given QA dataset and produce a conference-style paper describing the experimental setup and results. Format: The course will require weekly in-class progress meetings with the instructors, in addition to individual self-paced work outside the classroom.


admissions
 

Support for Innovation