Data Scientist TS/SCI with CI Polygraph        
MD-Bethesda, Data Scientist Bethesda, MD TS/SCI w/CI poly required Data Scientist The Data Scientist shall provide data analysis support to counterintelligence (CI) evaluations of US national security issues. The Data Scientist will use technical and analytical expertise to explore and examine data from multiple disparate sources with the goal of discovering patterns and previously hidden insights, which in tu
          IBM Ventures Advances Corporate Goals On 3 Pillars        
Platforms - We provide access to IBM assets such as Watson, data science capabilities, cloud resources, security protocols, IoT platforms. IBM is also ...
          Former State Dept, Sotera Official John Hillen Joins Govini Board of Advisers        
“A data science approach is essential to clarifying and simplifying the business of government.” Hillen is also currently a member of the board of ...
          IST student files patent, sees bright future in the ‘golden age of data science‘        
UNIVERSITY PARK, Pa. – For Penn State undergraduate student Yuya Ong, data sciences isn't just his major — it's a way of thinking about life.
          Galvanize and Amazon Launch Series of Free Alexa Workshops Across the Country        
Galvanize is a 21st Century school for engineers, entrepreneurs and data scientists. On eight campuses across the U.S., the energy, intellect and ...
          Data Science Campus outlines plans for Fall        
The Office for National Statistics (ONS) Data Science Campus has been set up to act as a hub for the whole of government to gain practical advantage ...
          Rare Disease Treatments to Be Discovered by Machine Learning and Simulation Platform        
"I look forward to combining the GNS REFS platform with Alexion's deep expertise in data sciences to accelerate the discovery of innovative medicines ...
          Data Science Campus outlines plans for Fall        
Focused on exploring the potential for deep learning application to government data science challenges, Data Science is currently working on ten ...
          Data Science Engineer @ Semantive        
Data Science backend position at Semantive in Warsaw, Poland. 11000 - 17000 PLN / Month.
          QuanticMind Reports Record Year-over-Year Growth in Q2 2017        
“We're thrilled to partner with even more astute advertisers across the globe to conquer the challenges of digital with advanced data science, machine ...
          The Gamma and Digital News Innovation Fund        

Last year, I wrote a bit about my interest in building programming tools for data journalism. Data journalism might sound like a niche field, but that is not the case if we talk about data-driven storytelling more generally,

In programming, your outcome is typically some application that does stuff. In data science, your outcome is very often a report or a story that aims to influence people's behavior or company decisions. No matter whether you are a journalist writing an article about government spending or an analyst producing internal sales reports, you are telling stories with data.

Being able to tell stories with data (but also verify and assess other people's stories that can be backed by data) is becoming a vital skill in the modern world, which is partly why I find this topic extremely important and interesting. But to do this currently, you need to be a skilled programmer, great designer and good storyteller, all at the same time!

I have not written about this topic much over the last year (mainly because I was busy with Coeffects, fsharpConf, FsLab and fsharpWorks), but this will be changing - I'm very happy to announce that my data-journalism related project The Gamma has been awarded funding from the DNI Innovation Fund and I'll be working on it over the next year at the Alan Turing Institute in London.


          Better F# data science with FsLab and Ionide        

At NDC Oslo 2016, I did a talk about some of the recent new F# projects that are making data science with F# even nicer than it used to be. The talk covered a wider range of topics, but one of the nice new thing I showed was the improved F# Interactive in the Ionide plugin for Atom and the integration with FsLab libraries that it provides.

In particular, with the latest version of Ionide for Atom and the latest version of FsLab package, you can run code in F# Interactive and you'll see resulting time series, data frames, matrices, vectors and charts as nicely pretty printed HTML objects, right in the editor. The following shows some of the features (click on it for a bigger version):

In this post, I'll write about how the new Ionide and FsLab integration works, how you can use it with your own libraries and also about some of the future plans. You can also learn more by getting the FsLab package, or watching the NDC talk..


          F# + ML |> MVP Summit: Talk recordings, slides and source code        

I was fortunate enough to make it to the Microsoft MVP summit this year. I didn't learn anything secret (and even if I did, I wouldn't tell you!) but one thing I did learn is that there is a lot of interest in data science and machine learning both inside Microsoft and in the MVP community. What was less expected and more exciting was that there was also a lot of interest in F#, which is a perfect fit for both of these topics!

When I visited Microsoft back in May to talk about Scalable Machine Learning and Data Science with F# at an internal event, I ended up chatting with the organizer about F# and we agreed that it would be nice to do more on F#, which is how we ended up organizing the F# + ML |> MVP Summit 2015 mini-conference on the Friday after the summit.


          Education for Real-World Data Science Roles (Part 2): A Translational Approach to Curriculum Development        

This study reports on the findings from Part 2 of a small-scale analysis of requirements for real-world data science positions and examines three further data science roles: data analyst, data engineer and data journalist. The study examines recent job descriptions and maps their requirements to the current curriculum within the graduate MLIS and Information Science and Technology Masters Programs in the School of Information Sciences (iSchool) at the University of Pittsburgh. From this mapping exercise, model ‘course pathways’ and module ‘stepping stones’ have been identified, as well as course topic gaps and opportunities for collaboration with other Schools. Competency in four specific tools or technologies was required by all three roles (Microsoft Excel, R, Python and SQL), as well as collaborative skills (with both teams of colleagues and with clients). The ability to connect the educational curriculum with real-world positions is viewed as further validation of the translational approach being developed as a foundational principle of the current MLIS curriculum review process

 


          Data Scientist/Quantitative Analyst, Engineering - Google - Mountain View, CA        
(e.g., as a statistician / data scientist / computational biologist / bioinformatician). 4 years of relevant work experience (e.g., as a statistician /...
From Google - Sat, 05 Aug 2017 09:55:57 GMT - View all Mountain View, CA jobs
          Bowman to NYU        
Congratulations to Sam Bowman (BA/MA ’11), who shares with us the wonderful news that he’ll be starting as an assistant professor this fall at NYU in the Department of Linguistics and the Center for Data Science. Way to go, Sam!
          Java: Data Science Made Easy        

eBook Details: Paperback: 734 pages Publisher: WOW! eBook (July 7, 2017) Language: English ISBN-10: 1788475658 ISBN-13: 978-1788475655 eBook Description: Java: Data Science Made Easy: Data collection, processing, analysis, and more

The post Java: Data Science Made Easy appeared first on WOW! eBook: Free eBooks Download.


          Marketing Analytics Lab - Grand Opening        

Take a behind the scenes look at EMC's new Silicon Valley Marketing Science Lab, and see how Big Data analytics can help marketers make better connections with customers.

Cast: Dell Multimedia

Tags: EMC Corp, Marketing ANalytics, Big Data, Analytics Lab, Michael Foley, Big Data Scientist, Big Data Analytics, Silicon Valley and Marketing Lab


          MOOC Majors: An Alternative Route to the Workforce         
What is the MOOC Business Model?? That was one of the burning question about MOOCs in 2013. In 2014 we may learn that no one business model will prevail, but MOOC Platform firms will develop a number of promising and complementary revenue streams. 

MOOC Sequences and Specializations

One of these revenue streams will be the Sequence Certificate or MOOC-Mini-Major. Last year several MOOC platforms introduced course sequences. The most heralded is the Georgia Tech Udacity masters degree in Computer Science, sponsored by AT&T. 

But the MOOC firms also introduced several notable mini-courses-of-study that did not carry university credit or connect to a degree pathway. One notable example is the sequence of foundation MBA courses from University of Pennsylvania's Wharton School, announced by Wharton and Coursera last September. These courses, like others from Coursera, were offered free of charge. In January 2014 Laurie Pickard, a master's degree graduate of my university, Temple, was featured in Fortune magazine for patching together an entire MBA-type program from such MOOCs. 

In January 2014 Coursera announced their Specializations Certificate Programs. These programs package a number of distinct MOOCs - from 3 to 9 - offered by the same institution, plus a final capstone project and exam. Students pay for each course in the sequence, complete the capstone project and exam, and earn a certificate not just for each course but for the entire sequence. The sequences are thus MOOC-based near equivalents of college majors, at least in the sense that the courses are designed by a single institution's faculty to fit together in a sequence and to generate capabilities currently demanded in the economy. 

Two good examples of the new Specializations programs are the four-course sequence in cybersecurity, a hot field with a bright future, offered by the University of Maryland with a certificate available to those who pass all four, complete a capstone project and pay $245 and the sequence of nine MOOCs in data science offered by Johns Hopkins, with certificates available to those who pass all nine, complete a capstone project and pay $490.  

MOOC Mini-Majors

It is not difficult to imagine students in the near future offering up two, three or four of these Specialization certificates in lieu of a university "major". But obstacles remain.

EdX recently experimented with matching more than 800 top-performing MOOC learners with top-tier technology companies. The results, as reported by the Chronicle of Higher Education, were not merely disappointing but disastrous. Despite the sponsorship of edX, only three received interviews, and not one was hired. Subsequently edX has withdrawn from the 'employment agency' business.

That step may have been premature. The top-tier firms get thousands of applicants from the best university programs in computer science and information systems for every opening. Why would they be interested in experimenting with MOOC learners when they can take their pick of numerous Stanford, MIT and Purdue grads, who have shown the persistence to earn four year degrees, rubbed shoulders with top professors, and networked with other top students who will soon enter the workforce and connect up with hundreds of other hot prospects?

Meanwhile,  new business start-ups in Silicon Valley, on Massachusetts Route 128, in New York's Silicon Alley and throughout the country hunger for talent. Most organizations will not be able to compete for the top grads of the top-tier university programs. Is it not possible that edX, which is hardly an expert in the employment agency business, simply directed their efforts at the wrong job market.

One is reminded of Prof. Michael Lenox's Coursera MOOC on Business Strategy offered in February and March 2013. Lenox, a Professor at University of Virginia's Darden School of Business, used crowdsourcing techniques to locate business firms and non-profit organizations willing to involve his MOOC students in their strategic planning. As Lenox said at the time:

"The concept can be applied in any number of domains, Imagine a course on graphic design where students prepare solutions for real nonprofits or a computer program course where students develop code for small startups with limited budgets. The potential is enormous."

As it turned out, the potential was enormous -  more than 100 small firms and non-profits participated in Lenox's MOOC, and close to 80% of the active student participants contributed to these organizations' strategic planning.

Now imagine a large scale effort by one of the big MOOC Platform firms to source similar organizations to provide short-term internships or apprenticeships for top performing MOOC students who have earned one or more Specialization Certificates?   My guess is that, unlike the failed edX effort with Google, Intuit, Yahoo, and other top tier firms, a crowdsourced effort to connect top MOOC learners and hungry organizations would place many learners with MOOC majors in the workforce.

* * *

In my next post I will consider the new promise of MOOCs in providing near equivalents of course credit that some universities, faced with declining enrollment and loss of tuition dollars, can accept as transfer credit to forge efficient and affordable 'mixed-mode' degree pathways. 

          AI – What Chief Compliance Officers Care About        

AI conference logo

Arguably, there are more financial institutions located in the New York metropolitan area than anywhere else on the planet, so it was only fitting for a conference on AI, Technology Innovation & Compliance to be held in NYC – at the storied Princeton Club, no less. A few weeks ago I had the pleasure of speaking at this one-day conference, and found the attendees’ receptivity to artificial intelligence (AI), and creativity in applying it, to be inspiring and energizing. Here’s what I learned.

CCOs Want AI Choices

As you might expect, the Chief Compliance Officers (CCOs) attending the AI conference were extremely interested in applying artificial intelligence to their business, whether in the form of machine learning models, natural language processing or robotic process automation – or all three. These CCOs already had a good understanding of AI in the context of compliance, knowing that:

  • Working the sets of rules will not find “unknown unknowns”
  • They should take a risk-based approach in determining where and how to divert resources to AI-based methods in order to find the big breakthroughs.

All understood the importance of data, and how getting the data you need to provide to the AI system is job number one. Otherwise, it’s “garbage in, garbage out.” I also discussed how to provide governance around the single source of data, the importance of regular updating, and how to ensure permissible use and quality.

AI Should Explain Itself

Explainable AI (XAI) is a big topic of interest to me, and among the CCOs at the conference, there was an appreciation that AI needs to be explainable, particularly in the context of compliance with GDPR. The audience also recognized that their organizations need to layer in the right governance processes around model development, deployment, and monitoring––key steps in the journey toward XAI. I reviewed the current state of art of Explainable AI methods, and where their road leads to getting AI that is more grey-boxed.

Ethics and Safety Matter

In pretty much every AI conversation I have, ethics are the subject of lively discussion. The New York AI conference was no exception. The panel members and I talked about how any given AI system is not inherently ‘ethical’; it learns from the inputs it’s given. The modelers who build the AI system need to not pass sensitive data fields, and those same modelers need to examine if inadvertent biases are derived from the inputs in the training of the machine learning model.

Here, I was glad to be able to share some of the organizational learning FICO has accumulated over decades of work in developing analytic models for the FICO® Score, our fraud, anti-money laundering (AML) products and many others.

AI safety was another hot topic. I shared that although models will make mistakes and there needs to be a risk-based approach, machines are often better than human decision-making, such as autopilots on airplanes. Humans need to be there to step in if something is changing, to the degree that the AI system may not make an optimal decision. This could arise as a change in environment or data character.

In the end, an AI system will work with the data on which it has trained, and is trained to find patterns in it, but the model itself is not necessarily curious; the model is still constrained by the algorithm development, data posed in the problem, and the data it trains on.

Open Source Is Risky

Finally, the panel and I talked about AI software and development practices, including the risks of open source software and open source development platforms. I indicated that I am not a fan of open source, as it often leads to scientists using algorithms incorrectly, or relying on someone else’s implementation. Building an AI implementation from scratch, or from an open source development platform, gives data scientists more hands-on control over the quality of the algorithms, assumptions, and ultimately the AI model’s success in use.

I am honored to have been invited to participate in Compliance Week’s AI Innovation in Compliance conference. Catch me at my upcoming speaking events in the next month: The University of Edinburgh Credit Scoring and Credit Control XV Conference on August 30-September 1, and the Naval Air Systems Command Data Challenge Summit.

In between speaking gigs I’m leading FICO’s 100-strong analytics and AI development team, and commenting on Twitter @ScottZoldi. Follow me, thanks!

The post AI – What Chief Compliance Officers Care About appeared first on FICO.


          Three Keys to Advancing your Digital Transformation        

Digital assets

With today’s proliferation of data, digital transformation (DX) has become more than a hot topic: It’s an imperative for businesses of all shapes and sizes. The collision of data, analytics and technology has businesses, analysts and consumers excited — and scared — about what could happen next.

On one hand, everyone from banks to bagel shops and travel sites to tractor manufacturers have found new ways to connect the dots in their businesses while forging stronger, more dynamic customer engagement. Artificial intelligence (AI) has come of age in technologies such as smart sensors, robotic arms, and devices that can turn lights and heat on and off, adjust for changes in conditions and preferences, and even automatically reorder food and supplies for us.

However, today's Chief Analytics Officer (and Chief Data Officer and Chief Digital Officer, for example) faces both the promise and precariousness of digitizing business. While significant opportunities abound to drive revenues and customer connectivity, any leader will freely confess there are myriad technological, business and human obstacles to transforming even one element of business, introducing a new unique product or even meeting regulatory requirements.

The Big Data Dilemma

Big Data is at once the promise of the DX and its biggest roadblock. A recent Harvard Business Review article put it succinctly: “Businesses today are constantly generating enormous amounts of data, but that doesn’t always translate to actionable information.”

When 150 data scientists were asked if they had built a machine learning model, roughly one-third raised their hands. How many had deployed and/or used this model to generate value, and evaluated it? Not a single one.

This doesn’t invalidate the role of Big Data in achieving DX. To the contrary: The key to leveraging Big Data is understanding what its role is in solving your business problems, and then building strategies to make that happen — understanding, of course, that there will be missteps and possibly complete meltdowns along the way.

In fact, Big Data is just one component of DX that you need to think about. Your technology infrastructure and investments (including packaged applications, databases, and analytic and BI tools) need to similarly be rationalized and ultimately monetized, to deliver the true value they can bring to DX.

Odds are many components will either be retired or repurposed, and you’ll likely come to the same conclusion as everyone else that your business users are going to be key players in how DX technology solutions get built and used. That means your technology and analytic tools need to allow you the agility and flexibility to prototype and deploy quickly; evolve at the speed of business; and empower people across functions and lines of business to collaborate more than they’ve ever done before.

Beyond mapping out your overarching data, technology and analytic strategies, there are several areas to consider on your DX journey. Over the next three posts, I’ll focus on how to:

  1. Visualize your digital business, not your competitors’
  2. Unleash the knowledge hidden within your most critical assets
  3. Embrace the role and evolution of analytics within your journey

To whet your appetite, check out this short video on the role of AI in making DX-powered decisions.

 

The post Three Keys to Advancing your Digital Transformation appeared first on FICO.


          Google’s Pixel rules the bot world by thinking like an infant        

Google Assistant is the humbly named artificial intelligence agent that lives in Google’s new Pixel phone, and it represents the most powerful accumulation of applied data science in history. Google feeds its various intelligence processes, from Deep Mind to Knowledge Graph to Expander, with exabytes of data from its billions of active users. The data comes […]


          LinkedIn Joins CSU, Other Key Northeast Ohio Stakeholders to Analyze Vital Health IT Talent Data        
Project is a tactic within HIT in the CLE Regional Talent Initiative

*Release via the BioEnterprise

LinkedIn, which operates the world’s largest professional network on the Internet with more than 500 million members in over 200 countries and territories, has teamed up with BioEnterprise, the City of Cleveland, Cleveland State University and Cuyahoga County to provide data, analysis, and market research on the talent flows of software developers, data scientists, data analysts and other computer science positions within the Northeast Ohio health IT sector. Supported by the Cleveland Foundation, the analysis will ultimately inform policy, educational curriculum, community programming and other talent alignment strategies within this regional growth sector.

The bioscience cluster is a primary growth engine reviving the Northeast Ohio economy. Within the bioscience cluster, the health IT industry is flourishing, creating hundreds of new jobs each year. However, an acute shortage of qualified local talent is a major barrier to growth.

“One of the critical limiting factors to growth in Northeast Ohio’s bioscience industry today is the availability of health IT talent,” explained Aram Nerpouni, BioEnterprise President and CEO. “Thriving health IT companies are hindered by the dearth of software developers and data scientists. The LinkedIn project should provide meaningful data and analysis to inform how we address this challenge.”

With the support of the Cleveland Foundation, BioEnterprise launched HIT in the CLE in 2015 to address the regional computer science and data science talent gap. The Initiative aims to grow and diversify the Northeast Ohio health IT talent pipeline to support a vibrant health IT industry.

“We felt it was crucial to partner with BioEnterprise to begin addressing the demand-supply gap in health IT and to deeply engage businesses to expand the talent pipeline,” said Shilpa Kedar, Cleveland Foundation Program Director for Economic Development. “LinkedIn’s involvement with HIT in the CLE is a tremendous win for the region and we anticipate that this work will prove to be extraordinarily beneficial.”

The LinkedIn project is an important tactic within the larger HIT in the CLE talent strategy. The effort aspires to provide insights into the education and experience of people currently employed in the regional health IT sector, pathways for securing regional health IT positions, and institutions from which the local sector most successfully attracts qualified talent. The insights discovered through the analysis may surface gaps and barriers in the local health IT talent pipeline and help inform strategy for addressing these important talent issues.

“At LinkedIn, our vision is to create economic opportunity for every worker,” said LinkedIn U.S. Head of Policy Nicole Isaac. “We’re excited to use the Economic Graph – a digital map of the global economy that when complete will include every member of the global workforce and their skills, all open jobs, all employers, and all educational institutions – to provide the City of Cleveland with a more holistic view of the computer science and data science skills local employers need, the skills its workers have and the disconnect between the two. The City can use those insights to create a stronger IT talent pipeline, and grow its IT industry.”

“Making our workforce a competitive advantage, which includes understanding our gaps as well as opportunities is a crucial strategic focus,” said Cuyahoga County Executive Armond Budish. “We know that the strength of our healthcare is a great advantage and we believe that the bioscience cluster will drive a lot of our job growth in the coming years. LinkedIn’s contribution to help inform and accelerate that growth is a welcome addition to the HIT in the CLE effort.”

Data provided by BioEnterprise and LinkedIn will be pulled throughout the summer. Ongoing analysis will take place through the summer and findings are expected in the fall.

“Cleveland is a City with a growing health research and information technology economy from the unseen power of the 100 gig fiber network along the Health-Tech Corridor and the health care research institutions within our community,” said Mayor Frank G. Jackson. “I welcome the opportunity for the City of Cleveland to collaborate with LinkedIn to provide research and data on the talent that is relocating to Cleveland and drawing talent to join the workforce here.”


          IBM releases Watson Machine Learning for a general audience        

Not content with beating humans at quiz shows, IBM is moving forward with its Watson Machine Learning service. Now generally available after a year’s worth of beta testing, WML promises to address the needs of both data scientists and devs.

The post IBM releases Watson Machine Learning for a general audience appeared first on JAXenter.


          Summer 2016 tech reading        

Hi there! Summer is here and almost gone. So here's a gigantic list of my favorite, recent articles, which I should've shared sooner.

Java

Other languages

Reactive programming

Persistent data structures

CRDT

Data

Systems and other computer science-y stuff

Fun/General

Until next time! Ashwin.

          African Governments Lead the Way on Data Revolution        

Millions to benefit from commitments that include: data-led agriculture plans to build food resilience and security, a birth registration drive using SMS and web to widen access to healthcare and education and African data science hubs that will anchor the region’s future

(PRWeb June 29, 2017)

Read the full story at http://www.prweb.com/releases/2017/07/prweb14473188.htm


          Ð¡ÐµÑ€Ð²Ð¸Ñ для таргетирования рекламы, медицинский консультант и CRM для удержания клиентов. Победители первого беларусского дататона        

Стартап-хаб Imaguru провел первый в Беларуси дататон. Data Science-партнером мероприятия выступила компания InData Labs, телеком-партнером – velcom, финанс-партнером – Приорбанк, IT-партнером – Emerline, cloud-партнером – Microsoft.


          Â«Ð­Ñ‚о важнее, чем просто конкурс по машинному обучению»: сегодня в стартап-хабе Imaguru стартует первый в Беларуси дататон        

Imaguru Datathon – это площадка для решения конкретных проблем бизнеса с помощью data science and big data techniques. Компании-партнеры – velcom и Приорбанк – ÑÑ„ормулировали задачи и предоставили свои datasets.


          Sponsored Post: Apple, Domino Data Lab, Etleap, Aerospike, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp        

Who's Hiring? 

  • Apple is looking for passionate VoIP engineer with a strong technical background to transform our Voice platform to SIP. It will be an amazing journey with highly skilled, fast paced, and exciting team members. Lead and implement the engineering of Voice technologies in Apple’s Contact Center environment. The Contact Center Voice team provides the real time communication platform for customers’ interaction with Apple’s support and retail organizations. You will lead the global Voice, SIP, and network cross-functional engineers to develop world class customer experience. More details are available here.

  • Advertise your job here! 

Fun and Informative Events

  • Advertise your event here!

Cool Products and Services

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • What engineering and IT leaders need to know about data science. As data science becomes more mature within an organization, you may be pulled into leading, enabling, and collaborating with data science teams. While there are similarities between data science and software engineering, well intentioned engineering leaders may make assumptions about data science that lead to avoidable conflict and unproductive workflows. Read the full guide to data science for Engineering and IT leaders.

  • Etleap provides a SaaS ETL tool that makes it easy to create and operate a Redshift data warehouse at a small fraction of the typical time and cost. It combines the ability to do deep transformations on large data sets with self-service usability, and no coding is required. Sign up for a 30-day free trial.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Working on a software product? Clubhouse is a project management tool that helps software teams plan, build, and deploy their products with ease. Try it free today or learn why thousands of teams use Clubhouse as a Trello alternative or JIRA alternative.

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


          Stuff The Internet Says On Scalability For July 28th, 2017s        

Hey, it's HighScalability time:

 

Jackson Pollock painting? Cortical column? Nope, it's a 2 trillion particle cosmological simulation using 4000+ GPUs. (paper, Joachim Stadel, UZH)

If you like this sort of Stuff then please support me on Patreon.

 

  • 1.8x: faster code on iPad MacBook Pro; 1 billion: WhatsApp daily active users; 100 milliamps: heart stopping current; $25m: surprisingly low take from ransomware; 2,700x: improvement in throughput with TCP BBR; 620: Uber locations; $35.5 billion: Facebook's cash hoard; 2 billion: Facebook monthly active users; #1: Apple is the world's most profitable [legal] company; 500,000x: return on destroying an arms depot with a drone; 

  • Quotable Quotes:
    • Alasdair Allan: Jeff Bezos’ statement that “there’s not that much interesting about CubeSats” may well turn out to be the twenty first century’s “nobody needs more than 640kb.”
    • @hardmaru: Decoding the Enigma with RNNs. They trained a LSTM with 3000 hidden units to decode ciphertext with 96%+ accuracy. 
    • @tj_waldorf: Morningstar achieved 97% cost reduction by moving to AWS. #AWSSummit Chicago
    • Ed Sperling: Moore’s Law is alive and well, but it is no longer the only approach. And depending on the market or slice of a market, it may no longer be the best approach.
    • @asymco: With the end of Shuffle and Nano iPods Apple now sells only Unix-enabled products. Amazing how far that Bell Labs invention has come.
    • @peteskomoroch: 2017: RAM is the new Hadoop
    • Carlo Pescio: What if focusing on the problem domain, while still understanding the machine that will execute your code, could improve maintainability and collaterally speed up execution by a factor of over 100x compared to popular hipster code?
    • @stevesi: Something ppl forget: moving products to cloud, margins go down due to costs to operate scale services—costs move from Customer to vendor.
    • @brianalvey: The most popular software for writing fiction isn't Word. It's Excel.
    • @pczarkowski: How to make a monolithic app cloud native: 1) run it in a docker 2) change the url from .com to .io
    • @tj_waldorf: Morningstar achieved 97% cost reduction by moving to AWS. #AWSSummit Chicago
    • drinkzima: There is a huge general misunderstanding in the profitability of directing hotel bookings vs flight bookings or other types of travel consumables. Rate parity and high commission rates mean that directing hotel rooms is hugely profitable and Expedia (hotels.com, trivago, expedia) and Priceline (booking.com) operate as a duopoly in most markets. They are both marketing machines that turn brand + paid traffic into highly profitable room nights.
    • Animats: This is a classic problem with AI researchers. Somebody gets a good result, and then they start thinking strong human-level AI is right around the corner. AI went through this with search, planning, the General Problem Solver, perceptrons, the first generation of neural networks, and expert systems. Then came the "AI winter", late 1980s to early 2000s, when almost all the AI startups went bust. We're seeing some of it again in the machine learning / deep neural net era.
    • Charity Majors: So no, ops isn't going anywhere. It just doesn't look like it used to. Soon it might even look like a software engineer.
    • @mthenw: As long as I need to pay for idle it’s not “serverless”. Pricing is different because in Lambda you pay for invocation not for the runtime.
    • Kelly Shortridge: The goal is to make the attacker uncertain of your defensive environment and profile. So you really want to mess with their ability to profile where their target is
    • @CompSciFact: 'About 1,000 instructions is a reasonable upper limit for the complexity of problems now envisioned.' -- John von Neumann, 1946
    • hn_throwaway_99: Few barriers to entry, really?? Sorry, but this sounds a bit like an inexperienced developer saying "Hey, I could build most of Facebook's functionality in 2 weeks." Booking.com is THE largest spender of advertising on Google. They have giant teams that A/B test the living shite out of every pixel on their screens, and huge teams of data scientists squeezing out every last bit of optimization on their site. It's a huge barrier to entry. 
    • callahad: It's real [performance improvements]. We've [Firefox] landed enormous performance improvements this year, including migrating most Firefox users to a full multi-process architecture, as well as integrating parts of the Servo parallel browser engine project into Firefox. There are still many improvements yet-to-land, but in most cases we're on track for Firefox 57 in November.
    • Samer Buna: One important threat that GraphQL makes easier is resource exhaustion attacks (AKA Denial of Service attacks). A GraphQL server can be attacked with overly complex queries that will consume all the resources of the server.
    • wheaties: This is stupid. Really. Here we are in a world where the companies that own the assets (you know, the things that cost a lot of money) are worth less than the things that don't own anything. This doesn't seem "right" or "fair" in the sense that Priceline should be a middleman, unable to exercise any or all pricing power because it does not control the assets producing the revenue. I wonder how long this can last?
    • platz: Apparently deep-learning and algae are the same thing.
    • @CompSciFact: "If you don't run experiments before you start designing a new system, your entire system will be an experiment." -- Mike Williams
    • Scott Aaronson: our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect. 
    • The Internet said many more interesting things this week. To read them all please click through to the full article.

  • Cool interview with Margaret Hamilton--NASA's First Software Engineer--on Makers. Programmers, you'll love this. One of the stories she tells is how her daughter was playing around and selected the prelaunch program during flight. That crashed the simulator. So like a good programmer she wanted to prevent this from happening. She tried to get a protection put in because an astronaut could actually do this during flight. Management would certainly allow this, right? She was denied. They said astronauts are trained never to make a mistake so it could never happen. Eventually she won the argument and was able to add code to protect against human error. So little has changed :-)

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...


          Sponsored Post: Apple, Domino Data Lab, Etleap, Aerospike, Loupe, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp        

Who's Hiring? 

  • Apple is looking for passionate VoIP engineer with a strong technical background to transform our Voice platform to SIP. It will be an amazing journey with highly skilled, fast paced, and exciting team members. Lead and implement the engineering of Voice technologies in Apple’s Contact Center environment. The Contact Center Voice team provides the real time communication platform for customers’ interaction with Apple’s support and retail organizations. You will lead the global Voice, SIP, and network cross-functional engineers to develop world class customer experience. More details are available here.

  • Advertise your job here! 

Fun and Informative Events

  • Advertise your event here!

Cool Products and Services

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • What engineering and IT leaders need to know about data science. As data science becomes more mature within an organization, you may be pulled into leading, enabling, and collaborating with data science teams. While there are similarities between data science and software engineering, well intentioned engineering leaders may make assumptions about data science that lead to avoidable conflict and unproductive workflows. Read the full guide to data science for Engineering and IT leaders.

  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • Etleap provides a SaaS ETL tool that makes it easy to create and operate a Redshift data warehouse at a small fraction of the typical time and cost. It combines the ability to do deep transformations on large data sets with self-service usability, and no coding is required. Sign up for a 30-day free trial.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Working on a software product? Clubhouse is a project management tool that helps software teams plan, build, and deploy their products with ease. Try it free today or learn why thousands of teams use Clubhouse as a Trello alternative or JIRA alternative.

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


The Solution to Your Operational Diagnostics Woes

Scalyr gives you instant visibility of your production systems, helping you turn chaotic logs and system metrics into actionable data at interactive speeds. Don't be limited by the slow and narrow capabilities of traditional log monitoring tools. View and analyze all your logs and system metrics from multiple sources in one place. Get enterprise-grade functionality with sane pricing and insane performance. Learn more today


VividCortex Gives You Database Superpowers 

Database monitoring is hard, but VividCortex makes it easy. Modern apps run complex queries at large scales across distributed, diverse types of databases (e.g. document, relational, key-value). The “data tier” that these become is tremendously difficult to observe and measure as a whole. The result? Nobody knows exactly what’s happening across all those servers.

VividCortex lets you handle this complexity like a superhero. With VividCortex, you're able to inspect all databases and their workloads together from a bird's eye view, spotting problems and zooming down to individual queries and 1-second-resolution metrics in just two mouse clicks. With VividCortex, you gain a superset of system-monitoring tools that use only global metrics (such as status counters), offering deep, multi-dimensional slice-and-dice visibility into queries, I/O, and CPU, plus other key measurements of your system's work. VividCortex is smart, opinionated, and understands databases deeply, too: it knows about queries, EXPLAIN plans, SQL bugs, typical query performance, and more. It empowers you to find and solve problems that you can't even see with other types of monitoring systems, because they don’t measure what matters to the database.

Best of all, VividCortex isn’t just a DBA tool. Unlike traditional monitoring tools, VividCortex eliminates the walls between development and production -- anybody can benefit from VividCortex and be a superhero. With powerful tools like time-over-time comparison, developers gain immediate production visibility into databases with no need for access to the production servers. Our customers vouch for it: Problems are solved faster and they're solved before they ever reach production. As a bonus, DBAs no longer become the bottleneck for things like code reviews and troubleshooting in the database. 

Make your entire team into database superheros: deploy VividCortex today. It only takes a moment, and you’re guaranteed to immediately learn something you didn’t already know about your databases and their workload. 

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


          Advanced object-oriented programming in r: statistical programming for data science, analysis and finance        
none
          How to Measure PR with PRTech        

What PR work makes sense to automate? Should you be scared of machine learning and data science? Rebekah Iliff says we should embrace the technology that can turn us into our super selves. She explains

The post How to Measure PR with PRTech appeared first on Spin Sucks.

      

          Bayesian Decision Theory        
Alright! You probably have been hearing a lot about Big Data and Data Scientists etc. The big data craze was actually in full swing when the Harvard Business Review published an article 3 years ago with the title stated that "Data Scientist: The Sexiest Job of the 21st Century". And in order to become a […]
          Latinoware 2014 aí vamos nós!        

Introdução

Após 5 anos volto a Latinoware, evento da comunidade de Software Livre que ocorre em Foz do Iguaçu - Paraná - Brasil.

Além das conexões pessoais, trocas de chaves pgp, desvirtualizações de amigos virtuais, chops e etc… tem uma extensa e rica programação. Assim, para minha organização pessoal, listo abaixo as palestras ou oficinas que pretendo participar. Se você estiver por lá, nestes horários, poderemos compartilhar as mesmas coordenadas de espaço-tempo :-)

O que pretendo participar/assistir/comparecer

A programação completa (com sinopse de cada palestra/oficina/keynote) pode ser vista aqui.

15/10/2014

  • 10h - 11h - GNU/Linux - It is not 1984 (or 1969) anymore - Jon “Maddog” Hall
  • SIMULANDO FENÔMENOS COM O GEOGEBRA - Marcela Martins Pereira e Eduardo Antônio Soares Júnior
  • 12h - 13h - Espaços abertos colaborativos Guilherme Guerra
  • 13h - 14h - (comer alguma coisa) e tentar me dividir entre: O analfabetismo tecnológico e a formação dos professores Antonio Carlos C. Marques e Internet das Coisas: Criando APIs para o mundo real com Raspberry Pi e Python Pedro Henrique Kopper
  • 14h -16h - Abertura Oficial da Latinoware
  • 16h - 17h - Edição de vídeos na prática com kdenlive Carlos Cartola
  • 17h - 18h - red#matrix, muito mais que uma mídia social. Frederico (aracnus) Gonçalves Guimarães

16/10/2014

  • 10h - 11h - Direitos autorais e os cuidados ao utilizar serviços “da nuvem” e “gratuitos” para construir objetos educacionais Márcio de Araújo Benedito
  • .
  • 11h - 12h - Colaboração e Ferramentas Livres: possibilidades de contra-hegemonias na Escola. Sergio F. Lima
  • 12h - 13h - Professor Livre! O uso do software livre nas licenciaturas. Wendell Bento Geraldes
  • 13h - 14h - (comer alguma coisa) e Padrões abertos de documentação - ODF. Fa Conti
  • 14h -15h - Bitcoin, o futuro do dinheiro é open source (e livre). Kemel Zaidan e Plataforma Open Hardware para Robótica. Thalis Antunes De Souza e Thomás Antunes de Souza
  • 15h - 16h - Mozilla e Educação, como estamos revolucionando o ensino de habilidades digitais. Marcus Saad
  • 16h - 17h - Arduino Uno x MSP 430. Raphael Pereira AlkmimYuri Adan Gonçalves Cordovil
  • 17h - 18h - Inclusão de PCDs na Educação - Com Software Livre é Possível. Marcos Silva Vieira

17/10/2014

  • 10h - 12h - Presença digital: não basta estar lá, tem que participar. Frederico (aracnus) Gonçalves Guimarães Será um “mão na massa” :-)
  • .
  • 12h - 13h - Acho que vou almoçar :-)
  • 13h - 14h - Educação e tecnologia com recursos livres. Marcos Egito
  • 14h -14:15h - Foto oficial do Evento
  • 14:15h - 15:15h - Introdução ao Latex. Ole Peter Smith
  • 15:15h - 16:15h - abnTeX2 e LaTeX: normas “absurdas” e documentos elegantes. Lauro César
  • 16:15h - 17:15h - Data Science / Big Data / Machine Learning E Software Livre. Eduardo Maçan

Se você estiver por lá, faça contato!


          Combining Open Commerce Datasets to Drive Better Trade Business Intelligence        

“We can each define ambition and progress for ourselves. The goal is to work toward a world where expectations are not set by the stereotypes that hold us back, but by our personal passion, talents and interests.”—Sheryl Sandberg

As we bring private-sector innovators and technologies into our challenge to use public data to solve public problems, it’s striking how many are finding new ways to break through and apply their passions, talents and interests:

  • We have one company that is making our data available free and open on a platform with 700,00 data scientists.
  • We have another company that is wrangling, integrating, and presenting our data with information from a number of other public sources.
  • A third is making Commerce data more accessible via interactive visualizations and filters.
Counselor Justin Antonipillai, Economics & Statistics Administration (left) and Ike Kavas, Founder, Ephesoft

Counselor Justin Antonipillai, Economics & Statistics Administration (left) and Ike Kavas, Founder, Ephesoft

In today’s announcement, we are sharing what Ephesoft will be doing, free and open for the public, to advance the goals of democratizing our data using their technology.

We here at Commerce make a lot of data available in many formats, including in bulk and through our application programming interfaces (APIs). However, some of the data that we make available to the public might be available in pictures that contain the data—like files in “portable document format” (PDF) or in “tagged image file format” (TIFF). Documents in these formats are nearly impossible to derive insights from because the data itself is unstructured and hard to analyze.

In response, Ephesoft has used its digitization and machine learning technology to start extracting meaningful data from these images. Ephesoft first performed a proof-of-concept exercise on data from the US Patent and Trademark Office, by running patent data in image-based PDF format through their platform and identifying fields such as patent date and number.

Once these fields have been identified, Ephesoft’s algorithms extract pertinent data from the images and identify linkage across multiple documents. In its exercise on US patent data, Ephesoft’s resulting mind map visualization displays how one patent is connected to other patents, based on references, citations, and abstracts. This information can be used to analyze bright spots and clusters in US research, as well as identify gaps in patented technology or ‘lonely’ patents in spaces where little other patented art exists.

Before:

Proof-of-concept exercise on data from the US Patent and Trademark Office.

After:

Ephesoft’s mind map visualization displays how one patent is connected to other patents, based on references, citations, and abstracts.

In addition to its ability to extract data from images, Ephesoft can also house large sums of public data to create knowledge bases that allow organizations to compare their unstructured data against a series of benchmarks. For free and open use, their team is now working to combine trade data from the US International Trade Administration, the US Census Bureau, and the Bureau of Economic Analysis at Commerce to develop a public knowledge base for American industry.

This tool will be able to help U.S. businesses answer questions such as:

  • Will regulation requirements impact shifts in my export strategy?
  • Are my export practices compliant with all relevant trade regulations?
  • Which markets are most similar to my current trade portfolio?
  • How does my organization compare to other organizations in the same industry?

What continues to be exciting about these collaborations is that they highlight the many ways that the readers of this blog might bring to help improve the lives of the America people. It’s not about having a specific talent; it’s about how you can use your unique talent to serve.

Data can be used for just about anything these days—ordering a ride, booking a hotel, or determining which political candidate to cast a vote for. When the Commerce Department announced its challenge for private companies to use their technology for public good, we had no idea such a wide variety of organizations would come forward, offering to leverage their unique capabilities for the good of the American people. But they did come forward, in droves, using their individual talents to build tools for our citizens.

We hope you join us.

Thanks for reading.

Justin and Ike


          Brad Burke and Jeff Chen: Driving the Commerce Data Mission Forward        

“Any transition is easier,” Piryanka Chopra says, “if you believe in yourself and your talent.” 

Who is Ms. Chopra?  She’s a young woman whose life has been in constant transition, from being shuttled all over India as a child to becoming a best-known and award-winning Bollywood star and now the first South Asian to headline an American network thriller (Quantico).  Not to mention one of Time magazine’s 100 most influential people in the world and UNICEF Goodwill Ambassador for Child Rights.

With the change in Presidential Administrations, this is a time of great transition for our nation – and for many of us as professionals across government and here at Commerce and ESA.  For those moving to new chapters in your careers, I hope you believe in yourself and your talent as much as I do.  Because I have been incredibly fortunate to witness first-hand your hard work, dedication, focus, creativity, selflessness, teamwork – and ability to move mountains quickly.

Counselor Justin Antonipillai, Economics & Statistics Administration (left to right), Deputy Under Secretary for Economic Affairs Brad Burke, Deputy Chief Data Officer Jeff Chen

Counselor Justin Antonipillai, Economics & Statistics Administration (left to right), Deputy Under Secretary for Economic Affairs Brad Burke, Deputy Chief Data Officer Jeff Chen

As Commerce transitions to the new Administration, it is critical that we have strong leadership in key career positions. That means our work to improve the accessibility and usability of Commerce data – and advance our mission to advance data equality in America – will continue, as I strongly believe it should and hope it will.

Toward this end, we’ve scoped out work for Census, Bureau of Economic Analysis, National Institute of Standards and Technology, and the US Patent and Trademark Office.  All are supporting CDS and working with us to ensure that our joint projects advance their data missions.

Commerce Data Service

And, in transition, these two terrific leaders will be the acting leaders of our Data Pillar:

Brad Burke, our Deputy Under Secretary for Economic Affairs will serve as the Acting Under Secretary and as the leader of our Department’s strategic Data Pillar. 

Brad brings a wealth of management experience to ESA. He has a deep understanding of the federal government and has taken the lead with department officials in addressing important issues including the transition to the next Presidential Administration.

For the past five years, as Director of Administration and Chief Financial Officer for ESA, Brad has assisted in the oversight of operations of ESA and BEA, driving the planning, budgeting, performance and financial management activities. He also provided critical oversight of the Census Bureau’s budget and financial management activities.

In his spare time (ha ha), Brad also has served as a principal liaison with Commerce officials in addressing departmental issues and priorities. 

Before joining ESA, Brad served for six years with the Commerce Bureau of Industry and Security, and prior to that, four years at TSA at the Department of Homeland Security. Preceding federal service, Brad served in various executive positions over 20 years in financial services and manufacturing within the private sector. 

We are so fortunate to have someone of Brad’s talents and experience carrying the baton forward – for Commerce, ESA and most important, the nation we serve.

Jeff Chen, our Deputy Chief Data Officer, will serve as our Department’s Acting Chief Data Officer.

Jeff also brings invaluable experience to his new position.

We appointed Jeff in 2015 as the Department’s first-ever Chief Data Scientist. He was a key early leader in the development of the Commerce Data Service and spearheaded key data science projects bringing Digital Age tools and technologies – like data wrangling, predictive analytics and search string analysis – to modernize government.

Jeff also was the lead mentor for the Commerce Data Academy’s in-residence program, where he guided emerging data scientists and data engineers from other Bureaus as they built tools to benefit their missions. These included the beaR library, and new innovations that streamlined management and operations at PTO and the International Trade Administration.

Last spring, Jeff stepped into a leadership vacuum within the CDS and helped to stabilize management and steer them through the completion of key 2015 projects. The team is now finely honed and functioning smoothly due to his leadership.

Jeff joins the team with extensive experience as a data science leader, having led efforts in over 30 fields with NASA, the White House Office of Science and Technology Policy, the New York City Fire Department and the New York City Mayor’s Office, among others. His extensive background developing products and services in a diverse set of fields, ranging from emergency services to public health to legal affairs to trade, will provide enormous value for the Department as we continue our mission to create the conditions for economic growth and opportunity.

We are so gratified that Brad and Jeff have agreed to assume these leadership positions.  We’ll be well served by their leadership as the Commerce data mission moves forward.

In my previous posts, I’ve cited the “flywheel effect” [link] to describe our concerted effort, and tremendous progress, in turning “American’s Data Agency” into “America’s Data Engine” to fuel the economy, modernize government, address public needs and advance data equality. Together, we’ve gotten the wheel turning.  The challenge now is to propel that momentum into transformative change. 

With deep belief in Brad and Jeff, and their talents, I’m thankful they’ll be bridging the transition, leading the charge – and driving future change. 

Justin

Justin Antonipillai—Counselor to Secretary Penny Pritzker, with the Delegated Duties of the Under Secretary for Economic Affairs


          Innovations in Training at Commerce        

Leonardo DaVinci.  Paul Revere.  Ben Franklin.  Vincent Van Gogh.  Elvis Presley.  David Beede.  April Blair. William Hawk. Andrea Julca.  Karlheinz Skowronek. Patricia Tomczyszyn.

Which names don’t seem to belong with the others?  Trick question.  While DaVinci and other names obviously are famous, the backgrounds of all of these women and men have something major in common: They trained as apprentices.

Cover image of the The Benefits and Costs of Apprenticeships: A Business Perspective report.DaVinci and Van Gogh apprenticed as painters.  Revere, as a silversmith and Franklin, as a printer.  Elvis apprenticed not as a singer, but as an electrician.  As for Beede, Blair, Hawk, Julca, Skowronek and Tomczyszyn, they are all Commerce Department employees who trained on the job in data analysis, discovery and visualization at the Commerce Data Academy.  (More on this in a bit.)

This past November included National Apprenticeship Week, and as US Secretary of Labor Thomas Perez said, “Apprenticeships are experiencing a modern renaissance in America because the earn-while-learn model is a win-win proposition for workers looking to punch their ticket to the middle-class and for employers looking to grow and thrive in our modern global economy.”

The first-time study of the business benefits and costs of apprenticeships by ESA’s Office of Chief Economist and Case Western University underscores the point.  Key findings of the full report from 13 case studies include:

  • Companies turned to apprenticeships most often when they simply could not find skilled workers off the street locally.
  • Filling hard-to-fill jobs was the single most common benefit of apprenticeships.
  • Companies adapt apprenticeships to meet their unique needs.

The report also looked at the relative costs and benefits at two companies.  Siemens finds an 8 percent return on investment to its apprenticeship program relative to hiring skilled workers.  The Dartmouth-Hitchcock Medical Centers apprenticeships helped to reduce unpopular, unproductive, expensive overtime work from medical providers and to increase booked hours, which together more than paid for the apprentices.

ApprenticeshipUSA Toolkit - Advancing Apprenticeship as a Workforce StrategyOverall, the companies studied were unanimous in their support of registered apprenticeships.  They found value in the program and identified benefits that more than justified the costs and commitments to the apprentices.

The Labor Department’s ApprenticeshipUSA program offers tools and information to employers and employees about adopting the age-old earn-while-learning model for 21st century workforce needs.

Meanwhile, Commerce is helping to pioneer a hands-on, data-centric instructional model for the federal government by establishing our Commerce Data Academy – winner of the FedScoop 50 Tech Program of the Year for 2016.

Launched by the Commerce Data Service, the Data Academy’s goal is to empower more Commerce employees to make data-driven decisions, advancing the data pillar of Secretary Penny Pritzker’s “Open for Business” strategy and bringing a data-driven approach to modernizing government.

The Academy began as a pilot last January to test demand for training in data science, data engineering and web development training.  The pilot offered classes in agile development, HTML & CSS, Storytelling with Data, and Excel at Excel. 

Commerce Data Academy - Educating and empowering Commerce Employees.

We crossed our fingers for at least 30 pilot enrollees.  We were thrilled to receive 422 registrations, and initial classes posted a nearly 90 percent attendance rate. 

This overwhelming response told us: We needed to officially launch the Academy. 

So we did, and to date, we have presented 20 courses (two more are coming) with topics from agile development, to storytelling with data, to machine learning, to programming in Python. We have more than 1,900 unique Commerce registrants and nearly 1,100 unique attendees so far.

For the truly committed (and talented), the Academy offers an in-depth residency program that features a chance to work alongside data scientists and professional developers within the Commerce Data Service.  Residents must tough through an immersive boot camp to prepare for a three-month detail, but the reward is a chance to work on a high-priority problem or need identified by their home bureaus.

The first class of 13 residents recently graduated, and their bureaus are thrilled with the data-oriented products they have created – several featured at the recent Opportunity Project Demo Day and the Commerce Data Advisory Council (CDAC) meeting – and they’ve been instrumental in many of the Commerce Data Service’s award-winning innovations.

We really appreciated the additional feedback from CDAC’s digital thought leaders who encouraged us to spread the word out about our academy, join the online digital education community, harness the alumni network as it grows, and keep teaching our grads. 

Back Row L-R: Kevin Markham (Instructor, General Assembly); Steven Finkelstein (Census); Dmitri Smith (BIS); William Hawk (ESA); David Garrow (Census); Adam Bray (Instructor, General Assembly); David Beede (ESA). Front Row L-R: Karlheinz Skowronek (PTO), April Blair (PTO), Tanya Shen (BEA), Stephen Devine (EDA), Laura Cutrer (NOAA), and Patricia Tomczyszyn (MBDA). Not pictured: Amanda Reynolds (ITA), Gregory Paige (OS), Andrea Julca (BEA), Jennifer Rimbach (MBDA) Photograph: Dr. Tyrone Grandison.

Back Row L-R: Kevin Markham (Instructor, General Assembly); Steven Finkelstein (Census); Dmitri Smith (BIS); William Hawk (ESA); David Garrow (Census); Adam Bray (Instructor, General Assembly); David Beede (ESA). Front Row L-R: Karlheinz Skowronek (PTO), April Blair (PTO), Tanya Shen (BEA), Stephen Devine (EDA), Laura Cutrer (NOAA), and Patricia Tomczyszyn (MBDA). Not pictured: Amanda Reynolds (ITA), Gregory Paige (OS), Andrea Julca (BEA), Jennifer Rimbach (MBDA) Photograph: Dr. Tyrone Grandison.

Data skills are invaluable in the 21st century digital economy, and we want to do our part to advance America’s economy and competitiveness. 

Jake Schwartz, CEO and co-founder of the education startup General Assembly (a collaborator in setting up the initial Commerce Data Academy courses), recently told CNBC that the number-one skill that employers demand now is around data science, development and analytics.  This demand is driving new opportunities for rewarding jobs and careers in data.  But not just for digital and data professionals.  Everyone, in almost every industry and every job, and every level, including CEOs, will need a basic grounding. 

“An investment in knowledge,” onetime apprentice Ben Franklin said, “pays the best interest.” 

Franklin’s employers who invested in his printing apprenticeship certainly never imagined that he would leverage his trade to help found a nation.  Who knows how the Commerce Data Academy will launch David Beede, April Blair, William Hawk, Andrea Julca, Karlheinz Skowronek, Patricia Tomczyszyn and other alumni – current and future – to a future that could change the world. 

Thanks for reading.

Justin

Justin Antonipillai - Counselor to Secretary Penny Pritzker, with the Delegated Duties of the Under Secretary for Economic Affairs


          Kaggle’s Data Science Community to Solve Public Problems with Commerce Open Data        

Quick quiz:

  • Which state has the highest percentage of working moms?
  • Who’s more employed—people with bachelor’s degrees or doctorates?
  • Who earns more income—people who get to work at 7 am, 8 am or 7 pm?
Counselor Justin Antonipillai and Kaggle CEO  Anthony Goldbloom.

When we think about the information that the American people should have at our fingertips to make decisions about the way we live and work, the above data is exactly the kind that needs to be accessible and available. And when Commerce—“America’s Data Agency”—issued a call to the private sector to get our data out to those who are not accessing it directly, these were exactly the kinds of answers we were looking for. You have seen our prior call for help for the public good, and Kaggle is one of those companies that have stepped up to the challenge.

With Commerce public datasets loaded on to the Kaggle platform, you can find the answers to the above questions. In fact, “Kagglers”—members of the Kaggle community—analyzed data from the Census Bureau’s American Community Survey, the nation’s premier source for information about America’s changing population, housing and workforce, to challenge conventional wisdom with these answers.

Kaggle has committed to putting valuable Commerce datasets in front of its global community of data scientists, developers and coders. Making public data more open and accessible in this way helps democratize our data, promote data equality, and show what’s possible when the private and nonprofits sectors collaborate to take public data and run with it to address public problems.

Kaggle’s Response to the Challenge of Data Inequality

As Anthony will tell you, Kaggle’s mission is to help the world learn from data, making it easier for researchers, data scientists, and hobbyists to work collaboratively on reproducible projects by allowing data, code, and discussion to live and grow in a single ecosystem.

Responding to the Department’s call to address data inequality, Kaggle has committed to taking a series of publicly available Commerce datasets from the US Patent and Trademark Office and the US Census Bureau and others, and challenge the Kaggle community to solve public problems. Kagglers will be challenged to analyze innovation, creativity, and technological progress in the United States, and dig deeply into the stories of how Americans live and work to uncover insights about our country.

And, how does putting Commerce datasets on the Kaggle platform and before the Kaggle community help address data inequality?

First, by publishing datasets into an active data science community of around 700,000 Kagglers, where sharing insights, analytic approaches or methods and learning is the norm, there is a real opportunity to bring insights from this data to people, charities, nonprofits and small companies around the country. In addition to data, the Kaggle platform offers conversational threads, visual stories and a repository of documented code to accompany datasets prepared for analysis.

Second, Kaggle also runs machine-learning competitions in domains ranging from the diagnosis of diabetic retinopathy to the classification of galaxies, and brings together machine-learning veterans and students with varied academic and professional backgrounds. Datasets shared on Kaggle enable data scientists, researchers, and others who work with data, to find and share anything from civic statistics to European soccer matches for open community collaboration. This permits combining consistent access to public data with reproducible analysis, visibility of results, and conversations on forums with others interested in the data.

The ability to combine our Commerce data with other public data sets could bring insights that may not exist in our data alone.

Third, the in-browser analytics platform, Kaggle Kernels, will allow open analysis, visualization, and modeling of the Commerce data sets, as you’ll see illustrated below. Each Commerce dataset will be accompanied by a repository of code and insights, which enables quick learning and active contribution by the whole community.

The goal of all of this is to enable data scientists to find critical insights in our data and share them with the American people.

Kaggle will post more Commerce public datasets soon. We look forward to giving you an update—and of course, getting your thoughts, insights and comments.

– Justin and Anthony

PS: Here are the answers to the quiz at the top of this blog:

Kaggle Kernel, involving over 11,000 data scientists, found that Americans who  start their day around 8 am earn the most.

Kaggle Kernel, involving over 11,000 data scientists, found that Americans who start their day around 8 am earn the most.

This Kaggle Kernel  investigated whether it  pays to pursue a PhD and the best states to find a job post-degree. The analysis has received over  30,000 views and nearly 90 other data scientists have created reproducible  forks of the code.

This Kaggle Kernel investigated whether it pays to pursue a PhD and the best states to find a job post-degree. The analysis has received over 30,000 views and nearly 90 other data scientists have created reproducible forks of the code.

One working mother and data scientist uses the rich data provided by the Census Commerce American Communities Survey to explore the stories of American working moms in this Kaggle Kernel viewed by over 14,000 people.

One working mother and data scientist uses the rich data provided by the Census Commerce American Communities Survey to explore the stories of American working moms in this Kaggle Kernel viewed by over 14,000 people.


          100 announcements (!) from Google Cloud Next '17        

San Francisco — What a week! Google Cloud Next ‘17 has come to the end, but really, it’s just the beginning. We welcomed 10,000+ attendees including customers, partners, developers, IT leaders, engineers, press, analysts, cloud enthusiasts (and skeptics). Together we engaged in 3 days of keynotes, 200+ sessions, and 4 invitation-only summits. Hard to believe this was our first show as all of Google Cloud with GCP, G Suite, Chrome, Maps and Education. Thank you to all who were here with us in San Francisco this week, and we hope to see you next year.

If you’re a fan of video highlights, we’ve got you covered. Check out our Day 1 keynote (in less than 4 minutes) and Day 2 keynote (in under 5!).

One of the common refrains from customers and partners throughout the conference was “Wow, you’ve been busy. I can’t believe how many announcements you’ve had at Next!” So we decided to count all the announcements from across Google Cloud and in fact we had 100 (!) announcements this week.

For the list lovers amongst you, we’ve compiled a handy-dandy run-down of our announcements from the past few days:

100-announcements-15

Google Cloud is excited to welcome two new acquisitions to the Google Cloud family this week, Kaggle and AppBridge.

1. Kaggle - Kaggle is one of the world's largest communities of data scientists and machine learning enthusiasts. Kaggle and Google Cloud will continue to support machine learning training and deployment services in addition to offering the community the ability to store and query large datasets.

2. AppBridge - Google Cloud acquired Vancouver-based AppBridge this week, which helps you migrate data from on-prem file servers into G Suite and Google Drive.

100-announcements-4

Google Cloud brings a suite of new security features to Google Cloud Platform and G Suite designed to help safeguard your company’s assets and prevent disruption to your business: 

3. Identity-Aware Proxy (IAP) for Google Cloud Platform (Beta) - Identity-Aware Proxy lets you provide access to applications based on risk, rather than using a VPN. It provides secure application access from anywhere, restricts access by user, identity and group, deploys with integrated phishing resistant Security Key and is easier to setup than end-user VPN.

4. Data Loss Prevention (DLP) for Google Cloud Platform (Beta) - Data Loss Prevention API lets you scan data for 40+ sensitive data types, and is used as part of DLP in Gmail and Drive. You can find and redact sensitive data stored in GCP, invigorate old applications with new sensitive data sensing “smarts” and use predefined detectors as well as customize your own.

5. Key Management Service (KMS) for Google Cloud Platform (GA) - Key Management Service allows you to generate, use, rotate, and destroy symmetric encryption keys for use in the cloud.

6. Security Key Enforcement (SKE) for Google Cloud Platform (GA) - Security Key Enforcement allows you to require security keys be used as the 2-Step verification factor for enhanced anti-phishing security whenever a GCP application is accessed.

7. Vault for Google Drive (GA) - Google Vault is the eDiscovery and archiving solution for G Suite. Vault enables admins to easily manage their G Suite data lifecycle and search, preview and export the G Suite data in their domain. Vault for Drive enables full support for Google Drive content, including Team Drive files.

8. Google-designed security chip, Titan - Google uses Titan to establish hardware root of trust, allowing us to securely identify and authenticate legitimate access at the hardware level. Titan includes a hardware random number generator, performs cryptographic operations in the isolated memory, and has a dedicated secure processor (on-chip).

100-announcements-7

New GCP data analytics products and services help organizations solve business problems with data, rather than spending time and resources building, integrating and managing the underlying infrastructure:

9. BigQuery Data Transfer Service (Private Beta) - BigQuery Data Transfer Service makes it easy for users to quickly get value from all their Google-managed advertising datasets. With just a few clicks, marketing analysts can schedule data imports from Google Adwords, DoubleClick Campaign Manager, DoubleClick for Publishers and YouTube Content and Channel Owner reports.

10. Cloud Dataprep (Private Beta) - Cloud Dataprep is a new managed data service, built in collaboration with Trifacta, that makes it faster and easier for BigQuery end-users to visually explore and prepare data for analysis without the need for dedicated data engineer resources.

11. New Commercial Datasets - Businesses often look for datasets (public or commercial) outside their organizational boundaries. Commercial datasets offered include financial market data from Xignite, residential real-estate valuations (historical and projected) from HouseCanary, predictions for when a house will go on sale from Remine, historical weather data from AccuWeather, and news archives from Dow Jones, all immediately ready for use in BigQuery (with more to come as new partners join the program).

12. Python for Google Cloud Dataflow in GA - Cloud Dataflow is a fully managed data processing service supporting both batch and stream execution of pipelines. Until recently, these benefits have been available solely to Java developers. Now there’s a Python SDK for Cloud Dataflow in GA.

13. Stackdriver Monitoring for Cloud Dataflow (Beta) - We’ve integrated Cloud Dataflow with Stackdriver Monitoring so that you can access and analyze Cloud Dataflow job metrics and create alerts for specific Dataflow job conditions.

14. Google Cloud Datalab in GA - This interactive data science workflow tool makes it easy to do iterative model and data analysis in a Jupyter notebook-based environment using standard SQL, Python and shell commands.

15. Cloud Dataproc updates - Our fully managed service for running Apache Spark, Flink and Hadoop pipelines has new support for restarting failed jobs (including automatic restart as needed) in beta, the ability to create single-node clusters for lightweight sandbox development, in beta, GPU support, and the cloud labels feature, for more flexibility managing your Dataproc resources, is now GA.

100-announcements-9

New GCP databases and database features round out a platform on which developers can build great applications across a spectrum of use cases:

16. Cloud SQL for Postgre SQL (Beta) - Cloud SQL for PostgreSQL implements the same design principles currently reflected in Cloud SQL for MySQL, namely, the ability to securely store and connect to your relational data via open standards.

17. Microsoft SQL Server Enterprise (GA) - Available on Google Compute Engine, plus support for Windows Server Failover Clustering (WSFC) and SQL Server AlwaysOn Availability (GA).

18. Cloud SQL for MySQL improvements - Increased performance for demanding workloads via 32-core instances with up to 208GB of RAM, and central management of resources via Identity and Access Management (IAM) controls.

19. Cloud Spanner - Launched a month ago, but still, it would be remiss not to mention it because, hello, it’s Cloud Spanner! The industry’s first horizontally scalable, globally consistent, relational database service.

20. SSD persistent-disk performance improvements - SSD persistent disks now have increased throughput and IOPS performance, which are particularly beneficial for database and analytics workloads. Read these docs for complete details about persistent-disk performance.

21. Federated query on Cloud Bigtable - We’ve extended BigQuery’s reach to query data inside Cloud Bigtable, the NoSQL database service for massive analytic or operational workloads that require low latency and high throughput (particularly common in Financial Services and IoT use cases).

100-announcements-11

New GCP Cloud Machine Learning services bolster our efforts to make machine learning accessible to organizations of all sizes and sophistication:

22.  Cloud Machine Learning Engine (GA) - Cloud ML Engine, now generally available, is for organizations that want to train and deploy their own models into production in the cloud.

23. Cloud Video Intelligence API (Private Beta) - A first of its kind, Cloud Video Intelligence API lets developers easily search and discover video content by providing information about entities (nouns such as “dog,” “flower”, or “human” or verbs such as “run,” “swim,” or “fly”) inside video content.

24. Cloud Vision API (GA) - Cloud Vision API reaches GA and offers new capabilities for enterprises and partners to classify a more diverse set of images. The API can now recognize millions of entities from Google’s Knowledge Graph and offers enhanced OCR capabilities that can extract text from scans of text-heavy documents such as legal contracts or research papers or books.

25. Machine learning Advanced Solution Lab (ASL) - ASL provides dedicated facilities for our customers to directly collaborate with Google’s machine-learning experts to apply ML to their most pressing challenges.

26. Cloud Jobs API - A powerful aid to job search and discovery, Cloud Jobs API now has new features such as Commute Search, which will return relevant jobs based on desired commute time and preferred mode of transportation.

27. Machine Learning Startup Competition - We announced a Machine Learning Startup Competition in collaboration with venture capital firms Data Collective and Emergence Capital, and with additional support from a16z, Greylock Partners, GV, Kleiner Perkins Caufield & Byers and Sequoia Capital.

100-announcements-10

New GCP pricing continues our intention to create customer-friendly pricing that’s as smart as our products; and support services that are geared towards meeting our customers where they are:

28. Compute Engine price cuts - Continuing our history of pricing leadership, we’ve cut Google Compute Engine prices by up to 8%.

29. Committed Use Discounts - With Committed Use Discounts, customers can receive a discount of up to 57% off our list price, in exchange for a one or three year purchase commitment paid monthly, with no upfront costs.

30. Free trial extended to 12 months - We’ve extended our free trial from 60 days to 12 months, allowing you to use your $300 credit across all GCP services and APIs, at your own pace and schedule. Plus, we’re introduced new Always Free products -- non-expiring usage limits that you can use to test and develop applications at no cost. Visit the Google Cloud Platform Free Tier page for details.

31. Engineering Support - Our new Engineering Support offering is a role-based subscription model that allows us to match engineer to engineer, to meet you where your business is, no matter what stage of development you’re in. It has 3 tiers:

  • Development engineering support - ideal for developers or QA engineers that can manage with a response within four to eight business hours, priced at $100/user per month.
  • Production engineering support provides a one-hour response time for critical issues at $250/user per month.
  • On-call engineering support pages a Google engineer and delivers a 15-minute response time 24x7 for critical issues at $1,500/user per month.

32. Cloud.google.com/community site - Google Cloud Platform Community is a new site to learn, connect and share with other people like you, who are interested in GCP. You can follow along with tutorials or submit one yourself, find meetups in your area, and learn about community resources for GCP support, open source projects and more.

100-announcements-8

New GCP developer platforms and tools reinforce our commitment to openness and choice and giving you what you need to move fast and focus on great code.

33. Google AppEngine Flex (GA) - We announced a major expansion of our popular App Engine platform to new developer communities that emphasizes openness, developer choice, and application portability.

34. Cloud Functions (Beta) - Google Cloud Functions has launched into public beta. It is a serverless environment for creating event-driven applications and microservices, letting you build and connect cloud services with code.

35. Firebase integration with GCP (GA) - Firebase Storage is now Google Cloud Storage for Firebase and adds support for multiple buckets, support for linking to existing buckets, and integrates with Google Cloud Functions.

36. Cloud Container Builder - Cloud Container Builder is a standalone tool that lets you build your Docker containers on GCP regardless of deployment environment. It’s a fast, reliable, and consistent way to package your software into containers as part of an automated workflow.

37. Community Tutorials (Beta)  - With community tutorials, anyone can now submit or request a technical how-to for Google Cloud Platform.

100-announcements-9

Secure, global and high-performance, we’ve built our cloud for the long haul. This week we announced a slew of new infrastructure updates. 

38. New data center region: California - This new GCP region delivers lower latency for customers on the West Coast of the U.S. and adjacent geographic areas. Like other Google Cloud regions, it will feature a minimum of three zones, benefit from Google’s global, private fibre network, and offer a complement of GCP services.

39. New data center region: Montreal - This new GCP region delivers lower latency for customers in Canada and adjacent geographic areas. Like other Google Cloud regions, it will feature a minimum of three zones, benefit from Google’s global, private fibre network, and offer a complement of GCP services.

40. New data center region: Netherlands - This new GCP region delivers lower latency for customers in Western Europe and adjacent geographic areas. Like other Google Cloud regions, it will feature a minimum of three zones, benefit from Google’s global, private fibre network, and offer a complement of GCP services.

41. Google Container Engine - Managed Nodes - Google Container Engine (GKE) has added Automated Monitoring and Repair of your GKE nodes, letting you focus on your applications while Google ensures your cluster is available and up-to-date.

42. 64 Core machines + more memory - We have doubled the number of vCPUs you can run in an instance from 32 to 64 and up to 416GB of memory per instance.

43. Internal Load balancing (GA) - Internal Load Balancing, now GA, lets you run and scale your services behind a private load balancing IP address which is accessible only to your internal instances, not the internet.

44. Cross-Project Networking (Beta) - Cross-Project Networking (XPN), now in beta, is a virtual network that provides a common network across several Google Cloud Platform projects, enabling simple multi-tenant deployments.

100-announcements-16

In the past year, we’ve launched 300+ features and updates for G Suite and this week we announced our next generation of collaboration and communication tools.

45. Team Drives (GA for G Suite Business, Education and Enterprise customers) - Team Drives help teams simply and securely manage permissions, ownership and file access for an organization within Google Drive.

46. Drive File Stream (EAP) - Drive File Stream is a way to quickly stream files directly from the cloud to your computer With Drive File Steam, company data can be accessed directly from your laptop, even if you don’t have much space on your hard drive.

47. Google Vault for Drive (GA for G Suite Business, Education and Enterprise customers) - Google Vault for Drive now gives admins the governance controls they need to manage and secure all of their files, including employee Drives and Team Drives. Google Vault for Drive also lets admins set retention policies that automatically keep what’s needed and delete what’s not.

48. Quick Access in Team Drives (GA) - powered by Google’s machine intelligence, Quick Access helps to surface the right information for employees at the right time within Google Drive. Quick Access now works with Team Drives on iOS and Android devices, and is coming soon to the web.

49. Hangouts Meet (GA to existing customers) - Hangouts Meet is a new video meeting experience built on the Hangouts that can run 30-person video conferences without accounts, plugins or downloads. For G Suite Enterprise customers, each call comes with a dedicated dial-in phone number so that team members on the road can join meetings without wifi or data issues.

50. Hangouts Chat (EAP) - Hangouts Chat is an intelligent communication app in Hangouts with dedicated, virtual rooms that connect cross-functional enterprise teams. Hangouts Chat integrates with G Suite apps like Drive and Docs, as well as photos, videos and other third-party enterprise apps.

51. @meet - @meet is an intelligent bot built on top of the Hangouts platform that uses natural language processing and machine learning to automatically schedule meetings for your team with Hangouts Meet and Google Calendar.

52. Gmail Add-ons for G Suite (Developer Preview) - Gmail Add-ons provide a way to surface the functionality of your app or service directly in Gmail. With Add-ons, developers only build their integration once, and it runs natively in Gmail on web, Android and iOS.

53. Edit Opportunities in Google Sheets - with Edit Opportunities in Google Sheets, sales reps can sync a Salesforce Opportunity List View to Sheets to bulk edit data and changes are synced automatically to Salesforce, no upload required.

54. Jamboard - Our whiteboard in the cloud goes GA in May! Jamboard merges the worlds of physical and digital creativity. It’s real time collaboration on a brilliant scale, whether your team is together in the conference room or spread all over the world.

100-announcements-17

Building on the momentum from a growing number of businesses using Chrome digital signage and kiosks, we added new management tools and APIs in addition to introducing support for Android Kiosk apps on supported Chrome devices. 

55. Android Kiosk Apps for Chrome - Android Kiosk for Chrome lets users manage and deploy Chrome digital signage and kiosks for both web and Android apps. And with Public Session Kiosks, IT admins can now add a number of Chrome packaged apps alongside hosted apps.

56. Chrome Kiosk Management Free trial - This free trial gives customers an easy way to test out Chrome for signage and kiosk deployments.

57. Chrome Device Management (CDM) APIs for Kiosks - These APIs offer programmatic access to various Kiosk policies. IT admins can schedule a device reboot through the new APIs and integrate that functionality directly in a third- party console.

58. Chrome Stability API - This new API allows Kiosk app developers to improve the reliability of the application and the system.

100-announcements-2

Attendees at Google Cloud Next ‘17 heard stories from many of our valued customers:

59. Colgate - Colgate-Palmolive partnered with Google Cloud and SAP to bring thousands of employees together through G Suite collaboration and productivity tools. The company deployed G Suite to 28,000 employees in less than six months.

60. Disney Consumer Products & Interactive (DCPI) - DCPI is on target to migrate out of its legacy infrastructure this year, and is leveraging machine learning to power next generation guest experiences.

61. eBay - eBay uses Google Cloud technologies including Google Container Engine, Machine Learning and AI for its ShopBot, a personal shopping bot on Facebook Messenger.

62. HSBC - HSBC is one of the world's largest financial and banking institutions and making a large investment in transforming its global IT. The company is working closely with Google to deploy Cloud DataFlow, BigQuery and other data services to power critical proof of concept projects.

63. LUSH - LUSH migrated its global e-commerce site from AWS to GCP in less than six weeks, significantly improving the reliability and stability of its site. LUSH benefits from GCP’s ability to scale as transaction volume surges, which is critical for a retail business. In addition, Google's commitment to renewable energy sources aligns with LUSH's ethical principles.

64. Oden Technologies - Oden was part of Google Cloud’s startup program, and switched its entire platform to GCP from AWS. GCP offers Oden the ability to reliably scale while keeping costs low, perform under heavy loads and consistently delivers sophisticated features including machine learning and data analytics.

65. Planet - Planet migrated to GCP in February, looking to accelerate their workloads and leverage Google Cloud for several key advantages: price stability and predictability, custom instances, first-class Kubernetes support, and Machine Learning technology. Planet also announced the beta release of their Explorer platform.

66. Schlumberger - Schlumberger is making a critical investment in the cloud, turning to GCP to enable high-performance computing, remote visualization and development velocity. GCP is helping Schlumberger deliver innovative products and services to its customers by using HPC to scale data processing, workflow and advanced algorithms.

67. The Home Depot - The Home Depot collaborated with GCP’s Customer Reliability Engineering team to migrate HomeDepot.com to the cloud in time for Black Friday and Cyber Monday. Moving to GCP has allowed the company to better manage huge traffic spikes at peak shopping times throughout the year.

68. Verizon - Verizon is deploying G Suite to more than 150,000 of its employees, allowing for collaboration and flexibility in the workplace while maintaining security and compliance standards. Verizon and Google Cloud have been working together for more than a year to bring simple and secure productivity solutions to Verizon’s workforce.

100-announcements-3

We brought together Google Cloud partners from our growing ecosystem across G Suite, GCP, Maps, Devices and Education. Our partnering philosophy is driven by a set of principles that emphasize openness, innovation, fairness, transparency and shared success in the cloud market. Here are some of our partners who were out in force at the show:

69. Accenture - Accenture announced that it has designed a mobility solution for Rentokil, a global pest control company, built in collaboration with Google as part of the partnership announced at Horizon in September.

70. Alooma - Alooma announced the integration of the Alooma service with Google Cloud SQL and BigQuery.

71. Authorized Training Partner Program - To help companies scale their training offerings more quickly, and to enable Google to add other training partners to the ecosystem, we are introducing a new track within our partner program to support their unique offerings and needs.

72. Check Point - Check Point® Software Technologies announced Check Point vSEC for Google Cloud Platform, delivering advanced security integrated with GCP as well as their joining of the Google Cloud Technology Partner Program.

73. CloudEndure - We’re collaborating with CloudEndure to offer a no cost, self-service migration tool for Google Cloud Platform (GCP) customers.

74. Coursera - Coursera announced that it is collaborating with Google Cloud Platform to provide an extensive range of Google Cloud training course. To celebrate this announcement  Coursera is offering all NEXT attendees a 100% discount for the GCP fundamentals class.

75. DocuSign - DocuSign announced deeper integrations with Google Docs.

76. Egnyte - Egnyte announced an enhanced integration with Google Docs that will allow our joint customers to create, edit, and store Google Docs, Sheets and Slides files right from within the Egnyte Connect.

77. Google Cloud Global Partner Awards - We recognized 12 Google Cloud partners that demonstrated strong customer success and solution innovation over the past year: Accenture, Pivotal, LumApps, Slack, Looker, Palo Alto Networks, Virtru, SoftBank, DoIT, Snowdrop Solutions, CDW Corporation, and SYNNEX Corporation.

78. iCharts - iCharts announced additional support for several GCP databases, free pivot tables for current Google BigQuery users, and a new product dubbed “iCharts for SaaS.”

79. Intel - In addition to the progress with Skylake, Intel and Google Cloud launched several technology initiatives and market education efforts covering IoT, Kubernetes and TensorFlow, including optimizations, a developer program and tool kits.

80. Intuit - Intuit announced Gmail Add-Ons, which are designed to integrate custom workflows into Gmail based on the context of a given email.

81. Liftigniter - Liftigniter is a member of Google Cloud’s startup program and focused on machine learning personalization using predictive analytics to improve CTR on web and in-app.

82. Looker - Looker launched a suite of Looker Blocks, compatible with Google BigQuery Data Transfer Service, designed to give marketers the tools to enhance analysis of their critical data.

83. Low interest loans for partners - To help Premier Partners grow their teams, Google announced that capital investment are available to qualified partners in the form of low interest loans.

84. MicroStrategy - MicroStrategy announced an integration with Google Cloud SQL for PostgreSQL and Google Cloud SQL for MySQL.

85. New incentives to accelerate partner growth - We are increasing our investments in multiple existing and new incentive programs; including, low interest loans to help Premier Partners grow their teams, increasing co-funding to accelerate deals, and expanding our rebate programs.

86. Orbitera Test Drives for GCP Partners - Test Drives allow customers to try partners’ software and generate high quality leads that can be passed directly to the partners’ sales teams. Google is offering Premier Cloud Partners one year of free Test Drives on Orbitera.

87. Partner specializations - Partners demonstrating strong customer success and technical proficiency in certain solution areas will now qualify to apply for a specialization. We’re launching specializations in application development, data analytics, machine learning and infrastructure.

88. Pivotal - GCP announced Pivotal as our first CRE technology partner. CRE technology partners will work hand-in-hand with Google to thoroughly review their solutions and implement changes to address identified risks to reliability.

89. ProsperWorks - ProsperWorks announced Gmail Add-Ons, which are designed to integrate custom workflows into Gmail based on the context of a given email.

90. Qwiklabs - This recent acquisition will provide Authorized Training Partners the ability to offer hands-on labs and comprehensive courses developed by Google experts to our customers.

91. Rackspace - Rackspace announced a strategic relationship with Google Cloud to become its first managed services support partner for GCP, with plans to collaborate on a new managed services offering for GCP customers set to launch later this year.

92. Rocket.Chat - Rocket.Chat, a member of Google Cloud’s startup program, is adding a number of new product integrations with GCP including Autotranslate via Translate API, integration with Vision API to screen for inappropriate content, integration to NLP API to perform sentiment analysis on public channels, integration with GSuite for authentication and a full move of back-end storage to Google Cloud Storage.

93. Salesforce - Salesforce announced Gmail Add-Ons, which are designed to integrate custom workflows into Gmail based on the context of a given email.

94. SAP - This strategic partnership includes certification of SAP HANA on GCP, new G Suite integrations and future collaboration on building machine learning features into intelligent applications like conversational apps that guide users through complex workflows and transactions.

95. Smyte - Smyte participated in the Google Cloud startup program and protects millions of actions a day on websites and mobile applications. Smyte recently moved from self-hosted Kubernetes to Google Container Engine (GKE).

96. Veritas - Veritas expanded its partnership with Google Cloud to provide joint customers with 360 Data Management capabilities. The partnership will help reduce data storage costs, increase compliance and eDiscovery readiness and accelerate the customer’s journey to Google Cloud Platform.

97. VMware Airwatch - Airwatch provides enterprise mobility management solutions for Android and continues to drive the Google Device ecosystem to enterprise customers.

98. Windows Partner Program- We’re working with top systems integrators in the Windows community to help GCP customers take full advantage of Windows and .NET apps and services on our platform.

99. Xplenty - Xplenty announced the addition of two new services from Google Cloud into their available integrations: Google Cloud Spanner and Google Cloud SQL for PostgreSQL.

100. Zoomdata - Zoomdata announced support for Google’s Cloud Spanner and PostgreSQL on GCP, as well as enhancements to the existing Zoomdata Smart Connector for Google BigQuery. With these new capabilities Zoomdata offers deeply integrated and optimized support for Google Cloud Platform’s Cloud Spanner, PostgreSQL, Google BigQuery, and Cloud DataProc services.

We’re thrilled to have so many new products and partners that can help all of our customers grow. And as our final announcement for Google Cloud Next ’17 — please save the date for Next 2018: June 4–6 in San Francisco.

I guess that makes it 101. :-)



          Data science key to Monsanto improving its supply chain        

Monsanto CIO Jim Swanson has outlined five key elements of Monsanto’s digital transformation: customer centricity, internal business process disruption, technology and automation, data and decision science, and leadership and change management.

He views all of those facets as integral to breaking through the "clay layer" of mid-level employees, present in every organization, who may resist short-term changes in spite of long-term gain — the rank-and-file who can either inhibit digital transformation or become its greatest champions. In so doing, he asserts, IT is poised to take on its most rigorous and most critical role of "transformation agent," providing tangible and sustainable value while moving the business forward.

To read this article in full or to leave a comment, please click here


          What's Box Boy Richard Gage Up to These Days?        
If you check his events page, it looks like "not much" is the answer. The most recent event shown is the 15th anniversary, where Gage shared the stage with Munchkin Barbara Honegger.

But it turns out that the founder of AE911Truth participated in the recent Nation of Islam conference. It's not like Gage to avoid publicizing such events; when he appeared there in 2012 he was certainly crowing about the opportunity to expose Louis Farrakhan's followers to 9-11 Troof.

Back then we bashed him a bit based on inside information we had stating that Boy Wonder Kevin Barrett would be appearing with him. As it turned out, our insider was wrong. Waterboy Kevin Ryan appeared instead.

And indeed, this may give us something of a clue as to why Richard Gage is not anxious to publicize his appearance at the Nation of Islam rally. This time around, not only was he appearing with Kevin "the Holocaust is a hideously destructive myth" Barrett, but also Christopher Bollyn.

Bollyn gave us quite a bit of amusement back in the early days of this blog. Once a reporter for the Holocaust-denying, white separatist rag the American Free Press, Bollyn was an early investigoogler of 9-11 nuttery, with the result that when a more respectable and scholarly-seeming man like David Ray Griffin came along, Bollyn's work was often cited. Indeed, I used to joke that Griffin seemed unable to complete a book without referring to him once or twice.

One afternoon, Bollyn was apparently drinking when he noticed a suspicious looking vehicle parked in his neighborhood. When he confronted the occupants of the vehicle, they apparently freely admitted they were local cops on a stakeout of a nearby residence that they suspected of drug-dealing.

Well, paranoid people are going to be paranoid, and Bollyn assumed that the cops were in fact spying on him. Big altercation, Bollyn assaults a cop, and he's up on charges. He was convicted and probably facing about 90 days in the big house, but he lammed instead.

So this time around, Gage shared the stage with two Holocaust deniers, one of whom may still be a fugitive from justice.

I'm sure he'd much rather we talk about his exciting new NIST whistleblower.

But it's a classic false appeal to authority. As Peter Michael Ketcham himself notes in the video, he did not work on the WTC investigation. He states that he was in the mathematical computations area, which leads me to wonder if we are in for some real deep calculations that prove inside job.

Not to worry. Ketcham's cited evidence has nothing to do with number crunching. It's the usual "symmetrical collapse into its own footprint at near free-fall acceleration." Ketcham is Charlie Sheen with less hair.

Here's Ketcham's Linked-In page. His current occupation?

Mobile application developer currently building a data visualization application for Apple iOS devices with an emphasis on accessibility for disabled users. Other interests include data science, virtual reality environments, haptic technologies, hierarchical data formats, matrix computations, and Swift numeric data types for rational numbers, complex numbers, and quaternions.

Again, if he were questioning the numbers used by NIST he might have some credibility. But he's not, he's just parroting the Truther talking points.
          Data Scientist (m/w)        
VDI/VDE Innovation + Technik GmbH
          Data Science Reasoning, by Anne L. Washington        

During this year that I have been off I've been thinking about how to teach both people who are trained in technical parts of data science, and also policymakers, how we could have a common language. And then that way we could have these conversations so we could talk together.

Data Science Reasoning appeared first on Open Transcripts.
You can help support this project at Patreon or via cash.me


          Comment on Cheap shots at the Gartner Hype Curve by Spark is the Future of Analytics | ML/DL        
[…] One might question an analysis that equates real things like optimization with fake things like “Citizen Data Science.” Gartner’s Hype Cycle by itself proves nothing; it’s a conceptual salad, with neither empirical foundation nor predictive power. […]
          CfP: 17th Conference on Artificial and Computational Intelligence and its Applications to the Environmental Sciences, American Meteorological Society        
Philippe just let me know of the following fascinating opportunity (deadline is August 8th but it can be extended. In that case you need to get in touch directly with him).

Hi: If you are working on Artificial Intelligence/Machine Learning Applications to Environmental Sciences, we have a terrific conference coming up in Austin Texas, January 7-11, 2018. 
We are organizing the 17th Conference on Artificial and Computational Intelligence and its Applications to the Environmental Sciences as part of the 2018 annual meeting of the American Meteorological Society. 

We have sessions in areas such as weather predictions, extreme weather, energy, climate studies, the coastal environment, health warnings, high performance computing and general artificial intelligence application sessions. 
Two of our sessions, Machine Learning and Statistics in Data Science and Climate Studies, will be headlined by invited talks. 
Several of the sessions are co-organized with other conferences providing opportunities to network with researchers and professionals in other fields. 
We also have a few firsts including sessions focused on Machine learning the Climate studies, and AI Applications to the Environment in Private Companies and Public-private Partnerships and for early health warnings. 
To submit your abstract: AI Abstracts Submission 
For more information on our sessions: AI SessionsMore information on AMS Annual meeting: Overall AMS Annual Meeting Website
See you in Austin. 
The AMS AI Committee
More information on the AMS AI committee: AMS AI Committee Web Page



Here are the  AI Sessions

  • AI Applications to the Environment in Private Companies and Public-private Partnerships. Topic Description: With the rapid development of AI techniques in meteorological and environmental disciplines, a significant amount of research is occurring in the private sector and in collaborations between companies and academia. This session will focus on AI applications in private companies and public-private partnerships, showcasing new approaches and implementations that leverage AI to help solve complex problems.
  • AI Techniques Applied to Environmental Science
  • AI Techniques for Decision Support
  • AI Techniques for Extreme Weather and Risk Assessment
  • AI Techniques for Numerical Weather Predictions
  • AI and Climate Informatics
  • Joint Session: Applications of Artificial Intelligence in the Coastal Environment (Joint between the 17th Conf on Artificial and Computational Intelligence and its Applications to the Environmental Sciences and the 16th Symposium on the Coastal Environment). Topic Description: Contributions to this session are sought in the application of AI techniques to study coastal problems including coastal hydrodynamics, beach and marsh morphology, applications of remote sensing observations and other large data sets.
  • Joint Session: Artificial Intelligence and High Performance Computing (Joint between the 17th Conf on Artificial and Computational Intelligence and its Applications to the Environmental Sciences and the Fourth Symposium on High Performance Computing for Weather, Water, and Climate)
  • Joint Session: Machine Learning and Climate Studies (Joint between the 17th Conf on Artificial and Computational Intelligence and its Applications to the Environmental Sciences and the 31st Conference on Climate Variability and Change)
  • Joint Session: Machine Learning and Statistics in Data Science (Joint between the 17th Conf on Artificial and Computational Intelligence and its Applications to the Environmental Sciences and the 25th Conference on Probability and Statistics)
  • Statistical Learning the Environmental Sciences










Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

          Nuit Blanche in Review (July 2017)        
Since the last Nuit Blanche in Review (June 2017), it was found that Titan had interesting chemistry. On Nuit Blanche, on the other hand, we had four implementations released by their authors, several interesting in-depth articles (some of them related to SGD and Hardware) . We had several slides and videos of meetings and schools and three job offering. Enjoy !


In-depth

SGD related

CS/ML Hardware


Slides

Videos

Job:

Other 


Credit: Northern Summer on Titan, NASA/JPL-Caltech/Space Science Institute


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

          Why is #Netneutrality Important to Data Science, Machine Learning and Artificial Intelligence ?        
Image associée

So your ISP decides the speed or the kind of service you can get based on religion or what not. What happens to our field ? Because you have spent much time on the following services, you are throttled down or have to pay for "premium" services. As a result, you may or may not get to 

  • follow Andrew Ng's Coursera or Siraj Raval classes
  • submit your Kaggle results on time
  • read ArXiv preprints
  • read the latest GAN paper on time
  • watch NIPS/ICLR/CVPR/ACL videos
  • download datasets
  • pay more to use ML/DL on the cloud
  • share reviews
  • download the latest ML/DL frameworks
  • have access to your Slack channels 
  • read Nuit Blanche
  • follow awesome DL thread on Twitter
  • get scholar google alerts
  • .... 
The rest is on Twitter


 






Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

          Time waits for no one        

And technology changes as quickly as the numbers on a clock. A digital clock, of course – the numbers never change on an analogue one.

I think it’s nice to have this month’s T-SQL Tuesday (hosted by Koen Verbeeck (@ko_ver)) on this topic, as I delivered a keynote at the Difinity conference a couple of months ago on same thing.

In the keynote, I talked about the fear people have of becoming obsolete as technology changes. Technology is introduced that trivialises their particular piece of skill – the database that removes the need for a filing cabinet, the expert system that diagnoses sick people, and the platform as a service that is managed by someone other than the company DBA. As someone who lives in Adelaide, where a major car factory has closed down, costing thousands of jobs, this topic is very much at the forefront of a lot of people’s thoughts. The car industry has been full of robots for a very long time – jobs have been disappearing to technology for ages. But now we are seeing the same happen in other industries, such as IT.

Does Automatic Tuning in Azure mean the end of query tuners? Does Self-Service BI in Excel and Power BI mean the end of BI practitioners? Does PaaS mean the end of DBAs?

I think yes. And no.

Yes, because there are tasks that will disappear. For people that only do one very narrow thing, they probably have reason to fear. But they’ve had reason to fear for a lot longer than Azure has been around. If all you do is check that backups have worked, you should have expected to be replaced by a script a very long time ago. The same has applied in many industries, from production lines in factories to ploughing lines in fields. If your contribution is narrow, you are at risk.

But no, because the opportunity here is to use the tools to become a different kind of expert. The person who drove animals to plough fields learned to drive tractors, but could use their skills in ploughing to offer a better service. The person who painted cars in a factory makes an excellent candidate for retouching dent repair, or custom paint jobs. Their expertise sets them apart from those whose careers didn’t have the same background.

As a BI practitioner today, self-service BI doesn’t present a risk. It’s an opportunity. The opportunity is to lead businesses in their BI strategies. In training and mentoring people to apply BI to their businesses. To help create visualisations that convey the desired meaning in a more effective way than the business people realise. This then turns the BI practitioner into a consultant with industry knowledge. Or a data scientist who can transform data to bring out messages that the business users couldn’t see.

As the leader of a company of database experts, these are questions I’ve had to consider. I don’t want my employees or me to become obsolete. We don’t simply offer health checks, BI projects, Azure migrations, troubleshooting, et cetera. We lead business through those things. We mentor and train. We consult. Of course, we deliver, but we are not simply technicians. We are consultants.

@rob_farley

TSQL2sDay150x150


          PASS Summit 2016 – Blogging again – Keynote 1        

.So I’m back at the PASS Summit, and the keynote’s on! We’re all getting ready for a bunch of announcements about what’s coming in the world of the Microsoft Data Platform.

First up – Adam Jorgensen. Some useful stats about PASS, and this year’s PASSion Award winner, Mala Mahadevan (@sqlmal)

There are tweets going on using #sqlpass and #sqlsummit – you can get a lot of information from there.

Joseph Sirosh – Corporate Vice President for the Data Group, Microsoft – is on stage now. He’s talking about the 400M children in India (that’s more than all the people in the United States, Mexico, and Canada combined), and the opportunities because of student drop-out. Andhra Pradesh is predicting student drop-out using new ACID – Algorithms, Cloud, IoT, Data. I say “new” because ACID is an acronym database professionals know well.

He’s moving on to talk about three patterns: Intelligence DB, Intelligent Lake, Deep Intelligence.

Intelligence DB – taking the intelligence out of the application and moving it into the database. Instead of the application controlling the ‘smarts’, putting them into the database provides models, security, and a number of other useful benefits, letting any application on top of it. It can use SQL Server, particularly with SQL Server R Services, and support applications whether in the cloud, on-prem, or hybrid.

Rohan Kumar – General Manager of Database Scripts – is up now. Fully Managed HTAP in Azure SQL DB hits General Availability on Nov 15th. HTAP is Hybrid Transactional / Analytical Processing, which fits really nicely with my session on Friday afternoon. He’s doing a demo showing the predictions per second (using SQL Server R Services), and how it easily reaches 1,000,000 per second. You can see more of this at this post, which is really neat.

Justin Silver, a Data Scientist from PROS comes onto stage to show how a customer of theirs handles 100 million price requests every day, responding to each one in under 200 milliseconds. Again we hear about SQL Server R Services, which pushes home the impact of this feature in SQL 2016. Justin explains that using R inside SQL Server 2016, they can achieve 100x better performance. It’s very cool stuff.

Rohan’s back, showing a Polybase demo against MongoDB. I’m sitting next to Kendra Little (@kendra_little) who is pretty sure it’s the first MongoDB demo at PASS, and moving on to show SQL on Linux. He not only installed SQL on Linux, but then restored a database from a backup that was taken on a Windows box, connected to it from SSMS, and ran queries. Good stuff.

Back to Joseph, who introduces Kalle Hiitola from Next Games – a Finnish gaming company – who created a iOS game that runs on Azure Media Services and DocumentDB, using BizSpark. 15 million installs, with 120GB of new data every day. 11,500 DocumentDB requests per second, and 43 million “Walkers” (zombies in their ‘Walking Dead’ game) eliminated every day. 1.9 million matches (I don’t think it’s about zombie dating though) per day. Nice numbers.

Now onto Intelligent Lake. Larger volumes of data than every before takes a different kind of strategy.

Scott Smith – VP of Product Development from Integral Analytics – comes in to show how Azure SQL Data Warehouse has allowed them to scale like never before in the electric-energy industry. He’s got some great visuals.

Julie Koesmarno on stage now. Can’t help but love Julie – she’s come a long way in the short time since leaving LobsterPot Solutions. She’s done Sentiment Analysis on War & Peace. It’s good stuff, and Julie’s demo is very popular.

Deep Intelligence is using Neural Networks to recognise components in images. eSmart Systems have a drone-based system for looking for faults in power lines. It’s got a familiar feel to it, based on discussions we’ve been having with some customers (but not with power lines).

Using R Services with ML algorithms, there’s some great options available…

Jen Stirrup on now. She’s talking about Pokemon Go and Azure ML. I don’t understand the Pokemon stuff, but the Machine Learning stuff makes a lot of sense. Why not use ML to find out where to find Pokemon?

There’s an amazing video about using Cognitive Services to help a blind man interpret his surroundings. For me, this is the best demo of the morning, because it’s where this stuff can be really useful.

SQL is changing the world.

@rob_farley


          Webinar Series: How to Become Insight-Driven with Real-time Analytics        

Did you know that companies are twice as likely to outperform their peers if they use advanced analytics? It’s no wonder then that today about 74% of firms say they want to become more insight-driven and introduce analytics and data science

The post Webinar Series: How to Become Insight-Driven with Real-time Analytics appeared first on Insights into In-Memory Computing and Real-time Analytics.


          Charting the next Insight Platform Frontier: InsightEdge 2.1 GA        

If software is eating the world, then fast data analytics will surely chew it up. Once the digital transformation has made its way beyond the emergence phase, the next step is organic. Real-time decisions, operational analytics, and data science driven

The post Charting the next Insight Platform Frontier: InsightEdge 2.1 GA appeared first on Insights into In-Memory Computing and Real-time Analytics.


          Practical Data Science & Big Data/Data Analytics Online Trainingg @ Sequelgate (hyderabad)        
SequelGate is one of the best training institutes for Data Science & Big Data /Data Analytics Training . We have been providing Classroom and Classroom Trainings and Corporate training. All our training sessions are COMPLETELY PRACTICAL. DATA S...
          Big Data - A Microsoft Tools Approach        

(As with all of these types of posts, check the date of the latest update I’ve made here. Anything older than 6 months is probably out of date, given the speed with which we release new features into Windows and SQL Azure)

I don’t normally like to discuss things in terms of tools. I find that whenever you start with a given tool (or even a tool stack) it’s too easy to fit the problem to the tool(s), rather than the other way around as it should be.

That being said, it’s often useful to have an example to work through to better understand a concept. But like many ideas in Computer Science, “Big Data” is too broad a term in use to show a single example that brings out the multiple processes, use-cases and patterns you can use it for.

So we turn to a description of the tools you can use to analyze large data sets. “Big Data” is a term used lately to describe data sets that have the “Four V’s”  as a characteristic, but I have a simpler definition I like to use:

Big Data involves a data set too large to process in a reasonable period of time

I realize that’s a bit broad, but in my mind it answers the question and is fairly future-proof. The general idea is that you want to analyze some data, and using whatever current methods, storage, compute and so on that you have at hand it doesn’t allow you to finish processing it in a time period that you are comfortable with. I’ll explain some new tools you can use for this processing.

Yes, this post is Microsoft-centric. There are probably posts from other vendors and open-source that cover this process in the way they best see fit. And of course you can always “mix and match”, meaning using Microsoft for one or more parts of the process and other vendors or open-source for another. I never advise that you use any one vendor blindly - educate yourself, examine the facts, perform some tests and choose whatever mix of technologies best solves your problem.

At the risk of being vendor-specific, and probably incomplete, I use the following short list of tools Microsoft has for working with “Big Data”. There is no single package that performs all phases of analysis. These tools are what I use; they should not be taken as a Microsoft authoritative testament to the toolset we’ll finalize for a given problem-space. In fact, that’s the key: find the problem and then fit the tools to that.

Process Types

I break up the analysis of the data into two process types. The first is examining and processing the data in-line, meaning as the data passes through some process. The second is a store-analyze-present process.

Processing Data In-Line

Processing data in-line means that the data doesn’t have a destination - it remains in the source system. But as it moves from an input or is routed to storage within the source system, various methods are available to examine the data as it passes, and either trigger some action or create some analysis.

You might not think of this as “Big Data”, but in fact it can be. Organizations have huge amounts of data stored in multiple systems. Many times the data from these systems do not end up in a database for evaluation. There are options, however, to evaluate that data real-time and either act on the data or perhaps copy or stream it to another process for evaluation.

The advantage of an in-stream data analysis is that you don’t necessarily have to store the data again to work with it. That’s also a disadvantage - depending on how you architect the solution, you might not retain a historical record. One method of dealing with this requirement is to trigger a rollup collection or a more detailed collection based on the event.

StreamInsight - StreamInsight is Microsoft’s “Complex Event Processing” or CEP engine. This product, hooked into SQL Server 2008R2, has multiple ways of interacting with a data flow. You can create adapters to talk with systems, and then examine the data mid-stream and create triggers to do something with it. You can read more about StreamInsight here: http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx 

BizTalk - When there is more latency available between the initiation of the data and its processing, you can use Microsoft BizTalk. This is a message-passing and Service Bus oriented tool, and it can also be used to join system’s data together than normally does not have a direct link, for instance a Mainframe system to SQL Server. You can learn more about BizTalk here: http://www.microsoft.com/biztalk/en/us/overview.aspx 

.NET and the Windows Azure Service Bus - Along the same lines as BizTalk but with a more programming-oriented design are the Windows and Windows Azure Service Bus tools. The Service Bus allows you to pass messages as well, and opens up web interactions and even inter-company routing. BizTalk can do this as well, but the Service Bus tools use an API approach for designing the flow and interfaces you want. The Service Bus offerings are also intended as near real-time, not as a streaming interface. You can learn more about the Windows Azure Service Bus here: http://www.windowsazure.com/en-us/home/tour/service-bus/ and more about the Event Processing side here: http://msdn.microsoft.com/en-us/magazine/dd569756.aspx 

Store-Analyze-Present

A more traditional approach with an organization’s data is to store the data and analyze it out-of-band. This began with simply running code over a data store, but as locking and blocking became an issue on a file system, Relational Database Management Systems (RDBMs) were created. Over time a distinction was made between data used in an online processing system, meant to be highly available for writing data (OLTP) and systems designed for analytical and reporting purposes (OLAP).

Later the data grew larger than these systems were designed for, primarily due to consistency requirements. In analysis, however, consistency isn’t always a requirement, and so file-based systems for that analysis were re-introduced from the Mainframe concepts, with new technology layered in for speed and size.

I normally break up the process of analyzing large data sets into four phases:

  1. Source and Transfer - Obtaining the data at its source and transferring or loading it into the storage; optionally transforming it along the way
  2. Store and Process - Data is stored on some sort of persistence, and in some cases an engine handles the acquisition and placement on persistent storage, as well as retrieval through an interface.
  3.  Analysis - A new layer introduced with “Big Data” is a separate analysis step. This is dependent on the engine or storage methodology, is often programming language or script based, and sometimes re-introduces the analysis back into the data. Some engines and processes combine this function into the previous phase.
  4. Presentation - In most cases, the data wants a graphical representation to comprehend, especially in a series or trend analysis. In other cases a simple symbolic representation, similar to the “dashboard” elements in a Business Intelligence suite. Presentation tools may also have an analysis or refinement capability to allow end-users to work with the data sets. As in the Analysis phase, some methodologies bundle in the Analysis and Presentation phases into one toolset.

Source and Transfer

You’ll notice in this area, along with those that follow, Microsoft is adopting not only its own technologies but those within open-source. This is a positive sign, and means that you will have a best-of-breed, supported set of tools to move the data from one location to another. Traditional file-copy, File Transfer Protocol and more are certainly options, but do not normally deal with moving datasets.

I’ve already mentioned the ability of a streaming tool to push data into a store-analyze-present model, so I’ll follow up that discussion with the tools that can extract data from one source and place it in another.

SQL Server Integration Services (SSIS)/SQL Server Bulk Copy Program (BCP) - SSIS is a SQL Server tool used to move data from one location to another, and optionally perform transform or other processes as it does so. You are not limited to working with SQL Server data - in fact, almost any modern source of data from text to various database platforms is available to move to various systems. It is also extremely fast and has a rich development environment. You can learn more about SSIS here: http://msdn.microsoft.com/en-us/library/ms141026.aspx BCP is a tool that has been used with SQL Server data since the first releases; it has multiple sources and destinations as well. It is a command-line utility,and has some limited transform capabilities. You can learn more about BCP here: http://msdn.microsoft.com/en-us/library/ms162802.aspx 

Sqoop - Tied to Microsoft’s latest announcements with Hadoop on Windows and Windows Azure, Sqoop is a tool that is used to move data between SQL Server 2008R2 (and higher) and Hadoop, quickly and efficiently. You can read more about that in the Readme file here: http://www.microsoft.com/download/en/details.aspx?id=27584 

Application Programming Interfaces - API’s exist in most every major language that can connect to one data source, access data, optionally transforming it and storing it in another system. Most every dialect of  the .NET-based languages contain methods to perform this task.

Store and Process

Data at rest is normally used for historical analysis. In some cases this analysis is performed near real-time, and in others historical data is analyzed periodically. Systems that handle data at rest range from simple storage to active management engines.

SQL Server - Microsoft’s flagship RDBMS can indeed store massive amounts of complex data. I am familiar with a two systems in excess of 300 Terabytes of federated data, and the Pan-Starrs project is designed to handle 1+ Petabyte of data. The theoretical limit of SQL Server DataCenter edition is 540 Petabytes. SQL Server is an engine, so the data access and storage is handled in an abstract layer that also handles concurrency for ACID properties. You can learn more about SQL Server here: http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx 

SQL Azure Federations - SQL Azure is a database service from Microsoft associated with the Windows Azure platform. Database Servers are multi-tenant, but are shared across a “fabric” that moves active databases for redundancy and performance. Copies of all databases are kept triple-redundant with a consistent commitment model. Databases are (at this writing - check http://WindowsAzure.com for the latest) capped at a 150 GB size limit per database. However, Microsoft released a “Federation” technology, allowing you to query a head node and have the data federated out to multiple databases. This improves both size and performance. You can read more about SQL Azure Federations here: http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx 

Analysis Services - The Business Intelligence engine within SQL Server, called Analysis Services, can also handle extremely large data systems. In addition to traditional BI data store layouts (ROLAP, MOLAP and HOLAP), the latest version of SQL Server introduces the Vertipaq column-storage technology allowing more direct access to data and a different level of compression. You can read more about Analysis Services here: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx and more about Vertipaq here: http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx

Parallel Data Warehouse - The Parallel Data Warehouse (PDW) offering from Microsoft is largely described by the title. Accessed in multiple ways including using Transact-SQL (the Microsoft dialect of the Structured Query Language), This is an MPP appliance scaling in parallel to extremely large datasets. It is a hardware and software offering - you can learn more about it here: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx

HPC Server - Microsoft’s High-Performance Computing version of Windows Server deals not only with large data sets, but with extremely complicated computing requirements. A scale-out architecture and inter-operation with Linux systems, as well as dozens of applications pre-written to work with this server make this a capable “Big Data” system. It is a mature offering, with a long track record of success in scientific, financial and other areas of data processing. It is available both on premises and in Windows Azure, and also in a hybrid of both models, allowing you to “rent” a super-computer when needed. You can read more about it here: http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx 

Hadoop - Pairing up with Hortonworks, Microsoft has released the Hadoop Open-Source system -  including HDFS and a Map/Reduce standardized software, Hive and Pig - on Windows and the Windows Azure platform. This is not a customized version; off-the-shelf concepts and queries work well here. You can read more about Hadoop here: http://hadoop.apache.org/common/docs/current/ and you can read more about Microsoft’s offerings here: http://hortonworks.com/partners/microsoft/ and here: http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx

Windows and Azure Storage - Although not an engine - other than a triple-redundant, immediately consistent commit - Windows Azure can hold terabytes of information and make it available to everything from the R programming language to the Hadoop offering. Binary storage (Blobs) and Table storage (Key-Value Pair) data can be queried across a distributed environment. You can learn more about Windows Azure storage here: http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx 

Analysis

In a “Big Data” environment, it’s not unusual to have a specialized set of tasks for analyzing and even interpreting the data. This is a new field called “data Science”, with a requirement not only for computing, but also a heavy emphasis on math.

Transact-SQL - T-SQL is the dialect of the Structured Query Language used by Microsoft. It includes not only robust selection, updating and manipulating of data, but also analytical and domain-level interrogation as well. It can be used on SQL Server, PDW and ODBC data sources. You can read more about T-SQL here: http://msdn.microsoft.com/en-us/library/bb510741.aspx 

Multidimensional Expressions and Data Analysis Expressions - The MDX and DAX languages allow you to query multidimensional data models that do not fit well with typical two-plane query languages. Pivots, aggregations and more are available within these constructs to query and work with data in Analysis Services. You can read more about MDX here: http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx and more about DAX here: http://www.microsoft.com/download/en/details.aspx?id=28572 

HPC Jobs and Tasks - Work submitted to the Windows HPC Server has a particular job - essentially a reservation request for resources. Within a job you can submit tasks, such as parametric sweeps and more. You can learn more about Jobs and Tasks here: http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx 

HiveQL - HiveQL is the language used to query a Hive object running on Hadoop. You can see a tutorial on that process here: http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx 

Piglatin - Piglatin is the submission language for the Pig implementation on Hadoop. An example of that process is here: http://blogs.msdn.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx 

Application Programming Interfaces - Almost all of the analysis offerings have associated API’s - of special note is Microsoft Research’s Infer.NET, a new language construct for framework for running Bayesian inference in graphical models, as well as probabilistic programming. You can read more about Infer.NET here: http://research.microsoft.com/en-us/um/cambridge/projects/infernet/ 

Presentation

Lots of tools work in presenting the data once you have done the primary analysis. In fact, there’s a great video of a comparison of various tools here: http://msbiacademy.com/Lesson.aspx?id=73 Primarily focused on Business Intelligence. That term itself is now not as completely defined, but the tools I’ll show below can be used in multiple ways - not just traditional Business Intelligence scenarios. Application Programming Interfaces (API’s) can also be used for presentation; but I’ll focus here on “out of the box” tools.

Excel - Microsoft’s Excel can be used not only for single-desk analysis of data sets, but with larger datasets as well. It has interfaces into SQL Server, Analysis Services, can be connected to the PDW, and is a first-class job submission system for the Windows HPC Server. You can watch a video about Excel and big data here: http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f and you can also connect Excel to Hadoop: http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx

Reporting Services - Reporting Services is a SQL Server tool that can query and show data from multiple sources, all at once. It can also be used with Analysis Services. You can read more about Reporting Services here: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx 

Power View - Power View is a “Self-Service” Business Intelligence reporting tool, which can work with on-premises data in addition to SQL Azure and other data. You can read more about it and see videos of Power View in action here: http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx 

SharePoint Services - Microsoft has rolled several capable tools in SharePoint as “Services”. This has the advantage of being able to integrate into the working environment of many companies. You can read more about  lots of these reporting and analytic presentation tools here: http://technet.microsoft.com/en-us/sharepoint/ee692578 

This is by no means an exhaustive list - more capabilities are added all the time to Microsoft’s products, and things will surely shift and merge as time goes on. Expect today’s “Big Data” to be tomorrow’s “Laptop Environment”.


          The newsonomics of “Little Data,” data scientists, and conversion specialists        
OSLO — Arthur Sulzberger surprised some people recently when asked what he would do differently in the digital transition, given hindsight. It wasn’t an off-the-cuff comment. Each FTE is precious at The New York Times and at every other newspaper company these days, and the Times is indeed spending more than most on engineers and...
          Comment on Look out for Hadoop & Data Scientists! by Abhi shah        
very helpful and interesting information.
          Discover 7 new Microsoft MCSA and MCSE certifications         
Microsoft have announced the launch of 6 new MCSA certifications and 1 new MCSE certification. This demonstrates Microsoft’s commitment to a growing Azure, Big Data, Business Intelligence (BI) and Dynamics community.

These new certifications and courses will support Microsoft partners looking to upskill and validate knowledge in these technologies.  

Following the huge changes announced in September, these new launches will simplify your path to certification. They'll minimise the number of steps required to earn a certification, while allowing you to align your skills to industry-recognised areas of competence.

This blog will outline the new certifications Microsoft have announced, focusing on the technologies, skills and job roles they align to. 

So what's new?


MCSA: Microsoft Dynamics 365

This MCSA: Microsoft Dynamics 365 certification is one of three Dynamics 365 certifications launched. It demonstrates your expertise in upgrading, configuring and customising the new Microsoft Dynamics 365 platform.

There are currently no MOCs aligned to this certification. We have developed our own Firebrand material that will prepare you for the following two exams needed to achieve this certification:
  • MB2-715: Microsoft Dynamics 365 customer engagement Online Deployment
  • MB2-716: Microsoft Dynamics 365 Customization and Configuration 
This certification will validate you have the skills for a position as a Dynamics 365 developer, implementation consultant, technical support engineer or system administrator.

This certification is a prerequisite for the MCSE: Business Applications. 

MCSA: Microsoft Dynamics 365 for Operations

The second of these three Dynamics 365 certs is the MCSA: Microsoft Dynamics 365 for Operations. Here, you’ll get the skills to manage a Microsoft SQL Server database and customise Microsoft Dynamics 365.

On this course, you’ll cover the following MOC:
  • 20764: Administering a SQL Database Infrastructure 
The second part of this course, of which there is currently no MOC, will cover Firebrand's own material. 

To achieve this certification you’ll need to pass the following exams:
  • 70-764: Administering a SQL Database Infrastructure
  • MB6-890: Microsoft Development AX Development Introduction 
Earning this cert proves you have the technical competence for positions such as Dynamics 365 developer, solutions architect or implementer.  

Just like the MCSA: Microsoft Dynamics 365, this certification is also a prerequisite to the new MCSE: Business Applications certification. 

MCSE: Business Applications

Earning an MCSE certification validates a more advanced level of knowledge. The MCSE: Business Applications certification proves an expert-level competence in installing, operating and managing Microsoft Dynamics 365 technologies in an enterprise environment.

In order to achieve this certification you’ll be required to pass either the MCSA: Microsoft Dynamics 365 or the MCSA: Microsoft Dynamics 365 for Operations. You’ll also be required to choose one of the following electives to demonstrate expertise on a business-specific area:
  • MB2-717: Microsoft Dynamics 365 for Sales
  • MB2-718: Microsoft Dynamics 365 for Customer Service
  • MB6-892: Microsoft Dynamics AX - Distribution and Trade
  • MB6-893: Microsoft Dynamics AX - Financials  
Earning your MCSE: Business Applications certification will qualify you for the roles such as Dynamics 365 developer, implementation consultant, technical support engineer, or system administrator.

MCSA: Big Data Engineering

This MCSA: Big Data Engineering certification demonstrates you have the skills to design and implement big data engineering workflows with the Microsoft cloud ecosystem and Microsoft HD Insight to extract strategic value from your data.

On this course you’ll cover the following MOCs:
  • 20775A: Perform Data Engineering on Microsoft HDInsight – expected 28/6/2017
  • 20776A: Engineering Data with Microsoft Cloud Services – expected 08/2017
And take the following exams:
  • 70-775: Perform Data Engineering on Microsoft HD Insight – available now in beta
  • 70-776: Engineering Data with Microsoft Cloud Services – expected Q1 2018
This course is aimed at data engineers, data architects, data scientists and data developers.

Earning this MCSA acts as a prerequisite, and your first step, to achieving the MCSE: Data Management and Analytics credential.

MCSA: BI Reporting

This MCSA: BI Reporting certification proves your understanding of data analysis using Power BI. You’ll learn the skills to create and manage enterprise business intelligence solutions.

The MOCs you’ll cover on this course include:
  • 20778A: Analyzing Data with Power BI
  • 20768B: Developing SQL Data Models 
In order to achieve the certification, you’ll take the following exams:
  • 70-778: Analyzing Data with Power BI - expected Q1 2018
  • 70-768: Developing SQL Data Models 
This certification is aimed at database professionals needing to create enterprise BI solutions and present data using alternative methods.

This certification is a prerequisite for the MCSE: Data Management and Analytics credential. 

MCSA: Cloud Database Development 

This MCSA: Cloud Database Development certification will prove you have the skills to build and implement NoSQL solutions with DocumentDB and Azure Search for the Azure data platform

This certification covers the following MOCs:
  • 40441: Designing and Implementing Cloud Data Platform Solutions
  • 20777: Implementing NoSQL Solutions with DocumentDB and Azure Search – expected in August 2017 
In order to achieve the certification, you'll have to pass the following exams: 
  • 70-473: Designing and Implementing Cloud Data Platform Solutions
  • 70-777: Implementing NoSQL Solutions with DocumentDB and Azure Search – expected in Q1 2018
This course is aimed at specialist professionals looking to validate their skills and knowledge of developing NoSQL solutions for the Azure data platform. 

This certification is also a prerequisite certification to the MCSE: Data Management and Analytics credential. 

MCSA: Data Science

This course will teach you the skills in operationalising Microsoft Azure machine learning and Big Data with R Server and SQL R Services. You'll learn to process and analyse large data sets using R and use Azure cloud services to build and deploy intelligent solutions.

This certification covers the following MOCs:
  • 20773A: Analyzing Big Data with Microsoft R – in development, expected May 2017
  • 20774A: Perform Cloud Data Science with Azure Machine Learning – in development, expected June 2017
To achieve this certification you’ll be required to pass the following exams:
  • 70-773: Analyzing Big Data with Microsoft R – available now in beta
  • 70-774: Perform Cloud Data Science with Azure Machine Learning – available now in beta 
This certification, which is your first step to the MCSE: Data Management and Analytics cert is best suited to data science or data analyst job roles. 


          Super Simple Storage for Social Web Data with MongoDB (Computing Twitter Influence, Part 4)        
In the last few posts for this series on computing twitter influence, we’ve reviewed some of the considerations in calculating a base metric for influence and how to acquire the necessary data to begin analysis. This post finishes up all of the prerequisite machinery before the real data science fun begins by introducing MongoDB as a […]
          Blockchain as the Infrastructure for Science? (updated)        
Herbert Van de Sompel pointed me to Lambert Heller's How P2P and blockchains make it easier to work with scientific objects – three hypotheses as an example of the persistent enthusiasm for these technologies as a way of communicating and preserving research, among other things. Another link from Herbert, Chris H. J. Hartgerink's Re-envisioning a future in scholarly communication from this year's IFLA conference, proposes something similar:
Distributing and decentralizing the scholarly communications system is achievable with peer-to-peer (p2p) Internet protocols such as dat and ipfs. Simply put, such p2p networks securely send information across a network of peers but are resilient to nodes being removed or adjusted because they operate in a mesh network. For example, if 20 peers have file X, removing one peer does not affect the availability of the file X. Only if all 20 are removed from the network, file X will become unavailable. Vice versa, if more peers on the network have file X, it is less likely that file X will become unavailable. As such, this would include unlimited redistribution in the scholarly communication system by default, instead of limited redistribution due to copyright as it is now.
I first expressed skepticism about this idea three years ago discussing a paper proposing a P2P storage infrastructure called Permacoin. It hasn't taken over the world. [Update: my fellow Sun Microsystems alum Radia Perlman has a broader skeptical look at blockchain technology. I've appended some details.]

I understand the theoretical advantages of peer-to-peer (P2P) technology. But after nearly two decades researching, designing, building, deploying and operating P2P systems I have learned a lot about how hard it is for these theoretical advantages actually to be obtained at scale, in the real world, for the long term. Below the fold, I try to apply these lessons.

For the purpose of this post I will stipulate that the implementations of both the P2P technology and the operating system on which it runs are flawless, and their design contains no vulnerabilities that the bad guys can exploit. Of course, in the real world there will be flaws and vulnerabilities, but discussing their effects on the system would distract from the message of this post.

Heller's three hypotheses are based on the idea of using a P2P storage infrastructure such as IPFS that names objects by their hash:
  • It would be better for researchers to allocate persistent object names than for digital archives to do so. There are a number of problems with this hypothesis. First, it doesn't describe the current situation accurately. Archives such as the Wayback Machine or LOCKSS try hard not to assign names to content they preserve, striving to ensure that it remains accessible via its originally assigned URL, DOI or metadata (such as OpenURL). Second, the names Heller suggests are not assigned by researchers, they are hashes computed from the content. Third, hashes are not persistent over the timescales needed because, as technology improves over time, it becomes possible to create "hash collisions", as we have seen recently with SHA1.
  • From name allocation plus archiving plus x as a “package solution” to an open market of modular services. Heller is correct to point out that:
    The mere allocation of a persistent name does not ensure the long-term accessibility of objects. This is also the case for a P2P file system such as IPFS. ... Since name allocation using IPFS or a blockchain is not necessarily linked to the guarantee of permanent availability, the latter must be offered as a separate service.
    The upside of using hashes as names would be that the existence and location of the archive would be invisible. The downside of using hashes as names is that the archive would be invisible, posing insurmountable business model difficulties for those trying to offer archiving services, and insurmountable management problems for those such as the Keeper's Registry who try to ensure that the objects that should be preserved actually are being preserved. There can't be a viable market in archiving services if the market participants and their products are indistinguishable and accessible freely to all. Especially not if the objects in question are academic papers, which are copyright works.
  • It is possible to make large volumes of data scientifically usable more easily without APIs and central hosts. In an ideal world in which both storage and bandwidth were infinite and free, storing all the world's scientific data in an IPFS-like P2P service backed up by multiple independent archive services would indeed make the data vastly more accessible, useful and persistent than it is now. But we don't live in an ideal world. If this P2P network is to be sustainable for the long term, the peers in the network need a viable business model, to pay for both storage and bandwidth. But they can't charge for access to the data, since that would destroy its usability. They can't charge the researchers for storing their data, since it is generated by research that is funded by term-limited grants. Especially in the current financial environment, they can't charge the researchers' institutions, because they have more immediate funding priorities than allowing other institutions' researchers to access the data in the future for free.
I have identified three major problems with Heller's proposal which also apply to Hartgerink's:
  • They would populate the Web with links to objects that, while initially unique, would over time become non-unique. That is, it would become possible for objects to be corrupted. When the links become vulnerable, they need to be replaced with better hashes. But there is no mechanism for doing so. This is not a theoretical concern, the BitTorrent protocol underlying IPFS has been shown to be vulnerable to SHA1 collisions.
  • The market envisaged, at least for archiving services, does not allow for viable business models, in that the market participants are indistinguishable.
  • Unlike Bitcoin, there is no mechanism for rewarding peers for providing services to the network.
None of these has anything to do with the functioning of the software system. Heller writes:
There is hope that we will see more innovative, reliable and reproducible services in the future, also provided by less privileged players; services that may turn out to be beneficial and inspirational to actors in the scientific community.
I don't agree, especially about "provided by less privileged players". Leave aside that the privileged players in the current system have proven very adept at countering efforts to invade their space, for example by buying up the invaders. There is a much more fundamental problem facing P2P systems.

Four months after the Permacoin post, inspired in part by Natasha Lomas' Techcrunch piece The Server Needs To Die To Save The Internet about the MaidSafe P2P storage network, I wrote Economies of Scale in Peer-to-Peer Networks. This is a detailed explanation of how the increasing returns to scale inherent to technologies in general (and networked systems in particular) affect P2P systems, making it inevitable that they will gradually lose their decentralized nature and the benefits that it provides, such as resistance to some important forms of attack.

Unconfirmed transactions
The history of Bitcoin shows this centralizing effect in practice. It also shows that, even when peers have a viable (if perhaps not sustainable) business model, based in Bitcoin's case on financial speculation, Chinese flight capital and crime such as ransomware, resources do not magically appear to satisfy demand.

As I write, about 100MB of transactions are waiting to be confirmed. A week and a half ago, Izabella Kaminska reported that there were over 200,000 transactions in the queue. At around 5 transaction/sec, that's around an 11-hour backlog. Right now, the number is about half that. How much less likely are resources to become available to satisfy demand if the peers lack a viable business model?

Because Bitcoin has a lot of peers and speculation has driven its value sky-high, it is easy to assume that it is a successful technology. Clearly, it is very successful along some axes. Along others, not so much. For example, Kaminska writes:
The views of one trader:
... This is the biggest problem with bitcoin, it’s not just that it’s expensive to transact, it’s uncertain to transact. It’s hard to know if you’ve put enough of a fee. So if you significantly over pay to get in, even then it’s not guaranteed. There are a lot of people who don’t know how to set their fees, and it takes hours to confirm transactions. It’s a bad system and no one has any solutions.
Transactions which fail to get the attention of miners sit in limbo until they drop out. But the suspended state leaves payers entirely helpless. They can’t risk resending the transaction, in case the original one does clear eventually. They can’t recall the original one either. Our source says he’s had a significant sized transaction waiting to be settled for two weeks.

The heart of the problem is game theoretical. Users may not know it but they’re participating in what amounts to a continuous blind auction.

Legacy fees can provide clues to what fees will get your transactions done — and websites are popping up which attempt to offer clarity on that front — but there’s no guarantee that the state of the last block is equivalent to the next one.
Source
Right now, if you want a median-sized transaction in the next block you're advised to bid nearly $3. The uncertainty is problematic for large transactions and the cost is prohibitive for small ones. Kaminska points out that the irony is:
given bitcoin’s decentralised and real-time settlement obsession, ... how the market structure has evolved to minimise the cost of transaction.

Traders, dealers, wallet and bitcoin payments services get around transaction settlement choke points and fees by netting transactions off-blockchain.

This over time has created a situation where the majority of small-scale payments are not processed on the bitcoin blockchain at all. To the contrary, intermediaries operate for the most part as trusted third parties settling netted sums as and when it becomes cost effective to do so. ... All of which proves bitcoin is anything but a cheap or competitive system. With great irony, it is turning into a premium service only cost effective for those who can’t — for some reason, ahem — use the official system.
There's no guarantee that the axes on which Bitcoin succeeded are those relevant to other blockchain uses; the ones on which it is failing may well be. Among the blockchain's most hyped attributes were the lack of a need for trust, and the lack of a single point of failure. Another of Kaminska's posts:
Coinbase has been intermittently down for at least two days.

With an unprecedented amount of leverage in the bitcoin and altcoin market, a runaway rally that doesn’t seem to know when to stop, the biggest exchange still not facilitating dollar withdrawals and incremental reports about other exchanges encountering service disruption, it could just be there’s more to this than first meets the eye.

(Remember from 2008 how liquidity issues tend to cause a spike in the currency that’s in hot demand?)
These problems illustrate the difficulty of actually providing the theoretical advantages of a P2P technology "at scale, in the real world, for the long term".

Update: In Blockchain: Hype or Hope? Radia Perlman provides a succinct overview of blockchain technology, asks what is novel about it, and argues that the only feature of the blockchain that cannot be provided at much lower cost by preexisting technology is:
a ledger agreed upon by consensus of thousands of anonymous entities, none of which can be held responsible or be shut down by some malevolent government
But, as she points out:
most applications would not require or even want this property. And, as demonstrated by the Bitcoin community's reaction to forks, there really are a few people in charge who can control the system
She doesn't point out that, in order to make money, the "thousands of ... entities" are forced to cooperate in pools, so that in practice the system isn't very decentralized, and the "anonymous entities" are much less anonymous than they would like to believe (see here and here).

Radia's article is a must-read corrective to the blockchain hype. Alas, although I have it in my print copy of Usenix ;login:, it doesn't appear to be on the Usenix website yet, and even when it is it will only be available to members for a year. I've made a note to post about it again when it is available.


          Data Scientist // Grover        

Grover is a fresh alternative to owning things – giving people better ways to live, work and play using the latest tech products, by simply subscribing to them monthly. About your role: You will be working in the very advanced Data & Analytics team on the state of the art Company Artificial Intelligence System You […]

Check out all open positions at http://BerlinStartupJobs.com


          CHP Dean’s Forum: Vulnerable Population, Aging and Data Science        
CHP Student Orientation
          David Donoho’s “Fifty Years of Data Science”, Sep. 2015        
https://dl.dropboxusercontent.com/u/23421017/50YearsDataScience.pdf
          Ep. #3, The Future of Data        

In the latest Data Science Storytime, Kyle and Kevin imagine what the future will look like where more companies take a cue from Google and Facebook and start counting & measuring everything. Hear the pair discuss why there is so much potential in companies like Tesla and why all the deep learning buzz might not be all it's cracked up to be.

The post Ep. #3, The Future of Data appeared first on Heavybit.


          Ep. #2, The Pre-Keen Years        

Keen IO's Kevin Wofsy and Kyle Wild return for the second installment of Data Science Storytime. Kyle tells the tale of his early, Tom Sawyeresque business ventures. From selling the family's groceries at school, to paying kids to do his chores (with money from a bootleg video game website), entrepreneurship took many forms for the young Mr. Wild. But it wasn't always smooth sailing. Hear what life lessons were reluctantly learned by Kyle before Keen.

The post Ep. #2, The Pre-Keen Years appeared first on Heavybit.


          Ep. #1, The Fandom Menace        

In the debut of Data Science Storytime, Kevin Wofsy and Kyle Wild brainstorm the concept of the show, debate the difference between data science and non-data science, and recount the story of the action-hero data scientist who skipped a meeting with Kyle to rescue a little girl trapped on a mountain (or so he assumes).

The post Ep. #1, The Fandom Menace appeared first on Heavybit.


          Portland’s Precrime Experiment and the Limits of Algorithms        

Using data science to predict where crime might occur is problematic. When we have unfair metrics, we develop unfair algorithms.

Portland’s Precrime Experiment and the Limits of Algorithms was originally published on Lawyerist.com.


          Data Scientists Hyderabad        
Data Scientist online and classroom training offered by 13+ year experienced faculty. Preferred by most students in Hyderabad and located at Ameerpet. Easy understandable hard and soft copies of the quality study material provided during the course.
          Case studies in error analysis and new product forecasting        

My colleague Gerhard Svolba (Solutions Architect at SAS Austria) has authored his third book, Applying Data Science: Business Case Studies Using SAS®." While the book covers a broad range of data science topics, forecasters will be particularly interested in two lengthy case studies on "Explaining Forecast Errors and Deviations" and [...]

The post Case studies in error analysis and new product forecasting appeared first on The Business Forecasting Deal.


          Canberra .NET user group: An Introduction to Data Science on Azure (April 18)        
Pretty cool looking free talk at the next .NET UG in Canberra https://www.ssw.com.au/ssw/NETUG/Canberra.aspx
          AUA Students Present Their Projects at TUMO        
On July 4, a group of students from the American University of Armenia (AUA) visited TUMO Center for Creative Technologies (TUMO) and presented the projects they had prepared during the spring semester as a part of their course curriculum. As part of the CS252/342 course “Data Science with R,” students were instructed to scrape data off the web from a resource of their choice, visualize and analyze that data, and present their findings as their final projects.
          Data Science Workshop Organized by Zaven & Sonia Akian College of Science and Engineering        
On May 22-23, the American University of Armenia (AUA) hosted its inaugural Data Science Workshop with the participation of around 24 local scientists.
          Data Scientist: come valorizzare i dati aziendali        

Data Scientist: come valorizzare i dati. L’arte di sfruttare l’asset dei Big data come fonte per sperimentare e innovare i modelli di business dei Brand e favorire la creazione di valore aggiunto. Analisi e riflessioni sulla figura emergente del Data Scientist. Un articolo ricco di informazioni e curiosità architettate con il supporto pratico e rilevante […]

L'articolo Data Scientist: come valorizzare i dati aziendali sembra essere il primo su B2corporate.


          Application Scientist - Text Mining Software        
Penny Warren Recruitment - Cambridge - Our Client, is a leading provider of text mining solutions with a current emphasis on high value life science, chemistry and biomedical... data science and/or information management skills, familiarity with text mining Salary: DOE Location: Cambridge, UK Our Client is improving...
          By: Krishna Sankar        
Yep, excellent set of topics. We have moving past the 3Vs of Big Data to the Three Amigos of Big Data - Interface, Intelligence & Inference. http://goo.gl/NqOWQ Couple of observations : Wearable Devices is missing from the list. Deep Learning definitely is an emerging topic, something close to my heart. I think Analytic Sandbox supported by a Data Landing Zone is more accurate than convergence of databases. Lastly the larger picture is the Big data pipeline spanning Data management & Data Science viz. Collect-Store-Transform-Model-Reason-Visualize/Predict/Recommend-Explore. Cheers
          Morning Session: Can Librarians Help Legal Organizations Become More Data Driven?        
In the morning session, Professor Dan Katz will delve into data science, a growing field within the legal industry. As organizations within the legal industry attempt to become more data driven, the skills held by law librarians are well suited to help law firms, corporate legal departments, courts, and non-profits craft and execute a legal […]
          Infosys Shows Love For Data Science, Acquires Stakes In US Based Waterline For $4 Million        
Infosys is continuing with its strategy of investing in promising startups from the tech sector, overseas. This time, the company has expressed its love for cutting edge technologies by pumping in a sum of $4 million in US-based data science startup Waterline Data Science. The acquisition looks to have been made after considerable thought. Waterline Data Science […]
          Mystery of Longevity: Clue from DNA - Juvenon Health Journal        
The more data scientists generate about what makes us grow old, the clearer one thing becomes: aging is complicated.
          Harvard, Stanford, UC Berkeley Take Center Stage at Upcoming HIMSS Big Data & Healthcare Analytics Forum May 15-16        

HIMSS Gathers Top Data Scientists and Healthcare Pros To Talk Putting Data To Work to Reduce Costs and Improve Patient Care

(PRWeb May 02, 2017)

Read the full story at http://www.prweb.com/releases/2017/05/prweb14294220.htm


          LinkedIn Joins CSU, Other Key Northeast Ohio Stakeholders to Analyze Vital Health IT Talent Data        
Project is a tactic within HIT in the CLE Regional Talent Initiative

*Release via the BioEnterprise

LinkedIn, which operates the world’s largest professional network on the Internet with more than 500 million members in over 200 countries and territories, has teamed up with BioEnterprise, the City of Cleveland, Cleveland State University and Cuyahoga County to provide data, analysis, and market research on the talent flows of software developers, data scientists, data analysts and other computer science positions within the Northeast Ohio health IT sector. Supported by the Cleveland Foundation, the analysis will ultimately inform policy, educational curriculum, community programming and other talent alignment strategies within this regional growth sector.

The bioscience cluster is a primary growth engine reviving the Northeast Ohio economy. Within the bioscience cluster, the health IT industry is flourishing, creating hundreds of new jobs each year. However, an acute shortage of qualified local talent is a major barrier to growth.

“One of the critical limiting factors to growth in Northeast Ohio’s bioscience industry today is the availability of health IT talent,” explained Aram Nerpouni, BioEnterprise President and CEO. “Thriving health IT companies are hindered by the dearth of software developers and data scientists. The LinkedIn project should provide meaningful data and analysis to inform how we address this challenge.”

With the support of the Cleveland Foundation, BioEnterprise launched HIT in the CLE in 2015 to address the regional computer science and data science talent gap. The Initiative aims to grow and diversify the Northeast Ohio health IT talent pipeline to support a vibrant health IT industry.

“We felt it was crucial to partner with BioEnterprise to begin addressing the demand-supply gap in health IT and to deeply engage businesses to expand the talent pipeline,” said Shilpa Kedar, Cleveland Foundation Program Director for Economic Development. “LinkedIn’s involvement with HIT in the CLE is a tremendous win for the region and we anticipate that this work will prove to be extraordinarily beneficial.”

The LinkedIn project is an important tactic within the larger HIT in the CLE talent strategy. The effort aspires to provide insights into the education and experience of people currently employed in the regional health IT sector, pathways for securing regional health IT positions, and institutions from which the local sector most successfully attracts qualified talent. The insights discovered through the analysis may surface gaps and barriers in the local health IT talent pipeline and help inform strategy for addressing these important talent issues.

“At LinkedIn, our vision is to create economic opportunity for every worker,” said LinkedIn U.S. Head of Policy Nicole Isaac. “We’re excited to use the Economic Graph – a digital map of the global economy that when complete will include every member of the global workforce and their skills, all open jobs, all employers, and all educational institutions – to provide the City of Cleveland with a more holistic view of the computer science and data science skills local employers need, the skills its workers have and the disconnect between the two. The City can use those insights to create a stronger IT talent pipeline, and grow its IT industry.”

“Making our workforce a competitive advantage, which includes understanding our gaps as well as opportunities is a crucial strategic focus,” said Cuyahoga County Executive Armond Budish. “We know that the strength of our healthcare is a great advantage and we believe that the bioscience cluster will drive a lot of our job growth in the coming years. LinkedIn’s contribution to help inform and accelerate that growth is a welcome addition to the HIT in the CLE effort.”

Data provided by BioEnterprise and LinkedIn will be pulled throughout the summer. Ongoing analysis will take place through the summer and findings are expected in the fall.

“Cleveland is a City with a growing health research and information technology economy from the unseen power of the 100 gig fiber network along the Health-Tech Corridor and the health care research institutions within our community,” said Mayor Frank G. Jackson. “I welcome the opportunity for the City of Cleveland to collaborate with LinkedIn to provide research and data on the talent that is relocating to Cleveland and drawing talent to join the workforce here.”


          Embrace R @ SQL Nexus 2017 & SQL Saturday #626        

R is the hottest topic in SQL Server 2016. If you want to learn how to use it for advanced analytics, join my seminar at SQL Nexus conference on my 1st in Copenhagen. Although there is still nearly a month before the seminar, there are less than half places still available. You are also very welcome to visit my session Using R in SQL Server, Power BI, and Azure ML during the main conference.

For beginners, I have another session in the same week, just this time in Budapest. You can join me at the Introducing R session on May 6th at SQL Saturday #626 Budapest.

Here is the description of the seminar.

As being an open source development, R is the most popular analytical engine and programming language for data scientists worldwide. The number of libraries with new analytical functions is enormous and continuously growing. However, there are also some drawbacks. R is a programming language, so you have to learn it to use it. Open source development also means less control over code. Finally, the free R engine is not scalable.

Microsoft added support for R code in SQL Server 2016 and, Azure Machine Learning, or Azure ML, and in Power BI. A parallelized highly scalable execution engine is used to execute the R scripts. In addition, not every library is allowed in these two environments.

Attendees of this seminar learn to program with R from the scratch. Basic R code is introduced using the free R engine and RStudio IDE. Then the seminar shows some more advanced data manipulations, matrix calculations and statistical analysis together with graphing options. The mathematics behind is briefly explained as well. Then the seminar switches more advanced data mining and machine learning analyses. Attendees also learn how to use the R code in SQL Server, Azure ML, and create SQL Server Reporting Services (SSRS) reports that use R.

  • The seminar consists of the following modules:
  • Introduction to R
  • Data overview and manipulation
  • Basic and advanced visualizations
  • Data mining and machine learning methods
  • Scalable R in SQL Server
  • Using R in SSRS, Power BI, and Azure ML

Hope to see you there!


          Data Scientist/Quantitative Analyst, Engineering - Google - Mountain View, CA        
(e.g., as a statistician / data scientist / computational biologist / bioinformatician). 4 years of relevant work experience (e.g., as a statistician /...
From Google - Sat, 05 Aug 2017 09:55:57 GMT - View all Mountain View, CA jobs
          Eckerson Group Profiles Top Eight Innovations In Data Science        

Alpine Data, DataRobot, Domino Data Lab, FICO, Informatica, Nutonian, RapidMiner, and SAS are selected for their innovations in data science

(PRWeb August 08, 2017)

Read the full story at http://www.prweb.com/releases/2017/08/prweb14560218.htm


          Podsumowanie konferencji SQLDay 2017        

W dniach 15- 17 maja 2017, we Wrocławskim Centrum Konferencyjnym odbyła się konferencja SQLDay 2017, organizowana przez Stowarzyszenie Użytkowników SQL Server PLSSUG (nowa nazwa - Data Community).

W wydarzeniu wzięło udział ponad 800 osób, kilkudziesięciu prelegentów przedstawiło ponad 60 sesji technicznych dot. rozwiązań Microsoft z zakresu Data Platform.
Wśród prelegentów znaleźli się wykładowcy studiów podyplomowych WSZiB; Tomasz Libera, Michał Sadowski, Grzegorz Stolecki i Marcin Szeliga.

Nowością tegorocznej - dziewiątej już konferencji SQLDay, była ścieżka naukowa APPLIED DATA SCIENCE. Nasza Uczelnia objęła patronatem to wydarzenie, a Dziekan dr Bartosz Banduła zasiadał w Radzie Programowej, obok przedstawicieli kilku innych uczelni, m.in. Politechniki Poznańskiej, Politechniki Lubelskiej,  Akademii Górniczo-Hutniczej czy Uniwersytetu Jagiellońskiego.

Studenci WSZIB, podobnie jak w poprzednich latach pracowali podczas konferencji jako wolontariusze.

WSZiB wspiera organizację Data Community - udostępniając sale wykładowe na potrzeby comiesięcznych spotkań podczas których uczestnicy dzielą się swoimi pomysłami.

Strona internetowa konferencji:
http://www.sqlday.pl

Ścieżka naukowa:
http://science.sqlday.pl


           The Emerging Battleground in the Analytics & Business Intelligence Market         
A new battleground is emerging in the analytics and business intelligence (BI) market over embedded data science and smart data discovery.  In this new research, "Market Opportunity Map: Analytics and...
          Four Themes From the Visualized Conference        

The first Visualized conference was held in mid-town Manhattan last week. Even with Sandy and a nor’easter, the conference went off with only a few minor hiccups. The idea behind Visualized is a TED-like objective of exploring the intersection of … Continue reading

The post Four Themes From the Visualized Conference appeared first on Gnip Blog - Social Data and Data Science Blog.


          Cleaning Big Data the Easy Way        

Cleaning big data usually invokes big stress levels. With the advancements we’ve seen in technology over the years, many industries have been transformed from the very core. For academic researchers and data scientists, data has become more expansive and detailed than ever before. However, this has also affected business analysts and millions of others in […]

The post Cleaning Big Data the Easy Way appeared first on Dataladder.


          So you want to be a data scientist…        

There’s no doubt that data scientists are in demand, and have been for some time. As far back as 2012, the Harvard Business Review described data scientist as the sexiest job in the 21st century. And while, for some at least, it may conjure up images of geeks in glasses, staring at endless streams of […]

The post So you want to be a data scientist… appeared first on Dataladder.


          [raspberry-python] Readings in Programming        

"Ex-Libris" part IV: Code


I've made available part 4 of my "ex-libris" of a Data Scientist. This one is about code. 

No doubt, many have been waiting for the list that is most related to Python.  In a recent poll by KDNuggets, the top tool used for analytics, data science and machine learning by respondents turned out to also be a programming language: Python.

The article goes from algorithms and theory, to approaches, to the top languages for data science, and more. In all, almost 80 books in just that part 4 alone. It can be found on LinkedIn:

"ex-libris" of a Data Scientist - Part IV

from Algorithms and Automatic Computing Machinesby B. A. Trakhtenbrot




See also


Part I was on "data and databases": "ex-libris" of a Data Scientist - Part i

Part II, was on "models": "ex-libris" of a Data Scientist - Part II



Part III, was on "technology": "ex-libris" of a Data Scientist - Part III

Part V will be on visualization, part VI on communication. Bonus after that will be on management / leadership.

Francois Dion
@f_dion

P.S.
Je vais aussi avoir une liste de publications en francais
En el futuro cercano voy a hacer una lista en espanol tambien

          Google Is Matching Your Offline Buying With Its Online Ads, but It Isn’t Sharing How        

The Federal Trade Commission received a complaint Monday from privacy advocates requesting a full investigation into a new advertising scheme from Google that links individuals’ online browsing data and what they buy offline in stores.

The privacy group that launched the federal complaint, the Electronic Privacy Information Center, alleges that Google is using credit card data to track whether online ads lead to in-store purchases without providing an easy opt-out or clear information about how the system works. The complaint specifically calls out a new advertising program Google unveiled in May that reportedly relies on billions of credit card records, which are matched to data on what ads people click on when logged into Google services.

The ability to link online ads to actual in-store purchases is often described as the “holy grail” of data-driven advertising, according to David Carroll, a professor at the New School who studies the online data tracking industry.

Google says it can’t disclose which companies it works with to get customers’ offline shopping records because of confidentiality agreements it has with those partners. So at the moment, the only way for a Google user to prevent his or her offline purchasing history from being linked to their web browsing is to opt out of Google’s web and app tracking entirely, which could make it nigh-impossible to use other Google services.

If Google did share the names of its partners in its offline ad-tracking program, customers could presumably stop using those services.  There are plenty of reasons why a person wouldn’t want their offline purchasing data to mingle with their online accounts. What you buy at a drug store alone can point to health concerns, sexual history, or other personal information that you may want to keep to yourself.

But Google says not to worry about that information seeping out, since it “does not learn what was actually purchased by any individual person (either the product or the amount). We just learn the number of transactions and total value of all purchases in a time period, aggregated to protect privacy,” a spokesperson said in an email. In other words, Google is saying that the advertiser doesn’t learn who clicked on their ads, just how many of those clicks translated to offline sales.

But that even if the data is anonymized by both credit card payment data holders and Google, those in-store linkages are not truly anonymous, despite what companies claim, according to Chris Hoofnagle, a law professor at Berkeley who specializes in data privacy.

“There’s a long history to this,” Hoofnagle said. Ten years ago, a digital advertiser industry group, the Data and Marketing Association, argued that phone numbers were not personally identifiable information since one number is usually shared within a single household linked to multiple individuals. That logic is being recycled. Hoofnagle says that digital marketers’ “new trick is to take personally identifiable information and hash it.” That means the personal data is encrypted. “That would be fine,” Hoofnagle continued, “but everyone uses the same hashes, and so these hashes are essentially pseudonyms.” Or, as Wolfie Christl, a digital privacy researcher and author of the book Networks of Control, explains in a recent report, data companies generally use the same encryption method. If everyone is masked with the same pseudonym process, it's easy to track that pseudonym across the internet

Just last week, at the annual hacker conference Defcon in Las Vegas, a journalist and a data scientist shared how they were able to obtain a database tracking 3 million German users’ browsing history, spanning 9 million different websites. The data set was said to be anonymized, but the team was able to de-anonymize many of the users, according to a report in the Guardian. For some people, the researchers could just look at the browsing history. For instance, a Twitter analytics page contains a URL with the username in it—so checking to see if a tweet went viral could give away your identity in “anonymous” browsing data.

EPIC’s complaint also points out that Google isn’t sharing enough detail about how it’s encrypting the data. The complaint alleges Googles uses a type of encryption, CryptDB, that has known security flaws. While it’s unclear that Google’s offline to online ad-tracking system uses CryptDB, Google has not shared details on the math and software that its using to implement its encryption.

“We don’t know a lot about how this is implemented,” said Joseph Lorenzo Hall, a technologist with the Center for Democracy and Technology, which is in part funded by Google. Hall says that typically Google would publish a white paper or some further explanation of how its encryption works.

Google also wouldn’t clarify whether users consent to having their web browsing linked to their offline purchase history, but a spokesperson did say that their “payment partners have the rights necessary to use this data.”

Carroll of the New School says that Google’s ad practices here can be manipulative. “Google is in the market of predicting consumer behavior and commoditizing our behavior at scale,” said Carroll. “We don’t know how it works. We don’t know how they are protecting us.”

Even if Google is able to anonymize its ad data, it should still make it easier for people to opt out of linking their browsing history to their offline shopping. Right now, you have to go navigate to the privacy settings of your account and then find the Activity Controls page.

It’s not super intuitive to find, but then again, Google, which is in the business of selling ads, would probably prefer you keep your personal data as accessible as possible.


          Venkatesh Saligrama        
Our webpage has moved to http://sites.bu.edu/data/ I run the the Data Science & Machine Learning laboratory at Boston University. The lab is involved in projects related to Machine Learning, Vision & Learning, Structured Signal Processing, and Decision & Control. The laboratory is led by Prof. Venkatesh Saligrama. In the area of machine learning recent research projects […]
          Data Scientist - Maths Modelling, Python, Forecasting, Machine Learning techniques, Cambridge, to 45k DoE: ECM SELECTION        
£Negotiable: ECM SELECTION
For more latest jobs and jobs in London & South East England visit brightrecruits.com
          CPS launches new training program in data science        
The College of Professional Studies is introducing a six-month program to teach data science and analytical skills to working professionals.
          Data-Driven Interactive Scientific Articles in a Collaborative Environment With Authorea        
Monday, January 23, 2017 12:00NOON – 1:30P.M. UC Davis, Shields Library, Data Science Initiative space, 3rd Floor Lunch served. RSVP here. Most tools that scientists use for the preparation of scholarly manuscripts, such as Overleaf and ShareLaTex, function offline and do not account for the born-digital nature of research objects. Authorea allows scientists to collaboratively … Continue reading Data-Driven Interactive Scientific Articles in a Collaborative Environment With Authorea
          New Computer and Math Camps for High School Students Offered at Ramapo College        
MAHWAH, N.J.– Ramapo College’s Center for Innovative & Professional Learning is offering a series of new computer and math camps for high school students this summer. These summer academic enrichment programs include CompTIA A+ Computer Camp, Game Design for Teens, Programming/Data Science Camp, Numerical Analysis Camp, and the online 3-credit Web Site Development course. Information […]
          Cloudera’s Data Science Workbench        
0. Matt Brandwein of Cloudera briefed me on the new Cloudera Data Science Workbench. The problem it purports to solve is: One way to do data science is to repeatedly jump through the hoops of working with a properly-secured Hadoop cluster. This is difficult. Another way is to extract data from a Hadoop cluster onto […]
          Seven new academic programs coming to campus this fall        
You might already know that IUPUI offers more than 350 undergraduate, graduate and professional programs.And come this fall, there will be a few more.Here’s a look at seven new academic programs from a variety of schools across campus:Ph.D. in data science, School of Informatics and Computing: This degree—the first of its kind in Indiana and in the Big Ten, and one of only a handful in the United States—leads to positions in academia as well as in industry. In fact, Glassdoor, a job and employment-recruiting website, ranks data scientist as the No. 1 job in America based on the number of job openings, salary and overall job-satisfaction rating. According to Glassdoor, the median base salary for a data scientist is $116,840.The field of data science involves collection, organization, management and extraction of knowledge and insights from massive, complex, heterogeneous data sets commonly known as "big data."Ph.D. in American studies, School of Liberal Arts: This nontraditional doctoral program looks to recruit students interested in exploring issues through a multidisciplinary approach, drawing on courses already being offered across the School of Liberal Arts. A doctoral internship of at least a year will help students translate their research into a variety of careers."The Ph.D. program in American studies at IUPUI does not tweak the traditional Ph.D. model, but rather builds an infrastructure for a collaborative and applied graduate school experience in order to close the distance between academia and the world that surrounds it," said Raymond Haberski Jr., professor of history and director of American studies.Graduate minor in communicating science, Department of Communication Studies, School of Liberal Arts: Scientists and health professionals today need to connect to and engage with the lay public, policymakers, funders, students and professionals from other disciplines. As a result, they find the need to tailor their communication for a variety of audiences. This program—designed for future scientists, including researchers and practitioners, who find themselves increasingly responsible for public speaking and writing—will increase students’ career prospects, help them secure funding and help them serve as effective teachers."The courses will offer more than public speaking and writing tips," said Krista Hoffmann-Longtin, assistant professor of communication studies in the School of Liberal Arts and assistant dean for faculty affairs and professional development in the School of Medicine. "Scientists will learn to improvise messages; to tell relevant stories; and to connect effectively with students, collaborators and funders."Liberal arts and management certificate, School of Liberal Arts: A 2013 study suggests that a liberal arts degree coupled with other skills can nearly double job prospects when those skills include marketing, business, data analysis and management—just to name a few."This certificate offers a course of study from both liberal arts and business to better prepare the 21st-century liberal arts graduate to respond to the challenges of a more complex world," said Kristy Sheeler, associate dean for academic programs in the School of Liberal Arts and a professor in the Department of Communication Studies. Contact Sheeler with questions about this new program.Doctor of public health in global health leadership, Richard M. Fairbanks School of Public Health: The school already knows what some students in this new program will do when they graduate: They’ll become state health commissioners; ministers of health; program officers; and mid- to senior-level managers in government agencies, foundations, nonprofits and nongovernmental organizations.That’s based on experiences of a similar program at the University of North Carolina at Chapel Hill. The person who helped design and lead that program is now at IUPUI: Sue Babich, associate dean of global health, director of the doctoral program in global health leadership, and professor of health policy and management.The degree prepares students to be leaders who can address the world’s challenging and complex public health issues. The three-year degree is a distance program, with classes delivered in real time via internet video. Students meet face-to-face three times each year in years one and two, and they complete dissertations in year three.Master of Science degree in product stewardship, Richard M. Fairbanks School of Public Health: The only academic degree available today designed to prepare students for leadership roles in the emerging field of product stewardship will train professionals to help businesses in a wide range of industrial fields navigate increasingly complex regulations as they advocate for the production of products in ways that ease regulatory compliance, minimize risks to people and the environment, and boost profitability.The online 30-credit-hour degree is expected to attract, among others, professionals who are already active in the product-stewardship field seeking formal training that will allow them to move up in their product-stewardship organizations and professionals from a wide range of other backgrounds, including environmental health, regulatory compliance, industrial hygiene, occupational health and safety, sustainability, product development, supply chain, and law.Master of Arts in teaching English to speakers of other languages (TESOL), Department of English, School of Liberal Arts: This 31-credit-hour degree provides both a strong theoretical foundation and hands-on practical experience to prepare national and international graduate students to become effective teachers of English to adult learners who speak other native languages, both in the United States and abroad.Working with IUPUI’s award-winning faculty, students will experience rich opportunities in teaching practica, including not only English for academic purposes but also English for specific purposes—for example, academic, legal, business and medical English. The program features a unique curricular strength in second-language research, materials preparation, curriculum design and the use of technology in second-language learning."It is thrilling to be able to launch the Master of Arts in TESOL at IUPUI," said Ulla Connor, director of the program. "This program is the culmination of TESOL and applied linguistics programming in the Department of English at IUPUI over the past 30 years. Our previous programs include the English for Academic Purposes Program for international students, which began in 1985; the International Center for Intercultural Communication, which started in 1998; and the Program for Intensive English that we began in 2015.”
          Size Matters: Empirical Evidence of the Importance of Training Set Size in Machine Learning        
There is much hype around "big data" these days - how it's going to change the world - which is causing data scientists to get excited about big data analytics, and technologists to scramble to understand how they employ scalable, distributed databases and compute clusters to store and process all this data. Interestingly, Gartner dropped "big [...]
          Oracle Hospitality Introduces New Data Science Cloud Services to Help Food & Beverage Operators Optimize Every Sale        
Press Release

Oracle Hospitality Introduces New Data Science Cloud Services to Help Food & Beverage Operators Optimize Every Sale

Expert Analysis and Machine Learning Enable Improved Menu Optimization and Greater Cost and Inventory Control

Redwood Shores, Calif.—Aug 1, 2017


Empowering food and beverage operators to convert data into profit, Oracle Hospitality today announced Data Science Cloud Services. With the new services, food and beverage operators gain the ability to analyze key information such as sales, guest, marketing and staff performance data at unprecedented speed – generating insights that lead directly to actionable measures that improve their top and bottom lines.

The suite includes two cloud-driven offerings – Oracle Hospitality Menu Recommendations Cloud Service and Oracle Hospitality Adaptive Forecasts Cloud Service ­­– currently available to operators worldwide, enabling them to improve up-sell and cross-sell opportunities, and optimize operations, respectively.

The new Data Science Cloud Services bring Oracle’s renowned machine learning and data-analytics expertise specifically to the food and beverage industry. This, combined with years of hospitality industry knowledge, delivers quick wins for operators, while saving them the significant expense of having to hire their own analysts and invest in a data processing infrastructure. In addition to Oracle technology, Data Science delivers the support of a team of leading data scientists, database engineers and experienced hospitality consultants. 

“Margins are being squeezed in hospitality like never before,” said Mike Webster, Senior Vice President and General Manager, Oracle Hospitality. “Labor and food costs are increasing, and competition for the dining dollar is high. With our Data Science Cloud Services, we are giving our customers the ability to be as profitable as possible, by helping them pinpoint cost-savings in each location while optimizing every single sales opportunity to deliver revenue growth.”

Making Every Sale Count with Oracle Hospitality Menu Recommendations Cloud Service

Oracle Hospitality Menu Recommendations Cloud Service allows food and beverage operators with multiple locations to evaluate their menus and identify enhancements to maximize every sales opportunity. The Data Science service can seek the best possible up-sell or cross-sell options by location or time of day, with recommendations dynamically updating based on customer behavior. Assumptions around cross-sells and up-sells can be analyzed, leading to better understanding of guest behavior and preferences.  

Speed to value is accelerated, thanks to integration between the Data Science service and the Oracle Hospitality technology platform. Recommendations are available at point-of-service terminals and displayed as localized cross-sells or timed up-sells. Such simplicity enables staff to optimize sales and serve guests without delay or confusion.

Predicting Stock and Labor Needs with Oracle Hospitality Adaptive Forecasts Cloud Service

Oracle Hospitality Adaptive Forecasts lets operators better predict stock and labor needs at every location. The service creates a single forecast by item, location and day part, and factors in weather, events, time of day, day of the week and Net Promoter scores. Such forecasting maintains appropriate levels of inventory and staffing in all business scenarios, helping store managers minimize wasted inventory, lower labor costs and, most importantly, ensure an exceptional guest experience.

For Oracle Hospitality customers, these Advanced Science Cloud Services complement the self-service data access and reporting solutions that are already available, including the InMotion mobile app that provides real-time access to restaurant KPIs and the Reporting and Analytics 9.0 service that was launched in April 2017.


About Oracle Hospitality

Oracle Hospitality brings 35 years of experience in providing technology solutions to food and beverage operators. We provide hardware, software, and services that allow our customers to deliver exceptional guest experiences while maximizing profitability. Our solutions include integrated point-of-sale, loyalty, reporting and analytics, inventory and labor management, all delivered from the cloud to lower IT cost and maximize business agility.

For more information about Oracle Hospitality, please visit www.Oracle.com/Hospitality

About Oracle

The Oracle Cloud offers complete SaaS application suites for ERP, HCM and CX, plus best-in-class database Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) from data centers throughout the Americas, Europe and Asia. For more information about Oracle (NYSE:ORCL), please visit us at www.oracle.com.

Trademarks

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Disclaimer

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle.



          Nihar Parikh Joins IQ Workforce as Managing Director, Analytics...        

Nihar's expertise in sourcing and screening data science and analytics talent will enable the firm to keep pace with the fast growing demand for analytics talent.

(PRWeb August 10, 2017)

Read the full story at http://www.prweb.com/releases/IQWorkforce/analyticsrecruiting/prweb14588657.htm


          Experts define new ways to manage supply chain risk in a digital economy        
The next BriefingsDirect digital business thought leadership panel discussion explores new ways that companies can gain improved visibility, analytics, and predictive responses to better manage supply chain risk in the digital economy.

The panel examines how companies such as Nielsen are using cognitive computing search engines, and even machine learning and artificial intelligence (AI), to reduce risk in their overall buying and acquisitions.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To learn more about the exploding sophistication around gaining insights into advanced business commerce, we welcome James Edward Johnson, Director of Supply Chain Risk Management and Analysis at Nielsen; Dan Adamson, Founder and CEO of OutsideIQ in Toronto, and Padmini Ranganathan, Vice President of Products and Innovation at SAP Ariba.

The panel was assembled and recorded at the recent 2017 SAP Ariba LIVE conference in Las Vegas. The discussion is moderated by Dana Gardner, principal analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Padmini, we heard at SAP Ariba LIVE that risk is opportunity. That stuck with me. Are the technologies really now sufficient that we can fully examine risks to such a degree that we can turn that into a significant business competitive advantage? That is to say, those who take on risk seriously, can they really have a big jump over their competitors?

Ranganathan
Ranganathan:I come from Silicon Valley, so we have to take risks for startups to grow into big businesses, and we have seen a lot of successful entrepreneurs do that. Clearly, taking risks drives bigger opportunity.

But in this world of supplier and supply chain risk management, it’s even more important and imperative that the buyer and supplier relationships are risk-aware and risk-free. The more transparent that relationship becomes, the more opportunity for driving more business between those relationships.

That context of growing business -- as well as growing the trust and the transparent relationships -- in a supply chain is better managed by understanding the supplier base. Understanding the risks in the supplier base, and then converting them into opportunities, allows mitigating and solving problems jointly. By collaborating together, they form partnerships.

Gardner: Dan, it seems that what was once acceptable risk can now be significantly reduced. How do people in procurement and supply chain management know what acceptable risk is -- or maybe they shouldn’t accept any risk?

Adamson
Adamson:My roots are also from Silicon Valley, and I think you are absolutely right that at times you should be taking risks -- but not unnecessarily. What the procurement side has struggled with -- and this is from me jumping into financial institutions where they treat risk very differently through to procurement – is risk versus the price-point to avoid that risk. That’s traditionally been the big problem.

For every vendor that you on-board, you have to pay $1,000 for a due diligence report and it's really not price-effective. But, being able to maintain and monitor that vendor on a regular basis at acceptable cost – then there's a real risk-versus-reward benefit in there.

What we are helping to drive are a new set of technology solutions that enable a deeper level of due diligence through technology, through cognitive computing, that wasn't previously possible at the price point that makes it cost-effective. Now it is possible to clamp down and avoid risk where necessary.

Gardner: James, as a consumer of some of these technologies, do you really feel that there has been a significant change in that value equation, that for less money output you are getting a lot less risk?

Knowing what you're up against  

Johnson: To some degree that value was always there; it was just difficult to help people see that value. Obviously tools like this will help us see that value more readily.

It used to be that in order to show the value, you actually had to do a lot of work, and it was challenging. What we are talking about here is that we can begin to boil the ocean. You can test these products, and you can do a lot of work just looking at test results.

Johnson
And, it's a lot easier to see the value because you will unearth things that you couldn't have seen in the past.

Whereas it used to take a full-blown implementation to begin to grasp those risks, you can now just test your data and see what you find. Most people, once they have their eyes wide open, will be at least a little more fearful.  But, at the same time -- and this goes back to the opportunity question you asked -- they will see the opportunity to actually tackle these risks. It’s not like those risks didn't exist in the past, but now they know they are there -- and they can decide to do something about it, or not.

Gardner:So rather than avoid the entire process, now you can go at the process but with more granular tools to assess your risks and then manage them properly?

Johnson:That's right. I wouldn't say that we should have a risk-free environment; that would cost more money than we’re willing to pay. That said, we should be more conscious of what we're not yet willing to pay for.

Rather than just leaving the risk out there and avoiding business where you can’t access information about what you don't know -- now you'll know something. It's your choice to decide whether or not you want to go down the route of eliminating that risk, of living with that risk, or maybe something in between. That's where the sweet spot is. There are probably a lot of intermediate actions that people would be taking now that are very cheap, but they haven't even thought to do so, because they haven’t assessed where the risk is.

Gardner: Padmini, because we're looking at a complex landscape -- a supply chain, a global supply chain, with many tiers -- when we have a risk solution, it seems that it's a team sport. It requires an ecosystem approach. What has SAP Ariba done, and what is the news at SAP Ariba LIVE? Why is it important to be a team player when it comes to a fuller risk reduction opportunity?

Teamwork

Ranganathan:You said it right. The risk domain world is large, and it is specialized. The language that the compliance people use in the risk world is somewhat similar to the language that the lawyers use, but very different from the language that the information technology (IT) security and information security risk teams use.

The reason you can’t see many of the risks is partly because the data, the information, and the fragmentation have been too broad, too wide. It’s also because the type of risks, and the people who deal with these risks, are also scattered across the organization.
It’s not like those risks didn't exist in the past, but now they know they are there -- and they can decide to do something about it, or not.

So a platform that supports bringing all of this together is number one. Second, the platform must support the end-to-end process of managing those supply chain relationships, and managing the full supply chain and gain the transparency across it. That’s where SAP Ariba has headed with Direct Materials Sourcing and with getting more into supply chain collaboration. That’s what you heard at SAP Ariba LIVE.

We all understand that supply chain much better when we are in SAP Ariba, and then you have this ecosystem of partners and providers. You have the technology with SAP and HANA to gain the ability to mash up big data and set it in context, and to understand the patterns. We also have the open ecosystem and the open source platform to allow us to take that even wider. And last but not the least, there is the business network.

So it’s not just between one company and another company, it's a network of companies operating together. The momentum of that collaboration allows users to say, “Okay, I am going to push for finding ethical companies to do business with,” -- and then that's really where the power of the network multiplies.

Gardner: Dan, when a company nowadays buys something in a global supply chain, they are not just buying a product -- they are buying everything that's gone on with that product, such as the legacy of that product, from cradle to PO. What is it that OutsideIQ brings to the table that helps them get a better handle on what that legacy really is?

Dig deep, reduce risk, save time

Adamson: Yes, and they are not just buying from that seller, they are buying from the seller that sold it to that seller, and so they are buying a lot of history there -- and there is a lot of potential risk behind the scenes.

That’s why this previously has been a manual process, because there has been a lot of contextual work in pulling out those needles from the haystack. It required a human level of digging into context to get to those needles.

The exciting thing that we bring is a cognitive computing platform that’s trainable -- and it's been trained by FinCrime’s experts and corporate compliance experts. Increasingly, supply management experts help us know what to look for. The platform has the capability to learn about its subject, so it can go deeper. It can actually pivot on where it's searching. If it finds a presence in Afghanistan, for example, well then that's a potential risk in itself, but it can then go dig deeper on that.

And that level of deeper digging is something that a human really had to do before. This is the exciting revolution that's occurring. Now we can bring back that data, it can be unstructured, it can be structured, yet we can piece it together and provide some structure that is then returned to SAP Ariba.

The great thing about the supply management risk platform or toolkit that's being launched at SAP Ariba LIVE is that there’s another level of context on top of that. Ariba understands the relationship between the supplier and the buyer, and that's an important context to apply as well.

How you determine risk scores on top of all of that is very critical. You need to weed out all of the noise, otherwise it would be a huge data science exercise and everyone would be spinning his or her wheels.
SAP Ariba understands the relationship between the supplier and the buyer, and that's an important context to apply.

This is now a huge opportunity for clients like James to truly get some low-hanging fruit value, where previously it would have been literally a witch-hunt or a huge mining expedition. We are now able to achieve this higher level of value.

Gardner: James, Dan just described what others are calling investigative cognitive computing brought to bear on this supply chain risk problem. As someone who is in the business of trying to get the best tools for their organization, where do you come down on this? How important is this to you?

Johnson: It's very important. I have done the kinds of investigations that he is talking about. For example, if I am looking at a vendor in a high-risk country, particularly a small vendor that doesn't have an international presence  that is problematic for most supplier investigations. What do I do? I will go and do some of the investigation that Dan is talking about.

Now I'm usually sitting at my desk in Chicago. I'm not going out in the world. So there is a heightened level of due-diligence that I suspect neither of us are really talking about here. With that limitation, you want to look up not only the people, you want to look up all their connections. You might have had a due-diligence form completed, but that's an interested party giving you information, what do you do with it?

Well, I can run the risk search on more than just the entity that I'm transacting with.  I am going to run it on everyone that Dan mentioned. Then I am going to look up all their LinkedIn profiles, see who they are connected to. Do any of those people show any red flags? I’d look at the bank that they use. Are there any red flags with their bank?

I can do all that work, and I can spend several hours doing all that work. As a lawyer I might dig a little deeper than someone else, but in the end, it's human labor going into the effort.

Gardner: And that really doesn't scale very well.

Johnson: That does not scale at all. I am not going to hire a team of lawyers for every supplier. The reality here is that now I can do some level of that time-consuming work with every supplier by using the kind of technology that Dan is talking about.

The promise of OutsideIQ technology is incredible. It is an early and quickly expanding, opportunity. It's because of relationships like the one between SAP Ariba and OutsideIQ that I see a huge opportunity between Nielsen and SAP Ariba. We are both on the same roadmap.

Nielsen has a lot of work to do, SAP Ariba has a lot of work to do, and that work will never end, and that’s okay. We just need to be comfortable with it, and work together to build a better world.

Gardner: Tell us about Nielsen. Then secondarily, what part of your procurement, your supply chain, do you think this will impact best first?

Automatic, systematic risk management

Johnson: Nielsen is a market research company. We answer two questions: what do people watch? And what do people buy? It sounds very simple, but when you cover 90% of the world’s population, which we do – more than six billion people -- you can imagine that it gets a little bit more complicated.

We house about 54 petabytes of database data. So the scale there is huge. We have 43,000 employees. It’s not a small company. You might know Nielsen for the set-top boxes in the US that tell what the ratings were overnight for the Super Bowl, for example, but it’s a lot more than that. And you can imagine, especially when you're trying to answer what do people buy in  developing countries with emerging economies? You are touching some riskier things.

In terms of what this SAP Ariba collaboration can solve for us, the first quick hit is that we will no longer have to leverage multiple separate sources of information. I can now leverage all the sources of information at one time through one interface. It is already being used to deliver information to people who are involved in the procurement chain. That's the huge quick win.

The secondary win is from the efficiency that we get in doing that first layer of risk management. Now we can start to address that middle tier that I mentioned. We can respond to certain kinds of risk that, today, we are doing ad-hoc, but not systematically. There is that systematic change that will allow us to not only target the 100 to 200 vendors that we might prioritize -- but the thousands of vendors that are somewhere in our system, too.

That's going to revolutionize things, especially once you fold in the environmental, social and governance (ESG) work that, today, is very focused for us. If I can spread that out to the whole supply chain, that's revolutionary. There are a lot of low-cost things that you can do if you just have the information.
What is the good in the world that’s freely available to me, that I'm not even touching? That's amazing.

So it’s not always a question of, “am I going to do good in the world and how much is it going to cost me?” It’s really a question of, “What is the good in the world that’s freely available to me, that I'm not even touching?” That's amazing! And, that's the kind of thing that you can go to work for, and be happy about your work, and not just do what you need to do to get a paycheck.

Gardner: It’s not just avoiding the bad things; it’s the false positives that you want to remove so that you can get the full benefit of a diverse, rich supplier network to choose from.

Johnson: Right, and today we are essentially wasting a lot of time on suspected positives that turn out to be false. We waste time on them because we go deeper with a human than we need to. Let’s let the machines go as deep as they can, and then let the humans come in to take over where we make a difference.

Gardner: Padmini, it’s interesting to me that he is now talking about making this methodological approach standardized, part of due-diligence that's not ad-hoc, it’s not exception management. As companies make this a standard part of their supply chain evaluations, how can we make this even richer and easier to use?

Ranganathan: The first step was the data. It’s the plumbing; we have to get that right. It’s about the way you look at your master data, which is suppliers; the way you look at what you are buying, which is categories of spend; and where you are buying from, which is all the regions. So you already have the metrics segmentation of that master data, and everything else that you can do with SAP Ariba.

The next step is then the process, because it’s really not a one-size-fits-all. It cannot be a one-size-fits-all, where every supplier that you on-board you are going to ask them the same set of questions, check the box and move on.

I am going to use the print service vendor example again, which is my favorite. For marketing materials printing, you have a certain level of risk, and that's all you need to look at. But you still want, of course, to look at them for any adverse media incidents, or whether they suddenly got on a watch-list for something, you do want to know that.

But when one of your business units begins to use them for customer-confidential data and statement printing -- the level of risk shoots up. So the intensity of risk assessments and the risk audits and things that you would do with that vendor for that level of risk then has to be engineered and geared to that type of risk.

So it cannot be a one-size-fits-all; it has to go past the standard. So the standardization is not in the process; the standardization is in the way you look at risk so that you can determine how much of the process do I need to apply and I can stay in tune.

Gardner: Dan, clearly SAP Ariba and Nielsen, they want the “dials,” they want to be able to tune this in. What’s coming next, what should we expect in terms of what you can bring to the table, and other partners like yourselves, in bringing the rich, customizable inference and understanding benefits that these other organizations want?

Constructing cognitive computing by layer

Adamson: We are definitely in early days on the one hand. But on the other hand, we have seen historically many AI failures, where we fail to commercialize AI technologies. This time it's a little different, because of the big data movement, because of the well-known use cases in machine learning that have been very successful, the pattern matching and recommending and classifying. We are using that as a backbone to build layers of cognitive computing on top of that.

And I think as Padmini said, we are providing a first layer, where it’s getting stronger and stronger. We can weed out up to 95% of the false-positives to start from, and really let the humans look at the thorny or potentially thorny issues that are left over. That’s a huge return on investment (ROI) and a timesaver by itself.

But on top of that, you can add in another layer of cognitive computing, and that might be at the workflow layer that recognizes that data and says, “Jeez, just a second here, there's a confidentiality potential issue here, let's treat this vendor differently and let's go as far as plugging in a special clause into the contract.” This is, I think, where SAP Ariba is going with that. It’s building a layer of cognitive computing on top of another layer of cognitive computing.

Actually, human processes work like that, too. There is a lot of fundamental pattern recognition at the basis of our cognitive thought, and on top of that we layer on top logic. So it’s a fun time to be in this field, executing one layer at a time, and it's an exciting approach.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: SAP Ariba.

You may also be interested in:


          Converged IoT systems: Bringing the data center to the edge of everything        
The next BriefingsDirect thought leadership panel discussion explores the rapidly evolving architectural shift of moving advanced IT capabilities to the edge to support Internet of Things (IoT) requirements.

The demands of data processing, real-time analytics, and platform efficiency at the intercept of IoT and business benefits have forced new technology approaches. We'll now learn how converged systems and high-performance data analysis platforms are bringing the data center to the operational technology (OT) edge.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To hear more about the latest capabilities in gaining unprecedented measurements and operational insights where they’re needed most, please join me in welcoming Phil McRell, General Manager of the IoT Consortia at PTC; Gavin Hill, IoT Marketing Engineer for Northern Europe at National Instruments (NI) in London, and Olivier Frank, Senior Director of Worldwide Business Development and Sales for Edgeline IoT Systems at Hewlett Packard Enterprise (HPE). The discussion is moderated by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: What's driving this need for a different approach to computing when we think about IoT and we think about the “edge” of organizations? Why is this becoming such a hot issue?

McRell: There are several drivers, but the most interesting one is economics. In the past, the costs that would have been required to take an operational site -- a mine, a refinery, or a factory -- and do serious predictive analysis, meant you would have to spend more money than you would get back.

For very high-value assets -- assets that are millions or tens of millions of dollars -- you probably do have some systems in place in these facilities. But once you get a little bit lower in the asset class, there really isn’t a return on investment (ROI) available. What we're seeing now is that's all changing based on the type of technology available.

Gardner: So, in essence, we have this whole untapped tier of technologies that we haven't been able to get a machine-to-machine (M2M) benefit from for gathering information -- or the next stage, which is analyzing that information. How big an opportunity is this? Is this a step change, or is this a minor incremental change? Why is this economically a big deal, Olivier?
Frank

Frank: We're talking about Industry 4.0, the fourth generation of change -- after steam, after the Internet, after the cloud, and now this application of IoT to the industrial world. It’s changing at multiple levels. It’s what's happening within the factories and within this ecosystem of suppliers to the manufacturers, and the interaction with consumers of those suppliers and customers. There's connectivity to those different parties that we can then put together.

While our customers have been doing process automation for 40 years, what we're doing together is unleashing the IT standardization, taking technologies that were in the data centers and applying them to the world of process automation, or opening up.

The analogy is what happened when mainframes were challenged by mini computers and then by PCs. It's now open architecture in a world that has been closed.

Gardner: Phil mentioned ROI, Gavin. What is it about the technology price points and capabilities that have come down to the point where it makes sense now to go down to this lower tier of devices and start gathering information?


Hill
Hill: There are two pieces to that. The first one is that we're seeing that understanding more about the IoT world is more valuable than we thought. McKinsey Global Institute did a study that said that by about 2025 we're going to be in a situation where IoT in the factory space is going to be worth somewhere between $1.2 trillion and $3.7 trillion. That says a lot.

The second piece is that we're at a stage where we can make technology at a much lower price point. We can put that onto the assets that we have in these industrial environments quite cheaply.

Then, you deal with the real big value, the data. All three of us are quite good at getting the value from our own respective areas of expertise.

Look at someone that we've worked with, Jaguar Land Rover. In their production sites, in their power train facilities, they were at a stage where they created an awful lot of data but didn't do anything with it. About 90 percent of their data wasn't being used for anything. It doesn't matter how many sensors you put on something. If you can't do anything with the data, it's completely useless.

They have been using techniques similar to what we've been doing in our collaborative efforts to gain insight from that data. Now, they're at a stage where probably 90 percent of their data is usable, and that's the big change.

Collaboration is key

Gardner: Let's learn more about your organizations and how you're working collaboratively, as you mentioned, before we get back into understanding how to go about architecting properly for IoT benefits. Phil, tell us about PTC. I understand you won an award in Barcelona recently.

McRell: That was a collaboration that our three organizations did with a pump and valve manufacturer, Flowserve. As Gavin was explaining, there was a lot of learning that had to be done upfront about what kind of sensors you need and what kind of signals you need off those sensors to come up with accurate predictions.

When we collaborate, we rely heavily on NI for their scientists and engineers to provide their expertise. We really need to consume digital data. We can't do anything with analog signals and we don't have the expertise to understand what kind of signals we need. When we obtain that, then with HPE, we can economically crunch that data, provide those predictions, and provide that optimization, because of HPE's hardware that now can live happily in those production environments.

Gardner: Tell us about PTC specifically; what does your organization do?

McRell: For IoT, we have a complete end-to-end platform that allows everything from the data acquisition gateway with NI all the way up to machine learning, augmented reality, dashboards, and mashups, any sort of interface that might be needed for people or other systems to interact.

In an operational setting, there may be one, two, or dozens of different sources of information. You may have information coming from the programmable logic controllers (PLCs) in a factory and you may have things coming from a Manufacturing Execution System (MES) or an Enterprise Resource Planning (ERP) system. There are all kinds of possible sources. We take that, orchestrate the logic, and then we make that available for human decision-making or to feed into another system.

Gardner: So the applications that PTC is developing are relying upon platforms and the extension of the data center down to the edge. Olivier, tell us about Edgeline and how that fits into this?
Explore
HPE's Edgeline

IoT Systems
Frank: We came up with this idea of leveraging the enterprise computing excellence that is our DNA within HPE. As our CEO said, we want to be the IT in the IoT.

According to IDC, 40 percent of the IoT computing will happen at the edge. Just to clarify, it’s not an opposition between the edge and the hybrid IT that we have in HPE; it’s actually a continuum. You need to bring some of the workloads to the edge. It's this notion of time of insight and time of action. The closer you are to what you're measuring, the more real-time you are.

We came up with this idea. What if we could bring the depth of computing we have in the data center in this sub-second environment, where I need to read this intelligent data created by my two partners here, but also, actuate them and do things with them?

Take the example of an electrical short circuit that for some reason caught fire. You don’t want to send the data to the cloud; you want to take immediate action. This is the notion of real-time, immediate action.

We take the deep compute. We integrate the connectivity with NI. We're the first platform that has integrated an industry standard called PXI, which allows NI to integrate the great portfolio of sensors and acquisition and analog-to-digital conversion technologies into our systems.

Finally, we bring enterprise manageability. Since we have proliferation of systems, system management at the edge becomes a problem. So, we bring our award-winning and millions-of-licenses sold our Integrated Lights-Out (iLO) that we sell in all our ProLiant servers, and we bring that technology at the edge as well.

Gardner: We have the computing depth from HPE, we have insightful analytics and applications from PTC, what does NI bring to the table? Describe the company for us, Gavin?

Working smarter

Hill: As a company, NI is about a $1.2 billion company worldwide. We get involved in an awful lot of industries. But in the IoT space, where we see ourselves fitting within this collaboration with PTC and HPE, is our ability to make a lot of machines smarter.

There are already some sensors on assets, machines, pumps, whatever they may be on the factory floor, but for older or potentially even some newer devices, there are not natively all the sensors that you need to be able to make really good decisions based on that data. To be able to feed in to the PTC systems, the HPE systems, you need to have the right type of data to start off with.

We have the data acquisition and control units that allow us to take that data in, but then do something smart with it. Using something like our CompactRIO System, or as you described, using the PXI platform with the Edgeline products, we can add a certain level of understanding and just a smart nature to these potentially dumb devices. It allows us not only to take in signals, but also potentially control the systems as well.

We not only have some great information from PTC that lets us know when something is going to fail, but we could potentially use their data and their information to allow us to, let’s say, decide to run a pump at half load for a little bit longer. That means that we could get a maintenance engineer out to an oil rig in an appropriate time to fix it before it runs to failure. We have the ability to control as well as to read in.

The other piece of that is that sensor data is great. We like to be as open as possible in taking from any sensor vendor, any sensor provider, but you want to be able to find the needle in the haystack there. We do feature extraction to try and make sure that we give the important pieces of digital data back to PTC, so that can be processed by the HPE Edgeline system as well.
Explore
HPE's Edgeline

IoT Systems
Frank: This is fundamental. Capturing the right data is an art and a science and that’s really what NI brings, because you don’t want to capture noise; it’s proliferation of data. That’s a unique expertise that we're very glad to integrate in the partnership.

Gardner: We certainly understand the big benefit of IoT extending what people have done with operational efficiency over the years. We now know that we have the technical capabilities to do this at an acceptable price point. But what are the obstacles, what are the challenges that organizations still have in creating a true data-driven edge, an IoT rich environment, Phil?

Economic expertise

McRell: That’s why we're together in this consortium. The biggest obstacle is that because there are so many different requirements for different types of technology and expertise, people can become overwhelmed. They'll spend months or years trying to figure this out. We come to the table with end-to-end capability from sensors and strategy and everything in between, pre-integrated at an economical price point.

Speed is important. Many of these organizations are seeing the future, where they have to be fast enough to change their business model. For instance, some OEM discrete manufacturers are going to have to move pretty quickly from just offering product to offering service. If somebody is charging $50 million for capital equipment, and their competitor is charging $10 million a year and the service level is actually better because they are much smarter about what those assets are doing, the $50 million guy is going to go out of business.

McRell
We come to the table with the ability to come and quickly get that factory, get those assets smart and connected, make sure the right people, parts, and processes are brought to bear at exactly the right time. That drives all the things people are looking for -- the up-time, the safety, the yield,  and performance of that facility. It comes down to the challenge, if you don't have all the right parties together with that technology and expertise, you can very easily get stuck on something that takes a very long time to unravel.

Gardner: That’s very interesting when you move from a Capital Expenditure (CAPEX) to an Operational Expenditure (OPEX) mentality. Every little bit of that margin goes to your bottom line and therefore you're highly incentivized to look for whole new categories of ways to improve process efficiency.

Any other hurdles, Olivier, that you're trying to combat effectively with the consortium?

Frank: The biggest hurdle is the level of complexity, and our customers don't know where to start. So, the promise of us working together is really to show the value of this kind of open architecture injected into a 40-year-old process automation infrastructure and demonstrate, as we did yesterday with our robot powered by our HPE Edgeline is this idea that I can show immediate value to the plant manager, to the quality manager, to the operation manager using the data that resides in that factory already, and that 70 percent or more is unused. That’s the value.

So how do you get that quickly and simply? That’s what we're working to solve so that our customers can enjoy the benefit of the technology faster and faster.

Bridge between OT and IT

Gardner: Now, this is a technology implementation, but it’s done in a category of the organization that might not think of IT in the same way as the business side -- back office applications and data processing. Is the challenge for many organizations a cultural one, where the IT organization doesn't necessarily know and understand this operational efficiency equation and vice versa, and how are we bridging that?

Hill: I'm probably going to give you the high-level end from the operational technology (OT) side as well. These guys will definitely have more input from their own domain of expertise. But, that these guys have that piece of information for that part that they know well is exactly why this collaboration works really well.

You have situations with the idea of the IoT, where a lot of people stood up and said, "Yeah, I can provide a solution. I have the answer," but without having a plan -- never mind a solution. But we've done a really good job of understanding that we can do one part of this system, this solution, really well, and if we partner with the people who are really good in the other aspects, we provide real solutions to customers. I don't think anyone can compete with us with at this stage, and that is exactly why we're in this situation.

Frank: Actually, the biggest hurdle is more on the OT side, not really relying on the IT of the company. For many of our customers, the factory's a silo. At HPE, we haven't been selling too much to that environment. That’s also why, when working as a consortium, it’s important to get to the right audience, which is in the factory. We also bring our IT expertise, especially in the areas of security, because at the moment, when you put an IT device in an OT environment, you potentially have problems that you didn’t have before.

We're living in a closed world, and now the value is to open up. Bringing our security expertise, our managed service, our services competencies to that problem is very important.

Speed and safety out in the open

Hill: There was a really interesting piece in the HPE Discover keynote in December, when HPE Aruba started to talk about how they had an issue when they started bringing conferencing and technology out, and then suddenly everything wanted to be wireless. They said, "Oh, there's a bit of a security issue here now, isn’t there? Everything is out there."

We can see what HPE has contributed to helping them from that side. What we're talking about here on the OT side is a similar state from the security aspect, just a little bit further along in the timeline, and we are trying to work on that as well. Again, we have HPE here and they have a lot of experience in similar transformations.

Frank: At HPE, as you know, we have our Data Center and Hybrid Cloud Group and then we have our Aruba Group. When we do OT or our Industrial IoT, we bring the combination of those skills.

For example, in security, we have HPE Aruba ClearPass technology that’s going to secure the industrial equipment back to the network and then bring in wireless, which will enable the augmented-reality use cases that we showed onstage yesterday. It’s a phased approach, but you see the power of bringing ubiquitous connectivity into the factory, which is a challenge in itself, and then securely connecting the IT systems to this OT equipment, and you understand better the kind of the phases and the challenges of bringing the technology to life for our customers.

McRell: It’s important to think about some of these operational environments. Imagine a refinery the size of a small city and having to make sure that you have the right kind of wireless signal that’s going to make it through all that piping and all those fluids, and everything is going to work properly. There's a lot of expertise, a lot of technology, that we rely on from HPE to make that possible. That’s just one slice of that stack where you can really get gummed up if you don’t have all the right capabilities at the table right from the beginning. 

Gardner: We've also put this in the context of IoT not at the edge isolated, but in the context of hybrid computing and taking advantage of what the cloud can offer. It seems to me that there's also a new role here for a constituency to be brought to the table, and that’s the data scientists in the organization, a new trove of data, elevated abstraction of analytics. How is that progressing? Are we seeing the beginnings of taking IoT data and integrating that, joining that, analyzing that, in the context of data from other aspects of the company or even external datasets?

McRell: There are a couple of levels. It’s important to understand that when we talk about the economics, one of the things that has changed quite a bit is that you can actually go in, get assets connected, and do what we call anomaly detection, pretty simplistic machine learning, but nonetheless, it’s a machine-learning capability.

In some cases, we can get that going in hours. That’s a ground zero type capability. Over time, as you learn about a line with multiple assets, about how all these function together, you learn how the entire facility functions, and then you compare that across multiple facilities, at some point, you're not going to be at the edge anymore. You're going to be doing a systems type analytics, and that’s different and combined.

At that point, you're talking about looking across weeks, months, years. You're going to go into a lot of your back-end and maybe some of your IT systems to do some of that analysis. There's a spectrum that goes back down to the original idea of simply looking for something to go wrong on a particular asset.

The distinction I'm making here is that, in the past, you would have to get a team of data scientists to figure out almost asset by asset how to create the models and iterate on that. That's a lengthy process in and of itself. Today, at that ground-zero level, that’s essentially automated. You don't need a data scientist to get that set up. At some point, as you go across many different systems and long spaces of time, you're going to pull in additional sources and you will get data scientists involved to do some pretty in-depth stuff, but you actually can get started fairly quickly without that work.

The power of partnership

Frank: To echo what Phil just said, in HPE we're talking about the tri-hybrid architecture -- the edge, so let’s say close to the things; the data center; and then the cloud, which would be a data center that you don’t know where it is. It's kind of these three dimensions.

The great thing partnering with PTC is that the ThingWorx platform, the same platform, can run in any of those three locations. That’s the beauty of our HPE Edgeline architecture. You don't need to modify anything. The same thing works, whether we're in the cloud, in the data center, or on the Edgeline.

To your point about the data scientists, it's time-to-insight. There are things you want to do immediately, and as Phil pointed out, the notion of anomaly detection that we're demonstrating on the show floor is understanding those nominal parameters after a few hours of running your thing, and simply detecting something going off normal. That doesn't require data scientists. That takes us into the ThingWorx platform.
Explore
HPE's Edgeline

IoT Systems
But then, to the industrial processes, we're involving systems integration partners and using our own knowledge to bring to the mix along with our customers, because they own the intelligence of their data. That’s where it creates a very powerful solution.

Gardner: I suppose another benefit that the IT organization can bring to this is process automation and extension. If you're able to understand what's going on in the device, not only would you need to think about how to fix that device at the right time -- not too soon, not too late -- but you might want to look into the inventory of the part, or you might want to extend it to the supply chain if that inventory is missing, or you might want to analyze the correct way to get that part at the lowest price or under the RFP process. Are we starting to also see IT as a systems integrator or in a process integrator role so that the efficiency can extend deeply into the entire business process?

McRell: It's interesting to see how this stuff plays out. Once you start to understand in your facility -- or maybe it’s not your facility, maybe you are servicing someone's facility -- what kind of inventory should you have on hand, what should you have globally in a multi-tier, multi-echelon system, it opens up a lot of possibilities.

Today PTC provides a lot of network visibility, a lot of spare-parts inventory, management, and systems, but there's a limit to what these algorithms can do. They're really the best that’s possible at this point, except when you now have everything connected. That feedback loop allows you to modify all your expectations in real time, get things on the move proactively so the right person and parts, process, kit, all show up at the right time.

Then, you have augmented reality and other tools, so that maybe somebody hasn't done this service procedure before, maybe they've never seen these parts before, but they have a guided walk-through and have everything showing up all nice and neat the day of, without anybody having to actually figure that out. That's a big set of improvements that can really change the economics of how these facilities run.

Connecting the data

Gardner: Any other thoughts on process integration?

Frank: Again, the premise behind industrial IoT is indeed, as you're pointing out, connecting the consumer, the supplier, and the manufacturer. That’s why you have also the emergence of a low-power communication layer, like LoRa or Sigfox, that really can bring these millions of connected devices together and inject them into the systems that we're creating.

Hill: Just from the conversation, I know that we’re all really passionate about this. IoT and the industrial IoT is really just a great topic for us. It's so much bigger than what we're talking about. You've talked a little bit about security, you have asked us about the cloud, you have asked us about the integration of the inventory and to the production side, and it is so much bigger than what we are talking about now.

We probably could have twice this long of a conversation on any one of these topics and still never get halfway to the end of it. It's a really exciting place to be right now. And the really interesting thing that I think all of us are now realizing, the way that we have made advancements as a partnership as well is that you don't know what you don't know. A lot of companies are waking up to that as well, and we're using our collaborations to allow us to know what we don’t know

Frank: Which is why speed is so important. We can theorize and spend a lot of time in R&D, but the reality is, bring those systems to our customers, and we learn new use cases and new ways to make the technology advance.

Hill: The way that technology has gone, no one releases a product anymore -- that’s the finished piece, and that is going to stay there for 20, 30 years. That’s not what happens. Products and services are being provided that get constantly updated. How many times a week does your phone update with different pieces of firmware, the app is being updated. You have to be able to change and take the data that you get to adjust everything that’s going on. Otherwise you will not stay ahead of the market.

And that’s exactly what Phil described earlier when he was talking about whether you sell a product or a service that goes alongside a set of products. For me, one of the biggest things is that constant innovation -- where we are going. And we've changed. We were in kind of a linear motion of progression. In the last little while, we've seen a huge amount of exponential growth in these areas.

We had a video at the end of the London HPE Discover keynote, where it was one of HPE’s pieces of what the future could be. We looked at it and thought it was quite funny. There was an automated suitcase that would follow you after you left the airport. I started to laugh at that, but then I took a second and I realized that maybe that’s not as ridiculous as it sounds, because we as humans think linearly. That’s incumbent upon us. But if the technology is changing in an exponential way, that means that we physically cannot ignore some of the most ridiculous ideas that are out there, because that’s what’s going to change the industry.

And even by having that video there and by seeing what PTC is doing with the development that they have and what we ourselves are doing in trying out different industries and different applications, we see three companies that are constantly looking through what might happen next and are ready to pounce on that to take advantage of it, each with their own expertise.

Gardner: We're just about out of time, but I'd like to hear a couple of ridiculous examples -- pushing the envelope of what we can do with these sorts of technologies now. We don’t have much time, so less than a minute each, if you can each come up perhaps with one example, named or unnamed, that might have seemed ridiculous at the time, but in hindsight has proven to be quite beneficial and been productive. Phil?

McRell: You can do this as engineering with us, you can do this in service, but we've been talking a lot about manufacturing. In a manufacturing journey, the opportunity, as Gavin and Olivier are describing here, is at the level of what happened between pre- and post-electricity. How fast things will run, the quality at which they will produce products, and then therefore the business model that now you can have because of that capability. These are profound changes. You will see up-times in some of the largest factories in the world go up double digits. You will see lines run multiple times faster over time.

These are things that, if you just walked in today and walked in in a couple of years to some of the people who run the hardest, it would be really hard to believe what your eyes are seeing at that point, just like somebody who was around before factories had electricity would be astounded by what they see today.

Back to the Future

Gardner: One of the biggest issues at the most macro level in economics is the fact that productivity has plateaued for the past 10 or 15 years. People want to get back to what productivity was -- 3 or 4 percent a year. This sounds like it might be a big part of getting there. Olivier, an example?

Frank: Well, an example would be more like an impact on mankind and wealth for humanity. Think about that with those technologies combined with 3D printing, you can have new class of manufacturers anywhere in the world -- in Africa, for example. With real-time engineering, some of the concepts that we are demonstrating today, you have designing.

Another part of PTC is Computer-Aided Design (CAD) systems and Product Lifecycle Management (PLM), and we're showing real-time engineering on the floor again. You design those products and you do quick prototyping with your 3D printing. That could be anywhere in the world. And you have your users testing the real thing, understanding whether your engineering choices were relevant, if there are some differences between the digital model and the physical model, this digital twin ID.

Then, you're back to the drawing board. So, a new class of manufacturers that we don’t even know, serving customers across the world and creating wealth in areas that are (not) up to date, not industrialized.

Gardner: It's interesting that if you have a 3D printer you might not need to worry about inventory or supply chain.

Hill: Just to add on that one point, the bit that really, really excites me about where we are with technology, as a whole, not even just within the collaboration, you have 3D printing, you have the availability of open software. We all provide very software-centric products, stuff that you can adjust yourself, and that is the way of the future.

That means that among the changes that we see in the manufacturing industry, the next great idea could come from someone who has been in the production plant for 20 years, or it could come from Phil who works in the bank down the road, because at a really good price point, he has the access to that technology, and that is one of the coolest things that I can think about right now.

Where we've seen this sort of development and this use of these sort of technologies and implementations and seen a massive difference, look at someone like Duke Energy in the US. We worked with them before we realized where our capabilities were, never mind how we could implement a great solution with PTC and with HPE. Even there, based on our own technology, those guys in the para-production side of things in some legacy equipment decided to try and do this sort of application, to have predictive maintenance to be able to see what’s going on in their assets, which are across the continent.

They began this at the start of 2013 and they have seen savings of an estimated $50 billion up to this point. That’s a number.

Listen to the podcast. Find it on iTunes. Get the mobile appRead a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:


          IDOL-powered appliance delivers better decisions via comprehensive business information searches        
The next BriefingsDirect digital transformation case study highlights how a Swiss engineering firm created an appliance that quickly deploys to index and deliver comprehensive business information.

By scouring thousands of formats and hundreds of languages, the approach then provides via a simple search interface unprecedented access to trends, leads, and the makings of highly informed business decisions.

We will now explore how SEC 1.01 AG delivers a truly intelligent services solution -- one that returns new information to ongoing queries and combines internal and external information on all sorts of resources to produce a 360-degree view of end users’ areas of intense interest.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To learn how to access the best available information in about half the usual time, we're joined by David Meyer, Chief Technology Officer at SEC 1.01 AG in Switzerland. The discussion is moderated by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Meyer
Gardner: What are some of the trends that are driving the need for what you've developed. It's called the i5 appliance?

Meyer: The most important thing is that we can provide instant access to company-relevant information. This is one of today’s biggest challenges that we address with our i5 appliance.

Decisions are only as good as the information bases they are made on. The i5 provides the ability to access more complete information bases to make substantiated decisions. Also, you don’t want to search all the time; you want to be proactively informed. We do that with our agents and our automated programs that are searching for new information that you're interested in.

Gardner: As an organization, you've been around for quite a while and involved with  large applications, packaged applications -- SAP, for example and R/3 -- but over time, more data sources and ability to gather information came on board, and you saw the need in the market for this appliance. Tell us a little bit about what led you to create it?

Accelerating the journey

Meyer: We started to dive into big data about the time that HPE acquired Autonomy, December 2011, and we saw that it’s very hard for companies to start to become a data-driven organization. With the i5 appliance, we would like to help companies accelerate their journey to become such a company.

Gardner: Tell us what you mean by a 360-degree view? What does that really mean in terms of getting the right information to the right people at the right time?

Meyer: In a company's information scope, you don’t just talk about internal information, but you also have external information like news feeds, social media feeds, or even governmental or legal information that you need and don’t have to time to search for every day.

So, you need to have a search appliance that can proactively inform you about things that happen outside. For example, if there's a legal issue with your customer or if you're in a contract discussion and your partner loses his signature authority to sign that contract, how would you get this information if you don't have support from your search engine?
Mission Critical
Server Choices

Have Never Been Better
Gardner: And search has become such a popular paradigm for acquiring information, asking a question, and getting great results. Those results are only as good as the data and content they can access. Tell us a little bit about your company SEC 1.01 AG, your size and your scope or your market. Give us a little bit of background about your company.

Meyer: We've been an HPE partner for 26 years, and we build business-critical platforms based on HPE hardware and also the HPE operating system, HP-UX. Since the merger of Autonomy and HPE in 2011, we started to build solutions based on HPE's big-data software, particularly IDOL and Vertica.

Gardner: What was it about the environment that prevented people from doing this on their own? Why wouldn't you go and just do this yourself in your own IT shop?

Meyer: The HPE IDOL software ecosystem, is really an ecosystem of different software, and these parts need to be packed together to something that can be installed very quickly and that can provide very quick results. That’s what we did with the i5 appliance.

We put all this good software from HPE IDOL together into one simple appliance, which is simple to install. We want to accelerate the time that is needed to start with big data to get results from it and to get started with the analytical part of using your data and gain money out of it.

Multiple formats

Gardner: As we mentioned earlier, getting the best access to the best data is essential. There are a lot of APIs and a lot of tools that come with the IDOL ecosystem as you described it, but you were able to dive into a thousand or more file formats, support a 150 languages, and 400 data sources. That's very impressive. Tell us how that came about.

Meyer: When you start to work with unstructured data, you need some important functionality. For example, you need to have support for lot of languages. Imagine all these social media feeds in different languages. How do you track that if you don't support sentiment analysis on these messages?

On the other hand, you also need to understand any unstructured format. For example, if you have video broadcasts or radio broadcasts and you want to search for the content inside these broadcasts, you need to have a tool to translate the speech to text. HPE IDOL brings all the functionality that is needed to work with unstructured data, and we packed that together in our i5 appliance.

Gardner: That includes digging into PDFs and using OCR. It's quite impressive how deep and comprehensive you can be in terms of all the types of content within your organization.
Access the Free
HPE Vertica

Community Edition
How do you physically do this? If it's an appliance, you're installing it on-premises, you're able to access data sources from outside your organization, if you choose to do that, but how do you actually implement this and then get at those data sources internally? How would an IT person think about deploying this?

Meyer: We've prepared installable packages. Mainly, you need to have connectors to connect to repositories, to data ports. For example, if you have a Microsoft Exchange Server, you have a connector that understands very well how the Exchange server can communicate to that connector. So, you have the ability to connect to that data source and get any content including the metadata.

You talk about metadata for an e-mail, for example, the “From” to “To”, to “Subject,” whatever. You have the ability to put all that content and this metadata into a centralized index, and then you're able to search that information and refine the information. Then, you have a reference to your original document.

When you want to enrich the information that you have in your company with external information, we developed a so-called SECWebConnector that can capture any information from the Internet. For example, you just need to enter an RSS feed or a webpage, and then you can capture the content and the metadata you want it to search for or that is important for your company.

Gardner: So, it’s actually quite easy to tailor this specifically to an industry focus, if you wish, to a geographic focus. It’s quite easy to develop an index that’s specific to your organization, your needs, and your people.

Informational scope

Meyer: Exactly. In our crowded informational system that we have with the Internet and everything, it’s important that companies can choose where they want to have the information that is important for them. Do I need legal information, do I need news information, do I need social media information, and do I need broadcasting information? It’s very important to build your own informational scope that you want to be informed about, news that you want to be able to search for.

Gardner: And because of the way you structured and engineered this appliance, you're not only able to proactively go out and request things, but you can have a programmatic benefit, where you can tell it to deliver to you results when they arise or when they're discovered. Tell us a little bit how that works.

Meyer: We call them agents. You can define which topics you're interested in, and when some new documents are found by that search or by that topic, then you get informed, with an email or with a push notification on the mobile app.

Gardner: Let’s dig into a little bit of this concept of an appliance. You're using IDOL and you're using Vertica, the column-based or high-performance analytics engine, also part of HPE, but soon to be part of Micro Focus. You're also using 3PAR StoreServ and ProLiant DL380 servers. Tell us how that integration happened and why you actually call this an appliance, rather than some other name?
In our crowded informational system that we have with the Internet and everything, it’s important that companies can choose where they want to have the information that is important for them.

Meyer: Appliance means that all the software is patched together. Every component can talk to the others, talks the same language, and can be configured the same way. We preconfigure a lot, we standardize a lot, and that’s the appliance thing.

And it’s not bound on hardware. So, it doesn’t need to be this DL380 or whatever. It also depends on how big your environment will be. It can also be a c7000 Blade Chassis or whatever.

When we install an appliance, we have one or two days until it’s installed, and then it starts the initial indexing program, and this takes a while until you have all the data in the index. So, the initial load is big, but after two or three days, you're able to search for information.

You mentioned the HPE Vertica part. We use Vertica to log every action that goes on, on the appliance. On one hand, this is a security feature. You need to prove if nobody has found the salary list, for example. You need to prove that and so you need to log it.

On the other hand, you can analyze what users are doing. For example, if they don’t find something and it’s always the same thing that people are searching in the company and can't find, perhaps there's some information you need to implement into the appliance.

Gardner: You mentioned security and privileges. How does the IT organization allow the right people to access the right information? Are you going to use some other policy engine? How does that work?

Mapped security

Meyer: It's included. It's called mapped security. The connector takes the security information with the document and indexes that security information within the index. So, you will never be able to find a document that you don't have access to in your environment. It's important that this security is given by default.

Gardner: It sounds to me, David, like were, in a sense, democratizing big data. By gathering and indexing all the unstructured data that you can possibly want to, point at it, and connect to, you're allowing anybody in a company to get access to queries without having to go through a data scientist or a SQL query author. It seems to me that you're really opening up the power of data analysis to many more people on their terms, which are basic search queries. What does that get an organization? Do you have any examples of the ways that people are benefiting by this democratization, this larger pool of people able to use these very powerful tools?

Meyer: Everything is more data-driven. The i5 appliance can give you access to all of that information. The appliance is here to simplify the beginning of becoming a data-driven organization and to find out what power is in the organization's data.
Mission Critical
Server Choices

Have Never Been Better
For example, we enabled a Swiss company called Smartinfo to become a proactive news provider. That means they put lots of public information, newspapers, online newspapers, TV broadcasts, radio broadcasts into that index. The customers can then define the topics they're interested in and they're proactively informed about new articles about their interests.

Gardner: In what other ways do you think this will become popular? I'm guessing that a marketing organization would really benefit from finding relationships within their internal organization, between product and service, go-to market, and research and development. The parts of a large distributed organization don't always know what the other part is doing, the unknown unknowns, if you will. Any other examples of how this is a business benefit?

Meyer: You mentioned the marketing organization. How could a marketing organization listen what customers are saying? For example, on social media they're communicating there, and when you have an engine like i5, you can capture these social media feeds, you can do sentiment analysis on that, and you will see an analyzed view on what's going on about your products, company, or competitors.

You can detect, for example, a shitstorm about your company, a shitstorm about your competitor, or whatever. You need to have an analytic platform to see that, to visualize that, and this is a big benefit.

On the other hand, it's also this proactive information you get from it, where you can see that your competitor has a new campaign and you get that information right now because you have an agent with the customer's name. You can see that there is something happening and you can act on that information.

Gardner: When you think about future capabilities, are there other aspects that you can add on? It seems extensible to me. What would we be talking about a year from now, for example?

Very extensible

Meyer: It's pretty much extensible. I think about all these different verticals. You can expand it for the health sector, for the transportation sector, whatever. It doesn't really matter.

We do network analysis. That means when you prepare yourself to visit a company, you can have a network picture, what relationships this company has, what employees work there, who is a shareholder of that company, which company has contracts with any of other companies?

This is a new way to get a holistic image of a company, a person, or of something that you want to know. It's thinking how to visualize things, how to visualize information, and that's the main part we are focusing on. How can we visualize or bring new visualizations to the customer?

Gardner: In the marketplace, because it's an ecosystem, we're seeing new APIs coming online all the time. Many of them are very low cost and, in many cases, open source or free. We're also seeing the ability to connect more adequately to LinkedIn and Salesforce, if you have your license for that of course. So, this really seems to me a focal point, a single pane of glass to get a single view of a customer, a market, or a competitor, and at the same time, at an affordable price.

Let's focus on that for a moment. When you have an appliance approach, what we're talking about used to be only possible at very high cost, and many people would need to be involved -- labor, resources, customization. Now, we've eliminated a lot of the labor, a lot of the customization, and the component costs have come down.
Access the Free
HPE Vertica

Community Edition
We've talked about all the great qualitative benefits, but can we talk about the cost differential between what used to be possible five years ago with data analysis, unstructured data gathering, and indexing, and what you can do now with the i5?

Meyer: You mentioned the price. We have an OEM contract, and that that's something that makes us competitive in the market. Companies can build their own intelligence service. It's affordable also for small and medium businesses. It doesn't need to be a huge company with own engineering and IT staff. It's affordable, it's automated, it's packed together, and simple to install.

Companies can increase the workplace performance and shorten the processes. Anybody has access to all the information they need in their daily work, and they can focus more on their core business. They don't lose time in searching for information and not finding it and stuff like that.

Gardner: For those folks who have been listening or reading, are intrigued by this, and want to learn more, where would you point them? How can they get more information on the i5 appliance and some of the concepts we have been discussing?

Meyer: That's our company website, sec101.ch. There you can find any information you would like to have. And this is available now.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:


          Entry Level Data Scientist - IBM - Canada        
If you are a Foreign National from any of the following embargoed countries (Cuba, Iran, North Korea, Sudan, and Syria) on a work permit, you are not eligible...
From IBM - Wed, 09 Aug 2017 15:02:06 GMT - View all Canada jobs
          Fund effort seeks to ID lung cancer        

A crowdfunding campaign aimed at building artificial intelligence into tools used to spot lung cancer is offering $100,000 to coders, engineers and researchers who will build the software that could identify the world’s most deadly cancer earlier and more accurately.

The Bonnie J. Addario Lung Cancer Foundation, which put up the funds for the collaborative effort, wants to develop software that will improve the ability of CT scans to pinpoint lung cancer while it’s most treatable. And the goal by April is to have a program that can be delivered to clinics.

“We wanted to focus on big data, machine learning,” Guneet Walia, the foundation’s senior director of research and medical affairs, told the Herald. “We wanted something that wasn’t just some code sitting on some engineer’s computer, but something that could work in a clinic.”

Cancer experts, big data scientists, engineers and others will contribute patches and improvements to a diagnostic tool. Participants will get points for adding lines of code to move the project forward and cash awards for gaining the most points in certain categories.

The foundation wants the push for earlier diagnoses to help it reach its ultimate goal of having lung cancer be a chronically managed disease by 2023 — “something you die with, not die of,” Walia said.

Lung cancer, which kills more people per year than any other cancer, is typically diagnosed late and after it has spread to other parts of the body. One of the problems with spotting lung cancer early, experts say, is it often doesn’t show symptoms until later stages when it has spread.

“Most people who have symptoms — a cough, coughing up blood, shortness of breath — they already have advanced disease,” said Dr. Christopher Lathan, of the Dana-Farber Cancer Institute. “Finding lung cancer when it is asymptomatic is the best way to cure people.”

Author(s): 

Brian Dowling

Follow on Twitter @be_d

Organization

Boston Herald

Articles

Blog Posts

051617taboccopw005.jpg

LATE DIAGNOSIS: Dr. Christopher Lathan of the Dana-Farber Cancer Institute said people who have the symptoms of lung cancer ‘already have advanced disease.’
Source: 
DTI
Freely Available: 
Disable AP title update: 

          MassMutual to Partner with UVM In Groundbreaking Data Science Initiative        
Seeking to expand the applications of computational, social and data science, Massachusetts Mutual Life Insurance Company (MassMutual) announced today that it is providing the University of Vermont (UVM) $500,000 to fund an innovative pilot program within the university’s Vermont Complex Systems Center.
          Gasser’s grant lays groundwork for student success        

A project led by iSchool Professor Les Gasser, "Simulating Social Systems at Scale (SSS)," has laid the groundwork for a prestigious award to a student researcher. Santiago Núñez-Corrales, an Informatics PhD student directed by Gasser, was recently chosen from among several hundred applicants to receive an ACM SIGHPC/Intel Computational & Data Science Fellowship, worth $15,000 per year for at least three years. 

Gasser's SSS project, which earned a 2016-2017 Faculty Fellowship from the National Center for Supercomputing Applications (NCSA), demonstrates new approaches to building very large computer models of social phenomena such as social change, the emergence of organizations, and the evolution of language and information. The project also explores new ways of connecting "live" social data to running simulations and new ways of visualizing social processes.

Núñez-Corrales is working on multidisciplinary problems in the project with three elements: (1) discrete event simulations that are too large and complex to compute complete solutions with available computing resources; (2) simulation elements that can be combined, condensed, or eliminated stochastically; and (3) specific driving applications that are very large data-driven models of social systems. He is also developing a novel method for comparing content of simulations based on "spectral analysis" of the simulation activity. 

"Research like this requires the ability to draw together knowledge from many disciplines including simulation, statistical physics, stochastic computing, and domain issues such as modeling social or biological structures and their evolution dynamics. Santiago has the fluency in all of these arenas to be able to synthesize novel solutions that push the state of the art, and this had a direct impact on his success," Gasser said.

Gasser has a joint appointment in the Department of Computer Science and faculty affiliate appointments in the Computational Science and Engineering program and the Beckman Institute at the University of Illinois. He also holds a faculty appointment in the Institute for Software Research at the University of California, Irvine. He has published over seventy technical papers and five books on the topics of social informatics and multi-agent systems.


          RR 319 Machine Learning with Tyler Renelle        

RR 319 Machine Learning with Tyler Renelle

This episode of the Ruby Rogues Panel features panelists Charles Max Wood and Dave Kimura. Tyler Renelle, who stops by to talk about machine learning, joins them as a guest. Tyler is the first guest to talk on Adventures in Angular, JavaScript Jabber, and Ruby Rogues. Tune in to find out more about Tyler and machine learning!

What is machine learning?

Machine learning is a different concept than programmers are used to.

There are three phases in computing technology.

  • First phase – building computers in the first place but it was hard coded onto the physical computing machinery
  • Second phase – programmable computers. Where you can reprogram your computer to do anything. This is the phase where programmers fall.
  • Third phase – machine learning falls under this phase.

Machine learning is where the computer programs itself to do something. You give the computer a measurement of how it’s doing based on data and it trains itself and learns how to do the task. It is beginning to get a lot of press and become more popular. This is because it is becoming a lot more capable by way of deep learning.

AI – Artificial Intelligence

Machine learning is a sub field of artificial intelligence. AI is an overarching field of the computer simulating intelligence. Machine learning has become less and less a sub field over time and more a majority of AI. Now we can apply machine learning to vision, speech processing, planning, knowledge representation. This is fast taking over AI. People are beginning to consider the terms artificial intelligence and machine learning synonymous.

Self-driving cars are a type of artificial intelligence. The connection between machine learning and self-driving cars is abstract. A fundamental thing in self-driving cars is machine learning. You program the car as to how to fix its mistakes. Another example is facial recognition. The program starts learning about a person’s face over time so it can make an educated guess as to if the person is who they say they are. Once statistics are added then your face can be off by a hair or a hat. Small variations won’t throw it off.

How do we start solving the problems we want to be solved?

Machine learning has been applied since the 1950s to a broad spectrum of problems. Have to have a little bit of domain knowledge and do some research.

Machine Learning Vs Programming

Machine learning is any sort of fuzzy programming situation. Programming is when you do things specifically or statically.

Why should you care to do machine learning?

People should care because this is the next wave of computing. There is a theory that this will displace jobs. Self-driving cars will displace truck drivers, Uber drivers, and taxis. There are things like logo generators already. Machines are generating music, poetry, and website designs. We shouldn’t be afraid that we should keep an eye towards it.

If a robot or computer program or AI were able to write its own code, at what point would it be able to overwrite or basically nullify the three laws of robotics?

Nick Bostrom wrote the book Superintelligence, which had many big names in technology talking about the dangers of AI. Artificial intelligence has been talked about widely because of the possibility of evil killer robots in the Sci-Fi community. There are people who hold very potential concerns, such as job automation.

Consciousness is a huge topic of debate right now on this topic. Is it an emergent property of the human brain? Is what we have with deep learning enough of a representation to achieve consciousness? It is suggested that AI may or may not achieve consciousness. The question is if it is able to achieve consciousness - will we be able to tell there isn’t a person there?

If people want to dive into this where do they go?

Machine Learning Language

The main language used for machine learning is Python. This is not because of the language itself, but because of the tools built on top of it. The main framework is TensorFlow. Python in TensorFlow drops to C and executes code on the GPU for performing matrix algebra, which is essential for deep learning. You can always use C, C++, Java, and R. Data scientists mostly use R, while researchers use C and C++ so they can custom code their matrix algebra themselves.

Picks

Dave:

Charles:

Tyler:


          Big Data, A Rising Data Management Platform for Many- Global Market to Benefit from Mergers        
Companies using big data products have experienced timely completion of high-priority jobs, every time. Hadoop has become a globally recognized technology, commonly used for Big Data. Companies such as Twitter, LinkedIn, Facebook, and eBay use big data management technologies.  History says that earlier DBMS and SQL were popularly used as data management tools. But today, advancements in the technology have paved the way for big data technologies such as Hadoop, as they offer a more generalized data management platform than the traditional ones. Today, big data being a universal concept, it has been adopted even by the small scale organizations.  The high efficiency of open source software and the cost advantages are the major driving factors for the global big data market. As organizations merge to increase their share in the global market, they need to advance their big data usage to run business campaigns, applications, and activities. Social Media giants such as Facebook and Twitter use big data to manage millions of daily databases. Mergers between two top performing social media companies will definitely block the data clusters and will create the need to update their currently operative big data services.  Mergers and Acquisitions would Require Updating Existing Big Data Products Recently, Microsoft, LinkedIn, and Twitter were in the news, where Twitter was predicted to benefit due to the acquisition of LinkedIn by Microsoft. Wonder how these social media giants manage a database of millions of profiles? The job is done by big data service providers.  By adopting the advanced big data products, organizations can continue operating their business. There is no doubt that the adoption of big data products is growing, not just by globally recognized organizations but also by small-size companies. Thus, the global big data market is expected to find its new growth opportunities in the years to come.    Big data services play an important role for data analysts, business experts, and data scientist of organizations to drive valuable business insights and performance metrics. Leading players in the global big data market are taking efforts to introduce new products and services to fulfill the growing data management needs of organizations. Thus, the global big data market is expected to create new growth opportunities in the years to come.

Original Post Big Data, A Rising Data Management Platform for Many- Global Market to Benefit from Mergers source Twease
          Project Management in Data Science        
Given that more and more companies need to be data-driven, data science is one of the most sought-after fields of expertise today. But while data science is a relatively new field, its best practices reveal how much it depends on strong project management.
          [DSP2017] 19# Romans z Pythonem cz.2 (Visual Studio 2017 Preview 15.2 & SQL Server 2017 CTP 2.0)        

Dziś sobie zrobimy kontynuację mini cyklu związanego z Pythonem. Ostatnio pokazałem częściowo tajniki składni tego języka. Tych tajników jest znacznie więcej i napiszę o nich jeszcze, tym razem jednak podzielę się informacjami na temat wsparcia Pythona w najnowszych wersjach narzędzi Microsoft.

Zacznijmy od Visual Studio 2017. O ile przed wersją finalną były dostępne moduły instalacyjne dla Pythona i Data Science, o tyle nie trafiły ostatecznie do wersji finalnej. Update 1 czyli wersja 15.1 też tego nie przynosi. Dopiero rozwojowa wersja 15.2 znowu oferuje te funkcjonalności. Możemy ją sobie zainstalować bez żadnych komplikacji obok stabilnej wersji produkcyjnej. Widzimy, że z poziomu Visual Studio możemy wskazać instalację Pythona, a także środowiska Anaconda.

vs2017_preview_install

W nowym Visual Studio możemy tworzyć projekty w języku Python dedykowane dla machine learning, a także Web. Machine learning w Pythonie to też dobry temat na niejednego posta, dziś jednak skupiam się na samych narzędziach. W projekcie dla machine learning usunąłem z pliku .py predefiniowany kod i napisałem w nim kilka prostych linijek. Zauważmy, że mam Intellisense. Przy czym od razu mi nie zadziałał. Wchodząc w ustawienia edytora wg. wątku Visual Studio 2017 Preview with Python (Intellisense not working) odznaczyłem opcję “Hide advanced members”.

vs_python_1b

Mamy też okno interaktywne dla Pythona. Dodam że edytor obsługuje podział kodu na komórki w takim samym formacie, w jakim zapisuje sesję Jupyter. Rolę separatora pełni znacznik #%%. Jeśli naciśniemy CTRL + Enter przy kursorze w obrębie danej komórki, to jej kod zostanie przeklejony do okna interaktywnego i wykonany.

vs_python_2b

Visual Studio zapewnia też wygodne debugowanie.

vs_python3b

Aktualności dotyczące Python Tools dla Visual Studio można śledzić na oficjalnym blogu Python Engineering at Microsoft. Jeśli z jakichś powodów nie możemy korzystać z najnowszej wersji 2017, to nie wszystko stracone. Istnieją Python Tools dla wcześniejszych wersji i to nie od wczoraj. Mogę polecić jeszcze cykl mini-szkoleń Getting Started with Python Development using Visual Studio. Co prawda jest z 2015 roku, ale pokazane w nim funkcjonalności są nadal aktualne w 2017. 

Przejdźmy teraz do najnowszego SQL Server 2017. Wszystko zaczęło się od konferencji Data Amp, a esencją jest prezentacja Python based machine learning in SQL Server, której obejrzenie szczerze polecam. Warto odwiedzać też co jakiś czas SQL Server Blog, a zwłaszcza przeczytać umieszczony na nim post Python in SQL Server 2017: enhanced in-database machine learning. Jak szybko zacząć cieszyć się Pythonem w SQL Server? Linki do pobrania ostatniej wersji CTP 2.0 zebrane są w oficjalnym poście SQL Server 2017 Community Technology Preview 2.0 now available. Rejestrujemy się i wybieramy platformę. Póki co wybrałem instalkę dla Windows. Podczas instalacji zaznaczyłem komponenty związane z Pythonem i machine learning.

sql_1

Następnie zainstalowałem najnowsze SQL Server Management Studio dostępne w wersji 17.0, które obsługuje już SQL Server 2017. Po czym odblokowałem uruchamianie zewnętrznych skryptów:

sp_configure 'external scripts enabled', 1; 
RECONFIGURE;

Zrestartowałem serwer. Następnie uruchomiłem sobie skrypt jak poniżej:

sql_2

Przyznaję, 17s pomieliło (narzut pomiędzy T-SQL a Pythonem? wczesna wersja CTP?), po czym wyskoczyły efekty wykonania. Generalnie bardzo fajna sprawa, zastanawia mnie tylko nieco długi czas wykonywania w stosunku do samego Pythona.

W następnym odcinku o Pythonie powrócimy do rozważań nad samym językiem.


          [DSP2017] 17# Romans z Pythonem cz.1 (porządek, matematyka, sterowanie, kolekcje, funkcje, obiekty)        

Dziś napiszę dla odmiany coś o Pythonie. Nie porzucam rzeczywistości rozszerzonej, HoloLens i Unity, które obecnie nadal pozostają tematem wiodącym. Przyszedł mi do głowy natomiast taki mini-cykl związany z Pythonem i pewnymi rzeczami naokoło niego, który wprowadzi pewne urozmaicenie, a jednocześnie będzie w zgodzie z regułami DSP 2017.

Czemu w ogóle warto uczyć się Pythona? Python jest językiem uniwersalnego zastosowania o popularności niewiele ustępującej C# i jest często używany obok języka R w uczeniu maszynowym i Data Science (w aplikacjach serwerowych i webowych też, ale nie jest to już takie wyjątkowe). Jest też głównym językiem używanym do programowania Raspberry Pi (jak się na nim postawi Windows 10 IoT Core to można pisać i w C#, co też zrobiłem w zeszłym roku i wcześniej, ale mówię jak jest generalnie). Visual Studio 2017 i 2015 oferuje wsparcie w postaci Python Tools, a ostatnio Python stał się językiem natywnie wspieranym przez SQL Server 2017, który przy wsparciu dla procesorów graficznych przeobraża się w wydajną platformę do budowania aplikacji AI.

python_logoball-python

Do początkowych zabaw z językiem może wystarczyć nam nawet sama przeglądarka internetowa jeśli wybierzemy środowisko Jupyter. Myślę, że wpis Nadal w kosmicznym klimacie – Jupyter krótko i treściwie wprowadza w używanie tego narzędzia. Jupitera możemy postawić także u siebie i również wtedy korzystać z przeglądarki internetowej, co też uczyniłem z uwagi na rozłączanie wersji on-line po okresie bezczynności. Instalujemy wtedy najpierw popularne środowisko Anaconda, które znacząco ułatwia zarządzanie pakietami i instalacjami Pythona.

A teraz trochę uwag odnośnie poznawania podstaw Pythona, a także trochę ciekawostek  i rzeczy, które nieraz trudno będzie odnaleźć w innych językach.

 

1#  Czystość i porządek

Za pomocą wcięć sterujemy zagnieżdżaniem bloków. Obchodzimy się bez nawiasów.

image

2#  Łatwiejsza matematyka

Pamięta ktoś silnię ze szkoły podstawowej?  W standardowej bibliotece math jest do niej funkcja.

image

Operator potęgi w samym języku

image

Bajka. Nielimitowana precyzja dla int. Liczby ograniczone jedynie pamięcią maszyny.

image

Na poziomie języka możemy definiować liczby urojone i zespolone, to nieczęsto się zdarza !

image

Mało tego, dzięki pakietowi cmath pierwiastek z –1 istnieje jako liczba urojona:

image

Pakiet cmath pozwala wyliczyć pewne wartości dla liczb zespolonych np. fazę.

Po co nam takie liczby? Może do faktur się nie przydadzą, ale już do obliczeń związanych z elektrycznością jak najbardziej. Dodatkowo przypomina mi się jak w zeszłym roku używałem szybkiej transformaty Fouriera do rozkładu dźwięku na zakresy częstotliwości. Tam też były liczby zespolone, tyle że w Javie czy C# trzeba sobie jakoś samemu je zamodelować.

Float duży zakres i dokładność.

Klasa Decimal w standardowym module decimal.

image

Wartości niecałkowite należy zawsze podawać w cudzysłowie, inaczej otrzymamy coś takiego:

image

Ułamki można też definiować za pomocą klasy Fraction ze standardowego modułu fractions.

 

3#  Warunki i pętle

Zamiast if else można krócej, wystarczy elif.

image

Oprócz for jest też oczywiście pętla while.

image

4# Kolekcje

string - Unicode, nie ma oddzielnego typu char, po prostu mamy 1-znakowy string

image

formatowanie napisów (jeden z wielu wariantów):

image

bytes - reprezentacja bajtowa napisów string, w zapisie przed cudzysłowem piszemy literę b

image

a teraz zdekodujmy misia:

image

list - listy

image

Ujemne indeksy list są możliwe! Indeks ostatniego elementu możemy alternatywnie opatrzyć wartością –1, a indeks każdego elementu przed nim jest o 1 mniejszy. Nie zaleca się tradycyjnego z innych języków wyliczania ostatniego indeksu jako długość listy - 1.

image

W Pythonie nawet zakres od 1 do –1 ma sens:

image

Bardzo poręczne są też przedziały otwarte:

image

A jak skopiować listę? Wystarczy:

image

Przy okazji widzimy operator is, który w Pythonie sprawdza czy zmienne wskazują na ten sam obiekt. Z kolei operator == dokonuje porównywania zawartości.

Mnożenie listy w Pythonie, zwłaszcza podczas jej inicjowania, ma sens:

image

Usuwanie elementu o wskazanym indeksie z listy jest nieco dziwne, bo w takim przypadku używamy operatora del:

image

dict - słowniki

image

tuple - krotki

image

Przecinek czasem robi różnicę:

image

Przy okazji widzimy funkcję type do sprawdzania typu.

Jak zamienić wartości zmiennych w jednej linii?

image

Szybka zamiana jednych struktur w inne, np. listy krotek w słownik:

image

range - zakres

Python to przykład języka z zakresami, które mogą kojarzyć się z Objective-C czy Swift.

image

image

set - zbiór

image

Co ciekawe pustego zbioru nie utworzymy za pomocą {}, a jedynie przez konstruktor set.

image

5# Funkcje

Do ich definiowania używamy słówka def.

image

Można definiować domyślne wartości dla parametrów podobnie jak choćby w C#.

6# Obiekty

Wszystko jest obiektem (w tym typy prymitywne i funkcje). Każdy obiekt ma swoje id, nawet prosta liczba:

image

Język jest dynamiczny, ale ściśle typowany. Nie ma niejawnych konwersji typów. Popatrzmy na wcześniej zdefiniowaną funkcję sum. Jak podałem jej teraz 2 stringi, to ich suma wyraziła się w konkatenacji. Pokazuje to siłę języka. Jednak jeśli podam tej funkcji różne typy jak string i int, to automatyczna konwersja nie nastąpi i wystąpi błąd.

image

Zakresy nazw zmiennych:

  • local - wewnątrz bieżącej funkcji
  • enclosing - wewnątrz jakiejkolwiek zagnieżdżonej funkcji
  • global - na najwyższym poziomie modułu (o modułach przy innej okazji)
  • built-in - dostarczana przez moduły wbudowane

O ile poniższy kod nie budzi wątpliwości:

image

o tyle ten poniżej jest niespodzianką:

image

Otóż wewnątrz funkcji x zostało potraktowane jako nowa lokalna zmienna. Ale jest sposób by temu zaradzić. Można jawnie powiedzieć, żeby x wewnątrz funkcji było traktowane jako nazwa globalna:

image

Na ten raz wystarczy. Następnym razem, jeśli będzie to odcinek o Pythonie, to pójdziemy w bardziej zaawansowane konstrukcje języka albo poeksperymentujemy z narzędziami… Stay tuned.


          Docker Containers on Windows        
This post is a follow up on kaggle’s “How to get started with data science in containers” post.  I was having some trouble setting up the environment on windows so I decided to document my steps\discoveries. I also (like to think that) I improved the script provided a little by preventing multiple containers from being […]
          Making Data Simple and Accessible To All        
Learn how business professionals and data scientists alike can use analytics technologies to drive results. Published by: IBM
          Book Review: Weapons of Math Destruction        
Today’s book review is Weapons of Math Destruction, How big data increases inequality and threatens Democracy, by Cathy O’Neil. Cathy O’Neil is a data scientist, with a PhD in mathematics, who blogs here. She has built models, and also tried to deconstruct them for those affected by them. This book is a thoughtful examination of the uses…
          Nasscom - Big Data & Analytics Summit 2014        

Virtusa is a gold partner at the NASSCOM Big Data and Analytics Summit 2014. Srinivasan Jayaraman, Managing Director - Middle East & APAC, Virtusa will be moderating the panel session - Developing a Seamless Communication Strategy for a Smooth Flow of Intelligence from Data Scientists to Customer Facing Business Users"at 2:30 PM IST onwards at the venue Kaveri hall, Hotel Trident.

  • Srinivasan Jayaraman - Managing Director – Middle East and APAC, Virtusa (Moderator)
  • Vyom Upadhyay, Analytics Head, ICICI Bank, (Panel Member)
  • Satish Srivastava, Chief Project Director, Steria, (Panel Member)
  • Ujjwal Sinha, Vice President – Enterprise Business Intelligence, Target India (Panel Member)

Meet our Executives to know more on what it takes to build an Analytically-mature organization with analytics embedded at the business core & across the business value chain, how brand India as the next big “global hub” for Analytics & Big Data, thought leadership in the field of Analytics, share best practices on processes, tools, technology, technique and applications used in the context of analytics and also brainstorm upon how to build India’s Analytics talent strength.


          Predictive Analytics Innovation Summit (Chicago, IL)        
Join Virtusa at the Predictive Analytics Innovation Summit - the largest gathering of business executives at the forefront of predictive analytics initiatives. Predictive Analytics has been identified by Forbes as a key priority for top CIO's in a recent study; in order to gain an advantage over competitors, investment in analytics offers modern businesses greater insight into their customers, competitors and the market. The event brings together thought-leaders from various industries for an event acclaimed for its interactive sessions and high-level speakers.

Visit us at Booth 3 to learn about Virtusa's Data expertise that caters to today’s dynamic decisioning needs
  • Claims analytics
  • Customer analytics
  • Healthcare analytics
  • Structured and Web Content Convergent analytics
  • Big Data analytics
  • Social Media, Mobile and Cloud analytics
Virtusa panel discussion:
November 14th 2013 Day 1 - 10:30 – 11:30 CST
Topic: Predictive Analytics – Is perfection or ROI a better way to measure its value?
Moderator:
  • Kumar Ramamurthy — Sr. Director, Enterprise at Virtusa
Panelists:
  • Mark Berry – VP Insights – ConAgra Foods
  • Christopher Gutierrez - Data Scientist - AirBNB
  • Xin Fu - Senior Data Scientist - LinkedIn

          Sören Auer ist neuer Direktor der Technischen Informationsbibliothek (TIB) in Hannover        
Gleichzeitig übernimmt er die Professur „Data Science and Digital Libraries“ an der Leibniz Universität Hannover
           Data Science Helps Us Ask the Right Questions         

We all do Data Science on a daily basis but sometimes we forget why we’re really doing it. It’s not to spend hours coding but rather it’s to answer often ambiguous questions.

We learn to ask the right questions at an early age. At an intersection, for example, a child might ask his parent: “Does red mean we must stop or just should stop.” The validity of the question will be confirmed by the answer in that case. Years later we ask questions about all aspects of our lives — jobs, finance, relationships etc. We hope to ask the right questions at the right time. via Forbes

The formulation of the right question (aka hypothesis) is key.

When we take on complex scientific problems using data science, asking the right questions at each stop is critical to the process. Failure to do so may make the difference between frustration and profound innovation. Aim carefully and with proper consideration in order to sculpt the right question. You may not get a second chance. [emphasis mine]


          STScI Appoints Head of Newly Created Data Science Mission Office        

Dr. Arfon Smith has been selected to lead the newly created Data Science Mission Office at the Space Telescope Science Institute (STScI) in Baltimore, Maryland. The Data Science Mission Head is responsible for maximizing the scientific returns from a huge archive containing astronomical observations from 17 space astronomy missions and ground-based observatories.

Since 2013, Smith has been a project scientist and program manager at GitHub, Inc., the world's largest platform for open source software. His duties included working to develop innovative strategies for sharing data and software in academia. Smith also helped to define GitHub's business strategy for public data products, and he played a key role in establishing the company's first data science and data engineering teams.


          Barry M. Lasker Data Science Fellowship        

The Space Telescope Science Institute (STScI) in Baltimore, Maryland, announces the initiation of the Barry M. Lasker Data Science Postdoctoral Fellowship. The Lasker Fellowship is a STScI-funded program designed to provide up to three years of support for outstanding postdoctoral researchers conducting innovative astronomical studies that involve the use or creation of one or more of the following: large astronomical databases, massive data processing, data visualization and discovery tools, or machine-learning algorithms. The first recipient of the fellowship is Dr. Gail Zasowski of the Johns Hopkins University (JHU) in Baltimore, Maryland. The fellowship is named in honor of STScI astronomer Barry M. Lasker (1939-1999).


          How Much Are You Worth? UI/UX Designer Salary Around the World        
IT industry has become a high-salaried industry, as the popularization of the Internet today,also become a development trend around the world. The demand for talent soared due to the rapid development of society. Therefore, the vocational skill requirements derived from the Internet environment for IT staff are constantly keeping up. If you want to gain a firm foothold in the IT industry, no matter the basic office skills, or professional application development language, and professional design tools, that are essential career competences for IT staff, according to the requirements of different industries.

So, you may raise questions under the precondition of the sustainable growth of Internet popularity. What’s the most wanted skill? What’s the most potential industry? And what’s the most promising job? Data from an IT survey shows that the most popular 10 IT skills/professions are as follows:
1. UI/UX designer
2. Full-stack Web and product developers
3. Network engineers
4. Security/network security experts
5. Mobile engineers
6. Business analysts
7. IT project manager
8. Cloud architect/integration
9. Data scientist
10. CMS

This survey shows that UI/UX designers listed at the top among these IT skills/professions. An article, 29 Best Jobs for Work-Life Balance, published on Forbes shows that the work-life balance of UI designer and UX designer are 4.0 and 4.1, which listed at 5th and 2nd places among the 29 best jobs respectively. The universal annual salary levels of them are maintained around $ 84,500 and $ 95,000. Obviously, UI/UX designer is the most popular and perfect job at the current environment. You can not only gain high income but also can keep a good balance between life and work.

How much the UI/UX designers earn in the entire year? Here is a figure below for your reference to learn how much the UI/UX designers earn in every part of the world. Salary varies according to economic development.

UI/UX Designer Average Annual Salaries Around the World:

Average Annual salary of UI/UX Designers


Popular Employer Salaries for UX Designer:

Popular Employer Salaries for UX Designer, Accenture, Google, SAP, Facebook

Source: payscale.com


Related Job Salaries:

Related Job Salaries, User experience designer

Source: payscale.com

Global Average UX Salaries by Years of Experience:

Here's how the average salary changes as more experience is gained.


Global Average UX Salaries by Years of Experience

Source: uxdesignersalaries.com

In the final analysis, we have a look at the UI/UX salary level in China. In recent years, the UI/UX design industry gets flourish, many people are wondering how the situation and how much salary could gain in China. A survey from a Chinese UI/UX design website shows that the average monthly UI salary is more than 6K RBM, and the average monthly UI salary over 10K RMB also accounted for a large proportion. Therefore, UI/UX salary increases with the accumulation of work experience. Experience can be wealth, too.

China Average UI Salaries by Years of Experience:

China Average UI Salaries by Years of Experience


It appears that the average salary of a fledgling designer is around 6.5k RMB. Salary level will get improve as experience improving. Of course, the salary of UI/UX designers is connected to the skills and tools they mastered. The more technique and tool mastered the more potential and better on salary and career development. Salary of UI/UX designer listed in this article is only for reference, and the ultimate is depending on your situation. Please remember that there is no easy way to get succeed. The best to keep you a high salary is self-motivated and always looking into the latest UI/UX design trend. If you are a newbie who just gets into this industry, here is a list of must-read books & resources recommended.

Some people are jealousy the high salary of UI/UX designers, while their cognition to UI/UX design, and product design are still remaining at the very superficial stage: designer is who use the prototyping/wireframing tools such as AxureMockplus, or SketchPhotoshop to draw beautiful wireframe model to help engineers better development.

More disappointed is that many people still think design is a fake and only to show good appearance. "To make the product more eye-catching!", "To make the interface more artistic!" - These are often heard. This is how people think about design, superficially, no technique content. Let alone to have a clear understand the terms of UI, UX, IA, IxD.

As an enterprising UI/UX designer longing for high-income, how to do to master the basic skills to achieve the salary goal? You may get some inspiration from the recruitment requirement. Generally speaking, in addition to the specific requirements of jobs, for example, playing computer games well will be a plus when you applying for a game developing company, the highly understanding and insights of UI/UX design is always concerned. Caring for the latest UI/UX trends, keep learning to keep ahead. Good communication and collaboration skills are also required to be an excellent designer. Besides, skill improvement is inevitable, the popular prototyping tools AxureMockplusProto.io are always required to master well.

You may also interested in:

About Author

Grace Jia
Prototyping tools, UI&UX news & information, article sharer, writer.
Email: grace@jongde.com 
Web: Mockplus

          MisInfoCon Brings Together Journalists, Technologists, Academics        
They came from around the country. They came from around the world, Journalists. Professors. Librarians. Software Developers. Activists, and Data Scientists. All to take on the question - What is Fake News, and how do we replace fake with facts?
          (USA-UNITED STATES) Clinical Data Manager II        
Boehringer Ingelheim is an equal opportunity global employer who takes pride in maintaining a diverse and inclusive culture. We embrace diversity of perspectives and strive for an inclusive environment which benefits our employees, patients and communities. **Description:** Contributes to the development process for new substances and development and promotion of drugs on the market by providing expertise, expectations, direction and oversight for Clinical Data Management (CDM) deliverables. Takes a lead role with internal and external partners and represents the company at meetings with clinical investigators and in the interaction with Contract Research Organizations (CROs) and external vendors in all aspects of data management for assigned trial(s). May provide input into CDM standards and process developments. Assumes one or more of the following roles demonstrating the required expertise and capabilities as: + Trial Data Manager (TDM) + Central Monitor (CM) + Or developing expertise and capabilities under supervision as: + Project Data Manager (PDM), as an associate PDM or supporting a local submission (e.g. in Japan or China) + Risk Based Quality Management (RBQM) Business Partner (BP), e.g. as an associate RBQM BP or for a trial of low complexity As an employee of Boehringer Ingelheim, you will actively contribute to the discovery, development and delivery of our products to our patients and customers. Our global presence provides opportunity for all employees to collaborate internationally, offering visibility and opportunity to directly contribute to the companies' success. We realize that our strength and competitive advantage lie with our people. We support our employees in a number of ways to foster a healthy working environment, meaningful work, diversity and inclusion, mobility, networking and work-life balance. Our competitive compensation and benefit programs reflect Boehringer Ingelheim's high regard for our employees **Duties & Responsibilities:** + In the role of a Trial Data Manager (TDM) for clinical trials led in-house or using business process outsourcing (BPO) + Key liaison / Data Management lead to establish, align and confirm data management expectations for assigned trial(s), this requires regular interaction with other internal and external partners e.g. TCM, TSTAT, TPROG, TMCP, BPO partners. + Responsible for CDM trial level oversight. Builds effective relationships with CROs/ vendor partners. Review protocols and identifies requirements for proper data capture including electronic Case Report Form design and processing of clinical data ensuring accuracy, consistency and completeness + Oversee the design, creation and UAT Plan and testing of clinical study databases along with development of edit check specifications and manual data listings as required. + Define or reviews creation and maintenance of all essential data management documentation including CRF specifications, eCRFs, annotated eCRF, eCRF completion guidelines, Data Management Plans (detailing complete data management processes throughout clinical studies), Data Transfer specifications and Data Review Guidelines, in accordance with the protocol, BI and project data standards. + Integrates external data (non-CRF data) from vendors or other internal departments into the clinical trial database. + Initiates and compiles Trial Master File (TMF) relevant documentation containing the necessary CDM / Biostatistics & Data Sciences (BDS) documentation for a trial together with other members of the trial team as appropriate. Therein ensures appropriate quality, scientific content, organization, clarity, accuracy, format, consistency and compliance with regulatory guidelines. + Establishes conventions and quality expectations for clinical data and plans and tracks the content, format, completeness, quality and timing of the trial data collection process and other CDM deliverables via data analytics throughout the conduct of a trial. + Throughout the trial, the function holder either performs or leads the respective trial level activities in the context of business process outsourcing (BPO) in CDM. + Collaborates with the trial team to ensure that the database can be locked according to the planned timelines and quality. Responsible for the database lock and accountable for the integrity of the database. + Ensures that SDTM (Study Data Tabulation Model) compliant data is available for analyses together with the Project Data Manager (PDM) and the SDTM programmer at the CRO (in the context of the BPO). + Leads and facilitates the Medical and Quality Review (MQR) process and other trial team meetings. Presents and trains at trial team, CRA at investigator meetings. + Ensures real-time inspection readiness of all CDM deliverables for a trial and participates in regulatory agency and BI internal audits as necessary. + Identifies and communicates lessons learned and best practices at the trial level within CDM. Identifies and participates in DM related process, system, and tool improvement initiatives within CDM/BDS. + In the role of a Trial Data Manager (TDM) for fully outsourced trials, supervises and instructs the CRO in performing the above TDM tasks and leads trial level oversight, including planned timelines and fulfillment of quality expectations. + Sets expectations for and defines specifications for data transmission with the CRO. Integrates the data from the CRO into the BI clinical trial database. Ensures that SDTM compliant data is available for analyses together with the responsible Project Data Manager (PDM). + In the role of a Central Monitor (CM) for clinical trials + Executes and manages the Risk Based Quality Management (RBQM) processes as described in the monitoring framework, this requires regular interaction with other internal and external functions e.g. clinical monitors, CRAs/site monitors, data managers, biostatistics, site personnel. + Conduct root cause analysis on the risk signals of aggregated site and trial data (pulled from various sources) using risk reports. Identifies and investigates potential risks and trends with subject protection and reliability of trial results and compliance with the investigational plan for impact on site/country/trial activities. + Provides direction to site monitors for additional remote and on-site monitoring activities for risk sites, within the scope of the trial monitoring plan. + Oversees potential issues and findings requiring further review and follow-up and ensures appropriate actions are taken by the trial team members to investigate, resolve and document potential risks identified, including adequate documentation of resolution. + Provides a regular and efficient mechanism of trial communication for the trial team including documentation and leads oversight meetings. + Ensures real-time inspection readiness of responsible RBQM deliverables for a trial and participates in regulatory agency and BI internal audits as necessary, in conjunction with the RBQM BP. + Identifies and communicates lessons learned and best practices at the trial level and with other CMs. Identifies and participates in CM related process, system, and tool improvement initiatives within CDM/BDS. Performs user acceptance testing and supports the development and maintenance of RBQM tools. + In the role of a Project Data Manager (PDM), the function holder performs (selected) PDM tasks under the supervision of an experienced PDM, e. g. as an associate PDM or for a project of low complexity where existing standards, material and documentation can be re-used and built upon. + Accountabilities include the definition, leadership and oversight of data management processes and deliverables for clinical projects (with one project comprising multiple trials in a substance in one indication) such as establishing expectations for CRF-based/external dataset content and structure, definition of project standards (e.g. SDTM, CRF, specifications such as for MQR, data cleaning, data transmission), review and acceptance of project level database elements, programming and validation of the project database (PDB), preparation and creation of CDM deliverables for regulatory submission and support of safety updates. + Alternatively, the function holder may be responsible for the specific CDM deliverables and support for a local regulatory submission (e.g. in Japan or China). + In the role of a Risk Based Quality Management (RBQM) Business Partner (BP), the function holder performs (selected) RBQM BP tasks under the supervision of an experienced RBQM BP, e.g. as an associate RBQM BP or for a trial of low complexity. + Takes a leadership role with the project / trial team to establish, align and confirm RBQM expectations for assigned trial(s). The function holder performs (selected) RBQM BP tasks in the definition, leadership and oversight of Risk Based Quality Management (RBQM) processes and deliverables for one or multiple clinical trials such as guiding the project and trial team through the process of identifying and assessing risks at the beginning of a trial, initiating and facilitating RBQM risk review and assessment meetings, facilitating the implementation of required RBQM documentation and tools, authoring the quality report and assisting with any risk related questions that arise. **Requirements:** + Bachelor’s degree or Master’s degree from an accredited institution (e.g. MBA, MSc) with major/focus in Life Sciences, Computer Science, Statistics, or similar preferred. + Experience in clinical research including data management and/or clinical trial management required. Initial experience within the pharmaceutical industry, CROs or academic sites: >=3 years. + No leadership experience required. + **Technical / Analysis / Medicine** : + Any of the following skills: data visualization/reporting, analytics; i.e. able to interpret integrated data displays and metrics, identify and communicate trends. + Experiences with Electronic Data Capture (EDC) processes + Knowledge in and experience with any of the following: Data review in JReview, Risk Management Tools, Statistical Analysis Software (SAS) programming + Ability to adapt to new technologies. + Critical thinker and able to discern risks. Must be precise and able to detect subtle inconsistencies in data / structures. + **Planning / Organization:** + Excellent organizational skills, problem solving abilities, negotiation skills, time management skills and initiative. + Must be able to work independently as well as part of a team. + Able to effectively manage multiple assignments and adapt flexibly to changing priorities. + Able to produce robust timelines and action plans, regularly review and follow up on progress and take decisive action in terms of follow up activities with local and global trial/project teams. Ensures work is completed effectively. + **Communication** : + Strong communication skills with the ability to simply summarize complex information. Ability to use a wide range of communication techniques and media (written and verbal). Confident and persuasive communicator to ensure that the message is clear and well understood. + Ability to work collaboratively on multi-disciplinary project teams and to pro-actively manage relationships with external vendors. + Mindful of local, global, internal and external cultures to ensure that messages are received positively and effectively. + Good written and oral communication skills in the English language. + Ability to lead and facilitate meetings. + Ability to develop and deliver (technical) training. + Responsible for the clinical trial database and the data collected within a clinical trial and/or for the identification, detection and assessment of risks in a clinical trial. + Knowledge and experience in and continuing education of clinical trial designs, data standards, clinical trial conduct and methodology (International Conference on Harmonization (ICH) regulations. + Good Clinical Practice (GCP), major regulatory authorities and relevant directives/regulations) are required. Internal and external negotiation skills are required. + Ensures all tasks are carried out in accordance with respective applicable BI Standard Operating Procedures (SOPs), BI and regulatory guidelines and BI working instructions. + Ensures that all interactions and engagements are carried out with the highest ethical and professional standards and that all work is accomplished with quality and in accordance with BI values. **Eligibility Requirements:** + Must be legally authorized to work in the United States without restriction. + Must be willing to take a drug test and post-offer physical (if required) + Must be 18 years of age or older **Our Culture:** Boehringer Ingelheim is a different kind of pharmaceutical company, a privately held company with the ability to have an innovative and long term view. Our focus is on scientific discoveries that improve patients' lives and we equate success as a pharmaceutical company with the steady introduction of truly innovative medicines. Boehringer Ingelheim is the largest privately held pharmaceutical corporation in the world and ranks among the world's 20 leading pharmaceutical corporations. At Boehringer Ingelheim, we are committed to delivering value through innovation. Employees are challenged to take initiative and achieve outstanding results. Ultimately, our culture and drive allows us to maintain one of the highest levels of excellence in our industry. Boehringer Ingelheim, including Boehringer Ingelheim Pharmaceuticals, Inc., Boehringer Ingelheim USA, Boehringer Ingelheim Vetmedica Inc. and Boehringer Ingelheim Fremont, Inc. is an equal opportunity employer - Minority/Female/Protected Veteran/Person with a Disability Boehringer Ingelheim is firmly committed to ensuring a safe, healthy, productive and efficient work environment for our employees, partners and customers. As part of that commitment, Boehringer Ingelheim conducts pre-employment verifications and drug screenings **Organization:** _US-BI Pharma/BI USA_ **Title:** _Clinical Data Manager II_ **Location:** _Americas-United States_ **Requisition ID:** _179190_
          (USA-UNITED STATES) Principal Clinical Data Manager        
Boehringer Ingelheim is an equal opportunity global employer who takes pride in maintaining a diverse and inclusive culture. We embrace diversity of perspectives and strive for an inclusive environment which benefits our employees, patients and communities. **Description:** Contributes to the development process for new substances and development and promotion of drugs on the market by providing expertise, expectations, direction and oversight for Clinical Data Management (CDM) deliverables at project / trial level. Takes a lead role with internal and external partners and represents the company at meetings with regulatory authorities, clinical investigators and in the interaction with Contract Research Organizations (CROs) and external vendors in all aspects of data management for assigned project /trial(s). Provides input into CDM standards and process developments. Assumes primary responsibilities in one or more of the following roles demonstrating the required expertise and capabilities as: + Trial Data Manager (TDM) for complex trials or as subject matter expert of CDM responsibilities and processes e.g. TMCP process expert + Project Data Manager (PDM) + Risk Based Quality Management (RBQM) Business Partner (BP) As an employee of Boehringer Ingelheim, you will actively contribute to the discovery, development and delivery of our products to our patients and customers. Our global presence provides opportunity for all employees to collaborate internationally, offering visibility and opportunity to directly contribute to the companies' success. We realize that our strength and competitive advantage lie with our people. We support our employees in a number of ways to foster a healthy working environment, meaningful work, diversity and inclusion, mobility, networking and work-life balance. Our competitive compensation and benefit programs reflect Boehringer Ingelheim's high regard for our employees **Duties & Responsibilities:** + In the role of a Trial Data Manager (TDM) for complex trials or as subject matter expert of TDM responsibilities and processes for clinical trials led in-house or using business process outsourcing (BPO) + Takes a leadership role as subject matter expert for CDM responsibilities and processes in global projects/working groups and provides mentoring for less experienced Data Managers (DMs). + Takes a lead role in the specific setting of special trials, like mega trials or complex TMCP trials. Existing SOPs, guidelines and WIs do not cover these and the trial CDM has to make sure that the processes are developed according to the trial’s needs but adheres to the principles of GCP and other regulations like FDA guidance / regulations and documented in SOP variations as necessary. + Takes a lead role with internal and external partners to establish, align and confirm data management expectations for assigned trial(s). + Responsible for CDM trial level oversight. + Builds effective relationships with vendor partners. + Review protocols and identifies requirements for proper data capture. + In the role of a Trial Data Manager (TDM): Continued… + Oversee the design, creation and UAT Plan and testing of clinical study databases along with development of edit check specifications and manual data listings as required. + Define or reviews creation and maintenance of all essential data management documentation including CRF specifications, eCRFs, annotated eCRF, eCRF completion guidelines, Data Management Plans (detailing complete data management processes throughout clinical studies), Data Transfer specifications and Data Review Guidelines, in accordance with the protocol, BI and project data standards. + Integrates external data (non-CRF data) from vendors or other internal departments into the clinical trial database. + Initiates and compiles Trial Master File (TMF) relevant documentation containing the necessary CDM / Biostatistics & Data Sciences (BDS) documentation for a trial together with other members of the trial team as appropriate. Therein ensures appropriate quality, scientific content, organization, clarity, accuracy, format, consistency and compliance with regulatory guidelines. + Establishes conventions and quality expectations for clinical data and plans and tracks the content, format, completeness, quality and timing of the trial data collection process and other CDM deliverables via data analytics throughout the conduct of a trial. + Throughout the trial, the function holder leads the respective trial level activities in the context of business process outsourcing (BPO) in CDM. + Collaborates with the trial team to ensure that the database can be locked according to the planned timelines and quality. Responsible for the database lock and accountable for the integrity of the database. + Ensures that SDTM (Study Data Tabulation Model) compliant data is available for analyses together with the Project Data Manager (PDM) and the SDTM programmer at the CRO (in the context of the BPO). + Leads and facilitates the Medical and Quality Review (MQR) process and other trial team meetings. Presents and trains at trial team, CRA and investigator meetings. + Ensures real-time inspection readiness of all Clinical Data Management deliverables for a trial and participates in regulatory agency and BI internal audits as necessary. + Identifies and communicates lessons learned and best practices at the trial level and within CDM. Identifies and participates in DM related process, system, and tool improvement initiatives within CDM/BDS. + Leads trial data managers in support of their trial in aspects of the data management work. + In the role of a Project Data Manager (PDM), the function holder performs PDM tasks for multiple early stage project e.g. PDM TMCP or for an international development project that has gone beyond the stage of Proof of Clinical Principal (PoCP) and involves complex and large international phase III trials. + Builds effective relationships with CROs/ vendor partners utilized within the project. + Gives input to the core clinical trial protocol (CTP). + Defines, reviews and approves key Clinical Data Management Project level deliverables, including: Core Case Report Form (CRF) design, instructions for CRF completion, Project data management plan; e.g. database specifications including derivations, edit check specifications, data cleaning plan, Electronic data transmission agreements in accordance with the core protocol , BI, TA Level data standards and Project needs. + Initiates and compiles PDMAP documentation containing the necessary CDM / Biostatistics & Data Sciences (BDS) documentation for a project together with other members of the trial team as appropriate. Therein ensures appropriate quality, scientific content, organization, clarity, accuracy, format, consistency and compliance with regulatory guidelines. Ensures that SDTM (Study Data Tabulation Model) compliant data is available for analyses together with the Trial Data Manager (TDM) and the SDTM programmer at the CRO (in the context of the BPO). + The PDM sets up, maintains and validates the project database consistent with the latest industry, BI and project standards. Ensures that the SDTM project database is compliant with the requirements from the project statistical analyses plan and collaborates with the PSTAT, PPROG on a regular basis. + Establishes conventions and quality expectations for clinical data and plans and tracks the content, format, completeness, quality and timing of the project database via data analytics throughout the conduct of a project. + Compiles and ensures compliance of all elements of the electronic submission deliverables: (e.g.): datasets, trial level SDTM Reviewers Guide and define.xml. + As part of inspection readiness, the PDM ensures that the TDMAP documentation of the pivotal trials is complete and consistent and communicates with trial data managers during conduct of these trials to set expectations. + Identifies and communicates lessons learned and best practices at the project level and within CDM. Identifies and participates in DM related process, system, and tool improvement initiatives within CDM. + Leads / mentors project/trial data managers that support the project in aspects of the data management work. + In the role of a Risk Based Quality Management (RBQM) Business Partner (BP), the function holder performs RBQM BP tasks for one or multiple clinical trials. + Leads the project or trial team through the process of identifying and assessing risks at the beginning of a trial. + Initiate and facilitates the RBQM risk review and assessment meetings, + Develop and maintain trial specific Risk Based Quality Management (RBQM) documentation and assists with any risk related questions that arise. + Authors the quality statement / Quality Report at the conclusion of the trial - for the Clinical Trial Report. + RBQM mentor / trainer for Central Monitors (CM), new RBQM BPs and other trial team members e.g. CRAs, CMLs + RBQM BP may also perform CM tasks as needed. + Supports the development, and maintenance of RBQM tools **Requirements:** + Bachelor’s degree or Master’s degree from an accredited institution (e.g. MBA, MSc) with major/focus in Life Sciences, Computer Science, Statistics, or similar preferred. + Experience in clinical research including data management and/or clinical trial management required. Initial experience within the pharmaceutical industry, CROs or academic sites: >=6 years + International exposure in daily business: more than 50% of international business/customers/staff over more than four (4) years. + **Technical / Analysis / Medicine** : + Technical expertise including: industry data structure knowledge (e.g. CDASH/CDISC); EDC use and database specification experience. + Experience with data visualization/reporting, analytics; i.e. able to interpret integrated data displays and metrics identify and communicate trends. + Experience using Statistical Analysis Software (SAS) programming and Risk Management Tools, including data review in JReview + Ability to adapt to new technologies. + Critical thinker and able to discern risks. Must be precise and able to detect subtle inconsistencies in data / structures. + **Planning / Organization:** + Excellent organizational skills, problem solving abilities, negotiation skills, time management skills and initiative. + Must be able to work independently as well as part of a team. + Able to effectively manage multiple assignments and adapt flexibly to changing priorities. + Able to produce robust timelines and action plans, regularly review and follow up on progress and take decisive action in terms of follow up activities with local and global trial/project teams. Ensures work is completed effectively. + **Communication** : + Strong communication skills with the ability to simply summarize complex information. + Ability to use a wide range of communication techniques and media (written and verbal). Confident and persuasive communicator to ensure that the message is clear and well understood. + Ability to work collaboratively on multi-disciplinary project teams and to proactively manage relationships with external vendors. + Mindful of local, global, internal and external cultures to ensure that messages are received positively and effectively. + Good written and oral communication skills in the English language. + Ability to lead and facilitate meetings. + Ability to develop and deliver (technical) training. + Responsible for the project collection standards, database and submission deliverables within a substance/project. + Knowledge and experience in and continuing education of clinical trial designs, data standards, clinical trial conduct and methodology (International Conference on Harmonization (ICH) regulations. + Good Clinical Practice (GCP), major regulatory authorities and relevant directives/regulations) are required. + Strong project management skills and internal and external negotiation skills are required. + Ensures all tasks are carried out in accordance with respective applicable BI Standard Operating Procedures (SOPs), BI and regulatory guidelines and BI working instructions. + Ensures that all interactions and engagements are carried out with the highest ethical and professional standards and that all work is accomplished with quality and in accordance with BI values. **Eligibility Requirements:** + Must be legally authorized to work in the United States without restriction. + Must be willing to take a drug test and post-offer physical (if required) + Must be 18 years of age or older **Our Culture:** Boehringer Ingelheim is a different kind of pharmaceutical company, a privately held company with the ability to have an innovative and long term view. Our focus is on scientific discoveries that improve patients' lives and we equate success as a pharmaceutical company with the steady introduction of truly innovative medicines. Boehringer Ingelheim is the largest privately held pharmaceutical corporation in the world and ranks among the world's 20 leading pharmaceutical corporations. At Boehringer Ingelheim, we are committed to delivering value through innovation. Employees are challenged to take initiative and achieve outstanding results. Ultimately, our culture and drive allows us to maintain one of the highest levels of excellence in our industry. Boehringer Ingelheim, including Boehringer Ingelheim Pharmaceuticals, Inc., Boehringer Ingelheim USA, Boehringer Ingelheim Vetmedica Inc. and Boehringer Ingelheim Fremont, Inc. is an equal opportunity employer - Minority/Female/Protected Veteran/Person with a Disability Boehringer Ingelheim is firmly committed to ensuring a safe, healthy, productive and efficient work environment for our employees, partners and customers. As part of that commitment, Boehringer Ingelheim conducts pre-employment verifications and drug screenings **Organization:** _US-BI Pharma/BI USA_ **Title:** _Principal Clinical Data Manager_ **Location:** _Americas-United States_ **Requisition ID:** _179066_
          (USA-UNITED STATES) Clinical Data Manager II        
Boehringer Ingelheim is an equal opportunity global employer who takes pride in maintaining a diverse and inclusive culture. We embrace diversity of perspectives and strive for an inclusive environment which benefits our employees, patients and communities. **Description:** Contributes to the development process for new substances and development and promotion of drugs on the market by providing expertise, expectations, direction and oversight for Clinical Data Management (CDM) deliverables. Takes a lead role with internal and external partners and represents the company at meetings with clinical investigators and in the interaction with Contract Research Organizations (CROs) and external vendors in all aspects of data management for assigned trial(s). May provide input into CDM standards and process developments. Assumes one or more of the following roles demonstrating the required expertise and capabilities as: + Trial Data Manager (TDM) + Central Monitor (CM) + Or developing expertise and capabilities under supervision as: + Project Data Manager (PDM), as an associate PDM or supporting a local submission (e.g. in Japan or China) + Risk Based Quality Management (RBQM) Business Partner (BP), e.g. as an associate RBQM BP or for a trial of low complexity As an employee of Boehringer Ingelheim, you will actively contribute to the discovery, development and delivery of our products to our patients and customers. Our global presence provides opportunity for all employees to collaborate internationally, offering visibility and opportunity to directly contribute to the companies' success. We realize that our strength and competitive advantage lie with our people. We support our employees in a number of ways to foster a healthy working environment, meaningful work, diversity and inclusion, mobility, networking and work-life balance. Our competitive compensation and benefit programs reflect Boehringer Ingelheim's high regard for our employees **Duties & Responsibilities:** + In the role of a Trial Data Manager (TDM) for clinical trials led in-house or using business process outsourcing (BPO) + Key liaison / Data Management lead to establish, align and confirm data management expectations for assigned trial(s), this requires regular interaction with other internal and external partners e.g. TCM, TSTAT, TPROG, TMCP, BPO partners. + Responsible for CDM trial level oversight. Builds effective relationships with CROs/ vendor partners. Review protocols and identifies requirements for proper data capture including electronic Case Report Form design and processing of clinical data ensuring accuracy, consistency and completeness + Oversee the design, creation and UAT Plan and testing of clinical study databases along with development of edit check specifications and manual data listings as required. + Define or reviews creation and maintenance of all essential data management documentation including CRF specifications, eCRFs, annotated eCRF, eCRF completion guidelines, Data Management Plans (detailing complete data management processes throughout clinical studies), Data Transfer specifications and Data Review Guidelines, in accordance with the protocol, BI and project data standards. + Integrates external data (non-CRF data) from vendors or other internal departments into the clinical trial database. + Initiates and compiles Trial Master File (TMF) relevant documentation containing the necessary CDM / Biostatistics & Data Sciences (BDS) documentation for a trial together with other members of the trial team as appropriate. Therein ensures appropriate quality, scientific content, organization, clarity, accuracy, format, consistency and compliance with regulatory guidelines. + Establishes conventions and quality expectations for clinical data and plans and tracks the content, format, completeness, quality and timing of the trial data collection process and other CDM deliverables via data analytics throughout the conduct of a trial. + Throughout the trial, the function holder either performs or leads the respective trial level activities in the context of business process outsourcing (BPO) in CDM. + Collaborates with the trial team to ensure that the database can be locked according to the planned timelines and quality. Responsible for the database lock and accountable for the integrity of the database. + Ensures that SDTM (Study Data Tabulation Model) compliant data is available for analyses together with the Project Data Manager (PDM) and the SDTM programmer at the CRO (in the context of the BPO). + Leads and facilitates the Medical and Quality Review (MQR) process and other trial team meetings. Presents and trains at trial team, CRA at investigator meetings. + Ensures real-time inspection readiness of all CDM deliverables for a trial and participates in regulatory agency and BI internal audits as necessary. + Identifies and communicates lessons learned and best practices at the trial level within CDM. Identifies and participates in DM related process, system, and tool improvement initiatives within CDM/BDS. + In the role of a Trial Data Manager (TDM) for fully outsourced trials, supervises and instructs the CRO in performing the above TDM tasks and leads trial level oversight, including planned timelines and fulfillment of quality expectations. + Sets expectations for and defines specifications for data transmission with the CRO. Integrates the data from the CRO into the BI clinical trial database. Ensures that SDTM compliant data is available for analyses together with the responsible Project Data Manager (PDM). + In the role of a Central Monitor (CM) for clinical trials + Executes and manages the Risk Based Quality Management (RBQM) processes as described in the monitoring framework, this requires regular interaction with other internal and external functions e.g. clinical monitors, CRAs/site monitors, data managers, biostatistics, site personnel. + Conduct root cause analysis on the risk signals of aggregated site and trial data (pulled from various sources) using risk reports. Identifies and investigates potential risks and trends with subject protection and reliability of trial results and compliance with the investigational plan for impact on site/country/trial activities. + Provides direction to site monitors for additional remote and on-site monitoring activities for risk sites, within the scope of the trial monitoring plan. + Oversees potential issues and findings requiring further review and follow-up and ensures appropriate actions are taken by the trial team members to investigate, resolve and document potential risks identified, including adequate documentation of resolution. + Provides a regular and efficient mechanism of trial communication for the trial team including documentation and leads oversight meetings. + Ensures real-time inspection readiness of responsible RBQM deliverables for a trial and participates in regulatory agency and BI internal audits as necessary, in conjunction with the RBQM BP. + Identifies and communicates lessons learned and best practices at the trial level and with other CMs. Identifies and participates in CM related process, system, and tool improvement initiatives within CDM/BDS. Performs user acceptance testing and supports the development and maintenance of RBQM tools. + In the role of a Project Data Manager (PDM), the function holder performs (selected) PDM tasks under the supervision of an experienced PDM, e. g. as an associate PDM or for a project of low complexity where existing standards, material and documentation can be re-used and built upon. + Accountabilities include the definition, leadership and oversight of data management processes and deliverables for clinical projects (with one project comprising multiple trials in a substance in one indication) such as establishing expectations for CRF-based/external dataset content and structure, definition of project standards (e.g. SDTM, CRF, specifications such as for MQR, data cleaning, data transmission), review and acceptance of project level database elements, programming and validation of the project database (PDB), preparation and creation of CDM deliverables for regulatory submission and support of safety updates. + Alternatively, the function holder may be responsible for the specific CDM deliverables and support for a local regulatory submission (e.g. in Japan or China). + In the role of a Risk Based Quality Management (RBQM) Business Partner (BP), the function holder performs (selected) RBQM BP tasks under the supervision of an experienced RBQM BP, e.g. as an associate RBQM BP or for a trial of low complexity. + Takes a leadership role with the project / trial team to establish, align and confirm RBQM expectations for assigned trial(s). The function holder performs (selected) RBQM BP tasks in the definition, leadership and oversight of Risk Based Quality Management (RBQM) processes and deliverables for one or multiple clinical trials such as guiding the project and trial team through the process of identifying and assessing risks at the beginning of a trial, initiating and facilitating RBQM risk review and assessment meetings, facilitating the implementation of required RBQM documentation and tools, authoring the quality report and assisting with any risk related questions that arise. **Requirements:** + Bachelor’s degree or Master’s degree from an accredited institution (e.g. MBA, MSc) with major/focus in Life Sciences, Computer Science, Statistics, or similar preferred. + Experience in clinical research including data management and/or clinical trial management required. Initial experience within the pharmaceutical industry, CROs or academic sites: >=3 years. + No leadership experience required. + **Technical / Analysis / Medicine** : + Any of the following skills: data visualization/reporting, analytics; i.e. able to interpret integrated data displays and metrics, identify and communicate trends. + Experiences with Electronic Data Capture (EDC) processes + Knowledge in and experience with any of the following: Data review in JReview, Risk Management Tools, Statistical Analysis Software (SAS) programming + Ability to adapt to new technologies. + Critical thinker and able to discern risks. Must be precise and able to detect subtle inconsistencies in data / structures. + **Planning / Organization:** + Excellent organizational skills, problem solving abilities, negotiation skills, time management skills and initiative. + Must be able to work independently as well as part of a team. + Able to effectively manage multiple assignments and adapt flexibly to changing priorities. + Able to produce robust timelines and action plans, regularly review and follow up on progress and take decisive action in terms of follow up activities with local and global trial/project teams. Ensures work is completed effectively. + **Communication** : + Strong communication skills with the ability to simply summarize complex information. Ability to use a wide range of communication techniques and media (written and verbal). Confident and persuasive communicator to ensure that the message is clear and well understood. + Ability to work collaboratively on multi-disciplinary project teams and to pro-actively manage relationships with external vendors. + Mindful of local, global, internal and external cultures to ensure that messages are received positively and effectively. + Good written and oral communication skills in the English language. + Ability to lead and facilitate meetings. + Ability to develop and deliver (technical) training. + Responsible for the clinical trial database and the data collected within a clinical trial and/or for the identification, detection and assessment of risks in a clinical trial. + Knowledge and experience in and continuing education of clinical trial designs, data standards, clinical trial conduct and methodology (International Conference on Harmonization (ICH) regulations. + Good Clinical Practice (GCP), major regulatory authorities and relevant directives/regulations) are required. Internal and external negotiation skills are required. + Ensures all tasks are carried out in accordance with respective applicable BI Standard Operating Procedures (SOPs), BI and regulatory guidelines and BI working instructions. + Ensures that all interactions and engagements are carried out with the highest ethical and professional standards and that all work is accomplished with quality and in accordance with BI values. **Eligibility Requirements:** + Must be legally authorized to work in the United States without restriction. + Must be willing to take a drug test and post-offer physical (if required) + Must be 18 years of age or older **Our Culture:** Boehringer Ingelheim is a different kind of pharmaceutical company, a privately held company with the ability to have an innovative and long term view. Our focus is on scientific discoveries that improve patients' lives and we equate success as a pharmaceutical company with the steady introduction of truly innovative medicines. Boehringer Ingelheim is the largest privately held pharmaceutical corporation in the world and ranks among the world's 20 leading pharmaceutical corporations. At Boehringer Ingelheim, we are committed to delivering value through innovation. Employees are challenged to take initiative and achieve outstanding results. Ultimately, our culture and drive allows us to maintain one of the highest levels of excellence in our industry. Boehringer Ingelheim, including Boehringer Ingelheim Pharmaceuticals, Inc., Boehringer Ingelheim USA, Boehringer Ingelheim Vetmedica Inc. and Boehringer Ingelheim Fremont, Inc. is an equal opportunity employer - Minority/Female/Protected Veteran/Person with a Disability Boehringer Ingelheim is firmly committed to ensuring a safe, healthy, productive and efficient work environment for our employees, partners and customers. As part of that commitment, Boehringer Ingelheim conducts pre-employment verifications and drug screenings **Organization:** _US-BI Pharma/BI USA_ **Title:** _Clinical Data Manager II_ **Location:** _Americas-United States_ **Requisition ID:** _179187_
          (USA-CT-RIDGEFIELD) Clinical Data Manager II        
Boehringer Ingelheim is an equal opportunity global employer who takes pride in maintaining a diverse and inclusive culture. We embrace diversity of perspectives and strive for an inclusive environment which benefits our employees, patients and communities. **Description:** Contributes to the development process for new substances and development and promotion of drugs on the market by providing expertise, expectations, direction and oversight for Clinical Data Management (CDM) deliverables. Takes a lead role with internal and external partners and represents the company at meetings with clinical investigators and in the interaction with Contract Research Organizations (CROs) and external vendors in all aspects of data management for assigned trial(s). May provide input into CDM standards and process developments. Assumes one or more of the following roles demonstrating the required expertise and capabilities as: + Trial Data Manager (TDM) + Central Monitor (CM) + Or developing expertise and capabilities under supervision as: + Project Data Manager (PDM), as an associate PDM or supporting a local submission (e.g. in Japan or China) + Risk Based Quality Management (RBQM) Business Partner (BP), e.g. as an associate RBQM BP or for a trial of low complexity As an employee of Boehringer Ingelheim, you will actively contribute to the discovery, development and delivery of our products to our patients and customers. Our global presence provides opportunity for all employees to collaborate internationally, offering visibility and opportunity to directly contribute to the companies' success. We realize that our strength and competitive advantage lie with our people. We support our employees in a number of ways to foster a healthy working environment, meaningful work, diversity and inclusion, mobility, networking and work-life balance. Our competitive compensation and benefit programs reflect Boehringer Ingelheim's high regard for our employees **Duties & Responsibilities:** + In the role of a Trial Data Manager (TDM) for clinical trials led in-house or using business process outsourcing (BPO) + Key liaison / Data Management lead to establish, align and confirm data management expectations for assigned trial(s), this requires regular interaction with other internal and external partners e.g. TCM, TSTAT, TPROG, TMCP, BPO partners. + Responsible for CDM trial level oversight. Builds effective relationships with CROs/ vendor partners. Review protocols and identifies requirements for proper data capture including electronic Case Report Form design and processing of clinical data ensuring accuracy, consistency and completeness + Oversee the design, creation and UAT Plan and testing of clinical study databases along with development of edit check specifications and manual data listings as required. + Define or reviews creation and maintenance of all essential data management documentation including CRF specifications, eCRFs, annotated eCRF, eCRF completion guidelines, Data Management Plans (detailing complete data management processes throughout clinical studies), Data Transfer specifications and Data Review Guidelines, in accordance with the protocol, BI and project data standards. + Integrates external data (non-CRF data) from vendors or other internal departments into the clinical trial database. + Initiates and compiles Trial Master File (TMF) relevant documentation containing the necessary CDM / Biostatistics & Data Sciences (BDS) documentation for a trial together with other members of the trial team as appropriate. Therein ensures appropriate quality, scientific content, organization, clarity, accuracy, format, consistency and compliance with regulatory guidelines. + Establishes conventions and quality expectations for clinical data and plans and tracks the content, format, completeness, quality and timing of the trial data collection process and other CDM deliverables via data analytics throughout the conduct of a trial. + Throughout the trial, the function holder either performs or leads the respective trial level activities in the context of business process outsourcing (BPO) in CDM. + Collaborates with the trial team to ensure that the database can be locked according to the planned timelines and quality. Responsible for the database lock and accountable for the integrity of the database. + Ensures that SDTM (Study Data Tabulation Model) compliant data is available for analyses together with the Project Data Manager (PDM) and the SDTM programmer at the CRO (in the context of the BPO). + Leads and facilitates the Medical and Quality Review (MQR) process and other trial team meetings. Presents and trains at trial team, CRA at investigator meetings. + Ensures real-time inspection readiness of all CDM deliverables for a trial and participates in regulatory agency and BI internal audits as necessary. + Identifies and communicates lessons learned and best practices at the trial level within CDM. Identifies and participates in DM related process, system, and tool improvement initiatives within CDM/BDS. + In the role of a Trial Data Manager (TDM) for fully outsourced trials, supervises and instructs the CRO in performing the above TDM tasks and leads trial level oversight, including planned timelines and fulfillment of quality expectations. + Sets expectations for and defines specifications for data transmission with the CRO. Integrates the data from the CRO into the BI clinical trial database. Ensures that SDTM compliant data is available for analyses together with the responsible Project Data Manager (PDM). + In the role of a Central Monitor (CM) for clinical trials + Executes and manages the Risk Based Quality Management (RBQM) processes as described in the monitoring framework, this requires regular interaction with other internal and external functions e.g. clinical monitors, CRAs/site monitors, data managers, biostatistics, site personnel. + Conduct root cause analysis on the risk signals of aggregated site and trial data (pulled from various sources) using risk reports. Identifies and investigates potential risks and trends with subject protection and reliability of trial results and compliance with the investigational plan for impact on site/country/trial activities. + Provides direction to site monitors for additional remote and on-site monitoring activities for risk sites, within the scope of the trial monitoring plan. + Oversees potential issues and findings requiring further review and follow-up and ensures appropriate actions are taken by the trial team members to investigate, resolve and document potential risks identified, including adequate documentation of resolution. + Provides a regular and efficient mechanism of trial communication for the trial team including documentation and leads oversight meetings. + Ensures real-time inspection readiness of responsible RBQM deliverables for a trial and participates in regulatory agency and BI internal audits as necessary, in conjunction with the RBQM BP. + Identifies and communicates lessons learned and best practices at the trial level and with other CMs. Identifies and participates in CM related process, system, and tool improvement initiatives within CDM/BDS. Performs user acceptance testing and supports the development and maintenance of RBQM tools. + In the role of a Project Data Manager (PDM), the function holder performs (selected) PDM tasks under the supervision of an experienced PDM, e. g. as an associate PDM or for a project of low complexity where existing standards, material and documentation can be re-used and built upon. + Accountabilities include the definition, leadership and oversight of data management processes and deliverables for clinical projects (with one project comprising multiple trials in a substance in one indication) such as establishing expectations for CRF-based/external dataset content and structure, definition of project standards (e.g. SDTM, CRF, specifications such as for MQR, data cleaning, data transmission), review and acceptance of project level database elements, programming and validation of the project database (PDB), preparation and creation of CDM deliverables for regulatory submission and support of safety updates. + Alternatively, the function holder may be responsible for the specific CDM deliverables and support for a local regulatory submission (e.g. in Japan or China). + In the role of a Risk Based Quality Management (RBQM) Business Partner (BP), the function holder performs (selected) RBQM BP tasks under the supervision of an experienced RBQM BP, e.g. as an associate RBQM BP or for a trial of low complexity. + Takes a leadership role with the project / trial team to establish, align and confirm RBQM expectations for assigned trial(s). The function holder performs (selected) RBQM BP tasks in the definition, leadership and oversight of Risk Based Quality Management (RBQM) processes and deliverables for one or multiple clinical trials such as guiding the project and trial team through the process of identifying and assessing risks at the beginning of a trial, initiating and facilitating RBQM risk review and assessment meetings, facilitating the implementation of required RBQM documentation and tools, authoring the quality report and assisting with any risk related questions that arise. **Requirements:** + Bachelor’s degree or Master’s degree from an accredited institution (e.g. MBA, MSc) with major/focus in Life Sciences, Computer Science, Statistics, or similar preferred. + Experience in clinical research including data management and/or clinical trial management required. Initial experience within the pharmaceutical industry, CROs or academic sites: >=3 years. + No leadership experience required. + **Technical / Analysis / Medicine** : + Any of the following skills: data visualization/reporting, analytics; i.e. able to interpret integrated data displays and metrics, identify and communicate trends. + Experiences with Electronic Data Capture (EDC) processes + Knowledge in and experience with any of the following: Data review in JReview, Risk Management Tools, Statistical Analysis Software (SAS) programming + Ability to adapt to new technologies. + Critical thinker and able to discern risks. Must be precise and able to detect subtle inconsistencies in data / structures. + **Planning / Organization:** + Excellent organizational skills, problem solving abilities, negotiation skills, time management skills and initiative. + Must be able to work independently as well as part of a team. + Able to effectively manage multiple assignments and adapt flexibly to changing priorities. + Able to produce robust timelines and action plans, regularly review and follow up on progress and take decisive action in terms of follow up activities with local and global trial/project teams. Ensures work is completed effectively. + **Communication** : + Strong communication skills with the ability to simply summarize complex information. Ability to use a wide range of communication techniques and media (written and verbal). Confident and persuasive communicator to ensure that the message is clear and well understood. + Ability to work collaboratively on multi-disciplinary project teams and to pro-actively manage relationships with external vendors. + Mindful of local, global, internal and external cultures to ensure that messages are received positively and effectively. + Good written and oral communication skills in the English language. + Ability to lead and facilitate meetings. + Ability to develop and deliver (technical) training. + Responsible for the clinical trial database and the data collected within a clinical trial and/or for the identification, detection and assessment of risks in a clinical trial. + Knowledge and experience in and continuing education of clinical trial designs, data standards, clinical trial conduct and methodology (International Conference on Harmonization (ICH) regulations. + Good Clinical Practice (GCP), major regulatory authorities and relevant directives/regulations) are required. Internal and external negotiation skills are required. + Ensures all tasks are carried out in accordance with respective applicable BI Standard Operating Procedures (SOPs), BI and regulatory guidelines and BI working instructions. + Ensures that all interactions and engagements are carried out with the highest ethical and professional standards and that all work is accomplished with quality and in accordance with BI values. **Eligibility Requirements:** Must be legally authorized to work in the United States without restriction. Must be willing to take a drug test and post-offer physical (if required) Must be 18 years of age or older **Our Culture:** Boehringer Ingelheim is one of the world’s top 20 pharmaceutical companies and operates globally with approximately 50,000 employees. Since our founding in 1885, the company has remained family-owned and today we are committed to creating value through innovation in three business areas including human pharmaceuticals, animal health and biopharmaceutical contract manufacturing. Since we are privately held, we have the ability to take an innovative, long-term view. Our focus is on scientific discoveries and the introduction of truly novel medicines that improve lives and provide valuable services and support to patients and their families. Employees are challenged to take initiative and achieve outstanding results. Ultimately, our culture and drive allows us to maintain one of the highest levels of excellence in our industry. We are also deeply committed to our communities and our employees create and engage in programs that strengthen the neighborhoods where we live and work. Boehringer Ingelheim, including Boehringer Ingelheim Pharmaceuticals, Inc., Boehringer Ingelheim USA, Boehringer Ingelheim Animal Health USA, Inc., Merial Barceloneta, LLC and Boehringer Ingelheim Fremont, Inc. is an equal opportunity and affirmative action employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race; color; creed; religion; national origin; age; ancestry; nationality; marital, domestic partnership or civil union status; sex, gender identity or expression; affectional or sexual orientation; disability; veteran or military status, including protected veteran status; domestic violence victim status; atypical cellular or blood trait; genetic information (including the refusal to submit to genetic testing) or any other characteristic protected by law. Boehringer Ingelheim is firmly committed to ensuring a safe, healthy, productive and efficient work environment for our employees, partners and customers. As part of that commitment, Boehringer Ingelheim conducts pre-employment verifications and drug screenings. **Organization:** _US-BI Pharma/BI USA_ **Title:** _Clinical Data Manager II_ **Location:** _Americas-United States-CT-Ridgefield_ **Requisition ID:** _175443_
          (USA-CT-RIDGEFIELD) Principal Clinical Data Manager        
Boehringer Ingelheim is an equal opportunity global employer who takes pride in maintaining a diverse and inclusive culture. We embrace diversity of perspectives and strive for an inclusive environment which benefits our employees, patients and communities. **Description:** Contributes to the development process for new substances and development and promotion of drugs on the market by providing expertise, expectations, direction and oversight for Clinical Data Management (CDM) deliverables at project / trial level. Takes a lead role with internal and external partners and represents the company at meetings with regulatory authorities, clinical investigators and in the interaction with Contract Research Organizations (CROs) and external vendors in all aspects of data management for assigned project /trial(s). Provides input into CDM standards and process developments. Assumes primary responsibilities in one or more of the following roles demonstrating the required expertise and capabilities as: + Trial Data Manager (TDM) for complex trials or as subject matter expert of CDM responsibilities and processes e.g. TMCP process expert + Project Data Manager (PDM) + Risk Based Quality Management (RBQM) Business Partner (BP) As an employee of Boehringer Ingelheim, you will actively contribute to the discovery, development and delivery of our products to our patients and customers. Our global presence provides opportunity for all employees to collaborate internationally, offering visibility and opportunity to directly contribute to the companies' success. We realize that our strength and competitive advantage lie with our people. We support our employees in a number of ways to foster a healthy working environment, meaningful work, diversity and inclusion, mobility, networking and work-life balance. Our competitive compensation and benefit programs reflect Boehringer Ingelheim's high regard for our employees **Duties & Responsibilities:** + In the role of a Trial Data Manager (TDM) for complex trials or as subject matter expert of TDM responsibilities and processes for clinical trials led in-house or using business process outsourcing (BPO) + Takes a leadership role as subject matter expert for CDM responsibilities and processes in global projects/working groups and provides mentoring for less experienced Data Managers (DMs). + Takes a lead role in the specific setting of special trials, like mega trials or complex TMCP trials. Existing SOPs, guidelines and WIs do not cover these and the trial CDM has to make sure that the processes are developed according to the trial’s needs but adheres to the principles of GCP and other regulations like FDA guidance / regulations and documented in SOP variations as necessary. + Takes a lead role with internal and external partners to establish, align and confirm data management expectations for assigned trial(s). + Responsible for CDM trial level oversight. + Builds effective relationships with vendor partners. + Review protocols and identifies requirements for proper data capture. + In the role of a Trial Data Manager (TDM): Continued… + Oversee the design, creation and UAT Plan and testing of clinical study databases along with development of edit check specifications and manual data listings as required. + Define or reviews creation and maintenance of all essential data management documentation including CRF specifications, eCRFs, annotated eCRF, eCRF completion guidelines, Data Management Plans (detailing complete data management processes throughout clinical studies), Data Transfer specifications and Data Review Guidelines, in accordance with the protocol, BI and project data standards. + Integrates external data (non-CRF data) from vendors or other internal departments into the clinical trial database. + Initiates and compiles Trial Master File (TMF) relevant documentation containing the necessary CDM / Biostatistics & Data Sciences (BDS) documentation for a trial together with other members of the trial team as appropriate. Therein ensures appropriate quality, scientific content, organization, clarity, accuracy, format, consistency and compliance with regulatory guidelines. + Establishes conventions and quality expectations for clinical data and plans and tracks the content, format, completeness, quality and timing of the trial data collection process and other CDM deliverables via data analytics throughout the conduct of a trial. + Throughout the trial, the function holder leads the respective trial level activities in the context of business process outsourcing (BPO) in CDM. + Collaborates with the trial team to ensure that the database can be locked according to the planned timelines and quality. Responsible for the database lock and accountable for the integrity of the database. + Ensures that SDTM (Study Data Tabulation Model) compliant data is available for analyses together with the Project Data Manager (PDM) and the SDTM programmer at the CRO (in the context of the BPO). + Leads and facilitates the Medical and Quality Review (MQR) process and other trial team meetings. Presents and trains at trial team, CRA and investigator meetings. + Ensures real-time inspection readiness of all Clinical Data Management deliverables for a trial and participates in regulatory agency and BI internal audits as necessary. + Identifies and communicates lessons learned and best practices at the trial level and within CDM. Identifies and participates in DM related process, system, and tool improvement initiatives within CDM/BDS. + Leads trial data managers in support of their trial in aspects of the data management work. + In the role of a Project Data Manager (PDM), the function holder performs PDM tasks for multiple early stage project e.g. PDM TMCP or for an international development project that has gone beyond the stage of Proof of Clinical Principal (PoCP) and involves complex and large international phase III trials. + Builds effective relationships with CROs/ vendor partners utilized within the project. + Gives input to the core clinical trial protocol (CTP). + Defines, reviews and approves key Clinical Data Management Project level deliverables, including: Core Case Report Form (CRF) design, instructions for CRF completion, Project data management plan; e.g. database specifications including derivations, edit check specifications, data cleaning plan, Electronic data transmission agreements in accordance with the core protocol , BI, TA Level data standards and Project needs. + Initiates and compiles PDMAP documentation containing the necessary CDM / Biostatistics & Data Sciences (BDS) documentation for a project together with other members of the trial team as appropriate. Therein ensures appropriate quality, scientific content, organization, clarity, accuracy, format, consistency and compliance with regulatory guidelines. Ensures that SDTM (Study Data Tabulation Model) compliant data is available for analyses together with the Trial Data Manager (TDM) and the SDTM programmer at the CRO (in the context of the BPO). + The PDM sets up, maintains and validates the project database consistent with the latest industry, BI and project standards. Ensures that the SDTM project database is compliant with the requirements from the project statistical analyses plan and collaborates with the PSTAT, PPROG on a regular basis. + Establishes conventions and quality expectations for clinical data and plans and tracks the content, format, completeness, quality and timing of the project database via data analytics throughout the conduct of a project. + Compiles and ensures compliance of all elements of the electronic submission deliverables: (e.g.): datasets, trial level SDTM Reviewers Guide and define.xml. + As part of inspection readiness, the PDM ensures that the TDMAP documentation of the pivotal trials is complete and consistent and communicates with trial data managers during conduct of these trials to set expectations. + Identifies and communicates lessons learned and best practices at the project level and within CDM. Identifies and participates in DM related process, system, and tool improvement initiatives within CDM. + Leads / mentors project/trial data managers that support the project in aspects of the data management work. + In the role of a Risk Based Quality Management (RBQM) Business Partner (BP), the function holder performs RBQM BP tasks for one or multiple clinical trials. + Leads the project or trial team through the process of identifying and assessing risks at the beginning of a trial. + Initiate and facilitates the RBQM risk review and assessment meetings, + Develop and maintain trial specific Risk Based Quality Management (RBQM) documentation and assists with any risk related questions that arise. + Authors the quality statement / Quality Report at the conclusion of the trial - for the Clinical Trial Report. + RBQM mentor / trainer for Central Monitors (CM), new RBQM BPs and other trial team members e.g. CRAs, CMLs + RBQM BP may also perform CM tasks as needed. + Supports the development, and maintenance of RBQM tools **Requirements:** + Bachelor’s degree or Master’s degree from an accredited institution (e.g. MBA, MSc) with major/focus in Life Sciences, Computer Science, Statistics, or similar preferred. + Experience in clinical research including data management and/or clinical trial management required. Initial experience within the pharmaceutical industry, CROs or academic sites: >=6 years + International exposure in daily business: more than 50% of international business/customers/staff over more than four (4) years. **Technical / Analysis / Medicine** : + Technical expertise including: industry data structure knowledge (e.g. CDASH/CDISC); EDC use and database specification experience. + Experience with data visualization/reporting, analytics; i.e. able to interpret integrated data displays and metrics identify and communicate trends. + Experience using Statistical Analysis Software (SAS) programming and Risk Management Tools, including data review in JReview + Ability to adapt to new technologies. + Critical thinker and able to discern risks. Must be precise and able to detect subtle inconsistencies in data / structures. **Planning / Organization:** + Excellent organizational skills, problem solving abilities, negotiation skills, time management skills and initiative. + Must be able to work independently as well as part of a team. + Able to effectively manage multiple assignments and adapt flexibly to changing priorities. + Able to produce robust timelines and action plans, regularly review and follow up on progress and take decisive action in terms of follow up activities with local and global trial/project teams. Ensures work is completed effectively. **Communication** : + Strong communication skills with the ability to simply summarize complex information. + Ability to use a wide range of communication techniques and media (written and verbal). Confident and persuasive communicator to ensure that the message is clear and well understood. + Ability to work collaboratively on multi-disciplinary project teams and to proactively manage relationships with external vendors. + Mindful of local, global, internal and external cultures to ensure that messages are received positively and effectively. + Good written and oral communication skills in the English language. + Ability to lead and facilitate meetings. + Ability to develop and deliver (technical) training. + Responsible for the project collection standards, database and submission deliverables within a substance/project. + Knowledge and experience in and continuing education of clinical trial designs, data standards, clinical trial conduct and methodology (International Conference on Harmonization (ICH) regulations. + Good Clinical Practice (GCP), major regulatory authorities and relevant directives/regulations) are required. + Strong project management skills and internal and external negotiation skills are required. + Ensures all tasks are carried out in accordance with respective applicable BI Standard Operating Procedures (SOPs), BI and regulatory guidelines and BI working instructions. + Ensures that all interactions and engagements are carried out with the highest ethical and professional standards and that all work is accomplished with quality and in accordance with BI values. **Eligibility Requirements:** Must be legally authorized to work in the United States without restriction. Must be willing to take a drug test and post-offer physical (if required) Must be 18 years of age or older **Our Culture:** Boehringer Ingelheim is one of the world’s top 20 pharmaceutical companies and operates globally with approximately 50,000 employees. Since our founding in 1885, the company has remained family-owned and today we are committed to creating value through innovation in three business areas including human pharmaceuticals, animal health and biopharmaceutical contract manufacturing. Since we are privately held, we have the ability to take an innovative, long-term view. Our focus is on scientific discoveries and the introduction of truly novel medicines that improve lives and provide valuable services and support to patients and their families. Employees are challenged to take initiative and achieve outstanding results. Ultimately, our culture and drive allows us to maintain one of the highest levels of excellence in our industry. We are also deeply committed to our communities and our employees create and engage in programs that strengthen the neighborhoods where we live and work. Boehringer Ingelheim, including Boehringer Ingelheim Pharmaceuticals, Inc., Boehringer Ingelheim USA, Boehringer Ingelheim Animal Health USA, Inc., Merial Barceloneta, LLC and Boehringer Ingelheim Fremont, Inc. is an equal opportunity and affirmative action employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race; color; creed; religion; national origin; age; ancestry; nationality; marital, domestic partnership or civil union status; sex, gender identity or expression; affectional or sexual orientation; disability; veteran or military status, including protected veteran status; domestic violence victim status; atypical cellular or blood trait; genetic information (including the refusal to submit to genetic testing) or any other characteristic protected by law. Boehringer Ingelheim is firmly committed to ensuring a safe, healthy, productive and efficient work environment for our employees, partners and customers. As part of that commitment, Boehringer Ingelheim conducts pre-employment verifications and drug screenings. **Organization:** _US-BI Pharma/BI USA_ **Title:** _Principal Clinical Data Manager_ **Location:** _Americas-United States-CT-Ridgefield_ **Requisition ID:** _175445_
          (USA-CT-RIDGEFIELD) Clinical Data Manager II        
Boehringer Ingelheim is an equal opportunity global employer who takes pride in maintaining a diverse and inclusive culture. We embrace diversity of perspectives and strive for an inclusive environment which benefits our employees, patients and communities. **Description:** Contributes to the development process for new substances and development and promotion of drugs on the market by providing expertise, expectations, direction and oversight for Clinical Data Management (CDM) deliverables. Takes a lead role with internal and external partners and represents the company at meetings with clinical investigators and in the interaction with Contract Research Organizations (CROs) and external vendors in all aspects of data management for assigned trial(s). May provide input into CDM standards and process developments. Assumes one or more of the following roles demonstrating the required expertise and capabilities as: + Trial Data Manager (TDM) + Central Monitor (CM) + Or developing expertise and capabilities under supervision as: + Project Data Manager (PDM), as an associate PDM or supporting a local submission (e.g. in Japan or China) + Risk Based Quality Management (RBQM) Business Partner (BP), e.g. as an associate RBQM BP or for a trial of low complexity As an employee of Boehringer Ingelheim, you will actively contribute to the discovery, development and delivery of our products to our patients and customers. Our global presence provides opportunity for all employees to collaborate internationally, offering visibility and opportunity to directly contribute to the companies' success. We realize that our strength and competitive advantage lie with our people. We support our employees in a number of ways to foster a healthy working environment, meaningful work, diversity and inclusion, mobility, networking and work-life balance. Our competitive compensation and benefit programs reflect Boehringer Ingelheim's high regard for our employees **Duties & Responsibilities:** + In the role of a Trial Data Manager (TDM) for clinical trials led in-house or using business process outsourcing (BPO) + Key liaison / Data Management lead to establish, align and confirm data management expectations for assigned trial(s), this requires regular interaction with other internal and external partners e.g. TCM, TSTAT, TPROG, TMCP, BPO partners. + Responsible for CDM trial level oversight. Builds effective relationships with CROs/ vendor partners. Review protocols and identifies requirements for proper data capture including electronic Case Report Form design and processing of clinical data ensuring accuracy, consistency and completeness + Oversee the design, creation and UAT Plan and testing of clinical study databases along with development of edit check specifications and manual data listings as required. + Define or reviews creation and maintenance of all essential data management documentation including CRF specifications, eCRFs, annotated eCRF, eCRF completion guidelines, Data Management Plans (detailing complete data management processes throughout clinical studies), Data Transfer specifications and Data Review Guidelines, in accordance with the protocol, BI and project data standards. + Integrates external data (non-CRF data) from vendors or other internal departments into the clinical trial database. + Initiates and compiles Trial Master File (TMF) relevant documentation containing the necessary CDM / Biostatistics & Data Sciences (BDS) documentation for a trial together with other members of the trial team as appropriate. Therein ensures appropriate quality, scientific content, organization, clarity, accuracy, format, consistency and compliance with regulatory guidelines. + Establishes conventions and quality expectations for clinical data and plans and tracks the content, format, completeness, quality and timing of the trial data collection process and other CDM deliverables via data analytics throughout the conduct of a trial. + Throughout the trial, the function holder either performs or leads the respective trial level activities in the context of business process outsourcing (BPO) in CDM. + Collaborates with the trial team to ensure that the database can be locked according to the planned timelines and quality. Responsible for the database lock and accountable for the integrity of the database. + Ensures that SDTM (Study Data Tabulation Model) compliant data is available for analyses together with the Project Data Manager (PDM) and the SDTM programmer at the CRO (in the context of the BPO). + Leads and facilitates the Medical and Quality Review (MQR) process and other trial team meetings. Presents and trains at trial team, CRA at investigator meetings. + Ensures real-time inspection readiness of all CDM deliverables for a trial and participates in regulatory agency and BI internal audits as necessary. + Identifies and communicates lessons learned and best practices at the trial level within CDM. Identifies and participates in DM related process, system, and tool improvement initiatives within CDM/BDS. + In the role of a Trial Data Manager (TDM) for fully outsourced trials, supervises and instructs the CRO in performing the above TDM tasks and leads trial level oversight, including planned timelines and fulfillment of quality expectations. + Sets expectations for and defines specifications for data transmission with the CRO. Integrates the data from the CRO into the BI clinical trial database. Ensures that SDTM compliant data is available for analyses together with the responsible Project Data Manager (PDM). + In the role of a Central Monitor (CM) for clinical trials + Executes and manages the Risk Based Quality Management (RBQM) processes as described in the monitoring framework, this requires regular interaction with other internal and external functions e.g. clinical monitors, CRAs/site monitors, data managers, biostatistics, site personnel. + Conduct root cause analysis on the risk signals of aggregated site and trial data (pulled from various sources) using risk reports. Identifies and investigates potential risks and trends with subject protection and reliability of trial results and compliance with the investigational plan for impact on site/country/trial activities. + Provides direction to site monitors for additional remote and on-site monitoring activities for risk sites, within the scope of the trial monitoring plan. + Oversees potential issues and findings requiring further review and follow-up and ensures appropriate actions are taken by the trial team members to investigate, resolve and document potential risks identified, including adequate documentation of resolution. + Provides a regular and efficient mechanism of trial communication for the trial team including documentation and leads oversight meetings. + Ensures real-time inspection readiness of responsible RBQM deliverables for a trial and participates in regulatory agency and BI internal audits as necessary, in conjunction with the RBQM BP. + Identifies and communicates lessons learned and best practices at the trial level and with other CMs. Identifies and participates in CM related process, system, and tool improvement initiatives within CDM/BDS. Performs user acceptance testing and supports the development and maintenance of RBQM tools. + In the role of a Project Data Manager (PDM), the function holder performs (selected) PDM tasks under the supervision of an experienced PDM, e. g. as an associate PDM or for a project of low complexity where existing standards, material and documentation can be re-used and built upon. + Accountabilities include the definition, leadership and oversight of data management processes and deliverables for clinical projects (with one project comprising multiple trials in a substance in one indication) such as establishing expectations for CRF-based/external dataset content and structure, definition of project standards (e.g. SDTM, CRF, specifications such as for MQR, data cleaning, data transmission), review and acceptance of project level database elements, programming and validation of the project database (PDB), preparation and creation of CDM deliverables for regulatory submission and support of safety updates. + Alternatively, the function holder may be responsible for the specific CDM deliverables and support for a local regulatory submission (e.g. in Japan or China). + In the role of a Risk Based Quality Management (RBQM) Business Partner (BP), the function holder performs (selected) RBQM BP tasks under the supervision of an experienced RBQM BP, e.g. as an associate RBQM BP or for a trial of low complexity. + Takes a leadership role with the project / trial team to establish, align and confirm RBQM expectations for assigned trial(s). The function holder performs (selected) RBQM BP tasks in the definition, leadership and oversight of Risk Based Quality Management (RBQM) processes and deliverables for one or multiple clinical trials such as guiding the project and trial team through the process of identifying and assessing risks at the beginning of a trial, initiating and facilitating RBQM risk review and assessment meetings, facilitating the implementation of required RBQM documentation and tools, authoring the quality report and assisting with any risk related questions that arise. **Requirements:** + Bachelor’s degree or Master’s degree from an accredited institution (e.g. MBA, MSc) with major/focus in Life Sciences, Computer Science, Statistics, or similar preferred. + Experience in clinical research including data management and/or clinical trial management required. Initial experience within the pharmaceutical industry, CROs or academic sites: >=3 years. + No leadership experience required. + **Technical / Analysis / Medicine** : + Any of the following skills: data visualization/reporting, analytics; i.e. able to interpret integrated data displays and metrics, identify and communicate trends. + Experiences with Electronic Data Capture (EDC) processes + Knowledge in and experience with any of the following: Data review in JReview, Risk Management Tools, Statistical Analysis Software (SAS) programming + Ability to adapt to new technologies. + Critical thinker and able to discern risks. Must be precise and able to detect subtle inconsistencies in data / structures. + **Planning / Organization:** + Excellent organizational skills, problem solving abilities, negotiation skills, time management skills and initiative. + Must be able to work independently as well as part of a team. + Able to effectively manage multiple assignments and adapt flexibly to changing priorities. + Able to produce robust timelines and action plans, regularly review and follow up on progress and take decisive action in terms of follow up activities with local and global trial/project teams. Ensures work is completed effectively. + **Communication** : + Strong communication skills with the ability to simply summarize complex information. Ability to use a wide range of communication techniques and media (written and verbal). Confident and persuasive communicator to ensure that the message is clear and well understood. + Ability to work collaboratively on multi-disciplinary project teams and to pro-actively manage relationships with external vendors. + Mindful of local, global, internal and external cultures to ensure that messages are received positively and effectively. + Good written and oral communication skills in the English language. + Ability to lead and facilitate meetings. + Ability to develop and deliver (technical) training. + Responsible for the clinical trial database and the data collected within a clinical trial and/or for the identification, detection and assessment of risks in a clinical trial. + Knowledge and experience in and continuing education of clinical trial designs, data standards, clinical trial conduct and methodology (International Conference on Harmonization (ICH) regulations. + Good Clinical Practice (GCP), major regulatory authorities and relevant directives/regulations) are required. Internal and external negotiation skills are required. + Ensures all tasks are carried out in accordance with respective applicable BI Standard Operating Procedures (SOPs), BI and regulatory guidelines and BI working instructions. + Ensures that all interactions and engagements are carried out with the highest ethical and professional standards and that all work is accomplished with quality and in accordance with BI values. **Eligibility Requirements:** Must be legally authorized to work in the United States without restriction. Must be willing to take a drug test and post-offer physical (if required) Must be 18 years of age or older **Our Culture:** Boehringer Ingelheim is one of the world’s top 20 pharmaceutical companies and operates globally with approximately 50,000 employees. Since our founding in 1885, the company has remained family-owned and today we are committed to creating value through innovation in three business areas including human pharmaceuticals, animal health and biopharmaceutical contract manufacturing. Since we are privately held, we have the ability to take an innovative, long-term view. Our focus is on scientific discoveries and the introduction of truly novel medicines that improve lives and provide valuable services and support to patients and their families. Employees are challenged to take initiative and achieve outstanding results. Ultimately, our culture and drive allows us to maintain one of the highest levels of excellence in our industry. We are also deeply committed to our communities and our employees create and engage in programs that strengthen the neighborhoods where we live and work. Boehringer Ingelheim, including Boehringer Ingelheim Pharmaceuticals, Inc., Boehringer Ingelheim USA, Boehringer Ingelheim Animal Health USA, Inc., Merial Barceloneta, LLC and Boehringer Ingelheim Fremont, Inc. is an equal opportunity and affirmative action employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race; color; creed; religion; national origin; age; ancestry; nationality; marital, domestic partnership or civil union status; sex, gender identity or expression; affectional or sexual orientation; disability; veteran or military status, including protected veteran status; domestic violence victim status; atypical cellular or blood trait; genetic information (including the refusal to submit to genetic testing) or any other characteristic protected by law. Boehringer Ingelheim is firmly committed to ensuring a safe, healthy, productive and efficient work environment for our employees, partners and customers. As part of that commitment, Boehringer Ingelheim conducts pre-employment verifications and drug screenings. **Organization:** _US-BI Pharma/BI USA_ **Title:** _Clinical Data Manager II_ **Location:** _Americas-United States-CT-Ridgefield_ **Requisition ID:** _175441_
          I-CHASS and OAS/ARTCA Announce Details for Summer Institute in Data Science        
Organized in partnership with the Organization of American States (OAS) and the Advanced Research and Technology Collaboratory for the Americas (ARTCA), the NSF-funded Pan-American Advanced Studies Institute (PASI) will take place at the Universidad del Valle de Guatemala, East of Guatemala City from July 15 through July 26, 2013 to offer training on Methods of Computation-Based Discovery (CBD) to about 40 participants from all over the Americas.
          PARTNERING, INTEGRATING, TO MAKE THE COMPLEX SIMPLE        

Partnering, Integrating, To Make the Complex Simple

In the recent top rated book, Simple Rules: How to Thrive in a Complex World , Donald Sull and Kathleen Eisenhardt offer that, “When many parties must work together, simple trumps complex” (p. 44). This is a beautiful fit for the future of work, a future made up of complex work, performed in complex ways. Freelancers, contractors, and global project work, all intermingle with traditional organizational forms. Rather than try and understand all the complexities yourself, partner with those who do -- and do it in a simple way. By simple in this instance I mean push decision-making to where the information is, close to the work itself.

Complex Work and Partnerships Require Simple Rules and Direct Connections to Feedback

This is such a strong idea that Sull and Eisenhardt use it as the conclusion of their book:

..simple rules work because they provide a threshold level of structure while leaving ample scope to exercise discretion....

Close to the facts on the ground, individuals can draw on their judgment and creativity to manage risks and seize unexpected opportunities. The latitude to exercise discretion not only makes simple rules effective, it makes them attractive. People [and organizations, my addition here, but also covered in the book] thrive when given the opportunity to apply their judgment and creativity to the situations they face from day to day. And if they benefit from simple rules, they are more likely to use them and use them well" (p. 228).

The “threshold level of structure” is what keeps the ground-level decision making from just being tactical. Key is that the structure is understood and committed to across all actors. Nilofer Merchant talks about the value of co-creating strategy so that the vision and the tactics are tied across all levels of the work from inception. Co-creation can support commitment and innovation. Sull and Eisenhardt provide detailed notes on the value of working throughout the organization as rules are created -- and are clear that strategy and execution cannot be separated.

The Future of Work Is Complex, But the Underlying Technologies Can Help

Internet enabled collaboration, product development supported by real-time data, The Internet of Things. These all mean we spend more time and effort checking and connecting with data and others throughout our days, and nights. The process is not simple, but it could be simpler. Some organizations have found ways to leverage the complexity of data in ways that simplify the work.

Pulse Mining Systems

Pulse Mining Systems provides integrated business management tools to mining companies. (I’m looking forward to writing a more historical piece remarking on how much mining has taught us about management.) They offer resources for operations, human resources, marketing, and more. The key is that they don’t do it alone -- and their tools aren’t meant just for executives or data scientists.

I spoke with Rob Parvin, then their visualization and analytics manager. I was looking for an example of the value of offering access to operational data to people doing the work, but I found much more. Yes, he described examples where mines with five kilometer conveyors are progressing from manual reporting to real time, sensor-based, feedback to the shift managers. Yes, maintenance and staffing decisions are made with better data. (More on those soon.) But what surprised me was how they were creating these opportunities.

Pulse Partners to Co-Innovate

Pulse partners to simplify both their strategic decision making and how they then take action on that strategy. They co-innovate -- work with their strategic clients -- to identify the specific information needed by the client for decision making (going for simple rather than complex), key metrics, and prototyping. The product is eventually rolled out as a general offering -- but with the knowledge that it’s a tool that’s valuable in the industry and works. The implicit rule is that products are co-developed rather than created away from the work itself. They’ve been able to create early versions in as little as three weeks.

Pulse is able to move this quickly because they’ve partnered with two analytics companies rather than trying to build out their own capabilities (implicit rule: Don’t reinvent the wheel). They work with Birst (see an earlier mention here) and Tableau to provide analytics and visualization building blocks that are rapidly prototyped and tested in the field. The complexity is managed by focusing on pre-built, reusable capabilities. The partners are bound by a common interest in answering operational questions.

In prior posts I’ve written about how we can lead by letting go (of old school management techniques), but that creates an image of chaos for some. Instead, let’s think about a structured handoff of responsibility. We are unlikely expert in all the areas where we need expertise. Pulse has found like-minded partners. SAP has done the same with their co-innovation labs. Each seems to have developed simple rules of organization to handoff pieces of the innovation process to partners with appropriate skills.

My Own Simple Rules

Rereading Simple Rules: How to Thrive in a Complex World, and considering the issues in the context of our quickly changing work environment, has inspired me to think about my own simple rules. I work with a variety of audiences interested in designing organizations for innovation and offer a process for creating designs unique to their settings (I’m in full agreement that the local creation of the rules is an important piece of the process). That said, I think there are a few rules many can work with and I share them here in hope that you will help me improve them.

  • Base decisions on data, with decision makers as close to the work as possible.

  • Build teams with diverse skills, but common interests - highlight the interest.

  • Bundle similar work, and where possible, pass off to automation.

  • Be transparent and pay attention to what others are sharing with you.

Sull and Eisenhardt use the second half of their book to discuss how to refine and improve your rules. The above are just a start for me, are they also an interesting start for you?




          Working Directly With the Twitter Data Ecosystem        

One of the reasons Twitter acquired Gnip was because Twitter believes the best way to support the distribution of Twitter data is to have direct data relationships with its data customers – the companies building analytic solutions using Twitter’s data … Continue reading

The post Working Directly With the Twitter Data Ecosystem appeared first on Gnip Blog - Social Data and Data Science Blog.


          Twitter and IBM Partner to Transform Decision Making        

I’m thrilled to announce that Twitter and IBM are partnering to transform how businesses and institutions understand their customers, markets and trends – and inform every business decision. For details, see our post on the Twitter blog and IBM’s press … Continue reading

The post Twitter and IBM Partner to Transform Decision Making appeared first on Gnip Blog - Social Data and Data Science Blog.


          Tweeting in the Rain, Part 4: Tweets during the 2013 Colorado flood        

In August 2013, we posted two “Tweeting in the Rain” (Part 1 & Part 2) articles that explored important roles social data could play in flood early-warning systems. These two posts focused on determining whether there was a Twitter “signal” … Continue reading

The post Tweeting in the Rain, Part 4: Tweets during the 2013 Colorado flood appeared first on Gnip Blog - Social Data and Data Science Blog.


          Historical PowerTrack Requests, Now Faster Than Ever        

The Twitter Data Product Team is excited to share an update with you around recent enhancements to our Historical PowerTrack offering. In an effort to improve our customer experience for historical data requests, we’ve made substantial technology investments to reduce … Continue reading

The post Historical PowerTrack Requests, Now Faster Than Ever appeared first on Gnip Blog - Social Data and Data Science Blog.


          The Gnip Usage API: A New Tool for Monitoring Data Consumption        

At Gnip we know that reliable, sustainable, and complete data delivery products are core to enabling our customers to surface insights and value from social data. Today we’re excited to announce a new API that will make it easier for … Continue reading

The post The Gnip Usage API: A New Tool for Monitoring Data Consumption appeared first on Gnip Blog - Social Data and Data Science Blog.


          The Power of Command Centers        

The ability to integrate enterprise data alongside social data and visualize the output in one place is a powerful one and one that brands are leveraging through the use of command centers. With this tool not only can brands combine … Continue reading

The post The Power of Command Centers appeared first on Gnip Blog - Social Data and Data Science Blog.


          Smoke vs. Smoke: DiscoverText helps public health researchers        

These days manually sorting data isn’t an option. The ability to easily and accurately classify and search Twitter data can save valuable time, whether for academic research or brand marketing analysis. That’s why we’re excited to add Texifter as a … Continue reading

The post Smoke vs. Smoke: DiscoverText helps public health researchers appeared first on Gnip Blog - Social Data and Data Science Blog.


          Leveraging the Search API        

Brands these days are savvy about comprehensively tracking keywords, competitors, hashtags, and so on. But there will always be unanticipated events or news stories that pop up. The keywords associated with these events are rarely ever tracked in advance. So … Continue reading

The post Leveraging the Search API appeared first on Gnip Blog - Social Data and Data Science Blog.


          Hacking to Improve Disaster Response with Qlik, Medair and Gnip        

At Gnip, we’re always excited to hear about groups and individuals who are using social data in unique ways to improve our world. We were recently fortunate enough to support this use of social data for humanitarian good first-hand. Along … Continue reading

The post Hacking to Improve Disaster Response with Qlik, Medair and Gnip appeared first on Gnip Blog - Social Data and Data Science Blog.


          Streaming Data Just Got Easier: Announcing Gnip’s New Connector for Amazon Kinesis        

I’m happy to announce a new solution we’ve built to make it simple to get massive amounts of social data into the AWS cloud environment. I’m here in London for the AWS Summit where Stephen E. Schmidt, Vice President of … Continue reading

The post Streaming Data Just Got Easier: Announcing Gnip’s New Connector for Amazon Kinesis appeared first on Gnip Blog - Social Data and Data Science Blog.


          Comment on The Community Data Science Collective Dataverse by Links 18/6/2017: New Debian Release, Catchup With a Lot of News | Techrights        
[…] The Community Data Science Collective Dataverse […]
          Comment on The Community Data Science Collective Dataverse by The Community Data Science Collective Dataverse https://mako.cc/cop… | Dr. Roy Schestowitz (罗伊)        
[…] The Community Data Science Collective Dataverse https://mako.cc/copyrighteous/cdsc-dataverse […]
          Big Data in Security – Part V: Anti-Phishing in the Cloud        
In the last chapter of our five part Big Data in Security series, expert Data Scientists Brennan Evans and Mahdi Namazifar join me to discuss their work on a cloud anti-phishing solution. Phishing is a well-known historical threat. Essentially, it’s social engineering via email and it continues to be effective and potent. What is TRAC currently doing […]
          Big Data in Security – Part IV: Email Auto Rule Scoring on Hadoop        
Following part three of our Big Data in Security series on graph analytics, I’m joined by expert data scientists Dazhuo Li and Jisheng Wang to talk about their work in developing an intelligent anti-spam solution using modern machine learning approaches on Hadoop. What is ARS and what problem is it trying to solve? Dazhuo: From a high-level view, Auto […]
          Big Data in Security – Part III: Graph Analytics        
Following part two of our Big Data in Security series on University of California, Berkeley’s AMPLab stack, I caught up with talented data scientists Michael Howe and Preetham Raghunanda to discuss their exciting graph analytics work. Where did graph databases originate and what problems are they trying to solve? Michael: Disparate data types have a lot of connections between […]
          Pushing and Polling Data Differences in Approach on the Gnip platform        

Obviously we have some understanding on the concepts of pushing and polling of data from service endpoints since we basically founded a company on the premise that the world needed a middleware push data service.    Over the last year we … Continue reading

The post Pushing and Polling Data Differences in Approach on the Gnip platform appeared first on Gnip Blog - Social Data and Data Science Blog.


          Clarity on What jobs are safe in an automated future?        

I'm hoping that some novel jobs come out of deregulation of highly general schooling, inspired by statements early in this video.

Politicians: politicians already find it hard to find jobs after leaving politics....extremely high job insecurity too. Their jobs aren't automated, they're downshifted to beaurocrats as something becomes economic orthodoxy and unpolitical. I reckon politics is a bad career choice irrespective of automation.

Agricultural occupations: already automated - precision agriculture is already automating everything from plant disease identification to treatment. Think about how few agriculturalists there are for a given tract of land? The only good reason I can see that gardeners may not be automated is for their aesthetic skills, not their botanical skills.

Data scientists: automated in 10 years - sexiest profession? I'm sure datascientists, in their current form, won't exist for very long. Machine learning visual programming tools already exist. All someone needs to do is to match a dataset to them and it can optimise for a particular parameter. It could forseeable take a short few hour course to train lay computer users to do data science in 5 years and it becoming a basic skill like Excel spreadsheet use in 10 years.

Escorts: non automatable in our generation - sex bots? pfffft. How can people think gardening is super hard to automate but sex is easy? Speaking as someone who's had my fair share of escorts, good quality sex bots will be ve.ry hard to design. Different Johns vary enormously, and the acting out the emotional elements will be difficult - not to mention that part of the fun comes from the human element. However, escorts tend to have a working life of around 1.5 decades, so that's something to consider if you're choosing your occupation based on this.

Managers: Management is highly nonspecific. Let's take about corporate governance occupations. Will they be automated? That depends on whether political sentiment swings to the right and gets liberated from regulatory burdens, or swings to the left and protects CEO's, chairmen and their ilk. Hold up, left and right isn't the right terms for this. It would be some other parameter based on sentiment around whether we want people to be held to account for others, as responsibility teeters more and more to the technology that manages us, or for the operators or designers to be held to account. Verdict: automatable in 50 years+

Scientific writers methodologists/statisticians/economists: automatable in 20 years, but a new class of them will emerge with the function of public education and such, and innovation/program design.

finance: already largely automated except for those parts involving selling: e.g. Mergers and aquisitions, tellers, etc. See the documentary at the top of this.

lawyers: increasingly automated - legal research already highly automated, adversarial actions like representation by attorneys still exists but is indirectly automated through common pool of cases and protocol through automated systems.

Policing and security: not automatable in forseeable future due to human discretion and negotiation. Good career choice.

Military: already automating.

computer science: already automating, except for AI research.

AI research: highly promising - not automatable in forseeable future (till the singularity, if that ever occurs). However, not a a good career option at the moment since you're likely to be funelled into data science or into finance at the moment.

humanitarian logistics: automatable

entertainment: partially automatable, but diminishing opportunities. Already highly limited opportunities for people to entertain professionally - see documentary.

media: automatable, see above.

healthcare professions: the rate of AI diagnostics and decision rule development is suprisngly slow. Also, incredibly powerful professional associations of incumbants. I reckon automation available in 50 years. A most promising career option for youngsters. However, for those of us with jobs already, the highly general training programs (general science, then medical, then specialised training) and non-evidence base for quite a few allied health practices means that it's probably not a very good option for people looking to make a career move at this very moment or change career directions (huge education investment for later payoff). For general practice, AI is already automating lots - Dr. Google is very capable.

manufacturing: already automated/inting

transport: automating

clergy: not automatable but, good career option

food development and preperation: already automating

sports and recreation: already automating

did I miss anything?

Verdict: become a AI developer or entrepreneur then just hope (or make enough, before things get out of hand) capital or live in a democratic enough rich country to enjoy the rents you capture from AI developers! However, the actual size of the AI field is tiny - say 2000 people at the moment. I reckon we're simply going to be resting on the rents of our past hard work and previous generations for a while until there is some kind of political reform to enable a stable transition to the coming AI developed world.


          Could Elon Musk, creator of Tesla and the Hyperloop, be the next Robert Moses?        

Best known for Tesla, the Hyperloop, and wanting to colonize Mars, Elon Musk recently put forth his newest idea for technological advancement: a series of underground tunnels in Los Angeles that would take cars underground and move them along a track like a conveyor belt. There are all kinds of legitimate questions about whether Musk’s ideas will work, but there’s another matter at hand too: would they perpetuate racial and class inequalities that exist as a result of previous urban planning, or would they help alleviate them?

Even though Musk's latest proposal is for Los Angeles, Emily Gorcenski, a data scientist from Charlottesville, says Musk’s ambitions bring up broader points that apply to our region as well. More specifically, Musk reminds Gorcenski of Robert Moses, who planned and designed much of New York’s public infrastructure from the 1930s to 1960s.

Moses was a visionary, but also a deeply racist man who deliberately kept parts of the city inaccessible to the poor and people of color. For example, he wanted to build bridges without tracks for public transit and at lower heights to prevent buses, which lower income people tended to use more frequently, from accessing the Long Island Parkway and taking people to beaches, parks, and more exclusive (and white) parts of town.

The result is that to this day, parts of New York City are divided by class and race. One example is the Cross Bronx, an expressway that cut through the Bronx, which at the time the road was built was a historical Jewish neighborhood. The expressway cut off more than 5,000 families from the same opportunities as their neighbors.

Unlike Moses, Musk’s ideas seem to spring out of a genuine desire to innovate and improve other people’s lives. For example, he has suggested that Tesla’s batteries could help areas around the world without electricity infrastructure power their homes.

At the same time, most of Musk’s proposals are only for the elite: only relatively well off people can buy Teslas, and nobody without very deep pockets is going to outer space.  Musk has really exciting ideas, but I personally think he sometimes comes off as wanting to use the world as his laboratory while others have to deal with the planning and the side effects.

These are all good things to think about. As the Washington region tries to improve and expand, residents and planners alike should consider how technological changes, from an expanded DC Streetcar to self-driving vehicles, may be a benefit or detriment to communities.

Who will take advantage of these new technologies? How will leaders ensure equity? Can the less fortunate have the same opportunities as their neighbors?

Comment on this article


          What's new in ag? A collection of five new items: June 26-30        

Here is a collection of media releases gathered into one place.

Check out this gallery of what's hot in agriculture. This gallery is a collection of media releases sent by manufacturers and other agriculture businesses gathered into one place. This week's issue tells us about a new tire spreader and an air flow system for Case and New Holland tractors. Plus, Anuvia garners a national award and a new marketing service for the digital age. 


          Associate Data Science Business & Operations Management Director - Astellas Pharmaceuticals - Northbrook, IL        
Astellas offers an environment where our employees can make a real difference. Establishes, manages and updates the training curricula for vendor staff in line...
From Astellas Pharmaceuticals - Mon, 19 Jun 2017 20:03:12 GMT - View all Northbrook, IL jobs
          Predicting Case Outcomes: Polytopic-ness to Measure Procedural-ness        
[Ed. Note: Please welcome guest blogger, Ravi Soni, data scientist from Casetext. I was introduced to Ravi by Casetext's Vice-President, Pablo Arredondo, and asked to publish Ravi's discussion on how he uses analytics at Casetext to determine if "the holding in a case is more procedural or more substantive," and how to leverage that information to potentially predict outcomes. - GL]

___________________



One of the biggest constraints to innovation in legal research is how hard it is to scalably classify and quantify information without significant human intervention. At Casetext we’ve made real progress using advanced analytics to better leverage the wealth of content within the law to predict certain outcomes with more precision. The applications for this can range from anything between practice management, case strategy, or in my case, legal research. There is one such challenge I’m particularly interested in, namely, how to quantifiably determine whether the holding in a case is more procedural or more substantive.

I started with a collection of 47,464 briefs written by top law firms in the country. Using the citations and nature of suit (NOS) code associated with each brief, I was able to determine how many unique NOS codes were associated with each case. I defined this as how “polytopic” a case is. In other words, I counted all the unique NOS codes from the briefs that cited to each case and assigned that number as the polytopic score for each case. Ultimately, my goal was to use polytopicness as a proxy to measure proceduralness.

The idea behind using polytopicness to measure proceduralness comes from a simple concept. Let’s say we have a lawyer at an AmLaw 50 firm working on a massive M&A, a public defender in a small county appealing a death penalty verdict, and a boutique immigration firm working on a deportation case, and they all cite to the same case. What does this case have that all three of these attorneys found useful? The short answer is probably nothing substantive. What is more likely is that they are all citing to this case because it is a foundational case that sets the framework for some common motion that transcends practice area.

Let’s look at a concrete example. If I ask a roomful of lawyers if they know about A to Z Maintenance Corp. v. Dole 710 F. Supp. 853 (D.D.C. 1989), it’s quite unlikely that any of them would be able to tell me much, or anything at all. If I asked about a case like Bell Atl. Corp. v. Twombly 550 U.S. 544 (2007), any attorney in the room should be able to tell me how it changed the standards for dismissal. Looking at Figure 1, we can see how there is a difference in citation count and polytopic score between these two procedurally distinct cases.


In this example comparing these two metrics clearly show a difference between the procedural and substantive case – but does this hold for all cases in the data set? 

To find the answer, I first looked at the average number of citations per distinct polytopic score, as seen in Figure 2. To clarify what that means, I’ll use the point at roughly (50, 2500) as an example. This point can be translated to the following: Cases that have a polytopic score of 50 will on average be cited for a total of 2500 times in the briefs data set. The fact that it is a positive slope is intuitive and somewhat trivial; since a case that has a polytopic score of 5, must have been cited at least 5 times. The interesting piece here is the exponential growth, which means that proportionally, the cases that have a higher polytopic score will have a higher citation count. This finding was the first bit of evidence used to confirm our initial assumptions.



Next, I wanted to see what the distribution of polytopic scores look like in order to better understand how many cases are monotopic, bi-topic, etc. To do this, I aggregated the count of cases based on polytopic score (see Figure 3). Easily we can see that most cases in our brief data set are mono or bi-topic. However, when looking closer at the NOS codes (there are 102 in total) it seemed like some of the NOS codes could have been clustered together to make larger groups. For instance, there were codes like Personal Injury: Other, Personal Injury: Marine, Personal Injury: Automotive, etc. that could have been grouped together to make our groups more distinct from one another. As such, after grouping it seemed like any case that is associated with a polytopic score of 6 or more could be considered more procedural.


Although looking at polytopic score is useful, there are some corner cases where this metric would fail in measuring proceduralness. For instance, if a case has a polytopic score of 7, and it has only been cited 7 times ever, then to say it is procedural may not be correct. This is due to the fact that such a small number of citations may not be enough to give us an accurate polytopic score. As such, we need to account for how often cases are cited and adjust the polytopic score accordingly. Looking to Figure 4 we can see the overall distribution of case citations to better understand how often cases are cited. Figure 4 specifically looks at cases that have been cited at least once. 

Here, we can see that roughly half of all cited cases are cited less than 20 times. (In the same light, of the 8.99 million total cases that make up the common law, 5.65 million or about 63% have never been cited at all.) Using this citation information and the polytopic score for each case, I was able to distill an updated polytopic score that accounted for the number of times a case is cited. 

With the help of lawyers, I was able to manually go through 10% of cases that were most procedural and 10% of cases that were most substantive based on our polytopic scoring. I used this to determine whether or not this measurement was accurate in determining if a case is procedural or not. Overall, our assumptions were verified and we can say with some confidence that using polytopicness is a reliable measure of proceduralness for a case. For reference, here are the 10 cases that were shown to be the most procedural. 

ASHCROFT V. IQBAL 556 U.S. 662 (2009)
BELL ATL. CORP. V. TWOMBLY 550 U.S. 544 (2007)
CELOTEX CORP. V. CATRETT 477 U.S. 317 (1986)
ANDERSON V. LIBERTY LOBBY, INC 477 U.S. 242 (1986)
MATSUSHITA ELEC. INDUSTRIAL CO. V. ZENITH RADIO 475 U.S. 574 (1986)
LUJAN V. DEFENDERS OF WILDLIFE 504 U.S. 555 (1992)
CONLEY V. GIBSON 355 U.S. 41 (1957)
DAUBERT V. MERRELL DOW PHARMACEUTICALS, INC 509 U.S. 579 (1993)
KOKKONEN V. GUARDIAN LIFE INS. CO. OF AMER 511 U.S. 375 (1994)
FOMAN V. DAVIS 371 U.S. 178 (1962)

While this analysis has shown a strong relationship between polytopicness and procedurality, there is still some fine tuning needed to address the small subset of corner cases. The next step in continuing forward with this would be to see how clustering of NOS codes could be used to further refine the polytopic score. In the same light, this analysis has also opened up different avenues to explore. Some of which include, looking at different relationships between a brief and the cases they cite, how citation counts for cases differ in briefs and court opinions, or if we can predict what a case is about using substantive citations in the case documents. 

If you have any questions, comments, or concerns, please feel free to send me an email at ravi@casetext.com.

_________________________________________
Ravi Soni is a recent University of California, Berkeley graduate with a degree in Applied Mathematics. He is currently working as a Data Scientist at Casetext Inc., a legal technology company using AI to enhance legal research. Prior to joining Casetext, Ravi spent some time at other legal technology companies and worked as a legal assistant at a boutique IP firm where he focused on trademarks.



          Grenzüberschreitendes Projekt untersucht die Situation von Menschen in Pflegeheimen        
Data Science für die Demenzforschung
          What We’ve Been Up To: 2017        

ODSC East 2017 On May 5th & 8th, we attended ODSC East 2017 (Open Data Science Conference). On Day 1, we participated in the AI showcase, displaying our AI experience demo and some literature about...

The post What We’ve Been Up To: 2017 appeared first on GroupVisual.io.


          R Programming Tool For Data Science - Simplilearn Americas Inc. , Online         
The Data Science with R training course has been designed to impart an in-depth knowledge of the various data analytics techniques which can be performed using R. The course is packed with real-life projects, case studies, and includes R CloudLabs for practice.

Mastering R language: The course provides an in-depth understanding of the R language, R-studio, and R packages. You will learn the various types of apply functions including DPYR, gain an understanding of data structure in R, and perform data visualizations using the various graphics available in R.

Mastering advanced statistical concepts:   The course also includes the various statistical concepts like linear and logistic regression, cluster analysis, and forecasting. You will also learn hypothesis testing.

As a part of the course, you will be required to execute real-life projects using CloudLab. The compulsory projects are spread over four case studies in the domains of healthcare, retail, and Internet. R CloudLab has been provided to ensure a practical and hands-on ; Additionally, we have four more projects for further practice.

Cost:

Certified


          R Programming Tool For Data Science - Simplilearn Americas Inc. , Online         
The Data Science with R training course has been designed to impart an in-depth knowledge of the various data analytics techniques which can be performed using R. The course is packed with real-life projects, case studies, and includes R CloudLabs for practice.

Mastering R language: The course provides an in-depth understanding of the R language, R-studio, and R packages. You will learn the various types of apply functions including DPYR, gain an understanding of data structure in R, and perform data visualizations using the various graphics available in R.

Mastering advanced statistical concepts:   The course also includes the various statistical concepts like linear and logistic regression, cluster analysis, and forecasting. You will also learn hypothesis testing.

As a part of the course, you will be required to execute real-life projects using CloudLab. The compulsory projects are spread over four case studies in the domains of healthcare, retail, and Internet. R CloudLab has been provided to ensure a practical and hands-on ; Additionally, we have four more projects for further practice.

Cost:

Certified


          Data Science & Machine Learning - OfCourse , Online         
Includes 68 lectures and 9 hours of video content.
  • Learn how to perform machine learning on \big data\" using Apache Spark and its MLLib package.
  • Apply best practices in cleaning and preparing your data prior to analysis
  • Be able to design experiments and interpret the results of A/B tests
  • Suitable for software developers or programmers who want to transition into the data science career path."""

This course will teach you the techniques used by real data scientists in the tech industry and prepare you for a move into this career path. It includes hands-on Python code examples which you can use for reference and for practice. It also contains an entire section on machine learning with Apache Spark, which lets you scale up these techniques to "big data" analysed on a computing cluster.

Frank Kane spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to millions of customers. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. He also started his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

This course is intended for software developers or programmers who want to transition into the lucrative data science career path. It would also suit Data analysts in the finance or other non-tech industries who want to transition into the tech industry. You will learn how to analyse data using code instead of tools and it covers the machine learning and data mining techniques real employers are looking for.

Introduction
  • Introduction
  • Share your course with friends and family!
  • Say hi to your fellow students!
Getting Started
  • [Activity] Installing Enthought Canopy
  • Python Basics, Part 1
  • [Activity] Python Basics, Part 2
  • Running Python Scripts
Statistics and Probability Refresher, and Python Practise
  • Types Of Data
  • Mean, Median, Mode
  • [Activity] Using mean, median, and mode in Python
  • [Activity] Variation and Standard Deviation
  • Probability Density Function; Probability Mass Function
  • Common Data Distributions
  • [Activity] Percentiles and Moments
  • [Activity] A Crash Course in matplotlib
  • [Activity] Covariance and Correlation
  • [Exercise] Conditional Probability
  • Exercise Solution: Conditional Probability of Purchase by Age
  • Bayes' Theorem
Predictive Models
  • [Activity] Linear Regression
  • [Activity] Polynomial Regression
  • [Activity] Multivariate Regression, and Predicting Car Prices
  • Multi-Level Models
Machine Learning with Python
  • Supervised vs. Unsupervised Learning, and Train/Test
  • Supervised vs. Unsupervised Learning, and Train/Test
  • Bayesian Methods: Concepts
  • [Activity] Implementing a Spam Classifier with Naive Bayes
  • K-Means Clustering
  • [Activity] Clustering people based on income and age
  • Measuring Entropy
  • [Activity] Install GraphViz
  • Decision Trees: Concepts
  • Decision Trees: Concepts
  • Ensemble Learning
  • Support Vector Machines (SVM) Overview
  • [Activity] Using SVM to cluster people using scikit-learn
Recommender Systems
  • User-Based Collaborative Filtering
  • Item-Based Collaborative Filtering
  • [Activity] Finding Movie Similarities
  • [Activity] Improving the Results of Movie Similarities
  • [Activity] Making Movie Recommendations to People
  • [Exercise] Improve the recommender's results
More Data Mining and Machine Learning Techniques
  • K-Nearest-Neighbors: Concepts
  • [Activity] Using KNN to predict a rating for a movie
  • Dimensionality Reduction; Principal Component Analysis
  • [Activity] PCA Example with the Iris data set
  • Data Warehousing Overview: ETL and ELT
  • Reinforcement Learning
  • External Resources
Dealing with Real-World Data
  • [Activity] K-Fold Cross-Validation to avoid overfitting
  • Data Cleaning and Normalization
  • [Activity] Cleaning web log data
  • Normalizing numerical data
  • [Activity] Detecting outliers
Apache Spark: Machine Learning on Big Data
  • [Activity] Installing Spark - Part 1
  • [Activity] Installing Spark - Part 1
  • [Activity] Installing Spark - Part 2
  • [Activity] - Installing Sparks Part 2
  • Spark Introduction
  • Spark and the Resilient Distributed Dataset (RDD)
  • Introducing MLLib
  • [Activity] Decision Trees in Spark
  • Introducing MLLib
  • TF / IDF
  • [Activity] Using the Spark 2.0 DataFrame API for MLLib
  • [Activity] Searching Wikipedia with Spark
  • Installing Spark file
Experimental Design
  • A/B Testing Concepts
  • T-Tests and P-Values
  • [Activity] Hands-on With T-Tests
  • Determining How Long to Run an Experiment
  • A/B Test Gotchas
Recommended Courses
  • Recommended Courses

Cost:

Discount: 85% Off for Laimoon Users!

Next Session:

Duration: Flexible


          R Programming Tool For Data Science - Excelr Solutions , Online         

R is a statistical tool which is making buzz in the field of data analytics. R is open source, comes for free, used by research scholars across globe, used by students & professors across universities making it the most sort after tool of 21st century. With 50% market share, R enjoys a pivotal position among the statistical tools. With the recent acquisition of Revolution analytics (works extensively on enterprise versions of R) by Microsoft, the demand for R would increase in leaps & bounds. R enjoys a lot of freebies in terms of GUI (Graphical User Interface) based tools with RStudio being the pioneer among the many.

ExcelR provides classroom, online and e-learning access on R & R Studio. We offer an extensive coverage on R Tool with Live Projects and the training will be delivered by one of the best trainers of the industry.

Cost:

Certified


          Tableau 10 Desktop, Online, Server Online Training - Excelr Solutions , Online         

ExcelR offers an in-depth understanding of Tableau Desktop 10 Associate Certification training for Tableau developers and complete Tableau Server training for Tableau administrators. 

Training includes 30 hours of hands-on exposure to ensure that you are left will a feeling of being an expert at the Tableau tool usage. We have considered the industry requirement & devised the course to ensure that you have the practical exposure required to swim through the interviews with ease. The case studies explained towards the end will only reinforce the practice learning to make you complete to face the real world projects & problems which are solved using Tableau. The datasets chosen ensures that you learn every option completely. With a lot of industry connects you get to know the job opportunities which none would otherwise. Mock interview questions & the final project which help you establish as an adroit in the space of data visualization. Learning the leading data visualization principles will ensure that you always work with a combo of "Data Visualisation Dos & Dont's + Tableau Tool".

  • Tableau is in the leaders quadrant of data visualization according to Gartner's magic quadrant. Key differentiators of Tableau over other business intelligence tools are:
  • Tableau connects to a lot of other native databases & servers
  • Tableau has a lot of analytics capability
  • Tableau connects with most of the leading Big Data tools
  • Tableau is designed for end users so that customers directly make changes as required
  • Tableau has varied licensing cost for different uses of different customers
  • Tableau Server for managing security & managing the reports sharing
  • Tableau Desktop for developers to develop reports, dashboard & story maps
  • Tableau Online for customers who want to view visualizations from anywhere
  • Tableau Mobile for the users using pad (iPad, notepad, etc.)
  • Tableau Public for basic users in trying to connect to excel workbook
  • Tableau Reader for users who want to read the Tableau developed visualizations

Who Should do Tableau Certification Training
  • Professionals who should pursue Tableau Certification Training includes:
  • Business intelligence professionals
  • Data Reporting professionals
  • Content Management Professionals
  • Senior management who provide reports to leadership teams
  • Leadership team who presents reports to customers
  • Media folks who create visualisations for leading magazines
  • Freshers who want to kick start their careers in IT/Software industry
  • Database administrators who always manage data
  • Data scientists who work on data to build prediction models
Though there are umpteen number of data visualisation tools which follow the data visualisation principles. Tableau holds No.1 position in the leaders quadrant for data visualisation. Hence data visualisation training offered by ExcelR is exclusively on Tableau Desktop. 


Cost:

Certified


          Certificate in Big Data and Data Analytics - PLUS Specialty Training , Dubai         

Across all lines of business sharp and timely data insights are required to keep an organization competitive in this digital era. Big data is a change agent that challenges the ways in which organizational leaders have traditionally made decisions. Used effectively it provides accurate business models and forecasts to support better decisionmaking across all facets of an organization. This course provides participants with the data literacy they need to remain efficient effective and ahead of the curve. Participants will learn why where and how to deploy technologies and methodologies from big data and Hadoop to data analytics and data science. During the course all participants will be given access to proprietary online resources for viewing and downloading including multiple coding demonstrationsexamples.

Cost:

Discount: Summer Discounts Available!

Next Session:

Duration: 5 Days


          Business Analytics Online Course - Excelr Solutions , Online         

What is the No.1 profession of 21st century? What is the profession to be termed as sexiest in 21st century? Which profession provides salaries like never seen before? Which profession is most (all) companies hunting for in full throttle? Which profession ensures that your salaries grow exponentially with the experience?

Answer to all the above questions is the word "DATA SCIENTIST", which is also termed differently as Data Analytics or Business Analytics. All it takes to become a successful data scientist is working knowledge of 5 core concepts - Statistical Analysis, Forecasting, Data Mining, Data Visualisation & Text Mining. Excelr provides hands-on training using live case studies being implemented in industry for 50 hours. In addition participants are provided with assignments, mini-projects, quizzes, case studies & a final capstone project to ensure that you are ready to crack any interview immediately after the last day of training.

As part of Statistical Analysis training we start from very basics & move on to discuss about very advanced concepts including Linear, Logistic, Poisson, Binomial, Negative Binomial, Zero Inflated regression techniques, Imputation etc. These core concepts provide you an edge over other aspirants who are trained else where. Aspirants can also opt to consider only statistical analysis training & thereby get statistical analysis certification. This will provide more confidence to the employers.

As part of Forecasting training you will learn about the various time series techniques which includes Auto Regression (AR) , Moving Average (MA), Exponential Smoothing (ES), ARMA, ARIMA, ARCH & GARCH.

Data Mining training includes two streams - Unsupervised data mining & Supervised data mining. As part of this machine learning training you will be exposed to various techniques within Unsupervised learning which will help you perform clustering, build recommender system, perform network analysis etc.,

Data Visualisation training is a must have for any data analyst. You will be exposed to Tableau which is arguably number one data visualisation tool with a lot of analytical capabilities. 

Text Mining is the most sort after skill in a data scientist. Reason being, 80% of the unstructured data is textual. Data is getting generated in social media in form of tweets, posts, etc., e-commerce website in form of review comments etc.


Cost:

Certified


          Data Science - Skillsology , Online         

Discover the skill set of a Data Scientist, a new role meeting the increased demands and opportunities of the web and modern technology.

In this course you will learn:

  • How to use your analytical skills to manipulate data
  • Develop business acumen, so findings are applicable in the real world
  • Master statistics, to separate vital signals from irrelevant noise
  • The basics of Excel and the R programming language

Understand the key elements of data science, allowing you to solve real business problems.

The course teaches the analytical and statistical skills to allow students to turn data into actionable insights. It also covers how to use an analytical toolkit consisting of widely available or free software (principally Microsoft Excel and the R programming language), to allow statistical analysis and visualization.

Cost:


          Most Popular Pandora Stations in New York City        

The Pandora data science team compiled a list of the most popular Pandora stations for New York City neighborhoods segmented by zip code.

Check out the map below and additional neighborhood breakdowns. Any surprises?

What Are The Other Top Pandora Station ... Read More


          Amazon S3 Introduction: Basics and Features        
Amazon S3 stands for Simple Storage Service and as the name suggests it is online cloud storage medium from Amazon.  With Amazon S3 you can easily store and retrieve any kind of data anytime smoothly from the online data clouds. It comes with various features that make it one of the best cloud storage medium for all kind of data demands. It is growing day by day and with time it is getting famous among all data scientists. Here in this article we will look at some of the best features that Amazon S3 comes with and why it is
          10 Ways To Attract Tech Talent As A Growing Startup        

Hiring tech talent — software engineers, data scientists, DevOps, and related — is hard, especially when your business is new […]

The post 10 Ways To Attract Tech Talent As A Growing Startup appeared first on Cox Blue.


          DATA SCIENCE        
Cuánto pagarías por un periódico con las noticias de mañana? Aún no lo hacemos pero estamos cerca, DATA SCIENCE el futuro para tu empresa HOY http://WWW.SPPERU.COM
          â€œThe Only Metric That Matters”: Total Time Reading        
That’s the claim of this piece by Medium product scientist Pete Davies, which goes into why they prefer to push past vanity metrics that create giant PR-friendly numbers. We’ve crossed a point at which the availability of data has exceeded what’s required for quality metrics. Most data scientists that I meet tell me that they’re...
          Senior/Lead Python/R Data Scientist, Front Office Quantitative Fixed Income Trading - Global Investment Bank (Hong Kong)        
Our client, an innovative, technology driven global investment bank, is looking for a proven big data / machine learning data scientist to join their front office team; They are an industry leader in incorporating the latest technology into their cor
          Bot for Teacher        

Today a future without schools. Instead of gathering students into a room and teaching them, everybody learns on their own time, on tablets and guided by artificial intelligence.

First, I talk to a Ashok Goel, a computer scientist who developed an artificially intelligent TA named Jill Watson and didn’t tell any of his students she wasn’t a human.

Then I talk to two people building future, app based educational systems. Jessie Woolley-Wilson from DreamBox explains what adaptive learning is, and how it can help create a better learning experience for kids. She also talks about all the data they collect on kids to better serve them (data we’ll come back to later in the episode.) Along with Jessie, Julia Stiglitz from Coursera explains how this kind of self-directed learning can extend into the college and post-college world.

Jessie and Julia see a future with these kinds of learning apps that could be more democratic, more creative, more fun and more effective. But there are some downsides too. Neither of them see apps or algorithms replacing teachers, but there are other organizations and projects that do.

In 2013, a guy named Sugata Mitra won the TED Prize which comes with a pretty healthy million dollar check. He won this prize for his work on what he calls “A school in the cloud.” Mitra founded this organization named Hole in the Wall, where he went around the slums of India and installed these kiosks that children could use and play with. His whole thesis is that students can be taught by computers, on their own time. Without teachers. Here’s his TED talk.

And this Hole in the Wall thing is one of the classic examples that a lot of people working on education apps point to to show that kids don’t need teachers to learn. Kids are naturally curious, they’re going to want to seek out information, you don’t have to force them into a tiny room to listen to a boring teacher.

But we talk to some people who question that narrative. Audrey Watters, who runs the site Hack Education, says that projects like Hole in the Wall often don’t last. Nearly all the kiosks that Mitra set up are abandoned and vandalized, she says, and when you look at footage and images of the kiosks you can see that older, bigger boys dominate and push the smaller boys and girls out.

And get this to a question that came up with literally every person I talked to for this episode. What is the purpose of school? Is it to teach content? Or is it to teach students how to relate to one another, how to empathize, how to think, how to be good citizens? Nobody really knows. But we talk about it on the episode!

We also talk about some of the other downsides of these systems. Jade Davis,  the associate director of digital learning projects at LaGuardia Community College in Queens New York, tells us about her concerns that algorithms might pigeonhole Kids who might not take to the system immediately. Kids like her own.

In the end, we talk about whether or not these kinds of solutions are really for everyone. Or if they’re just going to be used on poor, disadvantaged kids. Because, are Harvard students really going to be taught by robots? Probably not.

Bonus: Listen to the very end for a fun surprise.  

Further reading:

Flash Forward is produced by me, Rose Eveleth, and is part of the Boing Boing podcast family. The intro music is by Asura and the outtro music is by Broke for Free. The break music is by M.C. Cullah. Special thanks this week to listeners who sent in their kiddos for the intro — that’s Ari, David, Kevin, Sharon, Beth, Kim and Nav. The episode art is by Matt Lubchansky.

If you want to suggest a future we should take on, send us a note on Twitter, Facebook or by email at info@flashforwardpod.com. I love hearing your ideas! And if you think you’ve spotted one of the little references I’ve hidden in the episode, email us there too. If you’re right, I’ll send you something cool. Oh and one the survey some of you have asked what these references are that I’m talking about! If you go to flashforwardpod.com/references you’ll see a list of past hidden gems from season one so you can see what you should be looking for.

And if you want to support the show, there are a few ways you can do that too! We have a Patreon page, where you can donate to the show. But if that’s not in the cards for you, you can head to iTunes and leave us a nice review or just tell your friends about us. Those things really do help.

That’s all for this future, come back next week and we’ll travel to a new one.

 

▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹▹

TRANSCRIPT

Rose: Hello and welcome to Flash Forward. I’m Rose and I’m your host. Flash Forward is a show about the future. Every week we travel to a different possible, or not so possible, tomorrow.

Before we start this week I wanted to just thank everybody who took the listener survey and talk about a few things that came up in it. So, one big thing many of you said was that there are these weird long pauses. Here I was thinking that I was doing fancy sound design with these very low ambient noises but apparently you can’t hear those! So I’ll cut that out, and hopefully the pauses will go away.

Another thing a lot of people asked about was why there is now a musical break in the middle of the show. The reason that’s there, and this is a little bit of inside baseball podcast stuff, but basically I have ads on the show. And some of those ads get placed in the middle of the show. And they’re dynamic ads, which means that I put a little marker where the ad goes, and different ads get dropped in at different times. So if I didn’t have the musical break, you’d just get BOOM an ad right there in the middle. And sometimes the ad marker can get a little funny, so I need to leave a little space on either end to make sure it’s not cutting off a work or doing something weird. Which is why there’s music there. I’m trying to come up with a more elegant solution, but, for now that’s what it is.

Okay, two last things. I asked what kinds of rewards I might be able to offer to entice you to donate to the show. And a lot of you actually put down things that I already offer on Patreon! Which means that I haven’t done a good job telling you what the rewards are! If you donate $1 an episode, you get access to transcripts and bonus full interviews from certain futures. If you donate $2 an episode you get access to all that, and a special fan only newsletter that’s full of links and often pictures of my dog. If you donate $5 an episode you get all of those things, and every other week you get a short story about a future we’ve traveled to. And if you donate $10 an episode, you get all those things AND your voice gets to be in a future scene. ALSO for those of you who wrote in saying you hate the ads, I’ve got something for you as well. If you become an Acast+ member, which costs $5 a month, you get an ad free show. You can get all the info about Acast+ and become a member at Acast.com/plus.

And finally! Transcripts. Enough of you said you’d like transcripts of the show that I’m going to start posting them. I’m not exactly sure how I’ll do this yet, because I want them to be easy to read and not super cumbersome, but in the next couple of weeks you’ll get transcripts of at least this season’s episodes. The transcripts I have for last season’s episodes are not very good so I’ll need to figure out how to get those up to snuff. But stay tuned, this will happen, and when it does I’ll let you know.

OKAY enough blah blah from me, let’s go to the future! Let’s start this episode in the year 2099.

Computer voice: Hello there Kara, welcome to school today. I hope you’re ready to learn!

Computer voice: Today we are going to continue our lessons on trigonometry. Did you watch the homework video?

Computer voice: Hello Kara, let’s continue our lessons on American History. Please upload your essay on reparations to the portal.

Computer voice: Hello Kara. I have read your essay on reparations, let’s discuss. Please turn on your webcam.

Computer voice: We have identified some key points in your essay for discussion. It seems that perhaps you did not do some of the assigned reading. Did you read the Atlantic essay in your homework packet?

Let’s go over it together. We’ll start of page 3 of the assigned essay.

Computer voice: Hello Kara, we haven’t detected any movement in front of the screen in a while. Are you still there?

Computer voice: Hello Kara, your tracker indicates that you haven’t been at the computer during lessons. We have reset all lessons marked complete while you were not here.

Computer Voice: Do you remember having a human teacher? What was the best thing about human teachers?

Kid voice: Playing with Ms. Lloyd, and that Ms. Lloyd helps us. And that she gives us ice packs when we bump our head, yeah, how cool is that?

Kid voice2: Ummmmmm that they love me.

Kid voice3: I like that they love me.

Kid voice4: She’s very kind

Computer voice: Do you like me?

Kid voice4: No

Kid voice3: Mmmmmmm half and half.

Computer voice: Hello Kara, a new version of the LearnFuture software is now available. Would you like to update now?

Computer voice: Hello Kara, remember, your regional exam is tomorrow. You still have 12 lessons to complete in order to be ready for the exam.

Computer voice: Hello Kara, our records indicate that you missed the regional exam. Without this test you cannot move on in your education. Education is the key to your future.

Computer voice: Hello Kara, it has been 34 days since you logged on to your lessons portal. Are you still there? Education is the key to your future.

Rose: Okay so today’s future is one without schools. One in which everybody learns everything online, or at computer kiosks, guided by algorithms and pre-made lessons and artificial intelligence. And I want to start this future, with a story from Georgia Tech.

If you were a masters student at Georgia Tech this spring, you might have taken a class called Knowledge Based Artificial Intelligence (KBAI). It’s an online class and there are about 300 students the take it. And because it’s an online class the students rely on this online forum to post their assignments, have discussions, all that stuff. Every time this class is offered, the students post about 10,000 messages on this online forum. And one of the things they use this online forum for is for asking the TAs questions. Everything from “when is this assignment due” to “what is the nature of intelligence?” Normally are nine or ten TAs and they hang out on the forums to answer these questions.

But this spring, one of those TAs, was not human.

Student: Should we be aiming for 1000 or 2000 words? I know, it’s variable, but that is a big difference…
Jill (computer voice): There isn’t a word limit, but we will grade on both depth and succinctness. It is important to explain your design in enough detail so that other can get a clear overview of your approach. It’s also important to keep things clear and short.

This is Jill Watson. Yes, Watson, like the supercomputer.

Ashok Goel: So we started on this journey almost exactly a year back I think it was in May or June of 2015 that we first started thinking about this. I had already done some work with Watson for different projects so I was familiar with Watson.

Rose: And this Jill’s creator, I guess, Ashok Goel, a computer science professor at Georgia Tech.

Ashok: I had been teaching this online course in fall of 2014 so for two years and I knew that the number of questions that students were raising were really, the teaching staff was answering all of these questions but it was taking a lot of time and effort to do that.

Ashok says that a lot of the questions that students ask on these forums are pretty much the same: when is this due, how long should it be, where is this assignment, I can’t find the material I need, that kind of stuff. And TAs have to spend their time answering these questions, which the students need to know but that just kind of take a lot of time and aren’t super interesting. So he figured, why not have an AI do it?

So they got all the questions students had ever asked in this class, and fed them to a computer system powered by IBM’s Watson. Hence the name. And once they were sure she wouldn’t go rogue and answer questions incorrectly willy nilly, they let her loose on the students. But they never told them that Jill wasn’t a human.

Ashok: We build an AI so that the AI was sophisticated enough that students couldn’t tell the difference between the responses coming from a human TA and AI TA, and that was part of the reason why we didn’t did not tell students right from the beginning Jill Watson was in an AI because he wanted to see were there you know students will be able to figure it out.

So this TA went along, answering student questions.
JILL: It’s fine if your agent takes a few minutes to run. If it’s going to take more than 15 minutes to run, please leave notes in the submission about how long we should expect it to take. We can’t have all the projects taking a long time because we have to run them in a reasonable period of time.

Rose: And then at the end of the semester once all the finals were turned in, Ashok revealed his secret. And he was kind of nervous about it, he didn’t know how the students were going to react.

Ashok: So when we first told the class that Jill Watson was an AI we were actually very concerned. We were we did not know whether the class would view this positively or negatively. We were worried students may say what do you mean we have been dealing with AI all of this time how dare you you know that kind of reaction.

Rose: The students, were actually into it.

Ashok: On the other hand the response turned out to be not only positive but uniformly positive and positive it was like a wow kind of thing. And students were thrilled with it.

Rose: So, Ashok has plans to do this again.

Ashok: And I don’t know how much I want to share with you. Not because I don’t want to share with you but because you will put it on a podcast and I don’t want students necessarily to know about it. So it’s not you.

Rose: This fall Ashok will teach the same class, but this time he says that the students will know that not all the TAs are human. He’ll change the name, but there will be some TAs that are AIs, and some that are human.

Ashok: I don’t know the exact number yet but I can share with you that more than one of them will be an AI, we will not call it Jill I will just tell the students some are human Some are AI’s. You deal with it you figure out which one is human which one is here and I’m curious how long does it take them to figure out and to go to your question. I’m even more curious. Will the kind of questions they ask and will the interactions they have changed as compared to what happened in spring or what happened last year.

Rose: Now a lot of the media reports I read about this Jill Watson thing, characterized the whole thing as a prank. That Ashok had pulled a prank on his students. He does not see it that way.

Ashok: Yeah it was weird to me too. I did not think of this is a prank at all. As a teacher pulling a prank on my students is would be completely unprofessional I wouldn’t know what I would do that.

Rose: No, Ashok doesn’t see this as a prank. He sees this as, the future.

And there are tons of companies out there working on this kind of future — a future in which computers and algorithms take over the classroom. Some companies see computers as helping human teachers. Others see computers doing the vast majority of the teaching itself.

And one of the key terms that I encountered when I was researching all of this is something called “adaptive learning” this idea that as you use a an algorithm to tailor lessons to the individual child’s skill set. This is how an app called DreamBox works.

Jessie Woolley-Wilson: So if you and I were second graders and DreamBox went to understand how well we group numbers, Dreambox might say use the virtual math rack to build the number 27.

Rose: DreamBox is a math learning app, and this is Jessie Woolley Wilson, the President and CEO of the company.

Jessie: And let’s say you have better math skills than I. So you Rose would say I want to build some tens, you take five individual beats you questioned them into a five you do that twice to make it 10. You do that process three times. Then you take the individual set of five, and two individual beads and in literally five moves you get to number 37. I on the other hand, take 37 individual beads and move them over individually. I get thirty seven right. And you get 37 right. But I clearly didn’t know how to group numbers appeared to play I didn’t demonstrate that I understood that. So should you and I have the same next lesson? In Dreambox that will never happen. In fact Dreambox takes it even a step further. Dreambox will be monitoring me while I was thinking, and while I was trying to solve the problem it would recognize that I was not on track for doing any kind of grouping let alone efficient grouping. It would pull me out of the lesson before I get frustrated, before maybe my confidence was eroded and would move me earlier into the lesson really introduce me to effective grouping strategies and then give me another problem and another opportunity to be successful.

Rose: So right, adaptive learning basically means that the computer learns what you do and don’t know, and tailors the next thing you see to that. So every kid gets their own specific, personalized lesson that plays to their preferences and strengths and even just, which games they think are the most fun.

Jessie: With a little kid, they might not like pirates as much as pixies.

Rose: And the result is, in theory, a whole curriculum that is totally custom to every child.

Jessie: We envisioned a learning experience that was age and grade agnostic. So that we don’t we don’t make judgments about what a child should know by a certain time. We just evaluate what a child is ready to learn next.

And this idea of self directed learning is a really common thread in pretty much every educational technology company that I read about for this episode. The idea being that we should all be able to follow build our own curriculum, our own learning experiences based on our interests and our curiosity.

Julia Stiglitz: So I think we imagine a future where learning is something that happens throughout your life so that it’s not confined to those four years when you’re in an institution or you know the 12 years when you’re in primary and secondary school.

Rose: That’s Julia Stiglitz, the head of business development for a company called Coursera. You might have heard of Coursera, it’s this website where you can take college courses from institutions from all over the place. I have attempted several Coursera courses, in my life, but I will admit that I… have never finished one.

Coursera was started four and a half years ago by a computer science professor at Stanford, who basically put his machine learning course up online for anybody to see, and 100,000 people signed up. Today there are hundreds of courses you can take online with Coursera.

Julia: One of the you know really excellent course is of course from University of Pennsylvania modern poetry. There is a public speaking course from University of Washington where people videotaped themselves speaking and then get peer feedback from other people in the class. And how they did in the in their presentations there the songwriting there is you know really a host of creative writing class from Wesleyan.

Rose: So, in this future, where there are no schools, people will all be doing things like DreamBox and Coursera to learn whatever it is they need to learn. Instead of having education force fed to them.

Jessie: While ight now I feel like often time student engagement and enjoyment int the learning process is secondary. We have to give them medicine. And medicine taste bad but they need the medicine and we have to make sure that a good medicine.

Rose: We have a fun, game based teaching system! And one that changes with the student so that they’re never bored or frustrated or behind! And when you’re applying for a job, you just list all the courses you’ve taken online!

Julia: And we see that we see people who we have a data science specialty from John Hopkins and a lot of people post this on their LinkedIn profile and often it is the top educational experience that they have on their LinkedIn profile before where they went to school before anything else. And it signals to employers and it signals to the world that they finished this very rigorous curriculum and know this content

Rose: And all of this could be really great!

Jessie: The promise of democratized learning opportunity will come into full focus. As we build affordable and available, personalized learning solutions that are acceptable regardless of what we do that is because the model for learning potential of every child regardless of where they happen to be born what they look like what language they speak. We will take a huge step in the unleashing their human potential. And I personally believe that that is the pathway to a happier more tolerant, more peaceful, more sustainable world.

But, I think you probably know where I’m going here, there are some downsides. And when we come back we’re going to talk about what we lose, when we lose schools. But first, a quick break.

[[AD]]

Rose: So we’re talking about a future in which there are no more centralized schools. All learning is done on phones or computers or tablets or whatever future thing we might invent. And this is a future that some people are indeed driving towards.

I want to be clear that both Jessie from Dreambox and Julia from Coursera who we just heard from, they don’t actually think we should get rid of school or teachers. We’re going to get to why in a second, but there are actually people working in education technology who do see a future world where schools as we know them are obsolete.

In 2013, a guy named Sugata Mitra won the TED Prize which comes with a pretty healthy million dollar check. He won this prize for his work on what he calls “A school in the cloud.” Mitra founded this organization named Hole in the Wall, where he went around the slums of India and installed these kiosks that children could use and play with. His whole thesis is that students can be taught by computers, on their own time. Without teachers.

And this Hole in the Wall thing is one of the classic examples that a lot of people working on education apps point to to show that kids don’t need teachers to learn. Kids are naturally curious, they’re going to want to seek out information, you don’t have to force them into a tiny room to listen to a boring teacher.

Audrey Watters: Right so there’s throughout the 20th century at the very least there have been lots of imaginations that we would thanks to machines be able to sort of radically change what learning teaching and learning looks like.

Rose: That’s Audrey Watters, she runs a website called Hack Education and covers the intersection of education and technology.

Audrey: And so I think a lot of that with the Internet is the promise of that. You know you hear all the time you can access anything you want online right you can have. We have access to more you know more information more knowledge than anytime in the past and an expectation that without the constraints of a formal schooling system that children will all be eager curious learners who are motivated because of their own their own sort of innate curiosity in learning about the world and that somehow somehow school stands in the way of students actually learning.

Rose: And she says that when you look closely at these assumptions, this idea that you just give people access to all this information, if you just let them have it, they’ll learn, has some holes in it! Like, the Hole in the Wall kiosks, for example.

Audrey: Almost all of the places where these computer kiosks were installed are now abandoned abandoned and. Vandalized. And I think that if you know if we if we actually thought that these were somehow a magic you know a magic pill a silver bullet for educating for educating students in a better way than the school system the school system in the slums in India for example or in Africa then I think that we would have probably seen a different a more respectful treatment of these sites. But they’ve largely been abandoned.

Rose: And not only have they been abandoned, but when they were there, they weren’t serving every kid the same way.

Audrey: It was really interesting mostly it was older boys who dominated the kiosks. It doesn’t exactly or doesn’t necessarily look very just, so girls for example were excluded from participating on these kiosks. Younger smaller boys were elbowed away so that hardly seems like a solution to the future of education if it’s something that really only benefits the boys.

Rose: And get this to a question that came up with literally every person I talked to for this episode. What is the purpose of school.

Audrey: Why do we do this thing where in the what it looks like today at least we gather students of a particular age right from the age of 15 through 17 18 and we mandate like we actually say you must you must go to school.

Julia: So you know it and it’s interesting and it gets kind of to the what is the purpose of education. And what what what role should at play.

Ashok: So you know we always talk about cognitive skills you know how do you do algebra or how do you write an essay in English and things of that kind. How do you solve a problem in physics. There’s a cognitive skills and then of course terribly important but there are two other kinds of schools there where this sort of metaphor of sitting in front of a computer doesn’t quite work.

Jessie: There is something very important that happens in a classroom setting. When you bring people who live in different towns and different code into the same. Place. There’s something. That is added to the social fabric that cannot be achieved for the child. The single child is part of the single screen.

Jade: One of my ongoing questions throughout my entire education which is what is the point of learning what do you get from being in this space outside of those obvious things like academics critical thinking.

Rose: That last person you heard is a new person Jade Davis, she’s the associate director of digital learning projects at LaGuardia Community College in Queens New York. And she says that to reduce education down to simply: did you get the skills you need to move to the next lesson, or to perform in your job, doesn’t account for all the other things that students are looking for. And to assume that the things that students are naturally curious about are directly linked to marketable job skills, erases the experience of a whole lot of people.

Jade: And for me for the students that I work with which isn’t just a problem for community college students that’s a problem for college students everywhere everybody’s talking about it. For some of the children I was exposed to growing up. The thing that they would be curious about isn’t science. That’s where my going to get food. How can I get toilet paper. Oh my gosh where can I take a shower. And so what we don’t realize is that when students come in to learn they are just coming in as a learner, they’re coming in as a human and many of the students are facing challenges that make it so that they don’t have the space to be curious about the things that the algorithms are trying to measure.

Rose: Now, these are big picture questions: what is the nature of school? What is the value of different types of curiosity? How self driven are children really?

But I also want to talk about some more specific stuff. Like, if kids aren’t going to school, um, who is taking care of them? Tons of people rely on school for childcare during the day so that they can go to work.

Audrey: Totally I mean and I think that that’s that’s the that’s the odd sort of argument that this sort of vision of that you know bless their hearts that I think you get from a lot of maybe 20 something tech entrepreneur has that really just haven’t haven’t really thought about haven’t really thought through and any of the realities themselves again they’re you know they’re perfectly they’re perfectly happy with a narrative that makes it about the individual that makes it about autodidacts that they can find the resources they can even connect to perhaps to powerful networks themselves. But I think that that’s that’s absolutely not the reality for the vast majority of people in this country.

Rose: And what about all that data that’s being collected to create new lessons for the kid? Who has access to that data? Does it go on their permanent record? Is that a good thing?

Jade: I don’t know how much ends up in their permanent school record and it’s a little bit scary to have like kindergarteners and first graders. Adaptive learning things and having all that kept in their records potentially because all of this will keep that data in until the child enters or learning on their own. And if all the algorithms are speaking to each other maybe not knowing what one plus one is will be really bad in the future.

Rose: What this could mean, is that kids who maybe doesn’t get the interface, or starts out on the wrong track, could have a much harder time catching up or switching tracks.

Jade has some experience with his.

Jade: So, how old are my children, they are seven and nine. They are they just finished actually first and third grade and we are and then New York City metro area and we moved here from North Carolina. And when we got here they were both reading below grade level and sort of underperforming which was really bad. But one of the things that was going on in North Carolina is very early on in both of their school careers they switched their homework system for being on paper to being on the computer or tablet.

Rose: So in North Carolina, her kids were using one of these adaptive learning systems that we talked about before. And it wasn’t working for them. But when they moved to New York, and started doing their homework on paper again, instead of in the app, they caught up.

Jade: But as soon as they started reading on paper and having to write on paper as their handwriting is abysmal, it’s really bad. They got up to grade level in like a month and which was pretty average. In terms of writing I think that’s it’s gotten well.

Rose: But if they had stayed in North Carolina, Jade worries that they would have been stuck on the same track.

Jade: Yeah you’re already in a chain reaction right. That’s sort of what algorithms are so if this input at the very beginning was bad and you couldn’t adapt fast enough for whatever reason you might get lost like they legitimately thought my son had a learning disability we had them tested they didn’t know what was going on. And had we not got out of our way to get testing done and had we not sort of worked with especially when we moved to make sure that we repeated testing her and had we not gone out of our way to find things that they wanted to and worked with them on the algorithm that they were on would have said you know this person is behind grade level and they’re behind when they’re behind grade level. Even now because he was behind grade level he has an eye even though he doesn’t necessarily need it anymore with just an individual action plan thing.

Rose: And this made me think about my relationship with school. I was not a great student. Because I was, okay, am still, pretty ornery. And if I didn’t think an assignment was good, I just wouldn’t do it.

Jade: I was the same way with multiple choice tests out there really stupid. And I remember distinctly in third grade being really upset with where all the dots really.So are you raced all of my answers. And did a pattern. And I failed the tests. But the teacher knew that I knew everything so it was fine and my grades were high enough everywhere else that it balance that out. But for students like you and me who are just like “this isn’t my thing” we would be really bad people on the algorithm.

Rose: So, why does it seem like so many people want to “disrupt education” — I’m doing finger quotes right now around disrupt. Why is this idea of a school in the cloud, or a hole in the wall, or a world where students just teach themselves everything they need to know on their own, seem to be so trendy right now?

Audrey says that it’s partially because we don’t value teachers.

Audrey: Teaching is a feminized profession and it’s mostly it’s has been historically mostly women who’ve been teachers and so do we think it’s a value a valid profession. Do we respect this profession that’s dominated by women. I don’t I don’t think we do.

Rose: But it’s also has to do with the rise of tech as this huge and influential sector. So many of the people working as programmers in tech right now didn’t learn programming in school, because nobody taught programming in school. They taught themselves.

Audrey: They didn’t have opportunities to learn computer science in school and so they think here I know this thing I know this thing that everyone is talking about as being absolutely crucial to the future of the economy crucial to the future of education and I did it myself. So schools are you know I think it makes it really easy to believe that schools are somehow irrelevant if the one thing you do. You taught yourself and sure everyone in the world must be an autodidact.

Rose: And it’s true, a lot of the top programmers at these companies are self taught sort of. But does that necessarily mean that that’s what we should all be striving for? I’m not so sure. Besides, even though many of those people taught themselves the programming skills they need for their jobs, they still benefitted immensely from the schools they went to. Yes, you learn things at Harvard and Yale and Princeton, but along with that you get a very, very powerful network of people and connections that get you in the door at powerful companies. Google’s hiring team is notorious for rejecting applications outright from anyone who doesn’t have an Ivy-league degree. Learning oneline, those connections, that degree, that all goes away. You’re no longer a prime candidate at Google. That doesn’t mean you won’t get a job, but the leg up you got from simply having an Ivy League degree, that goes away.

But here’s the thing, when people talk about replacing teachers with algorithms and computers, they’re not talking about replacing the teachers in extremely expensive prep schools or at Harvard or Princeton. They’re talking about replacing the teachers in Africa (the whole continent, usually, a lot of these sites don’t tend to get more specific than that) or in slums in India.

Jade: Yes there’s yes there’s a very long history of education as a tool to advance sort of the missionary objectives of colonization. So a lot of the language if you actually look at it will also sort of map onto the way that people talked about how Christianity was the way to save people as they were sort of expanding colonial empires. So if we educate them there will be less multiple births with this much less poverty it’s going to fix everything. They will enter the world like the rest of us.

Rose: Rich kids aren’t going to be taught by artificial intelligence. It’s going to be the poor kids, the kids who are already left behind, the kids who nobody thinks are worth it.

Audrey: This is this is the future that I fear that this is one of the futures that I fear in which actually being in contact with caring skilled human teacher will become the privilege of the rich. Right so the poor will get computers they’ll get flash cards hopefully you know hopefully the computers will work. Hopefully they’ll have internet access and that that having having attention from a human having a powerful relationship with a teacher will increasingly become something for the privileged.

Rose: This is actually a fear that even the people at education tech companies talked about. Here’s Julia from Coursera.

Julia: I mean I think that that would be the most scary part about it is that essentially you know people’s demographics that will determine to their will determine their will determine who learned what they needed to learn so you would have families who are really who really knew how to play the game and really knew what you needed to do and they would support their students going through it and families who are you know have different circumstances wouldn’t have that

Rose: It’s kind of this weird circle. Educational technology can, according to its proponents, democratize education — make it better and cheaper and more accessible to everybody. But if you democratize it too far, you wind up making everything even less fair than it was before.

KIDS SURPRISE

Robot: Is that your final answer? Or would you like to review your choices?

[[MUSIC UP]]

That’s all for this week’s future. What do you think? Would you let your kid be taught by a robot? Do you think

There’s so much we didn’t get into this episode, so if you go online to flashforwardpod.com you’ll see more on education and technology and robots and all that good stuff that I couldn’t squeeze in here.

Flash Forward is produced by me, Rose Eveleth, and is part of the Boing Boing podcast family. The intro music is by Asura and the outtro music is by Broke for Free. Special thanks this week to listeners who sent in their kiddos for the intro — that’s Ari, David, Kevin, Sharon, Beth, Kim and Nav. The episode art is by Matt Lubchansky.

If you want to suggest a future we should take on, send us a note on Twitter, Facebook or by email at info@flashforwardpod.com. I love hearing your ideas! And if you think you’ve spotted one of the little references I’ve hidden in the episode, email us there too. If you’re right, I’ll send you something cool. Oh and one the survey some of you have asked what these references are that I’m talking about! If you go to flashforwardpod.com/references you’ll see a list of past hidden gems from season one so you can see what you should be looking for.

And if you want to support the show, there are a few ways you can do that too! We have a Patreon page, where you can donate to the show. But if that’s not in the cards for you, you can head to iTunes and leave us a nice review or just tell your friends about us. Those things really do help.

That’s all for this future, come back next week and we’ll travel to a new one.


          Comment on Why I’m Learning Data Science by Does Automation Kill the DBA "Store" | SQL Server Consulting        
[…] passionate about helping clients solve their business problems with SQL Server.  While some think machines are close to automating away your job as a DBA, I would hate it if new DBAs or experienced DBAs see written content like this and assume they need […]
          Comment on Why I’m Learning Data Science by codegumbo » The DBA is dead; long live the DBA!        
[…] Thomas LaRock – Why I’m Learning Data Science: […]
          Big Data Analytics: From Sometime Later to Real-Time        

In today’s Financial Industry data scientists can use big data analytics to extract useful insights for traders.  Bigger data produces better visibility, but generating valuable takeaways from big data sets has so far been something that takes a long time to accomplish.  Answers come in days or weeks, not the seconds or microseconds that matter

  Read More

The post Big Data Analytics: From Sometime Later to Real-Time appeared first on XIO Technologies.


          Big Data & Analytics Innovation Summit        
Start Date: Wed, 06 Sep 2017
End Date: Thu, 07 Sep 2017
City: #
Description:

The Big Data & Analytics Innovation Summit is one of the major gatherings of senior business executives leading Big Data initiatives.

Topics to be covered include:

  • Data Analytics
  • Big Data Strategy
  • Data Science
  • IoTs & Smart Cities


          Big Data Innovation Summit        
Start Date: Thu, 07 Sep 2017
End Date: Fri, 08 Sep 2017
City: #
Description:

The Big Data Innovation Summit is the largest gathering of Fortune 500 business executives leading Big Data initiatives.

Big Data Innovation will include ground-breaking keynote presentations and will blossom into both technical & business focused tracks throughout the day with interactive workshops, breakouts & more.

Key Exploration Areas Include:

  • Data Science
  • Machine Learning
  • Product Analytics
  • Data Governance & Security
  • Marketing Intelligence
  • Customer Analysis
  • Driving the Data Culture
  • The Chief Data Officer Role
  • Data Democratization
  • Pricing Analytics
  • Big Data Strategy
  • & somuch more across 40 sessions.

Check out some testimonials from last year.




          Big Data & Analytics Innovation Summit        
Start Date: Thu, 21 Sep 2017
End Date: Fri, 22 Sep 2017
City: #
Description:

The Big Data & Analytics Innovation Summit returns to Sydney as the largest gathering of senior business executives leading Big Data initiatives in Australasia.

The summit's key purpose is to join business leaders and innovators from across industry, to share their big data initiatives, and provide a platform to learn, share and come away with the tools, techniques, and knowledge to take back and implement in your organization. 

Topics Areas include:

  • Big Data & Analytics in Business
  • How to Embrace Data Science
  • Adopting Cloud Solutions in Your Enterprise
  • Advanced Analytics Offering Insight


          Big Data & Analytics Innovation Summit        
Start Date: Wed, 22 Nov 2017
End Date: Thu, 23 Nov 2017
City: #
Description:

The Big Data & Analytics Innovation Summit is one of the largest gatherings of senior business executives leading Big Data initiatives in Asia Pacific. This is the 5th year we are back in Beijing!

Simultaneous translation (Chinese/English) is available at this summit. æœ¬æ¬¡ä¼šè®®æä¾›ä¸­è‹±åŒå£°ä¼ è¯‘服务

Topics including but not limited to:

  • Data Science - Machine Learning, Artificial Intelligence 
  • Data Governance - Business Intelligence, Data Security
  • Predictive Modelling - Marketing, Consumer Intelligence 
  • Data Strategy - Data Culture, Data Products
  • Big Data challenges in 2018
  • Data Technologies 
  • FinTech


          Big Data & Analytics for Marketing Summit        
Start Date: Mon, 11 Dec 2017
End Date: Tue, 12 Dec 2017
City: #
Description:

Big Data has already proven to help marketers reach and engage with consumers in new ways. With so much consumer information and prospect data now available, organizations embracing data analytics and metrics are seeing improvements in the performance of their campaigns. 

Topics covered include:

  • Using Data to Drive Consumer Engagement
  • Organizing for Customer Data Management
  • User Level Web Analytics Across Multiple Platforms
  • Cognitive Science to Drive Customer Targeting
  • Big Data & Marketing Modelling
  • Data Science for Customer Lifecycle Management
  • Mobile Marketing Analytics
  • Consumer Behaviour Tracking

Click here to register today



          Big Data & Analytics Innovation Summit        
Start Date: Wed, 07 Feb 2018
End Date: Thu, 08 Feb 2018
City: #
Description:

Big Data & Analytics Innovation will bring you right up to speed to assist you with your every need covering an array of topics, themes and problem points.

Topics Included:

  • Data Analytics
  • Data Science
  • Advanced Analytics
  • Predictive Analytics
  • Machine Learning & Algorithms
  • Cloud Computing
  • & much more...


          Big Data & Analytics Innovation Summit        
Start Date: Wed, 07 Mar 2018
End Date: Thu, 08 Mar 2018
City: #
Description:

This gathering for 250+ big data pioneer will bring you 30+ keynote speeches across industries, providing the latest big data practices as well as cutting edge knowledge. 

The summit will have four tracks to focus on, including:

  • Big Data Analytics
  • Data Science & Machine Learning
  • Big Data Strategy
  • Smart Cities

Click here to get more information of the tracks. 

If you are not sure about what to expect, please feel free to contact Ryan at Ryuan@theiegroup.com for a free past presentation to taste the summit.



          Big Data Innovation Summit        
Start Date: Thu, 29 Mar 2018
End Date: Fri, 30 Mar 2018
City: #
Description:

The Big Data Innovation Summit London 2017 schedule will bring together executives from the data community for two days of keynotes, panel sessions, discussions & networking.

This summit will cover all areas of the big data journey, including:

  • Data Analytics Case Studies
  • Data Science: The extraction of knowledge from data
  • Cultural Transformation: Driving the use of data
  • Hadoop: Getting value from unstructured data
  • Advanced Analytics: Solutions to predict future trends
  • Customer Insights: Getting the most from your customer data
  • Data Mining: Identifying behaviour
  • & so much more...

For the latest information please contact Roy Asterley.



          Gaming Analytics Summit        
Start Date: Thu, 26 Apr 2018
End Date: Fri, 27 Apr 2018
City: #
Description:

Bringing together leaders and innovators for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion. 

Topics at this year's summit focus on how the industry is embracing data science to drive success in areas including:

- In-game analytics

- Player acquisition

- Retention

- Customer Insight



          Big Data & Analytics for Banking Summit        
Start Date: Wed, 11 Jul 2018
End Date: Thu, 12 Jul 2018
City: #
Description:

With greater constraints and challenges facing the banking industry every day, hear how forward-thinking organizations are driving success in an immensely competitive market. 

Key topics covered at the summit include:

  • Customer Analytics
  • Fraud Analytics
  • Data Science in Banking
  • Risk Modelling & Reporting
  • Data Science
  • Text Analytics
  • Data Governance & Security
  • & much more

We are currently accepting speaker submissions for the 2017 event - if you have something to share you can submit a speaker submission here.



          Big Data & Analytics for Banking Summit        
Start Date: Tue, 11 Jul 2017
End Date: Tue, 11 Jul 2017
City: #
Description:

With greater constraints and challenges facing the banking industry every day, hear how forward-thinking organizations are driving success in an immensely competitive market. 

Key topics covered at the summit include:

  • Customer Analytics
  • Fraud Analytics
  • Data Science in Banking
  • Risk Modelling & Reporting
  • Data Science
  • Text Analytics
  • Data Governance & Security
  • & much more

We are currently accepting speaker submissions for the 2017 event - if you have something to share you can submit a speaker submission here.



          Big Data & Analytics for Marketing Summit        
Start Date: Mon, 12 Jun 2017
End Date: Tue, 13 Jun 2017
City: #
Description:

Big Data has already proven to help marketers reach and engage with consumers in new ways. With so much consumer information and prospect data now available, organizations embracing data analytics and metrics are seeing improvements in the performance of their campaigns. 

Topics covered include:

  • Using Data to Drive Consumer Engagement
  • Organizing for Customer Data Management
  • User Level Web Analytics Across Multiple Platforms
  • Cognitive Science to Drive Customer Targeting
  • Big Data & Marketing Modelling
  • Data Science for Customer Lifecycle Management
  • Mobile Marketing Analytics
  • Consumer Behaviour Tracking

Click here to register today



          Gaming Analytics Summit        
Start Date: Wed, 26 Apr 2017
End Date: Thu, 27 Apr 2017
City: #
Description:

Bringing together leaders and innovators for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion. 

Topics at this year's summit focus on how the industry is embracing data science to drive success in areas including:

- In-game analytics

- Player acquisition

- Retention

- Customer Insight



          Big Data Innovation Summit        
Start Date: Thu, 30 Mar 2017
End Date: Fri, 31 Mar 2017
City: #
Description:

The Big Data Innovation Summit London 2017 schedule will bring together executives from the data community for two days of keynotes, panel sessions, discussions & networking.

This summit will cover all areas of the big data journey, including:

  • Data Analytics Case Studies
  • Data Science: The extraction of knowledge from data
  • Cultural Transformation: Driving the use of data
  • Hadoop: Getting value from unstructured data
  • Advanced Analytics: Solutions to predict future trends
  • Customer Insights: Getting the most from your customer data
  • Data Mining: Identifying behaviour
  • & so much more...

For the latest information please contact Roy Asterley.



          Big Data & Analytics Innovation Summit        
Start Date: Wed, 01 Mar 2017
End Date: Thu, 02 Mar 2017
City: #
Description:

This gathering for 250+ big data pioneer will bring you 30+ keynote speeches across industries, providing the latest big data practices as well as cutting edge knowledge. 

The summit will have four tracks to focus on, including:

  • Big Data Analytics
  • Data Science & Machine Learning
  • Big Data Strategy
  • Smart Cities

Click here to get more information of the tracks. 

If you are not sure about what to expect, please feel free to contact Ryan at Ryuan@theiegroup.com for a free past presentation to taste the summit.



          Big Data & Analytics Innovation Summit        
Start Date: Wed, 08 Feb 2017
End Date: Thu, 09 Feb 2017
City: #
Description:

Big Data & Analytics Innovation will bring you right up to speed to assist you with your every need covering an array of topics, themes and problem points.

Topics Included:

  • Data Analytics
  • Data Science
  • Advanced Analytics
  • Predictive Analytics
  • Machine Learning & Algorithms
  • Cloud Computing
  • & much more...

The full agenda is out - take a look here!



          Big Data & Analytics Innovation Summit        
Start Date: Wed, 14 Sep 2016
End Date: Thu, 15 Sep 2016
City: #
Description:

The Big Data & Analytics Innovation Summit returns to Sydney as the largest gathering of senior business executives leading Big Data initiatives in Australasia.

The summit's key purpose is to join business leaders and innovators from across industry, to share their big data initiatives, and provide a platform to learn, share and come away with the tools, techniques, and knowledge to take back and implement in your organization. 

Topics Areas include:

  • Big Data & Analytics in Business
  • How to Embrace Data Science
  • Adopting Cloud Solutions in Your Enterprise
  • Advanced Analytics Offering Insight


          Big Data Innovation Summit        
Start Date: Thu, 08 Sep 2016
End Date: Fri, 09 Sep 2016
City: #
Description:

The Big Data Innovation Summit is the largest gathering of Fortune 500 business executives leading Big Data initiatives.

This year we have a variety of tracks and other sessions covering every aspect of the Big Data journey, including:




          Big Data & Analytics Innovation Summit        
Start Date: Tue, 06 Sep 2016
End Date: Wed, 07 Sep 2016
City: #
Description:

The Big Data & Analytics Innovation Summit is one of the major gatherings of senior business executives leading Big Data initiatives.

Topics to be covered include:

  • Data Analytics
  • Big Data Strategy
  • Data Science
  • IoTs & Smart Cities


          Big Data & Analytics for Banking Summit        
Start Date: Tue, 12 Jul 2016
End Date: Wed, 13 Jul 2016
City: #
Description:

With greater constraints and challenges facing the banking industry every day, hear how forward-thinking organizations are driving success in an immensely competitive market. 

Key topics covered at the summit include:

  • Customer Analytics
  • Fraud Analytics
  • Data Science in Banking
  • Risk Modelling & Reporting
  • Data Science
  • Text Analytics
  • Data Governance & Security
  • & much more

We are currently accepting speaker submissions for the 2016 event - if you have something to share you can submit a speaker submission here.



          Chief Data Officer Summit        
Start Date: Wed, 25 May 2016
End Date: Fri, 26 Aug 2016
City: #
Description:

The current explosion of data coupled with its role informing critical business decisions is driving the need for chief data officers. Boost revenue generation through more effective use of enterprise data insight at every level.

Topics covered will include:

  • Developing an Analytic Strategy
  • Using 'Open Data' to Drive Innovation
  • Organisational Structure of a Data Team
  • Hiring the Right People
  • The Data Science Transformation of Business


          Big Data Innovation Summit        
Start Date: Tue, 10 May 2016
End Date: Wed, 11 May 2016
City: #
Description:

The Big Data Innovation Summit London 2016 schedule will bring together executives from the data community for two days of keynotes, panel sessions, discussions & networking.

Big Data Innovation will cover but not limited to:

  • Big Data Case Studies: How our speaker organisations are dealing with their data
  • Data Science: The extraction of knowledge from data
  • Hadoop: Getting value from unstructured data
  • Advanced Analytics: Solutions to predict future events
  • Customer Insights: Getting the most from your customer data
  • Data Mining: Identifying behaviour
  • & so much more...

For the latest information please contact Jordan Charalampous.



          Gaming Analytics Summit        
Start Date: Wed, 27 Apr 2016
End Date: Thu, 28 Apr 2016
City: #
Description:

Bringing together leaders and innovators for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion. 

Topics at this year's summit focus on how the industry is embracing data science to drive success in areas including:

- In-game analytics

- Player acquisition

- Retention

- Customer Insight



          Big Data Innovation Summit        
Start Date: Thu, 21 Apr 2016
End Date: Fri, 22 Apr 2016
City: #
Description:

Big Data Innovation will include ground-breaking keynote presentations and then will break into both technical & business focused tracks throughout the day with interactive workshops, breakouts & more.

Key Exploration Areas Include:

  • Data Science
  • Machine Learning
  • Product Analytics
  • Data Governance & Security
  • Marketing Intelligence
  • Customer Analysis
  • Driving the Data Culture
  • The Chief Data Officer Role
  • Data Democratization
  • Pricing Analytics
  • Big Data Strategy
  • & so much more across 80 sessions.

CLICK HERE TO VIEW THE SCHEDULE



          Big Data & Analytics Innovation Summit        
Start Date: Tue, 12 Apr 2016
End Date: Wed, 13 Apr 2016
City: #
Description:

The summit brings together business leaders and innovators from the industry for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion. 

Make sure to check back regularly for schedule additions and changes. Click the box on your right to view the full agenda.

Topics covered include:

  • Data Analytics Use Cases from Industry Speakers
  • Capitalizing on the Power of Big Data
  • Big Data and Social Media
  • Logical Data Warehousing
  • Data Science
  • Real Time Analytics
  • Consumer Analytics


          Big Data & Analytics Innovation Summit        
Start Date: Wed, 02 Mar 2016
End Date: Thu, 03 Mar 2016
City: #
Description:

The Big Data & Analytics Innovation Summit provides 25+ industry case studies and over 20 hours of networking opportunities across 2 days. Make sure to check back regularly for schedule additions and changes.

Topics covered include:

  • Data Analytics Use Cases from Industry Speakers
  • Big Data & Marketing
  • Data Governance & Data Security
  • Big Data & Social Media
  • Data-Driven Decision Making
  • Logical Data Warehousing
  • Data Science
  • Real Time Analytics
  • Consumer Analytics


          After all, it might not matter - A commentary on the status of .NET        

Did you know what was the most menacing nightmare for a peasant soldier in Medieval wars? Approaching of a knight.

Approaching of a knight - a peasant soldier's nightmare [image source]

Famous for gallantry and bravery, armed to the teeth and having many years of training and battle experience, knights were the ultimate war machine for the better part of Medieval times. The likelihood of survival for a peasant soldier in an encounter with a knight was very small. They should somehow deflect or evade the attack of the knight’s sword or lance meanwhile wielding a heavy sword bring about the injury exactly at the right time when the knight passes. Not many peasant had the right training or prowess to do so.


Appearing around 1000 AD, the dominance of knights started following the conquest of William of Normandy in 11th century and reached it heights in 14th century:
“When the 14th century began, knights were as convinced as they had always been that they were the topmost warriors in the world, that they were invincible against other soldiers and were destined to remain so forever… To battle and win renown against other knights was regarded as the supreme knightly occupation” [Knights and the Age of Chivalry,1974]
And then something happened. Something that changed the military combat for the centuries to come: the projectile weapons.
“During the fifteenth century the knight was more and more often confronted by disciplined and better equipped professional soldiers who were armed with a variety of weapons capable of piercing and crushing the best products of the armourer’s workshop: the Swiss with their halberds, the English with their bills and long-bows, the French with their glaives and the Flemings with their hand guns” [Arms and Armor of the Medieval Knight: An Illustrated History of Weaponry in the Middle Ages, 1988]
The development of longsword had provided more effectiveness for the knight attack but there was no degree of training or improved plate armour could stop the rise of the projectile weapons:
“Armorers could certainly have made the breastplates thick enough to withstand arrows and bolts from longbows and crossbows, but the knights could not have carried such a weight around all day in the summer time without dying of heat stroke.”
And the final blow was the handguns:
“The use of hand guns provided the final factor in the inevitable process which would render armor obsolete” [Arms and Armor of the Medieval Knight: An Illustrated History of Weaponry in the Middle Ages, 1988]
And with the advent of arbalests, importance of lifelong training disappeared since “an inexperienced arbalestier could use one to kill a knight who had a lifetime of training”

Projectile weapons [image source]

Over the course of the century, knighthood gradually disappeared from the face of the earth.

A paradigm shift. A disruption.

*       *       *

After the big promise of web 1.0 was not delivered resulting in the .com crash of 2000-2001, development of robust RPC technologies combined with better languages and tooling gradually rose to fulfill the same promise in web 2.0. On the enterprise front, the need for reducing cost by automating business process lead to the growth of IT departments in virtually any company that could have a chance to survive in the 2000s decade.

In the small-to-medium enterprises, the solutions almost invariably involved some form of a database in the backend, storing CRUD operations performed on data entry forms. The need for reporting on those databases resulted in creating business Intelligence functions employing more and more SQL experts.

With the rise of e-Commerce, there was a need for most companies to have online presence and and ability to offer some form of shopping experience online. On the other hand, to reduce cost of postage and paper, companies started having account management online.

Whether SOA or not, these systems functioned pretty well for the limited functionality they were offering. The important skills the developers of these systems needed to have was good command of the language used, object-oriented coding design principles (e.g. SOLID, etc), TDD and also knowledge of the agile principles and process. In terms of scalability and performance, these systems were rarely, if ever, pressed hard enough to break - even with sticky sessions could work as long as you had enough number of servers (it was often said “we are not Google or Facebook”). Obviously availability suffered but downtime was something businesses had used to and it was accepted as the general failure of IT.

True, some of these systems were actually “lifted and shifted” to the cloud, but in reality not much had changed from the naive solutions of the early 2000s. And I call these systems The Simpleton Swamps.

Did you see what was lacking in all of above? Distributed Computing.

*       *       *

It is a fair question that we need to ask ourselves: what was it that we, as the .NET community, were doing during the last 10 years of innovations? The first wave of innovations was the introduction of revolutionary papers of on BigTable and Dynamo Which later resulted in the emergence of NoSQL movement with Apache Cassandra, Riak and Redis (and later Elasticsearch). [During this time I guess we were busy with WPF and Silverlight. Where are they now?]

The second wave was the Big Data revolution with Apache Hadoop ecosystem (HDFS, Pig, Hive, Mahout, Flume, HBase). [I guess we were doing Windows Phone development building Metro UI back then. Where are they now?]

The third wave started with Kafka (and streaming solutions that followed), Grid Computing platforms with YARN and Mesos and also the extended Big Data family such as Spark, Storm, Impala, Drill, too many to name. In the meantime, Machine Learning became mainstream and the success of Deep Learning brought yet another dimension to the industry. [I guess we were rebuilding our web stack with Katana project. Where is it now?]

And finally we have the Docker family and extended Grid Computing (registry, discovery and orchestration) software such as DCOS, Kubernetes, Marathon, Consul, etcd… Also the logging/monitoring stacks such as Kibana, Grafana, InfluxDB, etc which had started along the way as an essential ingredient of any such serious venture. The point is neither the creators nor the consumers of these frameworks could do any of this without in-depth knowledge of Distributed Computing. These platforms are not built to shield you from it, but to merely empower you to make the right decisions without having to implement a consensus algorithm from scratch or dealing with the subtleties of building a gossip protocol.


And what was it that we have been doing recently? Well I guess we were rebuilding our stacks again with the #vNext aka #DNX aka #aspnetcore. Where are they now? Well actually a release is coming soon: 27th of June to be exact. But anyone who has been following the events closely knows that due to recent changes in direction, we are still - give or take - 9 to18 months far from a stable platform that can be built upon.

So a big storm of paradigm shifts swept the whole industry and we have been still tinkering with our simpleton swamps. Please just have a look at this big list, only a single one of them is C#: Greg Young’s EventStore. And by looking at the list you see the same pattern, same shifts in focus.

.NET ecosystem is dangerously oblivious to distributed computing. True we have recent exceptions such as Akka.net (a JVM port) or Orleans but it has not really penetrated and infused the ecosystem. If all we want to do is to simply build the front-end APIs (akin to nodejs) or cross-platform native apps (using Xamarin studio) is not a problem. But if we are not supposed to build the sizeable chunk of backend services, let’s make it clear here.

*       *       *

Actually there is fair amount of distributed computing happening in .NET. Over the last 7 years Microsoft has built significant numbers of services that are out to compete with the big list mentioned above: Azure Table Storage (arguably a BigTable implementation), Azure Blob Storage (Amazon Dynamo?) and EventHub (rubbing shoulder with Kafka). Also highly-available RDBM database (SQL Azure), Message Broker (Azure Service Bus) and a consensus implementation (Service Fabric). There is a plenty of Machine Learning as well, and although slowly, Microsoft is picking up on Grid Computing - alliance with Mesosphere and DCOS offering on Azure.

But none of these have been open sourced. True, Amazon does not Open Source its bread-and-butter cloud. But considering AWS has mainly been an IaaS offering while Azure is banking on its PaaS capabilities, making Distributed Computing easy for its predominantly .NET consumers. It feels that Microsoft is saying, you know, let me deal with the really hard stuff, but for sure, I will leave a button in Visual Studio so you could deploy it to Azure.


At points it feels as if, Microsoft as the Lords of the .NET stack fiefdom, having discovered gunpowder, are charging us knights and peasant soldiers to attack with our lances, axes and swords while keeping the gunpowder weapons and its science safely locked for the protection of the castle. .NET community is to a degree contributing to the #dotnetcore while also waiting for the Silver Bullet that #dotnetcore has been promised to be, revolutionising and disrupting the entire stack. But ask yourself, when was the last time that better abstractions and tooling brought about disruption? The knight is dead, gunpowder has changed the horizon yet there seems to be no ears to hear.

Fiefdom of .NET stack
We cannot fault any business entity for keeping its trade secrets. But if the soldiers fall, ultimately the castle will fall too.

In fact, a single company is not able to pull the weight of re-inventing the emerging innovations. While the quantity of technologies emerged from Azure is astounding, quality has not always followed exactly. After complaining to Microsoft on the performance of Azure Table Storage, others finding it too and sometimes abandon the Azure ship completely.


No single company is big enough to do it all by itself. Not even Microsoft.

*       *       *

I remember when we used to make fun of Java and Java developers (uninspiring, slow, eclipse was nightmare). They actually built most of the innovations of the last decade, from Hadoop to Elasticsearch to Storm to Kafka... In fact, looking at the top 100 Java repositories on github (minus Android Java), you find 24 distributed computing projects, 4 machine library repos and 2 languages. On C# you get only 3 with claims to distributed computing: ServiceStack, Orleans and Akka.NET.

But maybe it is fine, we have our jobs and we focus on solving different kinds of problems? Errrm... let's look at some data.

Market share of IIS web server has been halved over the last 6 years - according multiple independent sources [This source confirms the share was >20% in 2010].

IIS share of the market has almost halved in the last 6 years [source]

Now the market share of C# ASP.NET developers are decreasing to half too from tops of 4%:

Job trend for C# ASP.NET developer [source]
And if you do not believe that, see another comparison with other stacks from another source:

Comparing trend of C# (dark blue) and ASP.NET (red) jobs with that of Python (yellow), Scala (green) and nodejs (blue). C# and ASP.NET dropping while the rest growing [source]

OK, that was actually nothing, what I care more is OSS. Open Source revolution in .NET which had a steady growing pace since 2008-2009, almost reached a peak in 2012 with ASP.NET Web API excitement and then grew with a slower pace (almost plateau, visible on 4M chart - see appendix). [by the way, I have had my share of these repos. 7 of those are mine]

OSS C# project creation in Github over the last 6 years (10 stars or more). Growth slowed since 2012 and there is a marked drop after March 2015 probably due to "vNext". [Source of the data: Github]

What is worse is that the data showing with the announcement of #vNext aka #DNX aka #dotnetcore there was a sharp decline in the new OSS C# projects - the community is in a limbo situation waiting for the release - people find it pointless to create OSS projects on the current platform and the future platform is so much in flux which is not stable enough for innovation. With the recent changes announced, practically it will take another 12-18 months for it to stabilise (some might argue 6-12 months, fair enough, take what you like). For me this is the most alarming of all.

So all is lost?

All is never lost. You still find good COBOL or FoxPro developers and since it is a niche market, they are usually paid very well. But the danger is losing relevance…

Practically can Microsoft pull it off? Perhaps. I do not believe it is hopeless, I feel a radical change by taking the steps below, Microsoft could materially reverse the decay:
  1. Your best community brains in the Distributed Computing and Machine Learning are in the F# community, they have already built many OSS projects on both - sadly remaining obscure and used by only few. Support and promote F# not just as a first class language but as THE preferred language of .NET stack (and by the way, wherever I said .NET stack, I meant C# and VB). Ask everyone to gradually move. I don’t know why you have not done it. I think someone somewhere in Redmond does not like it and he/she is your biggest enemy.
  2. Open Source good part of distributed services of Azure. Let the community help you to improve it. Believe me, you are behind the state of the art, frankly no one will look to copy it. Someone will copy from Azure Table Storage and not Cassandra?!
  3. Stop promoting deployment to Azure from Visual Studio with a click of a button making Distributed Computing looking trivial. Tell them the truth, tell them it is hard, tell them so few do succeed hence they need to go back and study, and forever forget about one-button click stuff. You are not doing a favour to them nor to yourself. No one should be acknowledged to deploy anything in distributed fashion without sound knowledge of Distributed Computing. 

Last word

So when I am asked about whether I am optimistic about the future of .NET or on the progress of dotnetcore, I usually keep silent: we seem to be missing the point on where we need to go with .NET - a paradigm shift has been ignored by our ecosystem. True dotnetcore will be released on 27th but  after all, it might not matter as much as we so much care about. One of the reasons we are losing to other stacks is that we are losing our relevance. We do not have all the time in the world. Time is short...

Appendix

Github Data

Gathering the data from github is possible but due to search results being limited to 1000to rate-limiting, it takes a while to process. The best approach I found was to list repos by update date and keep moving up. I used a python script to gather the data.

It is sensible to use the number of stars as the bar for the quality and importance of Github projects. But choosing the threshold is not easy and also there is usually a lag between creation of a project and it to gain popularity. That is why the threshold has been chosen very low. But if you think the drop in creation of C# projects in Github was due to this lag, think again. Here is the chart of all C# projects regardless of their stars (0 stars and more):


All C# projects in github (0 stars and more) - marked drop in early 2015 and beyond

F# showing healthy growth but the number of projects and stars are much less than that of C#. Hence here we look at the projects with 3 stars and more:


OSS F# projects in Github - 3 stars or more
Projects with 0 stars and more (possible showing people starting picking up and playing with it) is looking very healthy:


All F# projects regardless of stars - steady rise.


Data is available for download: C# here and F# here

My previous predictions

This is actually my second post of this nature. I wrote one 2.5 years ago, raising alarm bells for the lack of innovation in .NET and predicting 4 things that would happen in 5 years (2.5 years from now):
  1. All Data problems will be Big Data problems
  2. Any server-side code that cannot be horizontally scaled is gonna die
  3. Data locality will still be an issue so technologies closer to data will prevail
  4. We need 10x or 100x more data scientists and AI specialists
Judge for yourself...


Deleted section

For the sake of brevity, I had to delete this section but this puts in context how we have many more hyperscale companies:

"In the 2000s, not many had the problem of scale. We had Google, Yahoo and Amazon, and later Facebook and Twitter. These companies had to solve serious computing problems in terms of scalability and availability that on one hand lead to the Big Data innovations and on the other hand made Grid Computing more accessible.

By commoditising the hardware, the Cloud computing allowed companies to experiment with the scale problems and innovate for achieving high availability. The results have been completely re-platformed enterprises (such as Netflix) and emergence of a new breed of hyperscale startups such as LinkedIn, Spotify, Airbnb, Uber, Gilt and Etsy. Rise of companies building software to solve problems related to these architectures such as Hashicorp, Docker, Mesosphere, etc has added another dimension to all this.

And last but not least, is the importance of close relationship between academia and the industry which seems to be happening after a long (and sad) hiatus. This has lead many academy lecturers acting as Chief Scientists, etc to influence the science underlying the disruptive changes.

There was a paradigm shift here. Did you see it?"


          Predictive Analytics Innovation Summit        
Start Date: Thu, 18 Feb 2016
End Date: Fri, 19 Feb 2016
City: #
Description:

The Predictive Analytics Innovation Summit is the largest gathering of Fortune 500 business executives leading initiatives in Data, Marketing & Customer Analytics, as well as Machine Learning, Personalization and Advanced Data Science.

We are currently accepting speaker submissions for the 2016 event - if you have something to share you can submit a speaker submission here.

This year we have a number of exciting tracks, panel sessions and interactive workshops all happening over the two days. Topics covered will include:

  • Structuring Data to Improve Predictive Models
  • Better Consumer Experiences with Applied Machine Learning 
  • Evolving Predictive Analytics from Score to Story 
  • Models for the Performance of Big Data Systems 

Purchase an Access All Areas Pass to also gain access to the co-located Apache Hadoop Innovation Summit and Data Science Innovation Summit.



          Data Science Innovation Summit         
Start Date: Thu, 18 Feb 2016
End Date: Fri, 19 Feb 2016
City: #
Description:

The Data Science Innovation Summit aims to explore the interdisciplinary field within. Organizations are now well equipped to extract knowledge and/or insights from data both structured and unstructured. Our agenda will guide you through the process, learning from experts working day in, day out with data.

Topic Areas Covered Include: 

  • Data Science Innovation
  • Extracting Knowledge from Data
  • Insights within Data
  • Structured & Unstructured Data
  • Statistics
  • Data Mining
  • Predictive Analytics
  • Knowledge Discovery 
  • & more...

Click Here to Register



          Big Data & Analytics Innovation Summit        
Start Date: Wed, 10 Feb 2016
End Date: Thu, 11 Feb 2016
City: #
Description:

Big Data & Analytics Innovation will bring you right up to speed to assist you with your every need covering an array of topics, themes and problem points.

Topics to Include:

  • Data Analytics
  • Data Science
  • Advanced Analytics
  • Predictive Analytics
  • Machine Learning & Algorithms
  • Cloud Computing
  • & much more...


          Big Data Innovation Summit        
Start Date: Thu, 28 Jan 2016
End Date: Fri, 29 Jan 2016
City: #
Description:

As organizations evolve and embrace technological advances, data becomes a key currency on which they can hope to gain an advantage over competitors and push business success. The agenda will explore all topics and themes on how we can better embrace data and push it to the limits.

Topics to include: 

  • Data Analytics
  • Data Science
  • Emerging Algorithms
  • Machine Learning
  • Big Data Technologies
  • Data Security
  • Data in the Cloud
  • & much more...


          Big Data & Analytics for Banking Summit        
Start Date: Tue, 01 Dec 2015
End Date: Wed, 02 Dec 2015
City: #
Description:

With greater constraints and challenges facing the banking industry every day, hear how forward-thinking organizations are driving success in an immensely competitive market. 

Key topics covered at the summit include:

  • Customer Analytics
  • Fraud Analytics
  • Data Science in Banking
  • Risk Modelling & Reporting
  • Data Science
  • Text Analytics
  • Data Governance & Security
  • & much more

The schedule is now online - click on the right to take a look!



          Future of Programming - Rise of the Scientific Programmer (and fall of the craftsman)        
Level [C3]

[Disclaimer: I am by no means a Scientific Programmer but I am striving to become one] It is the turn of yet another year and the time is ripe for the last year reviews, predictions for the new year and its resolutions. Last year I made some bold statements and made some radical decisions to start transitioning. I picked up a Mac, learnt some Python and Bash and a year on, I think it was good and really enjoyed it. Still (as I predicted), I spent most of my time writing C#. [working on a Reactive Cloud Actor micro-Framework, in case for any reason it interests you]. Now a year on, Microsoft is a different company: new CEO, moving towards Open Source and embracing non-Windows operating systems. So how it is going to shift the innovation imbalance is a wait-and-see. But anyway, that was last year and is behind us.

Now let's talk about 2015. And perhaps programming in general. Are you sick of hearing Big Data buzzwords? Do you believe Data Science is a pile of mumbo jumbo to bamboozle us and actually used by a teeny tiny number of companies, and producing value even less? IoT is just another hype? I hope by reading the below, I would have been able to answer you. Sorry, no TL;DR

*     *     *

It was a warm, sunny and all around really nice day in June. The year is 2007 and I am on a University day trip (and punting) to Cambridge along with my classmates many of whom are at least 15 years younger than me. Punting is fun but as a part time student this is one of the few times I have a leisurely access to our Image Processing lecturer - a bright and young guy - again younger than me. And I open the discussion with how we have not moved much since the 80s in the field of Artificial Intelligence. We improve and optimise algorithms but there is no game-changing giant leap. And he argues the state of the art usually improves little by little.


"Day out punting in cambridge"

Next year, we work on a project involving some machine learning to recognise road markings. I spend a lot of time on feature extraction and use a 2 layer Neural Network since I get the best result out of it compared to 3. I am told not to use many layers of neurons as it usually gets stuck on a local minima during training - I actually tried and saw it. Overall the result was OK but it involved many pre- and post- processing techniques to achieve acceptable recognition.

*     *     *

I wake up and it is 2014. Many Universities, research organisations (and companies) across the world have successfully implemented Deep Learning using Deep Neural Networks - which have many layers of neurons. Watson answers all the questions in Double Jeopardy. Object Recognition from image is almost a solved case - with essentially no feature extraction.

A Deep Neural Network
Perhaps my lecturer was right: with improving training algorithms and providing many many labeled data, we suddenly have a big leap in science (or was I right?!). It seems that for the first time implementation has got ahead of the mathematics: we do not fully understand why Deep Learning works - but it works. And when they fail, we still don't know why they fail.

And guess what, industry and the academia have not been this close for a long time.

And what has all this got to do with us? Rise of the machine intelligence is going to change programming. Forever.

*     *     *

Honestly, I am sick of the amount of bickering and fanboyism that goes today in the programming world. The culture of "nah... I don't like this" or "ahhh... that is s..t" or "ah that is a killer" is what has plagued our community. One day Angular is super hot next week it is the worst thing. Be it zsh or Bash. Be in vim vs. Emacs vs. Sublime Text vs Visual Studio. Be it Ruby, Node.js, Scala, Java, C#, you name it. And same goes for technologies such as MongoDB, Redis... subjectivism instead of facts. As if we forgot we came from the line of scientists.

Like children we get attached to new toys and with the attention span of a goldfish, instead of solving real world problems, ruminate over on how we can improve our coding experience. We are ninjas and what we do no one can do. And we can do whatever we want to do.

"I have got power"

Yes, we are lucky. A 23-year old kid with a couple of years of programming experience can earn double of what a 45-year old retail manager with 20 years of experience earns annually. And what we do with that money? spend all of it on booze, specialty burgers, travelling and conferences, gadgets - basically whatever we want to.

But those who remember the first .com crash, can tell you it has not always been like this. In fact, back in 2001-2002 it was really hard to get a job. And the problem was, there were many really good candidates. IT industry became almost impenetrable since there was this catch-22 of requiring job experience to get the job experience. But anyway, the good ones, the stubborn ones and those with little talent but a lot of passion (includes me) stayed on for the good days that we have now. Reality was many programmers of the time had read "Access in 24 hours" and landed a fat salary in a big company. And on the other hand, projects were failing since we spent most of our time writing documentation. The industry had to weed out bad coders and inefficient practices.

And we have software craftsmanship movement and agile practices.

*     *     *

The opposition has already started. You might have seen discussions DHH has had with Kent Beck and Martin Fowler on TDD. I do not agree 100% with Erik Meijer says here (only 90%) but there is a lot of truth in it. We have replaced fact-based data-backed attitude with a faith-based wishy-washy peace-hug-freedom hippie agile way, forcing us mechanically to follow some steps and believe that it will be good for us. Agile has taken us a long way from where we started at the turn of the century, but there are problems. From personal experience, I see no difference in the quality of developers who do TDD and do not. And to be frank, I actually see negative effect, people who do TDD do not fully think hard about the consequence of the code they write - I know this could be inflammatory but hand on heart, that is my experience.  I think TDD and agile has given us a safety net that as a tightrope walker, instead of focusing on our walking technique, we improve the safety net. As long as we do the motions, we are safe. Unit tests, coverage, planning poker, retrospective, definition of done, Story, task, creating tickets, moving tickets. How many bad programmers have you seen that are masters of agile?

You know what? It is the mediocrity we have been against all the time. Mediocre developers who in the first .com boom got into the market by taking a class or reading a book are back in a different shape: those who know how to be opinionated, look cool, play the game and take the paycheck. We are in another .com boom now, and if there is a crash, sadly they are out - even if it includes me.


*     *     *

I think we have neglected the scientific side of our jobs. Our maths is rusty and those who did study CompSci do not remember a lot of what they read. We cannot calculate the complexity of our code and fall to the trap that machines are fast now - yes it didn't matter for a time but when you are dealing with petabytes of data and pay by processing hours? When our team first started working on recommendations, the naive implementation took 1000 node for 2 days, now the implementation uses 24 nodes for a few hours, and perhaps this is still way way too much.

"we are craftsmen and craftswomen" (from Anders Drachen)


But really, since when did our job look like a craftsman (a carpenter)? We are Ninjas? And we do code Kata to keep our skills/swords sharp. This is all gone too far into the world of fantasy. The world of warcraft. This is now a New Age full-blown religion.

What an utter rubbish.

*     *     *

Now back on earth, languages of the 90s and early 2000 are on the decline. Java, C#, C++ all on the decline. But they are being replaced by other languages such as Scala right? I leave that to you to decide based on the diagram below. 
Google trends of "Java", "Scala", "C#" and "Python Programming" (so that it does not get mixed up with Python the snake) - source: google
The only counter trend is Python. The recent rise in Python popularity is what I call "rise of the scientific programmer" - and that is just one of the signs. Python is a very popular language in the academic space. It is easy to pick up works everywhere and has some functional aspects making it terse. But that is not all: it sits on top of a huge wealth of scientific libraries and it can talk to Java and C as well. Industry innovations have started to come straight from the Universities. From the early 2000s where the academia seemed completely irrelevant to now where it leads the innovation. PySpark has come fully from the heart of Berkeley's University. Many of the contributors to Hadoop code and its wide ecosystem are in the academia.

We are now in need of people who can scientifically argue about algorithms and data (is coding anything but code+data?) and most of them could implement an algorithm given the paper or mathematical notation. And guess what, this is the trend for jobs with "Machine Learning":
Trend of jobs containing "Machine Learning" - Source: ITJobsWatch

And this is really not just Hadoop. According to the source above Machine learning jobs have had 41% rise from 2013 to 2014 while hadoop jobs had only 16%.

This Deep Learning thing is real. It is already here. All those existing algorithms need to be polished and integrated with the new concepts and some will be just replaced. If you can give interactions of a person with a site to a deep network, it can predict with a high confidence whether they are gonna buy, leave or indecisive. It can find patterns in diseases that we as humans cannot. This is what we were waiting for (and we were afraid of?). Machine intelligence is here.

The scientific Programmer [And yes, it has to know more]


Now one might say that the answer is the Data Scientists. True. But first, we don't have enough of them and second, based on first hand experience, we need people with engineering rigour to produce production ready software - something that certainly some Data Scientist have but not all. So I feel that a programmer turned Statistician can build a more robust software than the other way around. We need people who understand what it takes to build a software that you can put in front of millions of customers to use. People who understand linear scalability, SLA, monitoring and architectural constraints.

*     *     *

Horizon is shifting.

We can pick a new language (be it Go, Haskell, Julia, Rust, Elixir or Erlang) and start re-inventing the wheel and start from pretty much the same scratch again because hey, this is easy now, we have done it before and don't have to think. We can pick a new albeit cleaner abstraction and re-implement thousands of hours of hard work and sweat we and the community have suffered - since hey we can. We can rewrite the same HTTP pipeline 1000s of different ways and never be happy with what we have achieved, be it Ruby on Rails, Sinatra, Nancy, ASP.NET Web API, Flask, etc. And keep happy that we are striving for that perfection, that unicorn. We can argue about how to version APIs and how a service is such RESTful and such not RESTful. We can mull over pettiest of things such as semicolon or the gender of a pronoun and let insanely clever people leave our community. We can exchange the worst of words over "females in the industry" while we more or less are saying the same thing, Too much drama.

But soon this will be no good. Not good enough. We got to grow up and go back to school, relearn all about Maths, statistics, and generally scientific reasoning. We need to man up and re-learn that being a good coder has nothing to do with the number of stickers you have at the back of your Mac. It is all scientific - we come from a long line of scientists, we have got to live up to our heritage.

We need to go and build novelties for the second half of the decade. This is what I hope to be able to do.
          Big Data & Marketing Innovation Summit        
Start Date: Thu, 05 Nov 2015
End Date: Fri, 06 Nov 2015
City: #
Description:

Big Data has already proven to help marketers reach and engage with  consumers in new ways. With so much consumer information and prospect data now available, organizations embracing data analytics and metrics are seeing improvements in the performance of their campaigns. 

Topics covered include:

  • Driving Consumer Engagement through Big Data Analytics
  • Organizing for Customer Data Management
  • User Level Web Analytics Across Multiple Platforms
  • Cognitive Science to Drive Customer Targeting
  • Big Data & Marketing Modelling
  • Data Science for Customer Lifecycle Management
  • Mobile Marketing Analytics
  • Consumer Behaviour Tracking

Click here to Register Today



          Thank you Microsoft, nine months on ...        
Level [C1]

I felt that almost a year after my blog post Thank you Microsoft and so long, it is a right time to look back and contemplate on the decision I made back then. If you have not read the post, well in a nutshell, I decided to gradually move towards non-Microsoft technologies - mainly due to Microsoft's lack of innovation, especially in the Big Data space.

A few things have changed since then. TLDR; Generally it really feels I made the right decision to diversify. Personally, I have learnt a lot and at the same time I had a lot of fun. I have built a bridge to the other side and I can easily communicate with non-Microsoft peers and translate my skills. On the technology landscape, however, there has been some major changes that makes me feel having a hybrid skillset is much more important than a complete shift to any particular platform.

I will first look at the technology landscape as it stands now and then will share my personal journey so far in adopting alternative platforms and technologies.

A point on the predictions

OK, I made some predictions in the previous post such as "In 5 years, all data problems become Big Data problems". Some felt this is completely wrong - which could be - and left some not very nice messages. At the end of the day, predictions are free (and that is why I like them) and you could do the same. I am sharing my views, take it as it is worth for you. I have a track record of making predictions, some came true and some did not. I predicted a massive financial crash in 2011 which did not happen and lead to one of the biggest bull markets ever (well my view is they pumped money into the economy and artificially made the bull market) and I lost some money. On the other hand back in 2010 I predicted in my StackOverflow profile something that I think it is called Internet Of Things now, so I guess I was lucky (by the way, I am predicting a financial crash in the next couple of months). Anyway, take it easy :)

Technology Horizon

The New Microsoft

Since I wrote the blog post, a lot has changed, especially in Microsoft. It now has a new CEO and a radically different view on the Open Source. Releasing the source of a big chunk of the .NET Framework is harbinger of a shift whose size is difficult to guess at the moment. Is it mere a gesture? I doubt it, this adoption was the result of years of internal campaign from the likes of Phil Haack and Scott Hanselman and it has finally worked its way up the hierarchy.

But adopting Open Source is not just a community service gesture, it has an important financial significance. With the rate of change in the industry, you need to keep an army of developers to constantly work and improve products at this scale. No company is big enough on its own to build what can be built by an organic and healthy ecosystem. So crowd-sourcing is an important technique to improve your product without paying for the time spent. It is probably true that the community around your product is the real IP of most cloud platforms and not so much the actual code.

Microsoft is also relinquishing its push strategy towards its Operating System and to be honest, I am not surprised at all. Many have talked about the WebOS but reality is we have already had it for the last couple of years. Your small smartphone or tablet come to life when they are connected - enabling you to do most of what you can do on the laptop/pc with the only limitation being the screen size. On the other hand, Microsoft has released the web version of the office and to be fair it is capable of doing pretty much everything you can do in the desktop versions, and sometimes it does it better. So for the majority of consumers, all you need is the WebOS. It feels that the value of a desktop operating system becomes of less and less importance when most of the applications you use daily are web-based or cloud-based.


Cloud and Azure

I have been doing a lot of Azure both at work and outside it. Apart from HDInsight, I think Azure is expanding at a phenomenal rate in both feature and reliability and this is where I feel Microsoft is closing in the Innovation Gap. It is mind-blowing to look at the list of new features that are coming out of Azure every month.

Focusing mainly on the PaaS products, I think future of Azure in terms of adoption by the smaller companies is looking more and more attractive compared to AWS which has traditionally been IaaS platform of choice. Companies like Netflix have built all their software empire on AWS but they had an army of great developers to write the tooling and integration stuff.

All-in-all I feel Azure is here to stay and might even overtake AWS in the next 5 years. What will be a decider is the innovation pace.

Non-Hadoop platforms

A recent trend that could change the balance is the proliferation of non-Hadoop approaches to Big Data which will favour Microsoft and Google. With Hadoop 2.0 trying to abstract away even more the algorithm from the resource management, I think there is an opportunity for Microsoft to jump in and plug-in a whole host of Microsoft languages in a real way - it was possible to use C# and F# before but no one really used it.

Microsoft announced the release AzureML which is the PaaS offering of Machine Learning on the Azure Platform. It is early to say but it looks like this could be used for smaller-than-big-date machine learning and analysis. This platform is basically productionising of the Machine Learning platform behind the Bing search engine.

Another sign that the Hadoop's elephant is getting old is Google's announcement to drop MapReduce: "We invented it and now we are retiring it". Basically in-memory processing looks more and more appealing due to the need for quicker feedback cycle and speeding up processes. Also it seems that there is resurgence of focus towards in-memory grid computing, perhaps as a result of Actor Frameworks popularity.

In terms of technologies, Spark and to a degree Storm are getting a lot of traction and the next few months are essential to confirm the trend. These still very much come from a JVM ecosystem but there is potential in building competitor products.

Personal progress

MacBook

This is the first thing I did after making the decision 9 months ago: I bought a MacBook. I was probably the farthest thing away from being an Apple fanboy, but well it has put its hooks in me too now. I wasn't sure if I should get a Windows laptop and run a Linux VM on it, or buy a MacBook and run Windows VM. Funny enough, and despite my presumptions, I found the second option to be cheaper. In fact I could not find an Windows UltraBook with 16 GB of RAM and that is what I needed to be able to comfortably run a VIM. So buying a 13.3" MacBook pro proved both economical (in the light of what you get back for the money) and the right choice - since you want your VM to be your secondary platform.

Initially I did not like OSX but it helped me to get better at using the command line - be it the OSX variant of Linux commands. Six months on, similar to what some of my twitter friends had said, I don't think I will ever go back to Windows.

I have used Mac for everything apart from using Visual Studio and occasional Visio - also using some Azure tools had to be on Windows. I think I now spend probably only 20% of my time in Windows, the rest in Linux (Azure VM) and OSX.

Linux, Shell scripting and command line

I felt like an ignorant to find out the wealth of command line tools at my disposal in OSX and Linux. Find, Grep, Sort, Sed, tail, head, etc just amazing stuff. I admit, for some there might be windows equivalent that I have not heard of (which I doubt) but it really makes the life so easier to automate and manage your servers. So been working on understanding services on Linux and OSX, learning about Apache and how to configure it... I am no expert by any stretch but it has been fun and learnt a lot.
And yes, I did use VIM - and yes, I did find it difficult to exit it the first time :) I am not mad about it, I just have to use it on Linux VMs I manage configs, etc but cannot see myself using it for development - at least anytime soon.

Languages

As I said the, I had decided to start with some JVM languages. Scala felt the right choice then and with knowing more about it now, even more so. It is powerful yet all the wealth of Java libraries are at your fingertip. It is widely adopted (and Clojure the second candidate not so much). Erlang probably not appropriate now and go is non-JVM. so I am happy with that decision.

Having said that, I could not learn a lot of it. Instead I had to focus on Python for a personal NLP project - well, as you know most NLP and data science tools are on Python. I had to learn to code, understand its OOP and functional side, its versioning and distribution and finally above all being able to serve REST APIs (using Flask and RESTful-Flask) for interop with my other C# code.
My view on it? Python is simple and has a nice built-in support for important data structures (list, map, tuple, etc) making it ideal for working with the data. So it is a very useful language but it is not anywhere near as elegant as Scala or even C#. So for complex stuff, I would still rather coding in C#, until I properly pick up Scala again. I am also not very comfortable with distributing non-compiled code - although that is what we normally do in JavaScript (minimising aside), perhaps another point of similarity between these two.

Apart from these, I have still been doing a ton of C#, as I had predicted in the previous blog post. I have been working on a Cloud Actor Mini-Framework called BeeHive which I am currently using myself. I still enjoy writing C# and am planning to try out Mono as well (.NET on OSX and Linux). Having said that, I feel tools and languages best to be used in their native platform and ecosystem, so I am not sure if Mono would be a viable option for me.

Conclusion

All-in-all I think by embracing the non-Microsoft world, I have made the right decision. A new world has been suddenly opened up for me, a lot of exciting things to learn and to do. I wish I had done this earlier.

Would I think I will completely abandon my previous skills? I really doubt it: The future is not mono-colour, it is a democratised hybrid one, where different skillsets will result in cross-pollinisation and producing better software. It feel having a hybrid skill is becoming more and more important, and if you are looking to position yourself better as a developer/architect, this is the path you need to take. Currently cross-platform/hybrid skills is a plus, in 5 years it will be a necessity.
          Big Data & Analytics Innovation Summit        
Start Date: Tue, 15 Sep 2015
End Date: Wed, 16 Sep 2015
City: #
Description:

The Big Data & Analytics Innovation Summit is the largest gathering of senior business executives leading Big Data initiatives in Australasia.

The summit brings together business leaders and innovators from the industry for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion.

Topics include:

  • Big Data & Analytics in Business
  • How to Embrace Data Science
  • Adopting Cloud Solutions in Your Enterprise
  • Advanced Analytics Offering Insight




          Big Data Innovation        
Start Date: Wed, 09 Sep 2015
End Date: Thu, 10 Sep 2015
City: #
Description:

The Big Data Innovation Summit is the largest gathering of Fortune 500 business executives leading Big Data initiatives.

We are currently accepting speaker submissions for the 2015 event, if you have something to share you can submit a speaker submission here.

This year we have a number of exciting tracks, panel sessions and interactive workshops all happening over the two days. Click on the track titles below for more details:

The full agenda is now out, click on the link to the right to see what's happening at the event!



          Big Data Innovation Summit        
Start Date: Mon, 11 May 2015
End Date: Tue, 12 May 2015
City: #
Description:

The Big Data Innovation Summit London 2015 schedule will bring together executives from the data community for two days of keynotes, panel sessions, discussions & networking.

Big Data Innovation will cover but not limited to:

  • Big Data Case Studies: How our speaker organisations are dealing with their data
  • Data Science: The extraction of knowledge from data
  • Hadoop: Getting value from unstructured data
  • Advanced Analytics: Solutions to predict future events
  • Customer Insights: Getting the most from your customer data
  • Data Mining: Identifying behaviour
  • & so much more...

For the latest information please contact Hayley Law



          Gaming Analytics Summit        
Start Date: Wed, 29 Apr 2015
End Date: Thu, 30 Apr 2015
City: #
Description:

Bringing together leaders and innovators for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion. 

Topics at this year's summit focus on how the industry is embracing data science to drive success in areas including:

- In-game analytics

- Player acquisition

- Retention

- Customer Insight



          Big Data Innovation Summit        
Start Date: Tue, 28 Apr 2015
End Date: Wed, 29 Apr 2015
City: #
Description:

Big Data Innovation is back, bigger and better than ever before. With an action packed schedule over two days we promise to bring you the latest case studies, lessons, direction, cautionary tales, success stories & challenges that Big Data has to offer.

Big Data Innovation 2015 includes the following tracks:

  • Big Data Innovation Keynotes
  • Data Analytics
  • Hadoop & NoSQL
  • Data Science
  • Data & the Cloud
  • Machine Learning
  • Data Architecture
  • plus hours of networking, breakout sessions & more


          Big Data & Analytics Innovation Summit        
Start Date: Tue, 21 Apr 2015
End Date: Wed, 22 Apr 2015
City: #
Description:

The summit brings together business leaders and innovators from the industry for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion. 

Make sure to check back regularly for schedule additions and changes. Full agenda is to be launched in February 2015

Topics covered include:

  • Data Analytics Use Cases from Industry Speakers
  • Capitalizing on the Power of Big Data
  • Big Data and Social Media
  • Logical Data Warehousing
  • Data Science
  • Real Time Analytics
  • Consumer Analytics


          Predictive Analytics Innovation Summit        
Start Date: Thu, 12 Feb 2015
End Date: Fri, 13 Feb 2015
City: #
Description:

Bringing together business leaders and innovators for an event acclaimed for its interactive format; combining keynote presentations, interactive breakout sessions and open discussion. Click through to see the current agenda. Purchase an Access All Areas Pass to receive access to the co-located Apache Hadoop Innovation Summit and Data Science Innovation Summit.

Hear topics including:

  • Taking Advantage of Structure in Data to Improve Predictive Models
  • Better Consumer Experiences with Applied Machine Learning 
  • Evolving Predictive Analytics from Score to Story 
  • Models for the Performance of Big Data Systems 



          Data Science Innovation Summit        
Start Date: Thu, 12 Feb 2015
End Date: Fri, 13 Feb 2015
City: #
Description:

As organizations evolve and embrace technological advances, data becomes a key currency on which they can hope to gain an advantage over competitors and push business success. Creating a data-driven culture - ensuring that your business embraces data insight at every level - becomes essential.

Topics covered will include:   
  • The Role of the Chief Data Scientist  
  • Innovating Data-Driven Products 
  • Leveraging Data Science to Create Competitive Advantage 
  • Implementing an Effective Data Strategy
  • Creating a Data Driven Culture

For more information on speaking opportunities for the Data Science Innovation Summit please contact Gaby Morse at gmorse@theiegroup.com



          Big Data & Analytics Innovation Summit        
Start Date: Wed, 11 Feb 2015
End Date: Thu, 12 Feb 2015
City: #
Description:

Big Data & Analytics Innovation will bring you right up to speed to assist you with your every need covering an array of topics, themes and problem points.

Topics to Include:

  • Data Analytics
  • Data Science
  • Advanced Analytics
  • Predictive Analytics
  • Machine Learning & Algorithms
  • Cloud Computing
  • & much more...


          Big Data Innovation Summit        
Start Date: Thu, 22 Jan 2015
End Date: Fri, 23 Jan 2015
City: #
Description:

As organizations evolve and embrace technological advances, data becomes a key currency on which they can hope to gain an advantage over competitors and push business success. The agenda will explore all topics and themes on how we can better embrace data and push it to the limits.

Topics to include: 

  • Data Analytics
  • Data Science
  • Emerging Algorithms
  • Machine Learning
  • Big Data Technologies
  • Data Security
  • Data in the Cloud
  • & much more...


          Data Science Leadership Summit        
Start Date: Wed, 12 Nov 2014
End Date: Wed, 12 Nov 2014
City: #
Description:

Bringing together leaders from multiple industries, this event is claimed for its interactive format and open discussion. Schedule listed is from 2013 summit.

Hot topics covered include:

- Establishing a data-driven culture

- Building a data science team

- Opportunities afforded by data science



          Big Data & Marketing Innovation Summit        
Start Date: Thu, 06 Nov 2014
End Date: Fri, 07 Nov 2014
City: #
Description:

The Big Data revolution has the potential to help marketers reach and engage with customers and consumers in new ways. With so much consumer information and prospect data now available, organizations embracing data analytics and metrics are seeing improvements in the performance of their campaigns. The Big Data & Marketing Innovation summit will look at metrics and Big Data in Marketing to help your organization gain greater consumer insight; giving you the competitive edge.

Topics covered include:

  • Driving Consumer Engagement through Big Data Analytics
  • Organizing for Customer Data Managmenet
  • User Level Web Analytics Across Multiple Platforms
  • Cognitive Science to Drive Customer Targeting
  • Big Data & Marketing Modelling
  • Consumer Communication
  • Data Science for Customer Lifecycle Management
  • Mobile Marketing Analytics
  • Consumer Behaviour Tracking

Check out the schedule for the co-located Social & Digital Analytics Summit - purchase a Diamond Pass for access to all sessions across both events



          Big Data Innovation        
Start Date: Thu, 25 Sep 2014
End Date: Fri, 26 Sep 2014
City: #
Description:

The Big Data Innovation Summit is the largest gathering of Fortune 500 business executives leading Big Data initiatives.

We are currently accepting speaker submissions for the 2014 event, if you have something to share you can submit a speaker submission here.

The summit will comprise of multiple tracks, covering the most current topics in Big Data today:



          Big Data Innovation Summit        
Start Date: Wed, 09 Apr 2014
End Date: Thu, 10 Apr 2014
City: #
Description:

5 Epic Tracks make up #datawest14

  • Big Data Innovation - The 2014 Keynotes
  • Hadoop & NoSQL
  • Data Science 
  • Data Analytics
  • Algorithms & Machine Learning
  • PLUS Breakouts, Roundtables, Workshops & more...

The Agenda is now live...with more speakers being added every week (subject to change) 



          Big Data & Analytics for Banking Summit        
Start Date: Tue, 03 Dec 2013
End Date: Wed, 04 Dec 2013
City: #
Description:

With greater constraints and challenges facing the banking industry every day, hear how forward-thinking organizations are driving success in an immensely competitive market. 

Key topics covered at the summit include:

- Customer Analytics

- Fraud Analytics

- Data Science in Banking

& much more



          Data Science Leadership Summit        
Start Date: Thu, 14 Nov 2013
End Date: Thu, 14 Nov 2013
City: #
Description:

Bringing together leaders from multiple industries, this event is claimed for its interactive format and open discussion. 

Hot topics covered include:

- Data-driven culture

- Building a data science team

- Opportunities afforded by data science



          Predictive Analytics in Government Summit        
Start Date: Wed, 22 May 2013
End Date: Thu, 23 May 2013
City: #
Description:

Bringing together decision makers and thought leaders, this event offers answers to the issues facing government agencies and insight into how data science and data analytics can be applied to an ever more diverse number of challenges. 

The initial lineup of influential senior-level executives, innovators, technologists, strategists, analysts and pundits for 2013 is being recruited now. For speaking opportunities please email Michaela at mmorrison@theiegroup.com



          BIG DATA AND DATA SCIENCE (HYDERABAD)        
SequelGate is one of the best training institutes for Data Science & Big Data /Data Analytics Training . We have been providing Classroom and Classroom Trainings and Corporate training. All our training sessions are COMPLETELY PRACTICAL. DATA S...
          RapidMiner [Business Development Representative]        
DESCRIPTION RapidMiner builds software for real data science, fast and simple. We make data science teams more productive through a single platform that unifies data prep, machine learning, and model deployment. More than 200,000 users in over 150 countries use RapidMiner products to acquire more customers, reduce financial risk, and increase operational performance. RapidMiner boasts...
          Scala and Pyspark specialization certification courses started        

    Data science is a promising field, Where you have to continuously update your skill set by learning the new technique, algorithms, and newly created tools. As the learning journey never ends, we would always seek to find the best resources to start learning these new skill sets. We should be thankful for the great
+ Read More

The post Scala and Pyspark specialization certification courses started appeared first on Dataaspirant.


          EUDAT data management summer school        

3 - 7 July 2017

From 3-7 July 2017, EUDAT is organising a Summer School to introduce early-career researchers to the principles and tools needed for careers in data science and data management. The week-long training course will be set in the stunning landscapes of Heraklion, Crete, and kindly hosted by FORTH.

Read more


          Math/Computer Science/Statistics (VPAA 2018/19 - 66) - California State University - San Diego, CA        
SDSU College of Sciences Mathematics and Statistics Data Science - Assistant Professor Please click on the link to apply
From California State University - Tue, 08 Aug 2017 03:23:32 GMT - View all San Diego, CA jobs
          AWS re:Invent 2016 Video & Slide Presentation Links with Easy Index        
As with last year, here is my quick index of all re:Invent sessions. I'll keep running the tool to fill in the index.  It usually takes Amazon a few weeks to fully upload all the videos and presentations. This year it looks like Amazon got the majority of content on Youtube and Slideshare very quick with a few Slideshares still trickling in.

See below for how I created the index (with code):


ALX201 - How Capital One Built a Voice-Based Banking Skill for Amazon Echo
As we add thousands of skills to Alexa, our developers have uncovered some basic and more complex tips for building better skills. Whether you are new to Alexa skill development or if you have created skills that are live today, this session helps you understand how to create better voice experiences. Last year, Capital One joined Alexa on stage at re:Invent to talk about their experience building an Alexa skill. Hear from them one year later to learn from the challenges that they had to overcome and the results they are seeing from their skill. In this session, you will learn the importance of flexible invocations, better VUI design, how OAuth and account linking can add value to your skill, and about Capital One's experience building an Alexa skill.
ALX202 - How Amazon is enabling the future of Automotive
The experience in the auto industry is changing. For both the driver and the car manufacturer, a whole new frontier is on the near horizon. What do you do with your time while the car is driving itself? How do I have a consistent experience while driving shared or borrowed cars? How do I stay safer and more aware in the ever increasing complexity of traffic, schedules, calls, messages and tweets? In this session we will discuss how the auto industry is facing new challenges and how the use of Amazon Alexa, IoT, Logistics services and the AWS Cloud is transforming the Mobility experience of the (very near) future.
ALX203 - Workshop: Creating Voice Experiences with Alexa Skills: From Idea to Testing in Two Hours
This workshop teaches you how to build your first voice skill with Alexa. You bring a skill idea and well show you how to bring it to life. This workshop will walk you through how to build an Alexa skill, including Node.js setup, how to implement an intent, deploying to AWS Lambda, and how to register and test a skill. Youll walk out of the workshop with a working prototype of your skill idea. Prerequisites: Participants should have an AWS account established and available for use during the workshop. Please bring your own laptop.
ALX204 - Workshop: Build an Alexa-Enabled Product with Raspberry Pi
Fascinated by Alexa, and want to build your own device with Alexa built in? This workshop will walk you through to how to build your first Alexa-powered device step by step, using a Raspberry Pi. No experience with Raspberry Pi or Alexa Voice Service is required. We will provide you with the hardware and the software required to build this project, and at the end of the workshop, you will be able to walk out with a working prototype of Alexa on a Pi. Please bring a WiFi capable laptop.
ALX301 - Alexa in the Enterprise: How JPL Leverages Alexa to Further Space Exploration with Internet of Things
The Jet Propulsion Laboratory designs and creates some of the most advanced space robotics ever imagined. JPL IT is now innovating to help streamline how JPLers will work in the future in order to design, build, operate, and support these spacecraft. They hope to dramatically improve JPLers' workflows and make their work easier for them by enabling simple voice conversations with the room and the equipment across the entire enterprise. What could this look like? Imagine just talking with the conference room to configure it. What if you could kick off advanced queries across AWS services and kick off AWS Kinesis tasks by simply speaking the commands? What if the laboratory could speak to you and warn you about anomalies or notify you of trends across your AWS infrastructure? What if you could control rovers by having a conversation with them and ask them questions? In this session, JPL will demonstrate how they leveraged AWS Lambda, DynamoDB and CloudWatch in their prototypes of these use cases and more. They will also discuss some of the technical challenges they are overcoming, including how to deploy and manage consumer devices such as the Amazon Echo across the enterprise, and give lessons learned. Join them as they use Alexa to query JPL databases, control conference room equipment and lights, and even drive a rover on stage, all with nothing but the power of voice!
ALX302 - Build a Serverless Back End for Your Alexa-Based Voice Interactions
Learn how to develop voice-based serverless back ends for Alexa Voice Service (AVS) and Alexa devices using the Alexa Skills Kit (ASK), which allows you to add new voice-based interactions to Alexa. Well code a new skill, implemented by a serverless backend leveraging AWS services such as Amazon Cognito, AWS Lambda, and Amazon DynamoDB. Often, your skill needs to authenticate your users and link them back to your backend systems and to persist state between user invocations. User authentication is performed by leveraging OAuth compatible identity systems. Running such a system on your back end requires undifferentiated heavy lifting or boilerplate code. Well leverage Login with Amazon as the identity provider instead, allowing you to focus on your application implementation and not on the low-level user management parts. At the end of this session, youll be able to develop your own Alexa skills and use Amazon and AWS services to minimize the required backend infrastructure. This session shows you how to deploy your Alexa skill code on a serverless infrastructure, leverage AWS Lambda, use Amazon Cognito and Login with Amazon to authenticate users, and leverage AWS DynamoDB as a fully managed NoSQL data store.
ALX303 - Building a Smarter Home with Alexa
Natural user interfaces, such as those based on speech, enable customers to interact with their home in a more intuitive way. With the VUI (Voice User Interface) smart home, now customers don't need to use their hands or eyes to do things around the home they only have to ask and it's at their command. This session will address the vision for the VUI smart home and how innovations with Amazon Alexa make it possible.
ALX304 - Tips and Tricks on Bringing Alexa to Your Products
Ever wonder what it takes to add the power of Alexa to your own products? Are you curious about what Alexa partners have learned on their way to a successful product launch? In this session you will learn about the top tips and tricks on how to go from VUI newbie to an Alexa-enabled product launch. Key concepts around hardware selection, enabling far field voice interaction, building a robust Alexa Voice Service (AVS) client and more will be discussed along with customer and partner examples on how to plan for and avoid common challenges in product design, development and delivery.
ALX305 - From VUI to QA: Building a Voice-Based Adventure Game for Alexa
Hitting the submit button to publish your skill is similar to sending your child to their first day of school. You want it to be set up for a successful launch day and for many days thereafter. Learn how to set your skill up for success from Andy Huntwork, Alexa Principal Engineer and one of the creators of the popular Alexa skill The Magic Door. You will learn the most common reasons why skills fail and also some of the more unique use cases. The purpose of this session is to help you build better skills by knowing what to look out for and what you can test for before submitting. In this session, you will learn what most developers do wrong, how to successfully test and QA your skill, how to set your skill up for successful certification, and the process of how a skill gets certified.
ALX306 - State of the Union: Amazon Alexa and Recent Advances in Conversational AI
The way humans interact with machines is at a turning point, and conversational artificial intelligence (AI) is at the center of the transformation. Learn how Amazon is using machine learning and cloud computing to fuel innovation in AI, making Amazon Alexa smarter every day. Alexa VP and Head Scientist Rohit Prasad presents the state of the union Alexa and Recent Advances in Conversational AIn for Alexa. He addresses Alexa's advances in spoken language understanding and machine learning, and shares Amazon's thoughts about building the next generation of user experiences.
ALX307 - Voice-enabling Your Home and Devices with Amazon Alexa and AWS IoT
Want to learn how to Alexa-power your home? Join Brookfield Residential CIO and EVP Tom Wynnyk and Senior Solutions Architect Nathan Grice, for Alexa Smart Homefor an overview of building the next generation of integrated smart homes using Alexa to create voice-first experiences. Understand the technologies used and how to best expose voice experiences to users through Alexa. Paul and Nathan cover the difference between custom Alexa skills and Smart Home Skill API skills, and build a home automation control from the ground up using Alexa and AWS IoT.
ARC201 - Scaling Up to Your First 10 Million Users
Cloud computing gives you a number of advantages, such as the ability to scale your web application or website on demand. If you have a new web application and want to use cloud computing, you might be asking yourself, "Where do I start?" Join us in this session to understand best practices for scaling your resources from zero to millions of users. We show you how to best combine different AWS services, how to make smarter decisions for architecting your application, and how to scale your infrastructure in the cloud.
ARC202 - Accenture Cloud Platform Serverless Journey
Accenture Cloud Platform helps customers manage public and private enterprise cloud resources effectively and securely. In this session, learn how we designed and built new core platform capabilities using a serverless, microservices-based architecture that is based on AWS services such as AWS Lambda and Amazon API Gateway. During our journey, we discovered a number of key benefits, including a dramatic increase in developer velocity, a reduction (to almost zero) of reliance on other teams, reduced costs, greater resilience, and scalability. We describe the (wild) successes weve had and the challenges weve overcome to create an AWS serverless architecture at scale. Session sponsored by Accenture. AWS Competency Partner
ARC203 - Achieving Agility by Following Well-Architected Framework Principles on AWS
The AWS Well-Architected Framework enables customers to understand best practices around security, reliability, performance, and cost optimization when building systems on AWS. This approach helps customers make informed decisions and weigh the pros and cons of application design patterns for the cloud. In this session, you'll learn how National Instruments used the Well-Architected Framework to follow AWS guidelines and best practices. By developing a strategy based on the AWS Well-Architected Framework, National Instruments was able to triple the number of applications running in the cloud without additional head count, significantly increase the frequency of code deployments, and reduce deployment times from two weeks to a single day. As a result, National Instruments was able to deliver a more scalable, dynamic, and resilient LabVIEW platform with agility.
ARC204 - From Resilience to Ubiquity - #NetflixEverywhere Global Architecture
Building and evolving a pervasive, global service requires a multi-disciplined approach that balances requirements with service availability, latency, data replication, compute capacity, and efficiency. In this session, well follow the Netflix journey of failure, innovation, and ubiquity. We'll review the many facets of globalization and then delve deep into the architectural patterns that enable seamless, multi-region traffic management; reliable, fast data propagation; and efficient service infrastructure. The patterns presented will be broadly applicable to internet services with global aspirations.
ARC205 - Born in the Cloud; Built Like a Startup
This presentation provides a comparison of three modern architecture patterns that startups are building their business around. It includes a realistic analysis of cost, team management, and security implications of each approach. It covers Elastic Beanstalk, Amazon ECS, Docker, Amazon API Gateway, AWS Lambda, Amazon DynamoDB, and Amazon CloudFront, as well as Docker.
ARC207 - NEW LAUNCH! Additional transparency and control for your AWS environment through AWS Personal Health Dashboard
When your business is counting on the performance of your cloud solutions, having relevant and timely insights into events impacting your AWS resources is essential. AWS Personal Health Dashboard serves as the primary destination for you to receive personalized information related to your AWS infrastructure, guiding your through scheduled changes, and accelerating the troubleshooting of issues impacting your AWS resources. The service, powered by AWS Health APIs, integrates with your in-house event management systems, and can be programmatically configured to proactively get the right information into the right hands at the right time. The service is integrated with Splunk App for AWS to enhance Splunks dashboards, reports and alerts to deliver real-time visibility into your environment.
ARC208 - Hybrid Architectures: Bridging the Gap to the Cloud
AWS provides many services to assist customers with their journey to the cloud. Hybrid solutions offer customers a way to continue leveraging existing investments on-premises, while expanding their footprint into the public cloud. This session covers the different technologies available to support hybrid architectures on AWS. We discuss common patterns and anti-patterns for solving enterprise workloads across a hybrid environment.
ARC209 - Attitude of Iteration
In todays world, technology changes at a breakneck speed. What was new this morning is outdated at lunch. Working in the AWS Cloud is no different. Every week, AWS announces new features or improvements to current products. As AWS technologists, we must assimilate these new technologies and make decisions to adopt, reject, or defer. These decisions can be overwhelming: we tend to either reject everything and become stagnant, or adopt everything and never get our project out the door. In this session we will discuss the attitude of iteration. The attitude of iteration allows us to face the challenges of change without overwhelming our technical teams with a constant tug-o-war between implementation and improvement. Whether youre an architect, engineer, developer, or AWS newbie, prepare to laugh, cry, and commiserate as we talk about overcoming these challenges. Session sponsored by Rackspace.
ARC210 - Workshop: Addressing Your Business Needs with AWS
Come and participate with other AWS customers as we focus on the overall experience of using AWS to solve business problems. This is a great opportunity to collaborate with existing and prospective AWS users to validate your thinking and direction with AWS peers, discuss the resources that aid AWS solution design, and give direct feedback on your experience building solutions on AWS.
ARC211 - Solve common problems with ready to use solutions in 5 minutes or less
Regularly, customers at AWS assign resources to create solutions that address common problems shared between businesses of all sizes. Often, this results in taking resources away from products or services that truly differentiate the business in the marketplace. The Solutions Builder team at AWS focuses on developing and publishing a catalog of repeatable, standardized solutions that can be rapidly deployed by customers to overcome common business challenges. In this session, the Solutions Builder team will share ready to use solutions that make it easy for anyone to create a transit VPC, centralized logging, a data lake, scheduling for Amazon EC2, and VPN monitoring. Along the way, the team reveals the architectural tenets and best practices they follow for the development of these solutions. In the end, customers are introduced to a catalog of freely available solutions with a peek into the architectural approaches used by an internal team at AWS.
ARC212 - Salesforce: Helping Developers Deliver Innovations Faster
Salesforce is one of the most innovative enterprise software companies in the world, delivering 3 major releases a year with hundreds of features in each release. In this session, come learn how we enable thousands of engineers within Salesforce to utilize a flexible development environment to deliver these innovations to our customers faster. We show you how we enable engineers at Salesforce to test not only individual services they are developing but also large scale service integrations. Also learn how we can achieve setup of a representative production environment in minutes and teardown in seconds, using AWS.
ARC213 - Open Source at AWS—Contributions, Support, and Engagement
Over the last few years, we have seen a dramatic increase in the use of open source projects as the mainstay of architectures in both startups and enterprises. Many of our customers and partners also run their own open source programs and contribute key technologies to the industry as a whole (see DCS201). At AWS weengage with open source projects in a number of ways. Wecontribute bug fixesand enhancementstopopular projectsincluding ourwork with the Hadoop ecosystem (see BDM401), Chromium(see BAP305) and (obviously) Boto.We have our own standalone projectsincludingthe security library s2n (see NET405)and machine learning project MXnet (see MAC401).Wealsohave services that make open source easier to use like ECS for Docker (see CON316), and RDS for MySQL and PostgreSQL (see DAT305).In this session you will learn about our existing open source work across AWS, and our next steps.
ARC301 - Architecting Next Generation SaaS Applications on AWS
AWS provides a broad array of services, tools, and constructs that can be used to design, operate, and deliver SaaS applications. In this session, Tod Golding, the AWS Partner Solutions Architect, shares the wisdom and lessons learned from working with dozens of customers and partners building SaaS solutions on AWS. We discuss key architectural strategies and patterns that are used to deliver multi-tenant SaaS models on AWS and dive into the full spectrum of SaaS design and architecture considerations, including tenant isolation models, tenant identity management, serverless SaaS, and multi-tenant storage strategies. This session connects the dots between general SaaS best practices and what it means to realize these patterns on AWS, weighing the architectural tradeoffs of each model and assessing its influence on the agility, manageability, and cost profile of your SaaS solution.
ARC302 - From One to Many: Evolving VPC Design
As more customers adopt Amazon VPC architectures, the features and flexibility of the service are squaring off against evolving design requirements. This session follows this evolution of a single regional VPC into a multi-VPC, multi-region design with diverse connectivity into on-premises systems and infrastructure. Along the way, we investigate creative customer solutions for scaling and securing outbound VPC traffic, securing private access to Amazon S3, managing multi-tenant VPCs, integrating existing customer networks through AWS Direct Connect, and building a full VPC mesh network across global regions.
ARC303 - Cloud Monitoring - Understanding, Preparing, and Troubleshooting Dynamic Apps on AWS
Applications running in a typical data center are static entities. Dynamic scaling and resource allocation are the norm in AWS. Technologies such as Amazon EC2, Docker, AWS Lambda, and Auto Scaling make tracking resources and resource utilization a challenge. The days of static server monitoring are over. In this session, we examine trends weve observed across thousands of customers using dynamic resource allocation and discuss why dynamic infrastructure fundamentally changes your monitoring strategy. We discuss some of the best practices weve learned by working with New Relic customers to build, manage, and troubleshoot applications and dynamic cloud services. Session sponsored by New Relic. AWS Competency Partner
ARC304 - Effective Application Data Analytics for Modern Applications
IT is evolving from a cost center to a source of continuous innovation for business. At the heart of this transition are modern, revenue-generating applications, based on dynamic architectures that constantly evolve to keep pace with end-customer demands. This dynamic application environment requires a new, comprehensive approach to traditional monitoring one based on real-time, end-to-end visibility and analytics across the entire application lifecycle and stack, instead of monitoring by piecemeal. This presentation highlights practical advice on how developers and operators can leverage data and analytics to glean critical information about their modern applications. In this session, we will cover the types of data important for todays modern applications. Well discuss visibility and analytics into data sources such as AWS services (e.g., Amazon CloudWatch, AWS Lambda, VPC Flow Logs, Amazon EC2, Amazon S3, etc.), development tool chain, and custom metrics, and describe how to use analytics to understand business performance and behaviors. We discuss a comprehensive approach to monitoring, troubleshooting, and customer usage insights, provide examples of effective data analytics to improve software quality, and describe an end-to-end customer use case that highlights how analytics applies to the modern app lifecycle and stack. Session sponsored by Sumo Logic. AWS Competency Partner
ARC305 - From Monolithic to Microservices: Evolving Architecture Patterns in the Cloud
Gilt, a global e-commerce company, implemented a sophisticated microservices architecture on AWS to handle millions of customers visiting their site at noon every day. The microservices architecture pattern enables independent service scaling, faster deployments, better fault isolation, and graceful degradation. In this session, Emerson Loureiro, Sr. Software Engineer at Gilt, will share Gilt's experiences and lessons learned during their evolution from a single monolithic Rails application in a traditional data center to more than 300 Scala/Java microservices deployed in the cloud.Derek Chiles, AWS Solutions Architect, will review best practices and recommended architectures for deploying microservices on AWS.
ARC306 - Event Handling at Scale: Designing an Auditable Ingestion and Persistence Architecture for 10K+ events/second
How does McGraw-Hill Education use the AWS platform to scale and reliably receive 10,000 learning events per second? How do we provide near-real-time reporting and event-driven analytics for hundreds of thousands of concurrent learners in a reliable, secure, and auditable manner that is cost effective? MHE designed and implemented a robust solution that integrates AWS API Gateway, AWS Lambda, Amazon Kinesis, Amazon S3, Amazon Elasticsearch Service, Amazon DynamoDB, HDFS, Amazon EMR, Amazopn EC2, and other technologies to deliver this cloud-native platform across the US and soon the world. This session describes the challenges we faced, architecture considerations, how we gained confidence for a successful production roll-out, and the behind-the-scenes lessons we learned.
ARC307 - Accelerating Next Generation Healthcare Business on the AWS Cloud
Hear Geneia's design principles for using multiple technologies like Elastic Load Balancing and Auto Scaling in end-to-end solutions to meet regulatory requirements. Explore how to meet HIPAA regulations by using native cloud services like Amazon EC2, Amazon EBS volumes, encryption services, and monitoring features in addition to third-party tools to ensure end-to-end data protection, privacy, and security for protected health information (PHI) data hosted in the AWS Cloud. Learn how Geneia leveraged multiregion and multizone backup and disaster recovery solutions to address the recovery time objective (RTO) and recovery point objective (RPO) requirements. Discover how automated build, deployment, provisioning, and virtual workstations in the cloud enabled Geneia's developers and data scientists to quickly provision resources and work from any location, expediting the onboarding of customers, getting to market faster, and capturing bigger market share in healthcare analytics while minimizing costs. Session sponsored by Cognizant. AWS Competency Partner
ARC308 - Metering Big Data at AWS: From 0 to 100 Million Records in 1 Second
Learn how AWS processes millions of records per second to support accurate metering across AWS and our customers. This session shows how we migrated from traditional frameworks to AWS managed services to support a large processing pipeline. You will gain insights on how we used AWS services to build a reliable, scalable, and fast processing system using Amazon Kinesis, Amazon S3, and Amazon EMR. Along the way we dive deep into use cases that deal with scaling and accuracy constraints. Attend this session to see AWSs end-to-end solution that supports metering at AWS.
ARC309 - Moving Mission Critical Apps from One Region to Multi-Region active/active
In gaming, low latencies and connectivity are bare minimum expectations users have while playing online on PlayStation Network. Alex and Dustin share key architectural patterns to provide low latency, multi-region services to global users. They discuss the testing methodologies and how to programmatically map out a large dependency multi-region deployment with data-driven techniques. The patterns shared show how to adapt to changing bottlenecks and sudden, several million request spikes. Youll walk away with several key architectural patterns that can service users at global scale while being mindful of costs.
ARC310 - Cost Optimizing Your Architecture: Practical Design Steps For Big Savings
Did you know that AWS enables builders to architect solutions for price? Beyond the typical challenges of function, performance, and scale, you can make your application cost effective. Using different architectural patterns and AWS services in concert can dramatically reduce the cost of systems operation and per-transaction costs. This session uses practical examples aimed at architects and developers. Using code and AWS CloudFormation in concert with services such as Amazon EC2, Amazon ECS, Lambda, Amazon RDS, Amazon SQS, Amazon SNS, Amazon S3, CloudFront, and more, we demonstrate the financial advantages of different architectural decisions. Attendees will walk away with concrete examples, as well as a new perspective on how they can build systems economically and effectively. Attendees at this session will receive a free 30 day trial of AWS Trusted Advisor.
ARC311 - Evolving a Responsive and Resilient Architecture to Analyze Billions of Metrics
Nike+ is at the core of the Nike digital product ecosystem, providing services to enhance your athletic experience through quantified activity tracking and gamification. As one of the first movers at Nike to migrate out of the datacenter to AWS, they share the evolution in building a reactive platform on AWS to handle large, complex data sets. They provide a deep technical view of how they process billions of metrics a day in their quantified-self platform, supporting millions of customers worldwide. Youll leave with ideas and tools to help your organization scale in the cloud. Come learn from experts who have built an elastic platform using Java, Scala, and Akka, leveraging the power of many AWS technologies like Amazon EC2, ElastiCache, Amazon SQS, Amazon SNS, DynamoDB, Amazon ES, Lambda, Amazon S3, and a few others that helped them (and can help you) get there quickly.
ARC312 - Compliance Architecture: How Capital One Automates the Guard Rails for 6,000 Developers