Corporate educator TDWI has published an eBook on five engineering requirements for machine learning deployment.
Another year passes, more organizations deploy machine learning technologies into their business. But for every success, there are quite a few failures by companies that haven’t taken the appropriate steps to ensure their data is ready for the change.
Corporate educator TDWI has published an eBook on five engineering requirements for machine learning deployment. The report is aimed at business leaders who are exploring machine learning technologies to improve their business, or for those that have had trouble with prior deployments.
Rich Data Sources
Businesses that utilize a variety of data sources will see more of the benefits, according to author Fern Halper. Data virtualization and formatting services enable businesses to collect data from both online and offline sources, automatically formatting them to avoid data corruption.
The use of new technologies like data lakes and cloud platforms makes it even easier and cheaper for businesses to find and store data online. TDWI recommends businesses look into these new data management platforms before deploying machine learning.
Data Quality
Ensuring that data going into the machine learning algorithm is quality is another key to success. Halper provides five considerations for checking quality: standardization, deduplication, reasonableness, completeness, and accuracy.
See also: Google flexes machine learning muscle to drive real-time insights
Some platforms are able to automatically check for inconsistencies or red flags, like if data is unstructured or outside the valid perimeters, but some manual checks may be necessary to ensure all data is being processed and it is accurate.
Right Features For Problem
Having quality data sources and systems for ensuring quality are strong starts, but without a clear understanding of what an engineer needs to fix a problem or build a program, the business may end up building features that don’t provide much use.
Businesses need to provide engineers with access to disparate data to ensure the best feature selection. This may include data from mainframe, databases, or streams.
Keep Data Current
It is essential that businesses remove stale and out of date data from the system. The rate of removal depends on the type of model – a pricing model may need updating every few hours, a cybersecurity model every few minutes. Businesses will need to look into automatically changing the data, as manual changes every few hours will be difficult to manage.
Data Governance
This is applicable in most engineering and development processes, but because of the short timeframe for feature building and updating, it is very important for machine learning. A TDWI survey found most businesses do not do a good job at governing data, which leads to various issues down the road when engineers try to build another feature, update an existing or move to a new platform.
Read the full ebook here: https://bit.ly/2R8wT0M