Hsiao-Chien Shih
Data Scientist
Education: PhD in Geography (San Diego State University, UC Santa Barbara), MS in Geographic Information Science (San Diego State University), BA in Geography (National Taiwan Normal University)
The following profile was compiled by Jessica Embury (San Diego State University) for the Encoding Geography initiative. To learn more, visit: https://www.ncrge.org/encoding-geography/
Please describe your job, employer, and the primary tasks you perform in your position.
I am a data scientist at E Source, a utility consulting firm that helps clients in a data-science-as-a-service manner. Our main task is to reduce power outage risk. My job tasks include acquiring, exploring, extracting, and transforming geospatial (and sometimes time series) data, including remotely sensed imagery, elevation, land cover, and utility assets as inputs for data science and modeling. In addition, I compile and aggregate the output risk in a geographic data format.
What is your educational background? How did you initially discover geocomputation and why did you ultimately choose a career that uses geography and computer science?
I am a geographer and I have doctoral, master’s, and bachelor’s degrees in Geography. My initial discovery of geocomputation came from an introductory level GIS course during my undergraduate period. I kept taking other geospatial courses, including remote sensing and spatial statistics. I found my abilities in math and reasoning made geocomputation attractive. Besides, geographic information represents real-world big datasets that can be used to solve real-world problems. Thus, I aimed to work in this field for my career.
When thinking about geography, what specific background knowledge and conceptual ideas are important and useful to know?
I tend to emphasize the absolute and relative geographic locations of a phenomenon, and then I shift my attention to the associated spatial distribution and temporal dynamics. The spatial-temporal process always attracts me, and I am interested in the background mechanisms of the process. For example, the spatial-temporal processes of urbanization were my dissertation topic.
When thinking about computer science, what specific background knowledge and conceptual ideas are important and useful to know?
Programming skills are critical for handling large amounts of data iteratively. Machine learning and its associated applications are critical to the work I perform. With the understanding of various sources of data and their caveats, I compile data to create critical inputs for machine learning. Finally, cloud computing is critical for the current job market due to the nature of large–volume geospatial data.
What procedural knowledge is important and useful to know, from either geography or computer science?
Knowing how to handle (read/write) geospatial data is necessary. Understanding approaches to analyze raster, vector, or point cloud data significantly contributes to the current utility consulting industry.
What is an example of a social, economic, environmental, or other issue that you have recently investigated in a project at work?
My colleague and I regularly work to estimate and reduce power outage risks due to environmental effects. We monitor vegetation growth based on remotely sensed data around utility assets, and then provide feedback to our clients for prioritizing vegetation management. To accomplish this task, we need to know how to compile remotely sensed, elevation, and other sources of data for monitoring the environment. Thus, knowledge of remote sensing, image processing, geometry calculation, spatial analysis, and machine learning are necessary.
What kind of questions did you ask and think about during this project?
For the above task, we consider the spatial scale and resolution of multiple sources of data. It is important to know how to incorporate data across multiple scales and what each scale represents in terms of the data source. We often calculate the relative location between any two objects to understand the risk of power outage. Therefore, calculating geometry is necessary. In addition, we need to know how to compile these data programmatically and we need to know some graph theory to boost the computation speed.
What types of data did you acquire to support your project? Please identify up to three datasets you utilize most.
Multiple sources of data are often used in our projects, and most of the data are publicly accessible. First, utility data will be used to define the geographic coverage of territories, and samples will be generated from the proximity areas of the data. We also use multiple sources of remotely sensed data, including Sentinel-2 and NAIP, as well as elevation data, such as LiDAR and the national hydrology dataset. Recently, we started using commercial satellite imagery (e.g. Planetscope data).
What types of content knowledge and skills did you use to evaluate, process, and analyze the data you gathered for your project?
Several skills are critical. First, knowledge of remote sensing and associated image processing approaches are necessary. Specifically, image spectrum analysis and image preprocessing are helpful. Second, machine learning knowledge is another must-have skill for data preparation and data quality confirmation, including classification and clustering. Finally, cloud computing is a nice-to-have skill because we often handle humongous amounts of data (e.g. AWS Sagemaker and S3).
How did you apply geography and computer science to communicate the results of your project? Do you have a recent product or publication that you could share with us as an example?
We usually use python and QGIS for data visualization, and we use python for data compilation. Specifically, we rely on open-source python libraries (e.g.gdal) to handle geospatial data and create big data tables. Machine learning classification and regression are applied to estimate outage risks. Then, we create web-based apps to visualize the risk of power outage for our clients.
Here is the link to our publication: https://www.esource.com/001201rt0d/data-science-company-improves-vegetation-management
Reflecting on your work, how does it align with your personal values and your community or civic interests?
I often think about how my learning during my undergraduate and graduate studies can contribute to a better world, especially in urban areas. The knowledge of remote sensing, GIS, and geocomputation perfectly helps me achieve the goal of moving the world toward a decarbonizing future. I am glad that I can apply my knowledge in the utility industry.
This material is based upon work supported by the National Science Foundation under Grants No. 2031418, 2031407, and 2031380 (Collaborative Research: Encoding Geography – Scaling up an RPP to achieve inclusive geocomputational education). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation