top of page
0100152d7078adf2843a61819444cf2_edited.jpg
Academic Key Words: Real Estate Finance, Statistical Machine Learning (ML), Hedonic Pricing Model
Search

AsRES International Conference 2025

Together with the CUHK Centre of Real Estate Research, I attended the 2025 AsRES Conference in Melbourne, where I presented my working paper“Measuring More Than Distance: Multi-dimensional Perceived Accessibility of Urban Green Space and Green Premium Transmission Mechanism in Real Estate Prices".


(The details rerarding this "Park-Hedonic” paper would visit another post.)


I was scheduled to present my paper in the first PhD Colloquium session and was honored to receive the Best Paper Presentation for that session. The discussants were Prof. Wayne Wan (Monash University) and Dr. Yang Shi (Deakin University). Because my paper is highly Machine-learning (ML) related, Prof. Wan provided valuable feedback on the modeling framework. In particular, since my work involves computer vision–based preprocessing of Google Street View (GSV) images, he suggested adopting 0° / 90° / 180° / 270° -degree viewpoints as the cutting scheme, which connects closely to his earlier REE-published project on using building façade features for real estate valuation. Additionally, Wayne also pointed out that one of the greatest contributions of ML in the real estate field, as observed in the peer-review process, is its ability to improve predictive accuracy (e.g., by how many percentage points). Therefore, in future research, integrating sensory data into an Automated Valuation Model (AVM) and testing both real estate returns and the robustness of valuation accuracy could be the next direction for this project.


Back to the paper itself, Since I did not have access to commercial real estate data, my panel regression was based on residential housing transaction data obtained from Co-Star, where the validity of intuition regarding endogeneity and pricing factors becomes a major concern. Essentially, my paper argues that the green premium in housing prices is not determined solely by the intrinsic quality of nearby parks, but also by the sensory experience of pedestrian routes leading to them. Using machine learning techniques, we attempted to proxy how people perceive these walking paths. However, as Prof. Desmond Tsang suggested prior the conference, an important question still remains: why would people choose to “walk to” the park in the first place? Perhaps the initial motivation is not recreational but rather functional. The sensory quality of pedestrian paths leading to the nearest transit stations — which people use daily for commuting — may in fact be a more significant determinant of housing price.


Another concern was raised by PhD student Minghang Yu (NUS, Singapore), since both Hong Kong and Singapore have hilly terrains, so 2D pedestrian distances may not fully capture the real walking experience. Incorporating elevation changes along the 3D walking paths could provide a more comprehensive measure.


After the presentation, Prof. Cheng Keat Tang (Nanyang Technological University) approached me and raised the same concern. He suggested that I could consider on doing research on Singapore market, since commercial real estate transactions there are highly transparent and commercial data are relatively easy to obtain. Compared with the residential sector, focusing on commercial real estate would be more intuitive. Moreover, in recent years, the “green building” rating system has become a hot topic in Asia, which would make the research more relevant.


I would like to thank my research peers from NUS, the University of Tokyo, Tsinghua University, NTU, the University of Hong Kong, and the Hong Kong Polytechnic University for their constructive advice, especially on improving the design of machine learning (ML) designs, which greatly strengthened my project.


ree
ree
ree
ree
This is our regression result for industrial real estate, which shows that the Street-Wall Index from the building to the nearest street segment, is negatively correlated with the building’s transaction price. We applied forward, backward, and stepwise regression methods to ensure the robustness of our results.
This is our regression result for industrial real estate, which shows that the Street-Wall Index from the building to the nearest street segment, is negatively correlated with the building’s transaction price. We applied forward, backward, and stepwise regression methods to ensure the robustness of our results.


Our interaction terms test indicates that only truck traffic shows a significant moderation. Therefore, there is no evidence of a transmission or moderated effect based on pedestrian-path-related sensory factors between commercial real estate and their nearest parks.
Our interaction terms test indicates that only truck traffic shows a significant moderation. Therefore, there is no evidence of a transmission or moderated effect based on pedestrian-path-related sensory factors between commercial real estate and their nearest parks.
ree
Robustness Test by varying Step-wise regression, ML Gradient Boosting
Robustness Test by varying Step-wise regression, ML Gradient Boosting

Activities Photos:

Photo with Prof. Desmond Tsang (CUHK); and my colleague Julian Chen (PhD, CUHK)
Photo with Prof. Desmond Tsang (CUHK); and my colleague Julian Chen (PhD, CUHK)
Prof. Desmond presenting "Rotten-tail" buildings project about China's Real Estate downturn trend.
Prof. Desmond presenting "Rotten-tail" buildings project about China's Real Estate downturn trend.
CUHK Group Picture in front of Melbounr Business School, together with Prof. Yang Shi (U Deakin); Prof. Wayne Wan (U Monash)
CUHK Group Picture in front of Melbounr Business School, together with Prof. Yang Shi (U Deakin); Prof. Wayne Wan (U Monash)
ree

I have had extensive discussions with Wayne regarding machine learning and real estate research. Over the past five months, I have been conducting research to address the “data missing” problem that frequently occurs in the real estate field. I was also considering whether bootstrapping, or some form of supervised bootstrapping implemented using computer cluster processing, could serve as a potential solution. The type of missing data I am referring to is not the issue where, for example, out of 10,000 residential properties, 8,000 are recorded in Co-Star or CompStak and we use hedonic methods to impute or proxy the remaining 2,000. Rather, in the context of machine learning, the question is: how much missing data can be tolerated when applying deep learning models—that is, the amount that does not compromise the robustness of the model. Although deep learning is often treated as a black box, I believe that the sample size threshold remains a worthwhile topic for investigation.


Apart from my interest in identifying an underlying “sample size yardstick” to justify the application of deep learning in real estate, Wayne emphasized that the most important aspect of ML should be outcome-driven, namely, comparing how much better the predictive performance is relative to conventional panel regression. Moreover, in many ML labs, datasets with millions of properties are used for training, and GPU resources are both critical and limited. In many postdoctoral-level studies, the main constraint is often not insufficient data, but rather the lack of sufficient GPU capacity to process the available data.


ree

Some fun stories about LSE PhDs and their stories about "Criminology Economics" research experiences haha.

ree
ree
Melbourne's Sunset
Melbourne's Sunset
Nice city view of Melbourne, SEE YOU AGAIN.
Nice city view of Melbourne, SEE YOU AGAIN.

 
 
 

Comments


I’m Thomas Chan. 

And I do Real Estate and Machine Learning (ML).

陳裕傑 

bottom of page